Everything is a Freaking DNS problem - raid http://127.0.0.1:8080/blog/taxonomy/term/995/0 en Implementing Raid Monitoring on a 3Ware 3w-9xxx based controller. http://127.0.0.1:8080/blog/implementing-raid-monitoring-3ware-3w-9xxx-based-controller <p>When you pull out a disk from your Raid setup it shows a warning in syslog</p> <p><div class="geshifilter"><pre class="text geshifilter-text" style="font-family:monospace;"><ol><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">Jan 27 10:18:22 EL860 kernel: 3w-9xxx: scsi0: AEN: WARNING (0x04:0x0019): Drive </div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">removed:port=1.</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">Jan 27 10:18:22 EL860 kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0002): Degraded</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal"> unit:unit=0, port=1.</div></li></ol></pre></div></p> <p>However if no one is looking at syslog that won't really be helpfull.</p> <p>3Ware provides a tool from their site called tw_cli which can be used to manage<br /> the raid setup from the command line.</p> <p><div class="geshifilter"><pre class="text geshifilter-text" style="font-family:monospace;"><ol><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">[EL860-root@EL860 admin]# tw_cli /c0 show </div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">&nbsp;</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">------------------------------------------------------------------------------</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">u0 RAID-1 REBUILDING 41% - - 232.82 RiW ON </div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">&nbsp;</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">VPort Status Unit Size Type Phy Encl-Slot Model</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">------------------------------------------------------------------------------</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">p0 OK u0 232.88 GB SATA 0 - ST3250310NS </div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">p1 DEGRADED u0 232.88 GB SATA 1 - ST3250310NS </div></li></ol></pre></div></p> <p>I'd figure I'd either have to write wrapper script around that or find some other way of integrating it.<br /> Asking the question on ##infra-talk on irc.freenode.net gave me the following link to a <a href="http://github.com/stanaka/check_tw" rel="nofollow">check script</a> on github</p> <p><cite>koollman: sdog: something like <a href="http://github.com/stanaka/check_tw" title="http://github.com/stanaka/check_tw" rel="nofollow">http://github.com/stanaka/check_tw</a> should work. </cite></p> <p>With that in your snmpd.conf you can get the info via snmp</p> <p><div class="geshifilter"><pre class="text geshifilter-text" style="font-family:monospace;"><ol><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">[root snmp]# snmpwalk localhost -v 2c -c public .1.3.6.1.4.1.2</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">021 | grep ext </div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">UCD-SNMP-MIB::extIndex.1 = INTEGER: 1</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">UCD-SNMP-MIB::extNames.1 = STRING: TW_RAID</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">UCD-SNMP-MIB::extCommand.1 = STRING: /usr/local/sbin/check_tw</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">UCD-SNMP-MIB::extResult.1 = INTEGER: 2</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">UCD-SNMP-MIB::extOutput.1 = STRING: CRITICAL: Unit: u0, Type: RAID-1, Status: RE</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">BUILDING</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">UCD-SNMP-MIB::extErrFix.1 = INTEGER: 0</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">UCD-SNMP-MIB::extErrFixCmd.1 = STRING: </div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">UCD-SNMP-MIB::ssSysContext.0 = INTEGER: 2073</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">UCD-SNMP-MIB::ssRawContexts.0 = Counter32: 11781783</div></li><li style="font-family: monospace; font-weight: normal;"><div style="font-family: monospace; font-weight: normal; font-style: normal">UCD-DLMOD-MIB::dlmodNextIndex.0 = INTEGER: 1</div></li></ol></pre></div></p> http://127.0.0.1:8080/blog/implementing-raid-monitoring-3ware-3w-9xxx-based-controller#comments 3w-9xxx 3ware didimentionihateraid linux monitoring opensource raid snmp Thu, 28 Jan 2010 18:41:42 +0000 Kris Buytaert 981 at http://127.0.0.1:8080/blog The machine that vanished. http://127.0.0.1:8080/blog/machine-vanished <p>Today I lost a machine, a physical one, I couldn't find it back in my rack anymore. One moment I was logged on to it, and when I instructed it to boot off the network again for a fresh installation I couldn't find it back anymore, it was gone.</p> <p>When you have different ad hoc build development environments, you often grab whatever hardware is available to add to your pool and hope it doesn't kick you back, time always works against you when you have to build a fresh platform from a pool of hardware ready to be reused.</p> <p>I had half a rack of hardware ready to be redeployed, the default boot order of most machines is Disk, Network so we trigger a fresh network install by overwriting the MBR. So the one machine .. after doing a quick check to see if there was nothing relevant on it anymore we sent it to the reboot pool.</p> <p>The host was supposed to boot of the network, but I didn't even see a dhcp request coming in. So off to the lab it was .. where was that machine.. none of the consoles I tried was the correct one... until I found one box.. with a really really old installation , a machine that had returned from a different office.</p> <p>And then it all came clear ... unlike all the other machines this machine had a 2 disk raid setup, which we actually weren't using , we indeed hat cleared the bootsector of the first disk, but not the second disk .. and we never had really cleared the 2nd disk. So rather than booting of the network because the first disk failed it booted of the old copy on the second disk.</p> <p>Scratching that 2nd disk solved the problem .. for once it wasn't a DNS problem, but the RAID setup wasn't really helpfull either :)</p> <p>PS. Yes re-labeling the machines is still on the todolist .. maybe next year :)</p> http://127.0.0.1:8080/blog/machine-vanished#comments automating linux raid raid sucks Fri, 15 May 2009 18:12:36 +0000 Kris Buytaert 908 at http://127.0.0.1:8080/blog Raid is obsolete http://127.0.0.1:8080/blog/node/713 <p>In a lot of environments.</p> <p><a href="http://www.mysqlperformanceblog.com/2008/08/21/rendundant-array-of-inexpensive-servers/" rel="nofollow">Peter</a> gives a nice overview why you don't always need to invest in big fat redundant hardware.</p> <p>We've tackled the topic last year <a href="http://www.krisbuytaert.be/blog/node/388">already</a> ..</p> <p>Now I often get weird looks when I dare to mention that Raid is obsolete ..people fail to hear the "in a lot of environments"</p> <p>Obviously the catch is in the second part, you won't be doing this for your small shop around the corner with just one machine. You'll only be doing this in an environment where you can work with a redundant array of inexpensive disks. Not with a server that has to sit in a remote and isolated location.</p> <p>Next to that there are situations where you will be using raid, but not for redundancy, but for disk throughput.</p> http://127.0.0.1:8080/blog/node/713#comments availability ha mysql obsolete performance raid raif rais scaling troughput Mon, 25 Aug 2008 19:02:12 +0000 Kris Buytaert 713 at http://127.0.0.1:8080/blog