raid | Everything is a Freaking DNS problem

Jan 28 2010

Implementing Raid Monitoring on a 3Ware 3w-9xxx based controller.

By: Kris Buytaert Tags:

When you pull out a disk from your Raid setup it shows a warning in syslog

Jan 27 10:18:22 EL860 kernel: 3w-9xxx: scsi0: AEN: WARNING (0x04:0x0019): Drive 
removed:port=1.
Jan 27 10:18:22 EL860 kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0002): Degraded
 unit:unit=0, port=1.

However if no one is looking at syslog that won't really be helpfull.

3Ware provides a tool from their site called tw_cli which can be used to manage
the raid setup from the command line.

[EL860-root@EL860 admin]# tw_cli /c0 show 
 
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    REBUILDING     41%     -       -       232.82    RiW    ON     
 
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   232.88 GB SATA  0   -            ST3250310NS         
p1    DEGRADED       u0   232.88 GB SATA  1   -            ST3250310NS

I'd figure I'd either have to write wrapper script around that or find some other way of integrating it.
Asking the question on ##infra-talk on irc.freenode.net gave me the following link to a check script on github

koollman: sdog: something like http://github.com/stanaka/check_tw should work.

With that in your snmpd.conf you can get the info via snmp

[root snmp]#  snmpwalk  localhost   -v 2c  -c public  .1.3.6.1.4.1.2
021   | grep ext 
UCD-SNMP-MIB::extIndex.1 = INTEGER: 1
UCD-SNMP-MIB::extNames.1 = STRING: TW_RAID
UCD-SNMP-MIB::extCommand.1 = STRING: /usr/local/sbin/check_tw
UCD-SNMP-MIB::extResult.1 = INTEGER: 2
UCD-SNMP-MIB::extOutput.1 = STRING: CRITICAL: Unit: u0, Type: RAID-1, Status: RE
BUILDING
UCD-SNMP-MIB::extErrFix.1 = INTEGER: 0
UCD-SNMP-MIB::extErrFixCmd.1 = STRING: 
UCD-SNMP-MIB::ssSysContext.0 = INTEGER: 2073
UCD-SNMP-MIB::ssRawContexts.0 = Counter32: 11781783
UCD-DLMOD-MIB::dlmodNextIndex.0 = INTEGER: 1

»

May 15 2009

The machine that vanished.

By: Kris Buytaert Tags:

Today I lost a machine, a physical one, I couldn't find it back in my rack anymore. One moment I was logged on to it, and when I instructed it to boot off the network again for a fresh installation I couldn't find it back anymore, it was gone.

When you have different ad hoc build development environments, you often grab whatever hardware is available to add to your pool and hope it doesn't kick you back, time always works against you when you have to build a fresh platform from a pool of hardware ready to be reused.

I had half a rack of hardware ready to be redeployed, the default boot order of most machines is Disk, Network so we trigger a fresh network install by overwriting the MBR. So the one machine .. after doing a quick check to see if there was nothing relevant on it anymore we sent it to the reboot pool.

The host was supposed to boot of the network, but I didn't even see a dhcp request coming in. So off to the lab it was .. where was that machine.. none of the consoles I tried was the correct one... until I found one box.. with a really really old installation , a machine that had returned from a different office.

And then it all came clear ... unlike all the other machines this machine had a 2 disk raid setup, which we actually weren't using , we indeed hat cleared the bootsector of the first disk, but not the second disk .. and we never had really cleared the 2nd disk. So rather than booting of the network because the first disk failed it booted of the old copy on the second disk.

Scratching that 2nd disk solved the problem .. for once it wasn't a DNS problem, but the RAID setup wasn't really helpfull either :)

PS. Yes re-labeling the machines is still on the todolist .. maybe next year :)

»

Aug 25 2008

Raid is obsolete

By: Kris Buytaert Tags:

In a lot of environments.

Peter gives a nice overview why you don't always need to invest in big fat redundant hardware.

We've tackled the topic last year already ..

Now I often get weird looks when I dare to mention that Raid is obsolete ..people fail to hear the "in a lot of environments"

Obviously the catch is in the second part, you won't be doing this for your small shop around the corner with just one machine. You'll only be doing this in an environment where you can work with a redundant array of inexpensive disks. Not with a server that has to sit in a remote and isolated location.

Next to that there are situations where you will be using raid, but not for redundancy, but for disk throughput.

»