I came into work late today with the intention of doing some work on our servers here that required me to shut down the office while I do it, so I was just going to stay late tonight, after-hours, to do it. Basically, I’m just installing some additional hard drives on our file sever and setting up a new RAID array, as our SCSI drives are basically full (installing a server-grade SATA card instead, as SATA drives are much cheaper and larger capacity). No sooner than ten minutes after I walk in, people starting complaining that things are freezing up (we’re on a Citrix Metaframe setup so when something freezes, it generally affects everybody, which sucks but the boss but budgets don’t allow replacement/upgrades at this point).
So I go upstairs, and the file server that I had planned on working tonight is totally unresponsive. And since many of the profiles and some of the applications (like our cludgy DOS-based reservation system) that Citrix uses are on that server, it was causing things to freak out all around. So I reboot the the file server, and find the problem: one of the two drives in the systems main RAID1 array died.
Thankfully it was in an array, otherwise we’d be screwed right now, but I don’t like sitting here with no redundancy on our reservation data. Since it’s an old server (Compaq Proliant 800 1600 — I forgot we had an old front bezel on the thing, but it is indeed a 1600), finding the same hard drive requires a trip to eBay, where I found somebody that could overnight one for me.
Meanwhile, I’m hoping and praying that nothing else dies on the system (like the boot drive) before the new one gets here on Friday. I’d almost considered making the new SATA array the boot array, but I have a feeling that’s going to be a bigger nightmare than it’s worth (I’d rather keep data and boot on separate drives anyway, which was my reason for setting this up). But if I can at least get the data onto the new array, I can sleep better tonight knowing that I have a bit of redundancy (and yes, I do have off-site backups, but I never like to dig into backups — I’d rather just have things keep running without issue).
Than after I get all that crap up and going again, I have to rebuild the database for the aforementioned reservation system, which is always loads of fun. Basically, just hit a button, and wait until it’s done (and watch as a bunch of ASCII characters scroll down the screen).
Hopefully tonight will go smoother than the last time I did major work on these servers (which are really past their prime, but our boss is too cheap to replace them because he doesn’t seem to understand how much technology is used in this office).
Update at10:30: Went mostly smooth until I tried to reboot the system once I had the new card and drives formatted and array setup and data copied: Non System Disk Error. The thing was detecting the SATA card I added in (a Promise TX4310) and trying to boot from it instead of the SCSI card already in the system. I tried re-ordering the cards, telling the BIOs that it was supposed to boot from the SCSI card, but everytime I put back in the Promise card, it was trying to read off it, and refused to read the SCSI card. Lovely. So I’m taking the Promise card out, recreating the share on the old drive, and hopefully will get this figured out in the morning. People are going to be able to function, and that’s really all that matters.
But in the meantime, does anybody have any experience with these types of add-in cards and this particular server?
Update again: Just for giggles, I looked at the original product description and reviews on NewEgg, and it looks like people had issues with this card — if you don’t upgrade the BIOS. So I’ll have to upgrade the BIOS, me thinks, for this to work properly.
Update again the next day: My other employer had an extra SCSI drive as they use the same servers, and I didn’t know they had spares, so I’ll be taking this spare and giving him my other one when I get it.