Sunday, January 08, 2006

Life of a system engineer

We had planned replacement of a motherboard for one of the SF V440 production servers and estimated a down time of 2 hours for the activity on Saturday, 07th January 2006.




This was scheduled to start @ 14.00 hrs and expected to end @ 16.00 hrs. Deepankar & Sameer from Tata Infotech had been deputed to handle this activity.

However since application processes were running on the server, we could not bring down the server till 15.30 hrs. After replacing the motherboard, we were not able to connect to the 3300 storage box and trouble started.
We escalated the case to Sun Microsystems in Bangalore and they said that we would have connected a non-standard storage box to the server. However, this is a Sun 3310 storage box and it was working perfectly fine before replacement of the motherboard.
Started troubleshooting - swapped the slot for the card to eliminate a bad slot - wouldn't work. Tried putting back the old board and it wouldn't work too ! By this time, I had an idea that this was not going to end on Saturday, so I called up home and asked them not to wait for dinner. Luckily, I had also not fixed up any plans with my friends for a weekend meet, so was at peace to continue working over-night.
We called up Sun again to arrange for a replacement of the SCSI controller card. They said that the card would be dispatched immediately and would reach us before 05.00 AM on Sunday, so we decided to break for dinner and come back to the office. Had a quick bite at Status and returned back to office.

[To be continued]

No comments: