Proactive Maintenance prevents Pyrotechnics
May 11th, 2008 | by Steve O'Donnell |Way, way back in the stone age (whilst I was a student) I worked in the Edinburgh University Computer Labs keeping all of the hardware going. I had got a report of a DEC PDP 9 (drop me a mail if you know what one of these is) that kept restarting every few days all by itself. It would power down randomly and then just as strangely power itself back up again all for no apparent reason. One of the biggest pains was that when it restarted it needed an operator to program in the bootstrap loader in Octal on the front panel keys. (I have really made myself feel old now).
Well I tried everything I could to track down this problem, focussing on the power supply (that was attached to the back door of the equipment). In total frustration after hours of leaning into the computer with a voltmeter and oscilloscope, I straightened up turned around and stared to walk away for a coffee and at that second the power supply exploded. The door flew off it’s hinges, clouds of white smoke and sparks gushed from the cabinet and the fire suppression system kicked sounding the alarm that it was about to release gas into the data center. I got out of there in a hurry.
I learned a few lessons that day, not all of them relating to running the 100 yard dash in record time. I learned that electricity is dangerous and data center equipment can explode if stressed and that there can be no visible evidence of what is going wrong until it is too late. You would think that a more sensible person would have changed career!
The root cause of the demise of the PDP 9 (see picture below - and no the guy with the nerdy haircut is not me) was a bad electrical connection to a set of very large electrolytic capacitors. These always have a tendency to explode if the terminals are reversed.
So what possible bearing does this have on modern data centers? Have a look at the two pictures taken at different times of the same circuit fuses below:
They kind of look the same (mostly), except the top picture shows that the clips holding the fuses are hot (about 300 degrees Celsius) whilst the picture below shows the fuses after the clips have been repaired and replaced (these are still warm at 150 degrees Celsius but within design specs).
Want to know what happens if you don’t do the maintenance? Look at the picture at the top of this article it show a PDU exploding. Regular inspection of all electrical joints carrying heavy currents must be conducted with thermal cameras to avoid problems such as this. When buying equipment accessibility of these joints to thermal inspection is critically important.
Enough said?



















(4 votes, average: 4.75 out of 5)

