August 4, 2008

NOMAD Goes Down Again Due To Power Outage

NOMAD Status

Monday, August 4, 2008 - 5:36 P.M. Central: Due to a power outage at MSFC, there is no connectivity to the NOMAD infrastructure. Although servers are operational, there is no connectivity; therefore, users with accounts hosted at MSFC have no ability to send and receive email on desktops, laptops, Outlook Web Access (OWA), Instant Messanger (IM) or handheld devices. More info to come.

Monday, August 4, 2008 - 6:41 P.M. Central: Some NOMAD customers at various Centers are having some success connecting to the resources hosted at MSFC. There are still others close to or located at MSFC who are not able to access any of the applications hosted at MSFC. Additionally, NOMAD customers who are on the JSC NOMAD cluster are able to access their NOMAD Services and have never been impacted by the events at MSFC.

Monday, August 4, 2008 - 7:26 P.M. Central: Connectivity to all NOMAD services has been re-established. Some messages that are still in queue need to be delivered to BlackBerry and Treo users, but they will be delivered shortly. This outage was caused by an Army power failure at the Redstone Arsenal.

Editor's note: A power outage at an Army base crashes NASA's NOMAD? Every modern ISP I have ever heard of has back-up power - especially when critical services are hosted. I guess NASA went the cheap route. Does this mean that NASA has accepted an IT solution such that NOMAD-supported services can go down across the U.S. due to a single point failure in Alabama - one that the Army oversees? Who came up with this ingenious plan?


Posted by kcowing at August 4, 2008 8:11 PM
Comments

They named their system after that psychotic robot from the original Star Trek series? Good to see someone over at NASA has a sense of humor.

Posted by: I am NOMAD You are the creator at August 5, 2008 1:49 AM

I believe the answer to your question is Jonathan Pettus, the dubiously qualified NASA CIO, who never saw any IT infrastructure he didn't want to consolidate.... at Marshall.

Posted by: Joe Gurman at August 6, 2008 12:25 AM

I didn't know that NOMAD was critical. I thought it was just so I could check email from home. Anyway, backup generators are cheap given the relative cost of what this system must have been.

An IT system without backup power? I agree it doesn't make sense, or to borrow from the aforementioned Star Trek episode, non sequitur

Posted by: roger at August 6, 2008 8:41 PM

Marshall Space Flight Center resides on the US Army's Redstone Arsenal, so naturally they oversee a lot of the infrastructure. NASA is not a modern ISP that can charge its customers whatever is required to pay for backup power, it is the custodian of tax dollars with priorities. Administrative email is not a critical service, therefore I assume that in general backup power is deemed unwarranted.

I do agree that it should be possible to have a network of servers that are more fault tolerant and able to provide at least degraded service in this instance. But, again, it comes down to costs when you talk about scattering servers around the country.

I would point out that service was restored within two hours.

I am sure many things would be possible if we had a year where Congress passed an actual NASA budget (never mind funding for all of the government). As always, it comes down to how to spend the 0.6% of the Federal Budget that NASA receives when there are so many things to be done.

Posted by: How About Passing a Budget Guy at August 6, 2008 9:02 PM

If you read the original message, it was not a simple power outage. The NOMAD servers themselves were up, but not reachable. Sounds like the NOMAD servers themselves were on no-break power, but some of the network infrastructure was on regular commercial power. That smells to me like they actually thought to deal with power outages, but didn't think through the network dependency, and didn't actually perform testing to see what would happen. Not sure if that's better or worse than a complete power failure.

Posted by: M at September 13, 2008 8:11 PM
Post a comment









Remember personal info?