December 2, 2008
Another Planned NOMAD Blackout - Cancelled
NASA NOMAD Update: CANCELLED: Planned Outage This Weekend
"In order to support computer processing related to the Shuttle Endeavour's return to Florida, the facilities outage at Marshall Space Flight Center (MSFC) scheduled for this weekend has been cancelled. Therefore, there will not be any disruption in NOMAD services (Email, Calendaring, Webmail (Outlook Web Access - OWA), Instant Messaging (IM), BlackBerry, Treo and Windows Mobile Devices). Scheduled activities to relocate VIP and Critical user accounts from the servers at MSFC to JSC will be discontinued. VIP and Critical users mailboxes that were relocated on Monday, December 1, 2008 will be provided additional information at a later date."
NASA NOMAD Memo: Major Planned Outage Saturday December 6, 2008 - Sunday, December 7, 2008
"Many computer systems and networks will be unavailable for 43 hours due to a facilities maintenance outage necessary to support mission critical systems at Marshall Space Flight Center (MSFC). This preventive maintenance activity is being performed to avoid placing NASA's critical Space Operations Mission Directorate (SOMD) systems at an unacceptable risk. The following services will not be available to you during the weekend outage (Dec 6-7): Email, Calendaring, Instant Messaging (IM) and Webmail (Outlook Web Access - OWA) services. You will not be able to send or receive email messages using your email client on your desktop or laptop computer. Email messages sent during the time of the outage will be queued and delivered to your mailbox once the servers have been brought online."
Posted by kcowing at December 2, 2008 6:02 PM
Stunning, isn't it. My e-mail at NASA used to be reliable. I can not remember it ever going out for more than about fifteen minutes at a time. Now it goes out for hours, or even days at a time. That's progress.
Posted by: Mr Squid at November 27, 2008 1:15 PMIt's wonderful that Marshall is supporting its mission critical systems. Meanwhile, this outage will interfere with our mission critical activity at another center. Thanks NOMAD, for putting all of our email eggs in one basket, with almost no redundancy capability.
Posted by: CM at November 27, 2008 3:51 PMWow! Welcome back to the 1980's! Apparently NOMAD hasn't heard of redundant backup systems, even temporary ones. Oh, wait. Yes they have. They just don't give a crap.
Posted by: rackopinion at November 27, 2008 8:47 PMI have to say that it strikes me as utterly ridiculous that NASA, a national state-run institution has only one email server and that internal email systems are knocked out by a outage at one facility. Is it so impossible to have synchromised 'shadow' servers elsewhere that can take over if the prime system fails? Or is that too 'blue sky' thinking for the boys and girls at LBJ?
Posted by: Ben the Space Brit at November 28, 2008 5:21 AMIt's beyond time that heads start rolling over this farce. Let the firings commence.
Marshall, get your act together.
Posted by: cant.c.email@nasa.gov at November 28, 2008 12:40 PM
I second all the above comments.
I can't imagine that too many other NASA scientists find a 43-hour email outage acceptable, weekend or no weekend. People are observing over the weekend, writing proposals over the weekend, etc. and they really need to be able to communicate with other scientists in order to do their jobs.
Our old email system was completely reliable, had 5x more quota, and performed faster than NOMAD. I miss it.
It's clear that the present NASA upper management has been a diseaster. Let us hope that the transition team is talking to the NASA grunts and filing the official NASA report where it belongs; HQ bullshit.
Mel Averner
Program Manager (ret)
Fundamental Biology Program (Deceased)
Can you imagine if email from Microsoft, AOL, Gmail, Yahoo, AT&T, earthlink went off-line this way?
For all of the IT budget spent on new technology, you would think by now no large institution would have these types of issues with email.
Posted by: B at November 28, 2008 7:48 PMIt sure smells to me of the typical outages I've experience in the private sector where companies run Microsoft software for infrastructure (specifically, Exchange). In my companies, outages rarely lasted more than a few hours, but they are frequent.
It's a pity to see NASA depending on MS Exchange as well, and clearly not even doing a typical job of supporting it. Unix-based open-source solutions are (in my experience) far more reliable than Exchange.
Posted by: Supporter at November 30, 2008 2:34 AMI can appreciate the priorities here: mission critical systems are more important than most e-mail, though not mission-critical e-mail. With apparently little thought to what people (other than the Crackberry-toting classes) use e-mail for in the agency, NASA has bought into a business-class e-mail solution when an operational is needed, or is needed in addition.
I work on operational spacecraft at a NASA center, and am thankful that the powers that be have so far allowed us to keep in place our own e-mail server for mission-critical mail, including server logs. If we'd been forced to use NOMAD for those applications, we'd have no way of knowing if mission-critical systems failed over the outage period without putting eyeballs on every console.... the very old-fashioned way that no mission can now afford. (And to be fair, our flight ops team now carries Blackberries.)
Posted by: sunman42 at November 30, 2008 10:40 AMSo if we need e-mail to conduct NASA business over the weekend, we will be forced to use our personal accounts on gmail, yahoo, etc. This is IT security????
Posted by: Lankley at December 1, 2008 4:00 PMThe point that seems to missed here is that this isn't a NOMAD outage, it is a data center outage. They are taking the entire data center down for maintenance, power work, chiller work, etc. If you knew anything about the old building this data center is housed in you would want this to happen and be glad they are doing it.
I think all the complaining about NOMAD is just whining.
Posted by: msfc_drone at December 2, 2008 11:03 AM@msfc_drone, if Marshall's data center infrastructure is so decrepit, how was it a good design choice to concentrate half of NASA's email accounts there? (It was not.) Why were the data center's power and chiller problems not presented during NOMAD's Operational Readiness Review? (They were not.)
NOMAD was sold to our center's personnel as a "replacement of an aging and decentralized email architecture." This is a direct quote from the introducing email. How is the replacement of an aging, decentralized and reliable system, with an aging, centralized and (by your own admission) unreliable system a good choice? (my definition of reliability here includes fault-tolerance) Previous discussion on this subject revealed that NOMAD's design was rammed through by politically powerful groups. The other centers are paying the price for this choice in terms of reliability, and in terms of dollars. Centers are paying millions of dollars for this service each, but for sub-par reliability?
I for one was not was not criticizing the exigencies leading to this particular outage. If NOMAD has to gnaw off its foot to escape a deadly trap, then so be it, but it should be fair game to question why NASA stepped in the trap in the first place.
False alarm folks. The outage already been cancelled.
Posted by: anonymous at December 2, 2008 3:56 PMFrom: Nomad-Outreach
Sent: Tue 12/2/2008 15:53
To: Undisclosed recipients
Subject: CANCELLED: Planned Outage This Weekend
In order to support computer processing related to the Shuttle Endeavour's return to Florida, the facilities outage at Marshall Space Flight Center (MSFC) scheduled for this weekend has been cancelled. Therefore, there will not be any disruption in NOMAD services (Email, Calendaring, Webmail (Outlook Web Access - OWA), Instant Messaging (IM), BlackBerry, Treo and Windows Mobile Devices).
Scheduled activities to relocate VIP and Critical user accounts from the servers at MSFC to JSC will be discontinued. VIP and Critical users mailboxes that were relocated on Monday, December 1, 2008 will be provided additional information at a later date.
Should you have any questions regarding this activity, please contact the ODIN IT Help Desk. https://www.odin.lmit.com/nomad/nomadoutreach.html
Posted by: me at December 2, 2008 4:26 PMI used to work for a private company looking after Exchange. There we had two data centers with Exchange 5.5 servers. If there was going to be a major outage we would migrate the mailboxes from one server to another. Total downtime was the amount of time required to copy the data. Certainly less than 43 hours.
This was Exchange 5.5 almost 10 years ago. Exchange 2007, Clustered environments, SAN storage and so on means that this sort of situation should never occur.
Problem is, Government has the same money grabbing attitude as businsess. No one wants to pay out for a six sigma solution so people end up moaning about how crap the software is and how software 'x' is so much better.
Err, no it's not. Software 'x' is just as bad when you have that type of management and financial structure.
/rant
Posted by: Gary Williams at December 3, 2008 9:57 AMTherefore, there will not be any disruption in NOMAD services
I guess what they mean is there will be no unplanned disruption. Just the normal mysterious disappearances of service, sudden inability to authenticate to SMTP or IMAP service, or multi-hour-long delays.
It's unfathomable how an organization of any significant size could consider deploying an architecture that was not fault-tolerant, highly-available. It's not expensive either; certainly not as expensive as outages and business impact, or the dozens of people who must be on staff to push the red reset buttons. It ain't rocket-surgery.


