This paper will focus on one particular engineering disaster, that being the failure of the Mars Climate Orbiter (MCO). The Mars Climate Orbiter (Figure 1) was a space probe launched in 1998 with the mission of exploring …show more content…
Mars. However, about a year later in 1999, all communications with the probe were lost. This paper will attempt to answer the question as to why the MCO failed. In order to get to the bottom of why this disaster happened, this paper will assess a couple of reports and articles related to this mission. This includes the official project report published by NASA, the official report published by the Mishap Investigation Board as well as an article written by industry specialist, James Oberg.
Figure 1 – An artists’ rendition of the MCO
This paper will be broken down into four main sections. Section 1 will focus on the description of the project itself. Section 2 will include point by point analysis of why the orbiter failed. Section 3 will explain the disaster’s relation to engineering and the poor engineering choices that were involved. And lastly Section 4 will talk about the aftermath of the disaster and the impact on the engineering practices.
Section 1 – Mars Climate Orbiter Project Description
In an effort to further mankind’s knowledge of our solar system, NASA started the Mars Surveyor Program in 1993. The main objective of this program was to conduct mars exploration missions. The Mars Climate Orbiter (MCO) was one of the few machines designed to explore Mars. The MCO was a space probe designed to be a smaller and less expensive way to explore Mars. Its main function was to operate in conjunction with the Mars Polar Lander (MPL). Both these machines were to map Mars’ surface, profile its atmosphere, detect ice reservoirs, try to find traces of water beneath the surface and generally determine if Mars was capable of sustaining life.
This orbiter cost a total of US $125 million and was launched on December 11, 1998. The orbiter travelled for over nine months before it could start entering the Martian orbit. On September 23 1999, the orbiter began its planned insertion into Mars but a mere 4 minutes and 6 seconds later, the orbiter passed behind Mars and all radio contact was lost. The orbiter entered Mars at a lower than anticipated altitude of 57 KM as shown in Figure 2 and was unable to handle the atmospheric stresses.
Figure 2 – Estimated vs. Actual Trajectory of the MCO
Section 2 – Causes of Disaster
NASA has published a lengthy report outlining the several reasons for the failure of this probe. One reason however, is categorized to be the root cause of the disaster while there are several other contributing causes which played a role either directly or indirectly.
2.1 - Root Cause of Disaster
The NASA Mishap Investigation Board (MIB) determined the root cause of the accident to be the failure to use metric units in the coding of a ground software file. The Project Software Interface Specification (SIS) was not properly followed and one system on the orbiter reported data with imperial units of pounds-seconds instead of the required metric units of newton-seconds. Further computations with this incorrect data led to the spacecraft trajectory being off by a factor of 4.45. This lead to an incorrect calculation of the spacecraft’s trajectory. Figure 3 gives a visual look at the root cause and the end result. Figure 3 – Root cause leading to final disaster
2.2 - Contributing Causes of Disaster
However, as James Oberg, a space journalist and historian said, “Far more was at fault with the Mars Climate Orbiter than a simple mix-up in converting metric and British units” (Oberg, 1999). Support for this statement can be found in NASA’s report of this project where contributing causes to this disaster are highlighted.
1- Knowledge of Spacecraft Characteristics
A separate navigations team was brought on board shortly before the launch of the orbiter. This new team was not entirely familiar with the operations of the spacecraft and some critical information regarding the control of the MCO was not passed onto this team. Furthermore, this team was not present for the testing of the ground software nor were they present during the critical design review process of the spacecraft. This lack of knowledge meant that when something went wrong, like it actually did, the team did not know how to detect the anomaly or how to properly act on it.
2- Trajectory Correction Maneuver
In the event of an emergency, a contingency plan was in place which was to execute a Trajectory Correction Maneuver (TCM) which would raise the MCO to a safe altitude. However, at the time of emergency, the TCM was not executed. The reason why TCM was not executed is controversial to say the least. The Institute of Electrical and Electronics Engineers (IEEE) was conducting their own unofficial investigation and in an interview with an anonymous navigation expert they learned that the MCO operators originally predicted a fly by range of 150 to 180km which they stuck too and put their faith in. The navigational expert said the following:
Given expected errors in altitude targeting of about 10 km, a spread of values over a 100-km range [from 70 to 180 km] should have people screaming down the halls. This tells you that you have no idea where your spacecraft is, and therefore your trajectory has an unacceptable probability of intersecting the planet's atmosphere. To me this says 'aim high' and put another 200 km in there to be safe (Oberg, 1998) From this evidence, it can be seen that navigational misjudgment was the reason TCM was never executed. However, when Jet Propulsion Laboratory officials were asked about this, they refused to say anything.
Something else entirely was reported by NASA in their report with regards to the TCM not being executed. NASA said that:
Analysis, tests, and procedures to commit to a TCM-5 in the event of a safety issue were not completed, nor attempted. Therefore, the operations team was not prepared for such a maneuver. The criticality to perform TCM-5 was not fully understood by the spacecraft operations or operations navigation personnel (Mars Climate Orbiter Mishap, 1999, p.19).
This evidence again points to the fact of incomplete training and inefficient staff. Whichever evidence might be true, it can be reasonably concluded that there had been inadequate consideration of the entire mission and the operation as a total system which is what doomed the MCO.
3- Absence of a System Engineering Process
As stated in a report by the Mishap Investigation Board, “The Board saw strong evidence that the systems engineering team and the systems processes were inadequate on the Mars Climate Orbiter project” (Report on Project, 2000, p.20). During the operation phase of this mission, there was no mission systems engineer which led to, “lack of understanding on the part of the navigation team of essential spacecraft design characteristics and the spacecraft team understanding of the navigation challenge” (Report on Project, 2000, p.20). Therefore, when anomalies did arise, such as the orbiter sending in erroneous data due to the incorrect units, there was no systems engineer to iron out the likely cause.
4- Communications, Staffing and Training
Generally, there was a severe lack in the department of communications, staffing and training. As stated in MIB’s report, “In order to accomplish the very aggressive Mars mission, the Mars Surveyor Program agreed to significant cuts in the monetary and personnel resources available to support the Mars Climate Orbiter mission, as compared to previous projects” (Report on Project, 2000, p.17). One of the results of this the adoption of the culture of ‘Faster, Better and Cheaper’. This meant that the staffing of the operations team was less than adequate. Including the MCO there were 3 missions being executed simultaneously and that coupled with the fact that this team was not trained properly meant the doom of MCO. There was more training required for the software development team due to the lack of which they were unable to detect the unit error.
Furthermore, communications between the different teams was lackluster. The project management team failed to define some of the roles and responsibilities of the teams which lead to improper communication. As it says in the NASA report, “It was clear that the operations navigation team did not communicate their trajectory concerns effectively to the spacecraft operations team or project management. In addition, the spacecraft operations team did not understand the concerns of the operations navigation team” (Mars Climate Orbiter Mishap, 1999, p.21). The report goes on to say that due to the lack of training the operations navigation team assumed that the MCO’s hardware and software were similar to that of the Mars Global Surveyor (MGS) which lead to insufficient and flawed technical knowledge. Since the whole team was not able to communicate properly, when anomalies were observed they were not reported through the proper channels but were rather reported in an informal way which is why they were not paid much attention to. Based on the given evidence, it can be reasonably assumed that the lack of discipline within this project was one the leading causes of its failure. Section 3 - Relation to Engineering and Poor Engineering Practices Involved
As defined in the Introduction, an engineering disaster is when something is built to perform a specific function but it fails to perform that function. Since the MCO was a man-made machine and it ultimately did not achieve the goals it was built to achieve, its failure can be classified as an engineering disaster.
This mission was a disaster mainly due to a couple of poor engineering practices. One of these was the general lack of discipline in communication and when reporting problems. Communication is key in any engineering project, whether it be building a simple bus stand or a complex spacecraft. If the different teams are just working on their own without any communication then there are bound to be discrepancies and problems. Not only within the team, there has to be an even more robust communication between the teams and the project heads. The fact that problems were forwarded to the management in an informal way showed this lack of discipline which is a poor engineering practice.
Another poor engineering practice was on the side of the management team. When there is a project as complicated as this, it is crucial that proper training is held and the systems are stress tested, the lack of both of which doomed the MCO. Even if the teams know exactly what to do, if proper training is absent they will have troubles during times of emergency which is exactly what happened on this project. Furthermore, a lot of stress testing was also absent when launching this mission. This was evident with the unit error that occurred which could have been avoided if the files were properly looked into. This was evident even more when the team failed to launch the Trajectory Correction Maneuver (TCM) and as NASA stated, this was due to the fact that this system was not properly tested.
Section 4 – Impact on Engineering Practices
The loss of the MCO was a US $125 million loss. As a result of this devastating failure, the widespread culture of ‘Faster, Better, Cheaper’ at NASA was replaced with ‘Mission Success First’. Several changed were brought forth by NASA in the aftermath of this disaster.
General changes following this disaster included assigning a new senior management leader, review and augmentation of work plans and daily teleconferences to evaluate technical progress. Furthermore, an independent peer review of all operational contingency and procedures was also introduced.
Some of the team specific changes included establishing a proper systems engineering team that would always be on site. Their roles and responsibilities would be defined in the early stages of the project. Further improvements included defining a program architecture and establishing a comprehensive list of requirements early on in the formulation phase.
With regards to project management, similar changes were brought forth. These changes included explicitly detailing the roles and responsibilities of each team, having a cohesive and rigorously trained team from the start until the completion of the project and to create a proper structured process for reporting and solving problems.
In order to improve communications, changes were brought that implemented regular and frequent communication meetings involving every department. Furthermore, a change in the general workplace atmosphere was proposed, where anyone could raise a question or concern which would be pursued in an open fashion.
Conclusion
The failure of the Mars Climate Orbiter meant a huge loss of US $125 million for NASA. That coupled with the fact that federal budget for NASA was already on the decline (Figure 4) meant that NASA was under a lot of pressure and scrutiny. Figure 4 – Decline in NASA’s Budget
The goal of this paper was to discover the reasons for the failure of the Mars Climate Orbiter.
Through research and analysis quite a few causes for this disaster were discovered. One particular root cause for this disaster was the use of incorrect units in one of the systems in the spacecraft. The units reported by the orbiter came in pounds-seconds instead of the required Newton-seconds which caused a difference of 4.45.
A few contributing causes also led to this disaster. The operations navigations team lacked proper spacecraft knowledge and was implemented shortly before the spacecraft was launched. The Trajectory Correction Maneuver (TCM) was not implemented due to last minute navigational misjudgment and the lack of proper testing and training on implementing the TCM. The lack of a systems engineering team and lackluster, training, communications and staffing also added cause to this disaster.
Considering all the evidence gathered, it can safely be assumed that the team was not properly equipped or trained to handle a mission this complex. Even in the wake of this disaster however, the numerous successful NASA missions should not be forgotten. Even when projects fail, there is always something to learn from them. Engineers constantly innovate and their ambitions are set high and that is what NASA represents. This disaster served as another eye opener for NASA that pointed itself in the right direction following this
failure.