In our daily lives, we directly or indirectly use software more than we realize. When we pay at the supermarket, when withdrawing money from the bank, doctor visit, driving a car etc. Everything seems to be depending to one software or another.
Let’s think that there is a bug that breaks that software. I would like to categorize the software as “critical” and “non-critical” in this article.
If the software is not controlling a critical system, for example, it is a video decoder software such as VLC, Windows Media Player or Quicktime. Let’s imagine you found a bug that when you are opening a video file that is bigger than 1TB, the video software crashes after half an hour. This is annoying, but it will not cause damage reputation, health etc. On the other hand, If the software is critical, worse outcomes will happen when a bug that is overlooked. Think about a software that coordinates life support system. It has to provide oxygen according to the input, must calculate heart rate to the doctors. If this software gives less oxygen, or fails to warn the hospital staff about the patients critical condition due to a bug, lives can be lost.
And believe me, we have real life examples in history of software. Here are some of those:
Therac-25 Accident in 1985–1987
Therac-25 was a radiotherapy instrument for cancer patients. It is a procedure that had been using even today. Patient lies down to the machine, the operator enters the parameters on how long and how strong the beam will be. When procedure starts, x-ray photons are directed to the cancerous area, causing the tissue to get smaller and in the end, disappears.
And it seem to be working well, until at least 6 patients had unusal symptoms, injuries and 3 of them died. The high-current electron beam struck the patients with approximately 125 times the intended dose of radiation, and over a narrower area, delivering a potentially lethal dose of beta radiation. Some patients run out of treatment rooms in pain. But nobody questioned a possible fault for two years.
When the company tried to find out what went wrong, they found the fault in their software.
For the sake of not losing our focus, let’s just say software failed to prevent undesirable outcomes of race conditions. It assumed that operator knows the machine well and will not make a mistake, it didn’t take preventive measures against incorrect inputs in incorrect states.
It basically means software was not tested well. Developers relied on their code, tested it themselves. Code reviews were also done by themselves.
In the end of these poor choices, 3 lives are lost.
Ariane 5 Explosion
Ariane 5 was a launch vehicle which was developed by Arianespace for the European Space Agency.
On 4 June 1996 the first flight of the vehicle was scheduled and initiated. Only about 37seconds after initiation of the flight sequence, at an altitude of about 4 km, it suddenly rotated to sideways then broke up and exploded. It caused a loss of $370m.
This accident was also caused by a bug in the software. The issue was found to be about a value that system uses to determine if the rocket is pointing up and down. This value was represented by a 64-bit floating variable. Unfortunately, developers decided that put this 64-bit variable into a 16 bit integer. As the rocket was rising, it worked until this value was in 16 bit integer limits. After it passed this threshold (after 37 seconds) because the value overflows the variable limits, processor had an error. It caused to rocket to turn 90 degrees to the side, in the and, explode.
Again, here the quality of a software was overlooked. Apparently, this software worked well on previous devices, Ariane 4. Because it’s velocity was less, the software didn’t crash in previous tests. They just put the old software to the new device and didn’t test it properly. Fortunately, no lives were lost in this incident, but the company lost huge amounts of money.
Undetected hole in the Ozone
These days, we are getting good news about the radius of the hole in ozone layer is being reduced. Imagine what would happen if scientists didn’t discover this horrible natural disaster in time. Well, it was almost the case.
The hole in the ozone layer was not discovered until 1984.
Nimbus-7 satellite was launched by NASA to track atmospheric changes. Software on the satellitte collected data about the atmosphere but it ignored deviating numbers from the expected measurements. It could not detect changes in the ozone layer.
If it was not for the British Antarctic Survey scientists found this out in their observations and reported immediately, who knows, it could be too late to take necessary measures.
This incident also can be categorided as “design flaw”. Which also could be prevented with a good quality processes, which involves quality engineers from the beginning of the project.
Crossing the Meridian
On February 2007, six American military planes, F-22 Raptors were going from Hawaii to Okinawa, Japan, experienced computer crashes when they were passing through 180th meridian of longitude. The navigation and communication systems were lost. They were able to return to Hawaii, following the ships, partly thanks to the good weather.
As they returned back to the base, When engineers checked the problem, they realized software was crashed due to the 180th meridian crossing, because it is “International Date Line”. When you pass this meridian, one day passes. Software didn’t handle this situation.
Fortunately no lives were lost in this case.
As we can see, not well tested software can cause really big problems and cause to loss of lives, money. Consider this when next time you are performing testing activities.
Happy Testing!