Reading 05 Response
From the reading, the Therac-25 is a radiation therapy device that treats patients with either a beam of electrons or x-rays. The previous versions of this device had many manual controls to allow for a technician to set everything up with their hands, but the Therac-25 switched to software control in favor of hardware. The software had been the same since the first version of the Therac, so it was considered "battle tested" and did not need to be revised, but it turned out the switch to software dependance created some problems that were initially unreproducible when operation of the machine is done by hand. The determined problems with the Therac software were a race condition in the code which would not catch a switch in modes from x-ray to electron and a variable which determines whether the hardware is configured correctly had a byte overflow which caused the variable to display that there was no error when there in fact was. The reason for these trivial errors in such an important system was that "the software appeared to have been written by a programmer with little experience coding for real-time systems. There were few comments, and no proof that any timing analysis had been performed. According to AECL, a single programmer had written the software based upon the Therac-6 and 20 code." Furthermore, AECL did not do any error testing outside of their own, which does not account for input bias, which happened to be the cause of two deaths in these scenarios.
The challenges for software developers working safety-critical systems compared to other software developers is the added pressure of having the lives of others in your code. When I wrote code for my internship I could freely create a buffer overflow or a race condition and no one would lose their life. In the worst case, the program would crash and I would have to spend hours banging my head into the keyboard while debugging said errors. They should approach these projects with extreme diligence and precision making sure to comment plenty so they know what they did upon code review, to have plenty of error catching within the code itself, write code that produces error codes to the user which truly identify the gravity of the error (unlike "Malfunction 54." Furthermore I believe that companies which write such software should have it tested by outside sources AFTER it passes their own tests. Not every test will be the same and outside companies may be able to find corner cases which you are blind to, due to being the ones who worked on it for so long. I don't believe that the software engineer him/herself should be held responsible for any errors in software on mission-critical systems. It is the responsibility of the company to make sure that such code is heavily tested and fool-proofed before it is ever sent out to be used. The software engineer in this case was seemingly inexperienced and it is the fault of the company for allowing him/her to be the only one on the project.
Comments
Post a Comment