Friday, August 31, 2018

Errors in Data and Fail Safe Systems


I recently watched a very funny TV show where "the IT guy" said the thing about making the system so fail safe is what if you can't shut it down? It was a funny bit, as the particular system had errors in logic that caused publication of ridiculous name selection, but they couldn't correct them because they never could shut the system down to do proper maintenance. How do we make sure that the data itself is correct and consistent across all of its storage?

Back in the day, to say you were going to test the software before it was released was pretty simple. And I do mean "back in the day".  The first programs had simple instructions to run on computers with limited computing power. Usually, the programs were not sold, but consumed by the same people who were building them.

It is different now. In order to have the power that our current applications and systems have, different groups of people create different pieces that all need to work together. We are in the era of "big data". "Data" can contain instructions that tell the application how to function. The same information can and often is stored in multiple locations, backed up in pieces to ensure it is not lost, at a minute by minute rate, and all of this really should be tested to ensure that it is correct and working.

And we need to consider, as comedy often contains kernels of truth, that in our efforts to be fail safe, we do not ensure that we cannot correct inevitable errors when they arise.