![]() For instance, F14 CADC had built-in self-test and redundancy. In the 1970s, much work happened in the field. Again, IBM developed the first computer of this kind for NASA for guidance of Saturn V rockets, but later on BNSF, Unisys, and General Electric built their own. These needed computers with massive amounts of uptime that would fail gracefully enough with a fault to allow continued operation while relying on the fact that the computer output would be constantly monitored by humans to detect faults. Hyper-dependable computers were pioneered mostly by aircraft manufacturers, : 210 nuclear power companies, and the railroad industry in the USA. The computer is still working, as of early 2022. It could detect its own errors and fix them or bring up redundant modules as needed. This computer had a backup of memory arrays to use memory recovery methods and thus it was called the JPL Self-Testing-And-Repairing computer. NASA's first machine went into a space observatory, and their second attempt, the JSTAR computer, was used in Voyager. Most of the development in the so-called LLNM (Long Life, No Maintenance) computing was done by NASA during the 1960s, in preparation for Project Apollo and other research aspects. Eventually, they separated into three distinct categories: machines that would last a long time without any maintenance, such as the ones used on NASA space probes and satellites computers that were very dependable but required constant monitoring, such as those used to monitor and control nuclear power plants or supercollider experiments and finally, computers with a high amount of runtime which would be under heavy use, such as many of the supercomputers used by insurance companies for their probability monitoring. Several other machines were developed along this line, mostly for military use. : 155 Its basic design was magnetic drums connected via relays, with a voting method of memory error detection ( triple modular redundancy). The first known fault-tolerant computer was SAPO, built in 1951 in Czechoslovakia by Antonín Svoboda. This is similar to roll-back recovery but can be a human action if humans are present in the loop. In any case, if the consequence of a system failure is so catastrophic, the system must be able to use reversion to fall back to a safe mode. However, if the consequences of a system failure are catastrophic, or the cost of making it sufficiently reliable is very high, a better solution may be to use some form of duplication. Within the scope of an individual system, fault tolerance can be achieved by anticipating exceptional conditions and building the system to cope with them, and, in general, aiming for self-stabilization so that the system converges towards an error-free state. ![]() An example in another field is a motor vehicle designed so it will continue to be drivable if one of the tires is punctured, or a structure that is able to retain its integrity in the presence of damage due to causes such as fatigue, corrosion, manufacturing flaws, or impact. That is, the system as a whole is not stopped due to problems either in the hardware or the software. ![]() The term is most commonly used to describe computer systems designed to continue more or less fully operational with, perhaps, a reduction in throughput or an increase in response time in the event of some partial failure. Ī fault-tolerant design enables a system to continue its intended operation, possibly at a reduced level, rather than failing completely, when some part of the system fails. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation. Fault tolerance is particularly sought after in high-availability, mission-critical, or even life-critical systems. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown. JSTOR ( January 2008) ( Learn how and when to remove this template message)įault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components.Unsourced material may be challenged and removed. Please help improve this article by adding citations to reliable sources. This article needs additional citations for verification. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |