Preview

Failures In A Distributed System

Satisfactory Essays
Open Document
Open Document
376 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Failures In A Distributed System
Failures in a Distributed System
Robert Martinez
POS 355
May 12, 2014
William Davis

Failures in a Distributed System
A distributed system is a series of individual computers that appear to work as a single unit to its users. These systems share processing power, memory, and hard drive space. While this type of system is very efficient it does have its problems.
The four categories of failures that occur in a distributed system are Hardware failures, Omission failures, Operating System failures or Crash, and Byzantine failures. Failures are often confused with faults. Faults are defined by Paul Krzyzanowski in his paper titled Fault Tolerance, Dealing with an imperfect world as “a deviation from the expected behavior of a system: a malfunction.” Paul Krzyzanowski also list three types of faults, Transient, Intermittent and Permanent. Transient faults occur once, such as when sending a message that doesn’t reach its destination and has to be resent. Intermittent faults are reoccurring faults or faults that continually appear then disappear. Permanent faults are persistent leading to the replacement of the faulty component.
Hardware failures are the failure of a component in a system. These failures were very common, but with changes in design and how components are manufactured these failures are becoming fewer and fewer. Most hardware failures occur at network connections or hard drives. Distributed systems use an array of servers, and backup drives just in case there is a failure of a component.
Redhat characterizes omission failures as “a component that does not respond to an input from another component, and thereby fails by not producing the expected output.” Most users recognize omission failures as a failure to send or receive a message. Distributed systems handle omission failures with measures such as the acknowledgment or ACK response in a reliable end-to-end transmission. If the sender does not receive the ACK response the transmission is resent.



References: Krzyzanowski, P. (2009, April). Fault Tolerance, Dealing with an imperfect world. Retrieved from https://www.cs.rutgers.edu/~pxk/rutgers/notes/content/ft.html Wulf, J. (2013, October 31). JBoss Enterprise SOA Platform 4.2. Retrieved from https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_SOA_Platform/4.2/html/SOA_ESB_Programmers_Guide/SOA_ESB_Programmers_Guide-_Fault_tolerance_and_Reliability_-_Failure_classification_.html

You May Also Find These Documents Helpful

  • Powerful Essays

    Primary hardware that must have a backup to ensure availability is the web server and the database server. In addition to having a primary and a backup of each of these two servers a replication server must also be implemented into the architecture in order for the databases on each server to mirror each other. With proper planning and implementation of this system if the primary servers have a failure there will not be any interruption of service to the customer who is accessing the…

    • 2777 Words
    • 12 Pages
    Powerful Essays
  • Satisfactory Essays

    Unit 10 Assignment 1

    • 256 Words
    • 1 Page

    First of all I would recommend that you use Raid-1 as your fault-tolerant hardware. RAID-1 is most often deployed with two disks. The disks are mirrored – providing fault tolerance. Read performance is increased while write performance will be similar to a single disk – if not less. A single disk failure can be sustained without data loss. RAID-1 is often used when fault tolerance is key and there isn’t an exceptional space or performance requirement.…

    • 256 Words
    • 1 Page
    Satisfactory Essays
  • Good Essays

    Designing a fault-tolerant system can be done at different levels of the software stack. We call general purpose the approaches that detect and correct the failures at a given level of that stack, masking them entirely to the higher levels (and ultimately to the end-user, who eventually see a correct result, despite the occurrence of failures). General-purpose approaches can target specific types of failures (e.g. message loss, or message corruption), and let other types of failures hit higher levels of the software stack. In this section, we discuss a set of well-known and recently developed protocols to provide general-purpose fault tolerance for a large set of failure types, at different levels of the software stack, but always below the…

    • 1211 Words
    • 5 Pages
    Good Essays
  • Satisfactory Essays

    Nt1310 Unit 2 Essay

    • 418 Words
    • 2 Pages

    At this layer error recovery is also performed. The following are examples TCP, UDP, and SPX.…

    • 418 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    3. Network failures: is when some part of the overall network has a break. It can be a small link in the overall chain the cause the network to fail.…

    • 726 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    M150a Review Questions

    • 2913 Words
    • 11 Pages

    Distributed system: A system consists of separate computers that exchange data and information across a network to produce results for a user.…

    • 2913 Words
    • 11 Pages
    Powerful Essays
  • Satisfactory Essays

    Unit 8 Assignment

    • 380 Words
    • 2 Pages

    c. ____________________ is a fault-tolerance technique in which one device or component duplicates the activities of another.…

    • 380 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Ittnt2670 Lesson 1

    • 489 Words
    • 2 Pages

    The feature that enhances fault tolerance by providing multiple data paths to a single server storage device is called _________.…

    • 489 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    Task 3

    • 4254 Words
    • 17 Pages

    A computer that has full system failure, not allowing the technical support team to reach the computer remotely to provide technical support.…

    • 4254 Words
    • 17 Pages
    Powerful Essays
  • Good Essays

    Nt1310 Unit 9

    • 474 Words
    • 2 Pages

    8. Linear topology – One linear data line that connects to LAN. If one computer crashes all of them crash also.…

    • 474 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    According to the ________ definition of organizations, an organization is seen as a means by which primary production factors are transformed into outputs consumed by the environment.…

    • 341 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Laura DiDio, (Feb 2008). 2008 Server OS Reliability Survey, Institute for Advanced Professional Studies. Retrieved July 21, 2008 from http://www.iaps.com/2008-server-reliability-survey.html…

    • 852 Words
    • 4 Pages
    Good Essays
  • Better Essays

    All equipment may be subjected to fail or having technical difficulties. However, an organization should have a plan in existence, if such occurrence should happen. There are…

    • 921 Words
    • 4 Pages
    Better Essays
  • Good Essays

    An information system that more specifically is “a network architecture in which each computer or process on the network is either a client or a server Servers are powerful computers or processes dedicated to managing disk drives (file servers), printers (print servers), or network traffic (network servers). Clients are PCs or workstations on which users run applications. Clients rely on servers for resources, such as files, devices, and even processing power.” (Webopedia, 2014)…

    • 753 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Google Files Systems

    • 1348 Words
    • 6 Pages

    Component failures are the norm Inexpensive commodity hardware Large files Files mutated with appends Workload typically large streaming reads and appends…

    • 1348 Words
    • 6 Pages
    Powerful Essays