A. Cedillo
POS-355
Failures
In this paper I will be discussing the issue of failures in a distributed system, and to understand the different failures I will write about four failures that occur in and affect a distributed system. Also, I will be discussing and writing about how to isolate and fix two out of the four failure that can occur in the distributed system.
In a distributed system nothing is set in stone or perfect, so there are some issues that can arise, and the issues that arise are the failures that can occur in these distributed systems. The failure that can occur are Fail-Stop, Network Failure, Timing Failure, and Byzantine Failures; each of which I will discuss separately.
The first of the four failures in the distributed system is Fail-Stop and this is when a halting failure occurs with a type of notification to other components, and this can be when a network file server is in the process of telling its clients it is about to stop executing, and in the process the internal state and the contents connected to the volatile storage can be lost.
The second type of failure in a distributed system is network failure, and this can keep processors from being able to communicate with one another. One of Two problems that come up are one way link and which can lead to problems such as the processors slowing down, this can cause one processor not being able to receive messages from the other processor. The second problem that arises is Network partition and occurs when the connecting line of two sections of the network fail, and it can causes a group of two processors to be able to communicate with one another but not with another group of two processors; this can lead to the two groups of processed downloading a file in different ways leading to the file inconsistent among all processors.
The Third type of failure in a distributed system is Timing Failure, is the process or part of one that fails to meet its limit set for executing the