Preview

A Novel Roll-Back Mechanism for Performance Enhancement of Asynchronous Checkpointing and Recovery

Powerful Essays
Open Document
Open Document
8565 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
A Novel Roll-Back Mechanism for Performance Enhancement of Asynchronous Checkpointing and Recovery
Informatica 31 (2007) 1–13

1

A Novel Roll-Back Mechanism for Performance Enhancement of
Asynchronous Checkpointing and Recovery
Keywords: asynchronous checkpointing, recovery, maximum consistent state

In this paper, we present a high performance recovery algorithm for distributed systems in which checkpoints are taken asynchronously. It offers fast determination of the recent consistent global checkpoint (maximum consistent state) of a distributed system after the system recovers from a failure.
The main feature of the proposed recovery algorithm is that it avoids to a good extent unnecessary comparisons of checkpoints while testing for their mutual consistency. The algorithm is executed simultaneously by all participating processes, which ensures its fast execution. Moreover, we have presented an enhancement of the proposed recovery idea to put a limit on the dynamically growing lengths of the data structures used. It further reduces the number of comparisons necessary to determine a recent consistent state and thereby reducing further the time of completion of the recovery algorithm.
Finally, it is shown that the proposed algorithm offers better performance compared to some related existing works that use asynchronous checkpointing.

1

Introduction

Checkpointing and rollback-recovery are wellknown techniques for providing fault-tolerance in distributed systems [1]-[5]. The failures are basically transient in nature such as hardware error [1]. Typically, in distributed systems, all the sites save their local states, known as local checkpoints. All the local checkpoints, one from each site, collectively form a global checkpoint.
A global checkpoint is consistent if no message is sent after a checkpoint of the set and received before another checkpoint of the set [2]-[4], that is, each message recorded as received in a checkpoint should also be recorded as sent in another checkpoint. In this context, it may be mentioned that a

You May Also Find These Documents Helpful

  • Powerful Essays

    Primary hardware that must have a backup to ensure availability is the web server and the database server. In addition to having a primary and a backup of each of these two servers a replication server must also be implemented into the architecture in order for the databases on each server to mirror each other. With proper planning and implementation of this system if the primary servers have a failure there will not be any interruption of service to the customer who is accessing the…

    • 2777 Words
    • 12 Pages
    Powerful Essays
  • Good Essays

    Among them the first approach was proposed in 1984 by Chandy and Lamport, to build a possible global state of a distributed system [20]. The goal ofthis protocol is to build a consistent distributed snapshot of the distributed system. A distributed snapshot is a collection of process checkpoints (one per process), and a collection of in-flight messages (an ordered list of messages for each point to point channel). The protocol assumes ordered loss-less communication channel; for a given application, messages can be sent or received after or before a process took its checkpoint. A message from process p to process q that is sent by the application after the checkpoint of process p but received before process q checkpointed is said to be an orphan message. Orphan messages must be avoided by the protocol, because they are going to be re-generated by the application, if it were to restart in that snapshot. Similarly, a message from process p to process q that is sent by the application before the checkpoint of process p but received after the checkpoint of process q is said to be missing. That message must belong to the list of messages in channel p to q, or the snapshot is inconsistent. A snapshot that includes no orphan message, and for which all the saved channel messages are missing messages is consistent, since the application can be started from that state and pursue its computation…

    • 1211 Words
    • 5 Pages
    Good Essays
  • Satisfactory Essays

    Employees have a responsibility to do what is ethically expected and legally required, but often allow the thought of profit maximization overcome them. Doyle Brent Sheets, the one-time President of American Commercial Colleges, made an unethical decision that not only impacted his life, but the lives of many students that trusted him and relied on ACC to provide them with a quality education. Sheets falsified financial records of ACC in an effort to remain eligible for Federal Student Aid. As a proprietary college, American Commercial Colleges holds a…

    • 328 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    Cmgt 554 Week4

    • 1618 Words
    • 7 Pages

    Iniewski, K., McCrosky, C., & Minoli, D. (2008). Network infrastructure and architecture: Designing high-availability networks. Retrieved from The University of Phoenix eBook Collection database.…

    • 1618 Words
    • 7 Pages
    Powerful Essays
  • Good Essays

    There are two kinds of systems that people can utilize when setting up a network. They can use a distributed system or the other kind of system called a centralized system. In this paper we will find out what can happen as far as the failures in these systems and what if anything can be done to fix these systems when they fail.…

    • 726 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    What is the term for the process of locating and recovering information from your memory store?…

    • 329 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    M150a Review Questions

    • 2913 Words
    • 11 Pages

    Distributed system: A system consists of separate computers that exchange data and information across a network to produce results for a user.…

    • 2913 Words
    • 11 Pages
    Powerful Essays
  • Powerful Essays

    a guide to mysql ch 7

    • 1287 Words
    • 9 Pages

    Security of data, simplicity for removing extra information and the ability to better examine data.…

    • 1287 Words
    • 9 Pages
    Powerful Essays
  • Powerful Essays

    reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical,…

    • 13941 Words
    • 174 Pages
    Powerful Essays
  • Better Essays

    Website Migration Project

    • 3004 Words
    • 13 Pages

    This project aims to produce a system that will adequately address Tony’s Chips system requirements. In light of this, the system’s architecture will consider all of the system’s requirements in its design. The system’s architecture will make use of the ideally performing applications. The project aims to create a cohesive system from the many available system components by putting emphasis on application compatibility. The project also aims at creating reliable recovery solutions for the system. This will be undertaken with the aim of enhancing system recoverability.…

    • 3004 Words
    • 13 Pages
    Better Essays
  • Satisfactory Essays

    that is ,made for this memory to be stored and retrieval involves going back and getting what…

    • 824 Words
    • 4 Pages
    Satisfactory Essays
  • Better Essays

    Forgetting information from the short term memory can be explained using the theories of trace decay and displacement. In reference to the multi store model of memory the theory states that in the STM both capacity and duration are limited. The capacity of STM is about 5-9 units of information and the duration of STM is given at only a few seconds, to a maximum of a minute or so. As information cannot stay indefinitely In STM, if it is not transferred into LTM it will be forgotten. Therefore theories of forgetting in STM are based on availability. There are two main theories about how information is lost from the STM, trace decay and displacement theories.…

    • 1762 Words
    • 8 Pages
    Better Essays
  • Satisfactory Essays

    in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.…

    • 1874 Words
    • 8 Pages
    Satisfactory Essays
  • Good Essays

    methods can be placed in one of two categories: methods that help to reconstruct the past…

    • 506 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Have you ever felt like a piece of information has just vanished from memory? Or maybe you know that it's there, you just can't seem to find it. The inability to retrieve a memory is one of the most common causes of forgetting. One possible explanation retrieval failure is known as decay theory. According to this theory, a memory trace is created every time a new theory is formed. Decay theory suggests that over time, these memory traces begin to fade and disappear. If information is not retrieved and rehearsed, it will eventually be lost.…

    • 593 Words
    • 3 Pages
    Good Essays