Preview

Nt1310 Unit 4 Exercise 1

Best Essays
Open Document
Open Document
1486 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Nt1310 Unit 4 Exercise 1
3. Problem Formulation
As it is evident from the related work discussed in the section 2, when small files are stored on HDFS, disk utilization is not a bottleneck. In general, small file problem occurs when memory of NameNode is highly consumed by the metadata and BlockMap of huge numbers of files. NameNode stores file system metadata in main memory and the metadata of one file takes about 250 bytes of memory. For each block by default three replicas are created and its metadata takes about 368 bytes [9]. Let the number of memory bytes that NameNode consumed by itself be denoted as α. Let the number of memory bytes that are consumed by the BlockMap be denoted as β. The size of an HDFS block is denoted as S. Further assume that there are N
…show more content…
FIndex: Local index file for set of merged small files.
Phase 4: Uploading of files to HDFS: Both of the files, local index file and merged file are written to HDFS which avoid overhead involved in keeping the information at NameNode. NameNode keeps the information of merged file and index file only. File correlations are considered when storing the files to improve the access efficiency.
Phase 5: File caching strategy: The caching strategy is used to cache local index file and correlated files. Based on the strategy, communications with HDFS are drastically reduced thus to improve the access efficiency, when downloading files. When a requested file misses in cache, the client needs to query NameNode for file metadata. According to the metadata, the client connects with appropriate DataNodes where blocks locate. When the local index file is firstly read, based on the offset and length, the requested file is split from the block, and is returned to the client.
5. Theoretical Validation Of the Proposed Technique
Suppose there are N small files, which are merged into K merged files whose lengths are denoted as LM1, LM2, …, and LMK. The computational formula of the consumed memory of NameNode in file merging and caching technique is given

You May Also Find These Documents Helpful

  • Good Essays

    Nt1330 Unit 7 Exercise 1

    • 489 Words
    • 2 Pages

    The proposed method is illustrated as a flow chart in Fig 3. First, the source node broadcast the RREQ to neighbor nodes. The neighbor nodes forward the Route Request (RREQ) to the destination node. Destination node stores the neighbor RREQ in a table and checks the RREQ. If RREQ is from the true Source node address, then the system is regular and begins to transmit test data packets otherwise repeat the process. Next, it will check the packet delivery ratio if packet delivery ratio drops to threshold then Send Bait RREQ. The source node randomly chooses one-hop neighbor node (nr) as a destination node. The Source node selects one-hop neighbor node nr as the destination node and sends the RREQ to that node. First, if nr had not launched black hole attack, then after the source node had sent the RREQ, then other nodes in addition to nr node also reply with RREP. It indicates that there is an Attacker in Reply routing. If the only nr sent RREP then…

    • 489 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    Nt1310 Unit 7 Exercise 1

    • 641 Words
    • 3 Pages

    Triple DES was designed to replace the original Data Encryption Standard (DES) algorithm, which hackers eventually learned to defeat with relative ease. At one time, Triple DES was the recommended standard and the most widely used symmetric algorithm in the industry.…

    • 641 Words
    • 3 Pages
    Powerful Essays
  • Satisfactory Essays

    a. The time in hours, minutes, and seconds is to be passed to a function named totsec().…

    • 720 Words
    • 3 Pages
    Satisfactory Essays
  • Satisfactory Essays

    To understand how to interface with the local exchange you must first understand the different networks that comprise it. The first network is the access network. The access network is the network that connects the individual user or business with the telecommunications system. It’s a critical portion of the network because it is the part of the network that reaches the end users and “connects them”. This is made up of a series of fiber-optic and copper cabling and passive and active equipment that connects you to the local exchange.…

    • 336 Words
    • 1 Page
    Satisfactory Essays
  • Satisfactory Essays

    1. BY 12/4 THE DEALER HAD NOT MADE CONTACT, INSTEAD GOT A CALL FROM MS. MELTON.…

    • 596 Words
    • 3 Pages
    Satisfactory Essays
  • Powerful Essays

    Nt1330 Unit 1 Study Guide

    • 1178 Words
    • 5 Pages

    A Database is generally used for storing data in a structured way in an efficient manner for insert, update and retrieval of data in well defined formats. On the other hand, in file system the data stored in unstructured manner with an unrelated data.…

    • 1178 Words
    • 5 Pages
    Powerful Essays
  • Good Essays

    it 260 exam 1

    • 419 Words
    • 2 Pages

    It is the File Server element that conserves disk space by eliminating duplicate copies of...…

    • 419 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    HFS+ is file system developed by apple to replace their Hierarchical file system as the primary file system used in Mac computers It is also used by IPod and it is referred to as Mac OS extended.…

    • 706 Words
    • 3 Pages
    Satisfactory Essays
  • Good Essays

    Prg 211 Array Structure

    • 788 Words
    • 4 Pages

    There is a need for revisions to the current Naming Scheme program which uses loops and input fields to develop the correct file name. The new structure uses multiple parallel arrays to gather the required data. Each variable in the array obtains one of five specific attributes of…

    • 788 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    About Hris

    • 5386 Words
    • 22 Pages

    One of the primary disadvantages of traditional file systems is the time it takes to access data. It can take minutes if not hours to locate a few files in a large paper filing system.…

    • 5386 Words
    • 22 Pages
    Powerful Essays
  • Good Essays

    [7] R. L. Collins and J. S. Plank. Assessing the performance of erasure codes in the wide-area. DSN-05: Int. Conf. on Dependable Sys. and Networks, Yokohama, 2005. [8] P. Corbett et al. Row diagonal parity for double disk failure correction. 4th Usenix Conf. on File and Storage Tech., San Francisco, 2004. [9] L. Dairaine, J. Lacan, L. Lanc´ rica, and J. Fimes. Content-access QoS e in peer-to-peer networks using a fast MDS erasure code. Comp. Comm., 28(15):1778–1790, 2005. [10] G. Feng, R. Deng, F. Bao, and J. Shen. New efficient MDS array codes for RAID part I: Reed-Solomon-like codes for tolerating three disk failures. IEEE Trans. Comp., 54(9):1071–1080, 2005. [11] G. Feng, R. Deng, F. Bao, and J. Shen. New efficient MDS array codes for RAID part II: Rabin-like codes for tolerating multiple (≥ 4) disk failures. IEEE Trans. Comp., 54(12):1473–1483, 2005. [12] S. Frolund, A. Merchant, Y. Saito, S. Spence, and A. Veitch. A decentralized algorithm for erasure-coded virtual disks. DSN-04: Int. Conf. on Dependable Sys. and Networks, Florence, 2004. [13] A. Goldberg and P. N. Yianilos. Towards an archival intermemory. ADL98: IEEE Adv. in Dig. Libr., Santa Barbara, 1998, pp. 147–156. [14] G. R. Goodson, J. J. Wylie, G. R. Ganger, and M. K. Reiter. Efficient byzantine-tolerant erasure-coded storage. DSN-04: Int. Conf. on Dependable Sys. and Networks, Florence, 2004. [15] J. L. Hafner. WEAVER Codes: Highly fault tolerant erasure codes for storage systems. FAST-2005: 4th Usenix Conf. on File and Storage Tech., San Francisco, 2005, pp. 211–224. [16] J. L. Hafner.…

    • 7154 Words
    • 29 Pages
    Good Essays
  • Better Essays

    Hadoop clusters are built with inexpensive computers. If one computer or node fails, the cluster can continue to operate without losing data or interrupting work by simply re-distributing the work to the remaining machines in the cluster. HDFS manages storage on the cluster by breaking files into small blocks and storing duplicated copies of them across the pool of nodes. The figure below illustrates how a data set is typically stored across a cluster of five nodes. In this example, the entire data set will still be available even if two of the servers have…

    • 1572 Words
    • 7 Pages
    Better Essays
  • Better Essays

    In order to achieve these goals, the working set of the application should be optimal. The way to achieve an optimal working set is via data clustering. With good data clustering more data can be accessed in fewer pages; thus a high data density rate is obtained. A higher data density results in a smaller working set as well as a better chance of cache affinity. A smaller working set results in fewer page transfers. The following sections in this paper will explain several clustering patterns/techniques for achieving better performance via cache affinity, higher data density and a smaller…

    • 1188 Words
    • 5 Pages
    Better Essays
  • Powerful Essays

    * Selecting structures, including indexes and the overall database architecture, for storing and connecting files to make retrieving related data more efficient…

    • 2835 Words
    • 12 Pages
    Powerful Essays
  • Good Essays

    Altogether, this means repetitive merges and disk access, causing degraded performance for Hadoop. Therefore, an alternative merge algorithm is critical for Hadoop to mitigate the impact of repetitive merges and extra disk accesses.…

    • 844 Words
    • 4 Pages
    Good Essays

Related Topics