Preview

Data Leakage Detection

Better Essays
Open Document
Open Document
2743 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Data Leakage Detection
International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE II, JUNE 2011]

[ISSN: 2231-4946]

Development of Data leakage Detection Using Data Allocation Strategies
Rudragouda G Patil
Dept of CSE, The Oxford College of Engg, Bangalore. patilrudrag@gmail.com Abstract-A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). If the data distributed to third parties is found in a public/private domain then finding the guilty party is a nontrivial task to distributor. Traditionally, this leakage of data is handled by water marking technique which requires modification of data. If the watermarked copy is found at some unauthorized site then distributor can claim his ownership. To overcome the disadvantages of using watermark [2], data allocation strategies are used to improve the probability of identifying guilty third parties. In this project, we implement and analyze a guilt model that detects the agents using allocation strategies without modifying the original data. The guilty agent is one who leaks a portion of distributed data. The idea is to distribute the data intelligently to agents based on sample data request and explicit data request in order to improve the chance of detecting the guilty agents. The algorithms implemented using fake objects will improve the distributor chance of detecting guilty agents. It is observed that by  minimizing the sum objective the chance of detecting guilty agents will increase. We also developed a framework for generating fake objects.  Keywords - sensitive data; fake objects; data allocation strategies; I. INTRODUCTION In the course of doing business, sometimes sensitive data must be handed over to supposedly trusted third parties. For example, a hospital may give patient records to researchers who will devise new treatments. Similarly, a company may have partnerships with other companies that require sharing customer data. We call owner of the data,



References: [1] P. Papadimitriou and H. Garcia-Molina, “Data leakage detection,” IEEE Transactions on Knowledge and Data Engineering, pages 51-63, volume 23, 2011. [2] S. Czerwinski, R. Fromm, and T. Hodes. Digital music distribution and audio watermarking. [3] L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression, 2002. [4] S. U. Nabar, B. Marthi, K. Kenthapadi, N. Mishra, and R. Motwani. Towards robustness in query auditing. In VLDB ’06. Hence, there are different allocations. In every allocation, the distributor can permute T objects and keep the same chances of guilty agent detection. The reason is that the guilt probability depends only on which agents have received the leaked objects and not on the identity of the leaked objects. Therefore, from the distributor’s perspective there are different allocations. An object allocation that satisfies requests and ignores the distributor’s objective is to give each agent a unique subset of T of size m. The s-max algorithm allocates to an agent the data record that yields the minimum increase of the maximum relative overlap among any pair of agents. The s-max algorithm is as follows. Step 1: Initialize Min_overlap ← 1, the minimum out of the maximum relative overlaps that the allocations of different objects to Step 2: for k ∈ {k | ∈ } do Initialize max_rel_ov ← 0, the maximum relative overlap between and any set that the allocation of to Step 3: for all j = 1,..., n : j = i and ∈ do Calculate absolute overlap as abs_ov ← | ∩ | + 1 Calculate relative overlap as rel_ov ← abs_ov / min ( , ) Step 4: Find maximum relative as max_rel_ov ← MAX (max_rel_ov, rel_ov) If max_rel_ov ≤ min_overlap then min_overlap ← max_rel_ov ret_k ← k Return ret_k It can be shown that algorithm s-max is optimal for the sum-objective and the max-objective in problems where M ≤ |T| and n < |T|. It is also optimal for the maxobjective if |T| ≤ M ≤ 2 |T| or all agents request data of the same size. It is observed that the relative performance of algorithm and main conclusion do not change. If p approaches to 0, it becomes easier to find guilty agents and algorithm performance converges. On the other hand, if p approaches 1, the relative differences among algorithms grow since more evidence is need to find an agent guilty. The algorithm presented implements a variety of data distribution strategies that can improve the distributor’s chances of identifying a leaker. It is shown that distributing objects judiciously can make a significant difference in identifying guilty agents, especially in cases where there is large overlap in the 200 | P a g e

You May Also Find These Documents Helpful

  • Good Essays

    Technology has rapidly advanced, affecting standards on privacy, telecommunications, and criminal law. Every day, we encounter unexpected consequences of data flows that could not have happened a few years ago.…

    • 786 Words
    • 4 Pages
    Good Essays
  • Powerful Essays

    Nt1330 Unit 1 Assignment

    • 2207 Words
    • 9 Pages

    Undoubtedly in secure DBMS, it is feasible for clients to draw gatherings from the data they get from the database. The employee working in an organization when gets certain information they may try to elaborate it or draw some important aspects related to the database from past. There are two crucial events of the finding issue, which ordinarily climb in database systems.…

    • 2207 Words
    • 9 Pages
    Powerful Essays
  • Satisfactory Essays

    However, too many organizations fail to identify the potential threats from information unintentionally leaked, freely available over the Internet, and not normally identifiable from standard log file analysis. Most critically, an attacker can passively gather this information without ever coming into direct contact with the organizations servers – thus being essentially undetectable. Very little information has been publicly discussed about arguably one of the least understood, and most significant stages of penetration testing – the process of Passive Information Gathering. This technical paper and information gathering plan reviews the processes and techniques related to the discovery of leaked information. It also includes details on both the significance of the leaked information, and steps organizations should take to halt or limit their exposure to this threat.…

    • 501 Words
    • 2 Pages
    Satisfactory Essays
  • Better Essays

    It255 Project Part 1

    • 634 Words
    • 3 Pages

    References: David Kim., and Michael G. Solomon. Fundamentals of Information Systems Security , 2012: Sudbury, MA 2012…

    • 634 Words
    • 3 Pages
    Better Essays
  • Good Essays

    Nt1310 Final Exam

    • 3599 Words
    • 15 Pages

    b. digital watermark (Incorrect. A digital watermark is used to identify proprietary data, but it does not protect privacy.)…

    • 3599 Words
    • 15 Pages
    Good Essays
  • Satisfactory Essays

    : ACC 571 - Strayer. Tags: acc 571, acc 571 strayer, acc 571 strayer tutorials, acc571, assignment 1, Assignment 1: Corporate Fraud Schemes, assignment 2, Assignment 2: Cybercrimes and Computer Security Systems, Assignment 3, Assignment 3: Fraud Schemes and Fraud Investigations, Assignment 4, Assignment 4: Asset Misappropriation and Corporate Governance, Forensic Accounting.…

    • 293 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Data classification and security requirements – what measures will be implemented to protect the three states of data…

    • 449 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Learning style? Why would one need to know a learning style? There is a way for you to find out what type of learning style you have. The VARK learning style assessment tool, developed by Neil Fleming, is an assessment that can be taken to discover ones learning style. There are several learning styles; kinesthetic style; those who learn from hands on experience, auditory learners; a learning style of memorization by retaining what is heard (Blasen, n.d., para.1), visual learners; individuals who retain best through written language such as reading and writing ("VARK Questionnaire," 2011, para. 1), and last multimodal study strategy; a learning strategy of…

    • 780 Words
    • 4 Pages
    Good Essays
  • Good Essays

    Choicepoint Attack

    • 401 Words
    • 2 Pages

    The data theft identified by the various organizations is not uniform. Some of the steps that can be implement by organizations such as:…

    • 401 Words
    • 2 Pages
    Good Essays
  • Better Essays

    Dictionary.com describes compassion as a feeling of deep sympathy and sorrow for another who is stricken by misfortune, accompanied by a strong desire to alleviate the suffering (2012). Bush describes compassion fatigue as a complex emotion that allows caregivers to hold and sustain themselves in emotional balance while holding patients’ despair in one hand and their hopefulness in the other (2009). Being able to identify the warning signs, know the physical, emotional and spiritual needs of you as a caregiver, and knowing different coping skills and ways to deal with the stressors of juggling life’s activities can help us as caregivers to keep ourselves in balance.…

    • 1208 Words
    • 5 Pages
    Better Essays
  • Good Essays

    The Revolutionary war was a series of battle between the British and the Americans. The British had a much larger army than the Americans. The Continental Army, which is the army that is fighting for america, consisted of only some of the Americans. Loyalists were those in the colonies that wanted to remain loyal to the British King, the loyalists refused to fight with the other colonists. The French became allies with the Americans because they wanted revenge for the Seven Years War. The question everyone asks is how did America win the war?…

    • 805 Words
    • 4 Pages
    Good Essays
  • Better Essays

    References: Gueldenzoph, L. E., & Snyder, M. J. (2006). In Kaliski B. S. (Ed.), Encyclopedia of business and finance; privacy and security (2nd ed.). Detroit: Macmillan Reference USA. Retrieved from http://go.galegroup.com.library.capella.edu/ps/i.do?id=GALE%7CCX1552100254&v=2.1&u=minn04804&it=r&p=GVRL&sw=w…

    • 962 Words
    • 4 Pages
    Better Essays
  • Good Essays

    One very important task in defining the needed security for a system of data is first to understand the nature of that data and how it is used in a given system. Within any given organization there is a myriad of data that can all be categorized in a different way. We can use this opportunity to discuss the sensitivity of data within our organization and then break it into appropriate classifications to be used when implementing security measures. Additionally, this process will help the organization to conform to the ISO standards the company may be subjected to, in this case, ISO/IEC code 18028. This also directly relates to certain laws that also pertain to the security of information and finally how the organization will be able to test and measure how well these security practices are implemented and followed. Lastly, we can outline here how controls can be created and implemented to enforce these requirements as well as how auditing can validate the effectiveness of these implemented controls.…

    • 1069 Words
    • 5 Pages
    Good Essays
  • Good Essays

    Phantom Anonymity Protocol

    • 36549 Words
    • 147 Pages

    Recent years, and especially this past year, have seen a notable upswing in developments toward anti-online privacy around the world, primarily in the form of draconian surveillance and censorship laws (both legislated and suggested) and ISPs being pressured into individually acting as both police and informants for various commercial interests. Once such first steps have been taken, it is of course also of huge concern how these newly created possibilities could be used outside of their originally stated bounds, and what the future of such developments may hold in store for online privacy. There are no signs of this trend being broken anytime soon. Combined with the ever growing online migration of everything in general, and privacy sensitive activities in particular (like e.g. voting, all nature of personal and interpersonal discussions, and various personal groupings), this trend will in turn unavoidably lead to a huge demand for online anonymization tools and similar means of maintaining privacy. However, if not carefully designed, such anonymization tools will, ultimately, be easy targets for additional draconian legislation and directed [il]legal pressure from big commercial and political interests. Therefore, a well-conceived, robust and theoretically secure design for such an anonymization protocol and infrastructure is needed, which is exactly what is set out to be done with this project. What is presented in this paper is the design of a protocol and complete system for anonymization, intended as a candidate for a free, open, community owned, de facto anonymization standard, aimed at improving on existing…

    • 36549 Words
    • 147 Pages
    Good Essays
  • Good Essays

    Consequently, protecting and safeguarding information has become a necessity which organizations cannot ignore. According to the central intelligence agency (CIA) (2000), to secure information effort must be made to ensure confidentiality, which is preventing disclosure of information to unauthorized individuals or systems. Also, the information must have integrity, maintenance and assuring the accuracy and consistency of data over its entire life cycle. This means that data cannot be modified unauthorized or undetected.…

    • 815 Words
    • 4 Pages
    Good Essays

Related Topics