Secure Document Similarity Detection

Secure Document Similarity Detection

Document similarity detection is very useful in many areas like copyright and plagiarism discovery. However, it is difficult to test the similarity between documents when there is no information disclosure or when privacy is a concern. This paper provides a suggested solution using two metrics that are utility and security.

Problem
Suppose that there are two parties whose concern is finding wither or not they have related or similar documents. These parties have concerns about privacy. Their target is to only discover if there is similarity among their documents without disclosing them.

Solution a. Without Privacy concerns If the parties have no concern about their privacy, then there are many ways to discover the similarities. One among is using “similarity of ranked list”. Given a document D from A entity, find a ranked list of Top 10 documents with B, which are similar to D.

b. Privacy is a concern If the two entities do not want to disclose the documents to each others, then a secure solution has to be found. Using the same utility above, “Similarity of ranked list” and using the security metrics “t-Plausibility” below is a suggested solution: Given a document D, produce D’: a generalized document using t-Plausibility. Pass D’ to party B and retrieve the ranked list of similar documents.

Analysis and testing
To measure the efficiency of the solution suggested above, the top 10 ranked list output from solution (a) is compared with the top list output from solution (b). If for a given threshold, the documents that are common on both lists are close to threshold, then we can say the solution is sufficient.

Comments and Ideas: - The more general D’ is, the less of probability that D’ was generated from D. This may cause the similarity deduction difficult - The top rated list will contain the documents in the domain of D, not the documents similar to it. On other

Secure Document Similarity Detection

You May Also Find These Documents Helpful

Pre-writing DBQ

Pre-writing DBQ

costco

costco

Ap World History Dbq Analysis

Ap World History Dbq Analysis

Lab #3

Lab #3

Ibca Spring 2013 Final Exam Study Guide

Ibca Spring 2013 Final Exam Study Guide

colonial differences New England colonies to Chesapeake Colonies

colonial differences New England colonies to Chesapeake Colonies

Unit 3 Assignment 3

Unit 3 Assignment 3

2003 Dbq Analysis

2003 Dbq Analysis

MPI Assignment

MPI Assignment

Ais Midterm

Ais Midterm

first to fight

first to fight

Plagiarism Checker Analysis

Plagiarism Checker Analysis

ChildLine Activity Cards

ChildLine Activity Cards

Comparision of Group Models

Comparision of Group Models

Research Papers in Computer Science

Research Papers in Computer Science

Related Topics