We discussed the importance of and the techniques for, designing database integration systems in Chapter 4. Similar issues arise in data sharing P2P systems.
Due to specific characteristics of P2P systems, e.g., the dynamic and autonomous nature of peers, the approaches that rely on centralized global schemas no longer apply. The main problem is to support decentralized schema mapping so that a query expressed on one peer’s schema can be reformulated to a query on another peer’s schema. The approaches which are used by P2P systems for defining and creating the mappings between peers’ schemas can be classified as follows:
1- Pairwise schema mapping,
2-mapping based on machine learning techniques,
3- common agreement mapping,
-------------------------------------------------
4-schema mapping using information retrieval (IR) techniques.
1-Pairwise Schema Mapping:
In this approach, each user defines the mapping between the local schema and the schema of any other peer that contains data that are of interest. Relying on the transitivity of the defined mappings, the system tries to extract mappings between schemas that have no defined mapping.
Piazza follows this approach :
An Example of Pairwise Schema Mapping in Piazza
The data are shared as XML documents, and each peer has a schema that defines the terminology and the structural constraints of the peer. When a new peer (with a new schema) joins the system for the first time, it maps its schema to the schema of some other peers in the system. Each mapping definition begins with an XML template that matches some path or subtree of an instance of the target schema. Elements in the template may be annotated with query expressions that bind variables to XML nodes in the source.
Active XML [Abiteboul et al., 2002, 2008b] also relies on XML documents for data sharing. The main innovation is that XML documents are active in