A focused crawler is typically known to return relevant web searches on a given topic when a query is fired. The requirement of a web crawler that downloads most relevant web pages from such a large web is still a major challenge in the field of Information Retrieval Systems. Earlier web crawlers used to have keyword matching techniques for retrieval of the data but there was no concern of relevancy.
If (search_query == Page[web_content]) return Page_link; //URL else return false; //No searches found!
This project gives the framework of a novel self-adaptive semantic focused crawler –with the purpose of precisely and efficiently discovering, formatting, and indexing by taking …show more content…
Keyword matching won’t give efficient data so optimizing relevant data has became a challenge for researchers.
Storage of complex & upto date information is a weighty problem & has become a matter of concern for research fraternity as well. Automatically understanding the semantics of underlying web info is also one of task set that needs to look for.
3.3. Features of SASF:
The probability of no searches found & occurrence frequency of new terminologies is at greater extent when user wants to search anything on web. It may be because of ontology serer that have limited amount of vocabulary. When crawler is unable to find the term that has been fired from user, the most obvious output expected from crawler is no results found. Henceforth the most applicative feature of SASF is updating ontology server whenever such a valid & new keyword is fired. (Fig.3.1)
Quality of ontology may be questioned because of discrepancy that exists between experts & the understanding of domain knowledge. So unsupervised learning is done. Fig.3.1. No Searches Found!
3.4. Architecture & Explanation …show more content…
5. Metadata association and ontology learning: First of all, the direct string matching process examines whether or not the contents of the metadata are included in that of a concept. If the answer is yes, then the concept and the Meta data are regarded as semantically relevant data. By means of generating metadata and its association process, the metadata can also be generated and it is stored in the mining service metadata base as well as it is being associated with the concept. If the answer is no, an algorithm-based string matching process will be invoked to check the semantic relatedness between the metadata and their concept, by means of a concept- based metadata semantic similarity algorithm. If the concept and the metadata are semantically relevant, the contents of the metadata can be regarded as anew value for the concept. The metadata is thus allowed to go through the metadata generation and association process; otherwise the metadata is regarded as semantically irrelevant to the concepts used. The above process is repeated until all the concepts in the mining service ontology have been compared with those