Search Engines
Web Spiders
Before search engines tell you where a file of document is, it has to be found. To find information on hundreds of millions of Web pages, a search engine used special software robots called spiders to build lists of the words found on websites. When the list is being built by spiders, it’s called Web crawling. To build and maintain a useful list of words, a search engine’s spiders have to go through a lot of sites.
The process is quite simple. When a sequence of words is typed into a search engine such as ‘Suffolk One’, the spider looks at a HTML page and takes note of two things; The words within the page, and where the words are found. Words in the title, subtitle, meta tags and other positions are noted for special consideration during a search by a user. Most spiders are built to ignore significant words on pages such as ‘a’. ‘an’ and ‘the’.
When the spider has built up an index of different pages, it builds a list of words and notes where they were found. It then builds an index of these websites by creating a system of weighting. The more times a series of words is mentioned on the website, such as ‘BBC’, the website will be higher. If a website is linked from the BBC, which will also feature higher in the search than if it was linked to a less known website. After the spider has created an index it encodes the data to a save space and stores data for users to access.
Metatags
A metatag is a special HTML tag which provides information about a Webpage. Metatags don’t affect how the page looks, unlike normal HTML tags. Instead, they provide information such as who created the page, how often it’s updated, what the page is about, and which keywords represent the page’s content.
Boolean
Boolean is a data type which has two values, usually true and false. With search engines, Boolean is used to get better search results. If ‘AND’ is in the search, it will find all the words either side