Overview
The Porter Stemmer is a conflation Stemmer developed by Martin Porter at the University of Cambridge in 1980. The stemmer is a context sensitive suffix removal algorithm. It is the most widely used of all the stemmers and implementations in many languages are available. This native functor creates a module that exports a function which performs stemming by means of the Porter stemming algorithm. Quoting Martin Porter himself:
The Porter stemming algorithm (or 'Porter stemmer') is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems.
Algorithm
Porter's Algorithm works based on number of vowel characters, which are followed be a consonant character in the stem (Measure), must be greater than one for the rule to be applied. In details we can say that, every word (except noun) is a combination of consonant and vowel. A consonant is a letter other than A, E, I, O, U and Y preceded by a consonant. For example the in the word boy the consonants are B and Y, but in try they are T and R. A vowel is any letter that is not a consonant. A list of consonants greater than or equal to length one will be denoted by a C and a similar list of vowels by a V.Y preceded by a consonant here.
A consonant will be denoted by c, a vowel by v. ccc… is a list of consonant which will denoted by C, means sequence of one or more consonants. vvv… is a list of vowel which will denoted by V, means sequence of one or more vowel. A word may be in different length and therefore have four forms- CVCV ... C CVCV ... V VCVC ... C VCVC ... V
These may all be represented by the single form [C]VCVC ... [V]
These can be represented as [C](VC)m[V].
The superscript m in the equation, which is the measure, indicates the number of VC sequences. Square brackets