N.M..Revathi, G.P.Shanthi, Elanchezhiyan.K, T V Geetha, Ranjani Parthasarathi & Madhan Karky Tamil Computing Lab (TaCoLa), College of Engineering Guindy, Anna University, Chennai. haisweety18@gmail.com, jijutodo@gmail.com, madhankarky@gmail.com
Abstract
Tamil is slowly becoming the online language and mobile text messaging languages for many Tamils around the world. Social networks and mobile platforms now extensively support Unicode and applications for keying Tamil text. The number of characters in a text message is limited in some social nets and mobile text messages. The need for compacting the text becomes essential as it translates to saving online storage space, cost and many more factors. The paper proposes a text compaction system for Tamil, a first of its kind in Tamil. The system proposed in this paper handles common Tamil words, acronyms/abbreviations and numbers. Morphological analyzer [1] and Morphological generator are used to stem inflexion words and replace them to compact using a mapping repository. The proposed work is tested with over 10,000 words and it is found that the final result is reduced to 40% of the original text. The paper concludes by discussing possible extensions to this system.
1. Introduction:
In all languages, using compact or short form of words in text messages, emails, and blogs is rapidly increasing. It is particularly popularly amongst young urbanities as it allows for voiceless communication, useful in noisy environment that would defeat a voice conversation and also buffered communication since the message the sender wants to convey can be accessed by the receiver at any time. Compacting text is thus necessary because of limited message length in blog sites and tiny user interface of mobile phone. Getting the shortest word has no rule and it is mainly aimed at understanding. That is, those words should be understood by everyone. We can obtain the compact words by omitting letters, replacing prefix and suffix of through suitable symbols and numbers. This causes the compacted system to be credited with creating a language. The paper proposes a Text Compaction system for Tamil, the primogenital in Tamil..
2. Background:
Tamil is perhaps the only classical language, whose glorious literatures date back to the pre-Christian era, has remained in continuous use for more than many millennia now. Due to the untiring efforts of scholars, researches and enthusiasts, it has also evolved creatively over the years to the extent that it is also used today profusely in computers, internet, mobile phone etc. Diverse creative efforts have been taking place that would pave the way for a quantum jump in the usage of Tamil in Information Technology. “Tamil Virtual University”, “Centre for Research and Applications of Tamil in Internet”,
267
“Tamil Software Development Fund” is to quote a few. These efforts paved the way for the motivation of proposing Tamil compaction system in Tamil. Many compaction systems have been developed for English and other languages. Lee Ming Fung in [2] proposed a Short form Identification and Categorization model based on maximum entropy to identify short forms from actual words and acronyms/abbreviations and categorize the short forms into the short forms formed from letter omission and those formed through phonetic substitution of parts of words. In the proposed system the compact words are formed in a diverse variety of ways such as omission, truncation and phonetic substitution. Acronym Identification and detection has been much researched. Acrophile in [3] automatically searches acronyms from acronym-expansion pairs from domain specific databases. By acronyms expansion pairs, we refer to a pairs each containing acronyms and their full expanded form or meaning. The paper makes use of acronym expansion pairs to replace the full expanded form with the acronyms.
3. Text Compaction Framework:
The figure below presents the various components of the framework.
3.1 Input Processing The input text is tokenized based on a delimiter and is passed on to the Morphological Analyzer. The analyzer removes the suffix (if present) added to the word and delivers the root word (RW). For example if the input to the analyzer is கணி ெபாறியி 3.2 Identification of the type The proposed paper handles three categories of words; common Tamil words, Abbreviations /acronyms, numbers. Now, the category to which the RW belongs is to be identified. The RW is checked to decide the category of abbreviations /acronyms. This is done by comparing the root word with the keys of the hash map (2.3). If the comparison results are true then the RW is considered as the abnormal word (AW) i.e. it belongs to the category of acronyms/abbreviations, else, it is treated as the normal word (NW) i.e. it belongs to either the first or third category. the output is given as கணி ெபாறி.
268
3.3 Extraction of the compact word If the word is identified as a normal word, it is passed to a tree which is built dynamically from the set of words that has already been stored in the dictionary. The NW is then searched in the binary search tree. On finding the NW in the binary search tree, the compact word is retrieved with an efficient mapping algorithm that maps each of the normal word with its compact word. Say suppose the word is an abnormal word, its compact word is retrieved in the following manner. A linked hash map is built for all the abbreviated words. The hash map uses the first word the abbreviated word as its key. Again with the help of an efficient mapping algorithm, the compact word is retrieved. In case the NW is a number name it is replaced with the numerals based on the place value system. 3.4 Output Processing The compact word that is being extracted is passed on the Tamil tool Morphological Generator to add the suitable suffix to cater to the rules of the language.
4. Results and Analysis:
The paper proposes the following layout for displaying the results to the user. It has two text areas: the one on the left is for entering the input text and the other on the right for displaying the output. The user can also view the no of characters that have been reduced in the output text.
Efficiency of the system can be calculated as (no of characters in the input text / no of characters in the output text) X 100%. The proposed work is tested with over 10,000 words and it is found that the final result is reduced to 40% of the original text.
269
5. Conclusion and Future work:
The paper describes the Tamil Compaction System, a framework for shrinking the text such that its meaning remains the same. Different subsystems and components of the framework are described in detail. Results from the implementation of this Tamil compaction system framework is provided and is compared against the compacting third party applications of social networking sites that are implemented for English language. Improving the mapping for words which are frequently used, conceptual reducing, integrating numerical analyser will take this system to its next level.
References:
Anandan, R. Parthasarathi, and T.V. Geetha, Morphological Analyser for Tamil. ICON 2002, 2002. Fung, L. M. (2005). SMS short form identification and codec. Unpublished master’s thesis, National University of Singapore, Singapore Acrophile (LSLarkey, P Ogilvie, MA Price, B Tamilio, 2000) a system that automatically searches acronym expansion pairs. Short Message Service (SMS) Texting Symbols: A Functional Analysis of 10,000 Cellular Phone Text Messages by Robert E. Beasley, Franklin College.
270
References: Anandan, R. Parthasarathi, and T.V. Geetha, Morphological Analyser for Tamil. ICON 2002, 2002. Fung, L. M. (2005). SMS short form identification and codec. Unpublished master’s thesis, National University of Singapore, Singapore Acrophile (LSLarkey, P Ogilvie, MA Price, B Tamilio, 2000) a system that automatically searches acronym expansion pairs. Short Message Service (SMS) Texting Symbols: A Functional Analysis of 10,000 Cellular Phone Text Messages by Robert E. Beasley, Franklin College. 270
You May Also Find These Documents Helpful
-
Cited: Tim Nott. (2008, August). Words and pictures. Personal Computer World. Retrieved September 17, 2010, from ProQuest Computing. (Document ID: 1495329211).…
- 289 Words
- 2 Pages
Good Essays -
Standardizing words: Sometimes words are not in proper formats. Simple rules and regular expressions can help solve these cases.…
- 522 Words
- 3 Pages
Good Essays -
Considering that we are currently in the Net Generation and electronic communication has substantially taken over a lot of the linguistic communication, as well as electronic communication shortcuts have also become quite popular, such as textism and instant messaging. The research was conducted to verify if the current net communication shortcuts have an effect on the quality of writing.…
- 505 Words
- 3 Pages
Satisfactory Essays -
Part 4: How does context change the way we text? Do we text different people in different ways? Talk about features of text language – does the use of a feature depend on the audience and purpose of the message? Do you feel that you adapt your way of texting for specific purposes?…
- 371 Words
- 2 Pages
Satisfactory Essays -
Text language has evolved rapidly over recent years with trillions of text messages sent each year. Until recently, text messages were relatively expensive to send and so users have developed various techniques to reduce the number of characters per text to ensure they are paying as little as possible. This report will investigate the effects of these devices with text messages as well as trying establish whether there is a link between the way people text and they way they speak. Finally, I will also explore some of the public attitudes to texting.…
- 1130 Words
- 5 Pages
Powerful Essays -
With my heart beating out of my chest, the only thing I could think of was that I did not want to die in Mexico. It was a warm sunny day as we started our ATV adventure outside of the comforts and security of the resort walls. There were 6 of us and we planned to take turns driving. When it was our turn, we could go anywhere we could get the machine. The rental guy was nice enough to loan us his personal iPod, as the machine had a stereo. The iPod was filled with Mexican music of all sorts. How fitting, we realized. What would a Mexican adventure be without the music?…
- 918 Words
- 4 Pages
Good Essays -
In social media, we all can admit that it takes a different direction when coming to language in its event that will change the way people using the social media to use and recognize the language, sentence structure and punctuation in a different way. This particular essay 's purpose will seize and analyze − also interpret it in my own way − the…
- 1346 Words
- 6 Pages
Good Essays -
Over the past ten to fifteen years, there has been a major change in the way people communicate to each other due to the development of the internet. Because of this, there has been a massive effect on the amount of socialising between friends and family; using technology in online social messaging websites such as Bebo, MSN, Facebook and many more. Through these social messaging websites, more and more people can interact with their friends and family all over the world. An expert professor David Crystal has supported the idea of web-based messages and disagrees with the view that slang and contractions leads to a lower English standard of language. Although this is a benefit to most people, it has been abused by some people by overusing it and different ‘language’ while typing, which cause differentiation in the way we speak and the way we write. The current views are pointing at the fact that this is causing a pejoration in this generation’s language which could badly affect their and our future; destroying the conventions of Standard English. This essay will evaluate the similarities and differences between spoken language and web-based messaging such Facebook and many more.…
- 1036 Words
- 5 Pages
Better Essays -
By shortening words are we working faster or just being bone idle? The online communications through social networking sites such as Yahoo messenger, Msn messenger, Skype, Facebook and IM chat have revolutionized the way communicate and causing rapid change in linguistics. The use of these sites threatens the education system and the appropriate prescriptive grammar.…
- 2116 Words
- 9 Pages
Powerful Essays -
With technology rife in today’s society are the boundaries between spoken and written language becoming ever nearer? It seems that young children, teenagers, adults and even the elderly are all turning to mobile devices as an aid of communication. The frequent use of texting has brought about new features, such as clipping, that are unique to the texting world, this is thought to be putting a strain on our abilities to use correct Standard English. In this essay I will explore the variety of language and text specific features used within text messaging. I will also analyse the various attitudes towards texting and finally give my own opinion.…
- 1352 Words
- 6 Pages
Better Essays -
The past few days were a surreal blur. Our beloved king was murdered and replaced by the seemingly innocent Macbeth. Several days later, the news of Banquo’s death reached the castle. It can’t be a coincidence. I have been living at this castle for exactly four months now after my father decided I should finally find a husband achieve something for the family. I have no intention of marrying, the mere thought of having to spend the rest of my life with someone that sees me as his object makes me ill. But I still have to put on a smile and act pleasantly. The life at the castle is tough. Back home I was able to read and go outside, here it is all about social gatherings. I envy my younger sisters. They still have time until they are shipped off like a pricy cow.…
- 565 Words
- 3 Pages
Good Essays -
Texting has rapidly become one of the most popular ways of communication in the modern day, with the language and general rules of texting easy to learn. I will be looking at a collection of personal texts in order to gain a better understanding of the situations certain devices are used, and by what kinds of people use specific devices. Contrary to belief, texting actually supports the rules of language, Grices Maxims are embedded in texts, for example the maxim of quantity, referring to the message being as long as needs be and not waffling on. Texting supports this maxim greatly, as it’s the quickest and most to the point means of communication.…
- 1089 Words
- 5 Pages
Good Essays -
Texting is constantly changing our language. It is a relatively new worldwide phenomenon that is an example of language in evolution. The use of abbreviations, digits and the general absence of any vowels has changed the way we can communicate with people and how we use the written word by mobile phone.There are critics however such as author John Humphrys who wrote I h8 texting he believes that texting is ruining our language and that it makes people lazy with how they write. By exploring and comparing two differently opinionated pieces and conducting a survey of randomly chosen people think, will give us an overview of how texting has changed our language and if people truly believe it has changed the way we communicate.Less than a decade…
- 401 Words
- 2 Pages
Satisfactory Essays -
Modern technological advances and changes are causing the English language to deteriorate or degrade rapidly. The recent expansion of e-mail, chat room, and messaging communication has quickly left harmful effects on the way people communicate to one another. Communications between users occur at a fast pace since they are attempting to keep up with the incoming information in real time. The feeling of anxiety to receive, process, and react to a message causes the person to respond with brief and shorten words. Shorter phrases are usually preferred as well as words because they are easier to spell and less time consuming. In the article “SOS: Written English in Trouble” Joyce Lynn Garrett states, “text speak, emoticons, and the more casual language of e-mail have found their way into everyday writing” (8). In other words, e-mails, messaging, and chatting have…
- 1487 Words
- 6 Pages
Powerful Essays -
Innovation have made possible for the operations of the computer easy enough in processing record systems such as, creation of data records, storing, filing and retrieval of data. Short Message service (SMS) is a text messaging service component of phone, web, or mobile communication systems. It uses standardized communication protocols to allow fixed line or mobile phone services to exchange short text messages.…
- 867 Words
- 3 Pages
Good Essays