Analysis and Comparative Study of POS Tagging Techniques for National (Urdu) Language and other Regional Languages of Pakistan

Rahmat Ali Rajper
Samina Rajper
Abdullah Maitlo
Ghulam Nabi


Defining algorithms and techniques to enable computers to understand human language is the Natural Language Processing (NLP), which is an integral part of speech recognition. Parts of Speech (POS) is considered as one of the well understood problems of Natural Language Processing, in which natural language words and sentence are tagged or assigned grammatical classes, because tagging a single word by human hand is a time consuming and tedious job. To automate the tagging job is the way to automate the lexicons of the text of a language. Many of the languages are enriched with their POS tagging systems. Pakistani regional languages are less developed due to the many reasons and much of the work is needed in POS tagging system. Some of the regional languages have their POS tagging systems but still they need some more attention to refine their system. Some of the languages need to develop from the scratch. Balochi language has no any POS tagging system. This study presents the comparative analysis of POS tagging approaches for the national language (Urdu) and other regional languages of Pakistan. The approaches and their data sets used and their reported results are presented here.

