Sindhi Stemmer for Information Retrieval System Using Rule-Based Stripping Approach

Main Article Content

M. R. SHAH
H. SHAIKH
J. A. MAHAR
S. A. MAHAR

Abstract

For last few years, huge amount of information in Sindhi language has made available online in the form of e-data. The
information retrieval system is used to ensure easy and efficient access to the stored information. Stemmer is the tool, which
information retrieval system uses to decrease morphological variants of a word to its root or stem. As yet no any information
retrieval system and stemmer for Sindhi language is available, hence, the access to data resources is not possible. In this paper, an
algorithm is proposed applying rule based approach. The proposed algorithm depends upon our developed lexicon and linguistic
rules. The 5327 words are incorporated in the lexicon among them 2142 words are those having prefix or suffix word morphemes.
A number of 38 rules included the repository. The performance of the prefix, suffix and combined prefix-suffix words of Sindhi
language is separately calculated. The cumulative performance accuracy of 84.85% is calculated using developed stemmer. The
outcome of this stemmer will be beneficial for the developers of automatic Sindhi information retrieval system.

Article Details

How to Cite
M. R. SHAH, H. SHAIKH, J. A. MAHAR, & S. A. MAHAR. (2016). Sindhi Stemmer for Information Retrieval System Using Rule-Based Stripping Approach. Sindh University Research Journal - SURJ (Science Series), 48(4). Retrieved from https://sujo.usindh.edu.pk/index.php/SURJ/article/view/4729
Section
Articles