Extracting Temporal Entity from Urdu Language Text

Daler Ali; Malik Muhammad Saad Missen; Muhammad Ali Memon; Muhammad Ali Nizamani; Asadullah Shaikh

Extracting Temporal Entity from Urdu Language Text

Authors

Daler Ali
Malik Muhammad Saad Missen
Muhammad Ali Memon
Muhammad Ali Nizamani
Asadullah Shaikh

Keywords:

Entity Extraction, Urdu Language Text, Dates

Abstract

The detection of temporal entities within natural language texts is an interesting information extraction
problem. Temporal entities help to estimate authorship dates, enhance information retrieval capabilities, detect and
track topics in news articles, and augment electronic news reader experience. Research has been performed on the
detection, normalization and annotation guidelines for English temporal entities. However, research for Urdu
language lags far behind and there is a need for lot of work to be done in this regard especially when huge quantity
of Urdu data is being generated on online social networks on daily basis. In this paper, we propose a rule-based
approach for temporal entity extraction for Urdu language. Comparing our approach with existing Urdu temporal
entity extraction approaches, our approach dominates on behalf of accuracy and on tackling with all types of Urdu
temporal entity types. We use a publicly available Urdu data corpus for our experiments which consists of 206 date
tags. We extend this dataset by adding 200 Urdu Fully Qualified Date (UFQD) tags. We also introduce a new date
type for Urdu language called Urdu Partially Fully Qualified. Our proposed system achieved average (0.97, 0.98
and 0.98) (Precision, Recall and F1-Measure) respectively for UFQD and Urdu Partially Fully Qualified Date.
Some challenges and issues of other date types in Urdu Textual Language i.e. Deictic and Anaphoric are also
discussed in detail.

Downloads

Published

2020-10-31

How to Cite

Daler Ali, Malik Muhammad Saad Missen, Muhammad Ali Memon, Muhammad Ali Nizamani, & Asadullah Shaikh. (2020). Extracting Temporal Entity from Urdu Language Text. University of Sindh Journal of Information and Communication Technology, 4(3), 181–188. Retrieved from https://sujo.usindh.edu.pk/index.php/USJICT/article/view/2886

Download Citation

Issue

Vol. 4 No. 3 (2020): University of Sindh Journal of Information and Communication Technology (USJICT)

Section

Articles

License

University of Sindh Journal of Information and Communication Technology (USJICT) follows an Open Access Policy under Attribution-NonCommercial CC-BY-NC license. Researchers can copy and redistribute the material in any medium or format, for any purpose. Authors can self-archive publisher's version of the accepted article in digital repositories and archives.

Upon acceptance, the author must transfer the copyright of this manuscript to the Journal for publication on paper, on data storage media and online with distribution rights to USJICT, University of sindh, Jamshoro, Pakistan. Kindly download the copyright for below and attach as a supplimentry file during article submission

Extracting Temporal Entity from Urdu Language Text

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

Information