Quantitative Prediction of Offensiveness using Text Mining of Twitter Data

Main Article Content



Virtual communities reflect worldwide connectivity, and an enabler for real time information sharing and targeted advertising. Twitter has widely emerged as one of the extensively used micro blogging service. This is the platform to share ideas, feelings and views for any event. People have freedom to post Tweets for a particular event. The success of an event can be predicted by users’ responses. Individual interaction patterns can strongly indicate personalities. Garbage or bosh replies can harm the fidelity of an event. To make it trustworthy, we have performed sentiment analysis for the prediction of offensiveness in Tweets. We have collected data from Twitter search and stream API. Text mining techniques (preprocessing, stemming, negation rule, tokenization and stop words removal) are used for cleaning data. Our approach can predict offensiveness in Tweets effectively. We also performed comparative analysis of different machine learning classifiers, i.e., Naïve Bays (NB), Support Vector Machine (SVM) and Logistic Regression (LR) to find sentiment polarity and found that SVM outperforms others. An in-house tool, ‘Interaction Pattern Predictor’, is developed using Python programming language. Our results are trustworthy as we have used three large data dictionaries to train our developed tool.

Article Details