The Efficient Way of Detecting Anomalies in Large Scale Streaming Data
Abstract
These days many companies has marketed the big data streams in numerous applications including industry, Internet of Things and telecommunication. The stream of data produced by these applications may contain the values which are not normal. These values are called as anomalies. A lot of work has been done in anomaly detection to the batch data but detecting anomalies from streaming data nevertheless remains a largely available issue. In streaming data, the tasks related to find out the anomalies has become challenging with the passage of time because of the dynamic changes in data, which are produced by different methods applied in data streaming infrastructures. In the process of anomaly detection, first of all, it is required to know the way of finding the normal behavior of data and then it is easy to know the dynamic behavior or change in the data. In this context, clustering is a very prominent technique. The application of clustering method is very common to analyze the static data but in the field of data mining, it is key a problem especially on the streaming data. In this paper, we are applying streaming version of KMeans clustering algorithm for anomaly detection. The algorithm is analyzed both on single and distributed environments. Furthermore, we are investigating the stream of data to know various factors such as accuracy, anomaly detection time, true positive rate, and false positive rate. The data stream used in our analysis is generated from Kddcup99 dataset which is largely used in the field of intrusion detection.
Copyright (c) 2018 University of Sindh, Jamshoro
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
University of Sindh Journal of Information and Communication Technology (USJICT) follows an Open Access Policy under Attribution-NonCommercial CC-BY-NC license. Researchers can copy and redistribute the material in any medium or format, for any purpose. Authors can self-archive publisher's version of the accepted article in digital repositories and archives.
Upon acceptance, the author must transfer the copyright of this manuscript to the Journal for publication on paper, on data storage media and online with distribution rights to USJICT, University of sindh, Jamshoro, Pakistan. Kindly download the copyright for below and attach as a supplimentry file during article submission