Database Technology on the Web: Query Interface Determining Algorithm for Deep Web Based on HTML Features and Hierarchical Clustering

Main Article Content

R. A. SHAIKH
I. MEMON
J. A. MAHAR
H. SHAIKH

Abstract

According to the features of Hypertext Markup Language, the interactive elements exist in the terminal of Document Object Model tree and they are close to each other in local area, we proposed a method to find web query interface which combines models and rules. In this method, after establishing tree model of Hypertext Markup Language, we locate the parts of interfaces by interaction density and cluster interactive groups by their similarity in local structure hierarchically. Then some nonquery interfaces are filtered out in the help of content-filter composed of rules. This method avoids the excessive dependence on tag “form” and presents a better performance than traditional methods in the property of accuracy and generality. And the accuracy of experiment results on common dataset TEL-8 and self-organized dataset reached respectively to 90.1% and 92%.

Article Details

How to Cite
R. A. SHAIKH, I. MEMON, J. A. MAHAR, & H. SHAIKH. (2016). Database Technology on the Web: Query Interface Determining Algorithm for Deep Web Based on HTML Features and Hierarchical Clustering. Sindh University Research Journal - SURJ (Science Series), 48(1). Retrieved from https://sujo.usindh.edu.pk/index.php/SURJ/article/view/5033
Section
Articles