Data Extraction and Alignment of Search Results by Combining Tag Value Structure

Tushar Jadhav, Santosh Chobe


The databases that are available as a web based on
HTML form search interfaces have increased tremendously over the
years. For every query presented to such a Website, the results
retrieved from the corresponding databases are dynamically
embedded into the result pages for human browsing. In order to
make the embedded data to be machine-processable,which is
necessary for most applications like comparison shopping and deep
web data collection, the data extraction and relevant label
assignment should be done. A multi-annotator approach is
implemented which does the alignment of the corresponding data on
result pages into groups, and then annotates those groups from
different manner, and combines the various annotations in order to
anticipate a ultimate label for each group of data. Lastly for the
search website a wrapper is constructed for annotation
automatically. A new technique is applied to handle the case when
the search results are not adjoining which happens due to ads ,
comments etc. Also to handle nested tag structure which might be
present in the search results. The experimental results shows that
the algorithm used and implemented performs considerably well
than existing methods.

