An Experimental evaluation of Adaptive Real Time Web Crawler
Abstract
vague amount of information arranged in multiple servers.
The mere size of this collection is a daunting obstacle in
getting necessary and relevant information. This is where
search engines come into view which strives to retrieve
relevant information and serve it to the user. A Web Crawler is
one of the basic blocks of search engines. It is a program
which browses the World Wide Web for the purpose of Web
indexing and storing the data in a database for further analysis
and arrangement of the data. This paper is being aimed to
create an adaptive real time web crawler (ARTWC) which
retrieves the web links from a dataset and then achieves fast
in-site searching by extracting most relevant links with a
flexible and dynamic link re-ranking scheme. Our system
deduces that it is more effective than existing baseline
crawlers along with an increased coverage.
Full Text:
PDFReferences
Feng Zhao, Jingyu Zhou, Chang Nie, Heqing Huang, Hai
Jin, "Smart Crawler: A Two-stage Crawler for Efficiently
Harvesting Deep-Web Interfaces", IEEE Transactions on
Services Computing, vol.99, 2015.
J. Cho and H. Garcia-Molina, "Parallel crawlers". In
Proceedings of the Eleventh International World Wide
Web Conference, pp. 124 - 135, 2002.
A. Heydon and M. Najork, "Mercator: A scalable,
extensible web crawler", World Wide Web, vol. 2, no. 4,
pp. 219-229, 1999.
D. Fetterly, M. Manasse, M. Najork, and J. Wiener, "A
large-scale study of the evolution of web pages", In
proceedings of the twelfth international conference on
World Wide Web, Budapest, Hungary, pp. 669-678.
ACM Press, 2003.
O. Papapetrou and G. Samaras, "Minimizing the
Network Distance in Distributed Web Crawling",
International Conference on Cooperative Information
Systems, pp. 581- 596, 2004.
J Cho, H. G. Molina, Lawrence Page, "Efficient
Crawling Through URL Ordering", Computer Networks
and ISDN Systems, vol. 30, no. (1-7), pp. 161-172, 1998.
J. Cho and H. G. Molina, "The Evolution of the Web and
Implications for an incremental Crawler", In Proceedings
of 26th International Conference on Very Large
Databases (VLDB), pp. 200-209, September 2000.
Md. Abu Kausar, V S Dhaka and Sanjeev Kumar Singh,
"Web Crawler Based on Mobile Agent and Java Aglets"
I.J. Information Technology and Computer Science, vol.
, no. 10, pp. 85-91, 2013.
Md. Abu Kausar, V S Dhaka and Sanjeev Kumar Singh,
"An Effective Parallel Web Crawler based on Mobile
Agent and Incremental Crawling", Journal of Industrial
and Intelligent Information, vol. 1, no. 2, pp. 86-90,
Martin Hilbert, "How to Measure How Much
Information Theoretical, Methodological, and Statistical
Challenges for the Social Sciences", International Journal
of Communication 6 (2012).
Luciano Barbosa and Juliana Freire, "Searching for
hidden-web databases", In Web DB, pages 16, 2005.
Michael K. Bergman, "White paper: The deep web:
Surfacing hidden value", Journal of electronic
publishing, 7(1), 2001.
Yeye He, Dong Xin, Venkatesh Ganti, Sriram Rajaraman,
and Nirav Shah, "Crawling deep web entity pages", In
Proceedings of the sixth ACM international conference
on Web search and data mining, pages 355364. ACM,
Shestakov Denis, "On building a search interface
discovery system", In Proceedings of the 2nd
international conference on Resource discovery, pages
, Lyon France, 2010. Springer.
Luciano Barbosa and Juliana Freire, "An adaptive
crawler for locating hiddenweb entry points", In
Proceedings of the 16th international conference on
World Wide Web, pages 441450. ACM, 2007.
Soumen Chakrabarti, Martin Van den Berg, and Byron
Dom, "Focused crawling: a new approach to topicspecific
web resource discovery", Computer Networks,
(11):16231640,1999.
Jayant Madhavan, David Ko, ucja Kot, Vignesh
Ganapathy, Alex Rasmussen, and Alon Halevy, "Googles
deep web crawl", Proceedings of the VLDB Endowment,
(2):12411252, 2008.
Balakrishnan Raju and Kambhampati Subbarao.
"Sourcerank: Relevance and trust assessment for deep
web sources based on intersource agreement", In
Proceedings of the 20th international conference on
World Wide Web, pages 227-236, 2011.
Olston Christopher and Najork Marc, "Web crawling.
Foundations and Trends in Information Retrieval",
(3):175246,2010.
Refbacks
- There are currently no refbacks.
Copyright © IJETT, International Journal on Emerging Trends in Technology