Access the full text.
Sign up today, get DeepDyve free for 14 days.
International Journal for Innovative Research in Science & Technology
Bin He, Mitesh Patel, Zhen Zhang, K. Chang (2007)
Accessing the deep webCommun. ACM, 50
M. Goodman (2015)
Most of the web is invisible to Google. Here’s what it contains. A roadmap of the internet’s darkest alleys. Popular science
International Journal of Scientific & Engineering Research, Vol. 6
Vishakha Shukla, Dharmendra Roy (2016)
Web Crawlers and Web Crawling Algorithms - A ReviewInternational journal of scientific research in science, engineering and technology, 2
Keyur Desai, Virala Devulapalli, Smita Agrawal, Preeti Kathiria, Atul Patel (2017)
Web Crawler : Review of Different Types of Web Crawler, Its Issues, Applications and Research OpportunitiesInternational Journal of Advanced Research in Computer Science, 8
Advances in Vision Computing: An International Journal, 8
International Journal of Computer Sciences and Engineering, 2
(2017)
April 2017 web server survey. Retrieved May 10, 2017
Journal of Electronic Publishing, 7
Kwang-Young Kim, Wongoo Lee, Minho Lee, Hwa-Mook Yoon, Sung-Ho Shin (2011)
Development of Web Crawler for Archiving Web ResourcesThe Journal of the Korea Contents Association, 11
Babita Ahuja, A. Anuradha, Ashish Ahuja (2013)
Hidden Web Data Extraction ToolsInternational Journal of Computer Applications, 82
International Journal of Advanced Research in Computer Science, 8
M. Álvarez, J. Raposo, A. Pan, F. Cacheda, F. Bellas, V. Carneiro (2007)
DeepBot: a focused crawler for accessing hidden web content
Pablo Barrio, L. Gravano (2017)
Sampling strategies for information extraction over the deep webInf. Process. Manag., 53
S. Raghavan, H. Garcia-Molina (2001)
Crawling the Hidden Web
R. Madaan, A. Dixit, K. Bhatia (2010)
A Framework for Incremental Hidden Web Crawler
M. Bergman (2000)
The deep web:surfacing the hidden value
The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web.Design/methodology/approachThis study proposes and develops an algorithm to collect web information as if the web crawler gathers static webpages by managing script commands as links. The proposed web crawler actually experiments with the algorithm by collecting deep webpages.FindingsAmong the findings of this study is that if the actual crawling process provides search results as script pages, the outcome only collects the first page. However, the proposed algorithm can collect deep webpages in this case.Research limitations/implicationsTo use a script as a link, a human must first analyze the web document. This study uses the web browser object provided by Microsoft Visual Studio as a script launcher, so it cannot collect deep webpages if the web browser object cannot launch the script, or if the web document contains script errors.Practical implicationsThe research results show deep webs are estimated to have 450 to 550 times more information than surface webpages, and it is difficult to collect web documents. However, this algorithm helps to enable deep web collection through script runs.Originality/valueThis study presents a new method to be utilized with script links instead of adopting previous keywords. The proposed algorithm is available as an ordinary URL. From the conducted experiment, analysis of scripts on individual websites is needed to employ them as links.
Data Technologies and Applications – Emerald Publishing
Published: Mar 22, 2018
Keywords: Archives; Web archiving; Automatic crawler; Deep web; Link; Web information
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.