CT-FC: more Comprehensive Traversal Focused Crawler
Abstract: In today’s world,
people depend more on the WWW information, including professionals who have to
analyze the data according their domain to maintain and improve their business.
A data analysis would require information that is comprehensive and relevant to
their domain. Focused crawler as a topical based Web indexer agent is used to
meet this application’s information need. In order to increase the precision,
focused crawler face the problem of low recall. The study on WWW hyperlink
structure characteristics indicates that many Web documents are not strong
connected but through co-citation & co-reference. Conventional focused
crawler that uses forward crawling strategy could not visit the documents in
these characteristics. This study proposes a more comprehensive traversal
framework. As a proof, CT-FC (a focused crawler with the new traversal
framework) ran on DMOZ data that is representative to WWW characteristics. The
results show that this strategy can increase the recall significantly.
Author: Siti Maimunah, Husni S
Sastramihardja, Dwi H Widyantoro, Kuspriyanto
Journal Code: jptkomputergg120040