frankdavid / irwebcrawler Goto Github PK
View Code? Open in Web Editor NEWGiven a seed webpage, autonomously traverses the Internet. When the crawler encounters an unseen page, that page is crawled and analyzed. Calculates the number of distinct urls, exact duplicate and near duplicate pages and pages written in English.