WWW-Crawl The WWW::Crawl module provides a simple web crawling utility for extracting links and other resources from web pages within a single domain. It can be used to recursively explore a website and retrieve URLs, including those found in HTML href attributes, form actions, external JavaScript files, and JavaScript window.open links. WWW::Crawl will not stray outside the supplied domain. EXAMPLES Basic crawling with HTTP::Tiny: use WWW::Crawl; my $crawler = WWW::Crawl->new(); my @visited = $crawler->crawl('https://example.com', \&process_page); sub process_page { my $url = shift; print "Visited: $url\n"; } Crawling JavaScript-rendered pages with Chromium: use WWW::Crawl::Chromium; my $crawler = WWW::Crawl::Chromium->new( chromium_path => '/usr/bin/chromium', chromium_timeout => 30, ); my @visited = $crawler->crawl('https://example.com', \&process_page); sub process_page { my $url = shift; print "Visited: $url\n"; } INSTALLATION & TESTING To run author tests, set the environment variable RELEASE_TESTING Installation tests are only run if Test::Mock::HTTP::Tiny in installed. If you wish to run a full set of tests, ensure this module is installed before installing WWW::Crawl. To install this module, run the following commands: perl Makefile.PL make make test make install SUPPORT AND DOCUMENTATION After installing, you can find documentation for this module with the perldoc command. perldoc WWW::Crawl You can also look for information at: RT, CPAN's request tracker (report bugs here) https://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Crawl Search CPAN https://metacpan.org/release/WWW-Crawl LICENSE AND COPYRIGHT This software is Copyright (c) 2023 by Ian Boddison. This program is released under the following license: Perl