Some tools that are likely to be useful:
grab-site, the archival web crawler used by archive-team
heritrix, the archival web crawler used by the Internet Archive