Article 11300 of comp.infosystems.www: Newsgroups: comp.infosystems.www Path: feenix.metronet.com!news.ecn.bgu.edu!mp.cs.niu.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!EU.net!Germany.EU.net!netmbx.de!zrz.TU-Berlin.DE!zib-berlin.de!uni-paderborn.de!urmel.informatik.rwth-aachen.de!news.dfn.de!scsing.switch.ch!swidir.switch.ch!news.unige.ch!usenet From: oscar@cui.unige.ch (Oscar Nierstrasz) Subject: htget -- script to MIRROR WWW files and directories Message-ID: <1994Mar25.085107.20525@news.unige.ch> Sender: usenet@news.unige.ch Reply-To: oscar@cui.unige.ch Organization: University of Geneva, Switzerland Date: Fri, 25 Mar 1994 08:51:07 GMT Lines: 62 Something that's been on my "to do" list for some time now ... This is a first announcement for `htget', a perl script for mirroring WWW files and directories (using HTTP only). It is an evolution of `hget' an earlier script that can retrieve individual files only. `htget URL' makes a copy of the remote file in the current directory `htget -s URL' copies the file to stdout (as hget does by default) `htget -abs URL' converts all relative URLs to absolute URLs (so that the local file will contain correct links) `htget -r URL' will *recursively* retrieve the file and all other files reachable from that URL *provided* they have the same prefix (i.e., reside in the same directory hierarchy) The interesting case is the last one. Htget will try to re-create the required directory hierarchy, will convert all relative URLs to absolute ones, *except* those of retrieved pages, which will be made relative (so all links will be to the mirrored pages, not the original ones). Htget also tries to make intelligent decisions about which files should be called "index.html" (and tries to recover if trailing slashes are left off directory URLs). htget can be found at: http://cui_www.unige.ch/ftp/PUBLIC/oscar/scripts/README.html You will also need url.pl and ftplib.pl (at the same location). htget has been used experimentally to mirror the WWW 94 and OOPSLA 94 conference servers: http://cui_www.unige.ch/WWW94/CERN/ http://cui_www.unige.ch/OSG/OOinfo/Conf/OOPSLA/ Naturally you should only use this when you are sure you really want to mirror a whole directory! htget does give you feedback about files it is retrieving and their sizes. If you use this script and discover any problems, please let me know. Oscar Nierstrasz World Wide Web 94 Programme Chair __________________________________________________________________________ Attend WWW 94 in Geneva! Contribute a hypertalk! Being held at CERN, May 25-27, 1994. See: http://www1.cern.ch/WWW94/Welcome.html __________________________________________________________________________ Oscar Nierstrasz -- M.E.R. (Assistant Professor) | Prefix: +41 22 Centre Universitaire d'Informatique, University of Geneva | Tel: 705.7664 24, rue General-Dufour -- CH-1211 Geneva 4 -- SWITZERLAND | Sec: 705.7770 E-mail: oscar@cui.unige.ch | Fax: 320.2927 Ftp: cui.unige.ch:/OO-articles | WWW: http://cui_www.unige.ch/OSG/Oscar/home.html | Home: 733.9568 __________________________________________________________________________