Questions tagged [httrack]

HTTrack (Website copier)

HTTrack is a free and open source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3.

HTTrack allows users to download World Wide Web sites from the Internet to a local computer.[4][5] By default, HTTrack arranges the downloaded site by the original site's relative link-structure. The downloaded (or "mirrored") website can be browsed by opening a page of the site in a browser.

HTTrack can also update an existing mirrored site and resume interrupted downloads. HTTrack is configurable by options and by filters (include/exclude), and has an integrated help system. There is a basic command line version and two GUI versions (WinHTTrack and WebHTTrack); the former can be part of scripts and cron jobs.

HTTrack uses a Web crawler to download a website. Some parts of the website may not be downloaded by default due to the robots exclusion protocol unless disabled during the program. HTTrack can follow links that are generated with basic JavaScript and inside Applets or Flash, but not complex links (generated using functions or expressions) or server-side image maps.

Reference :

http://www.httrack.com/

http://en.wikipedia.org/wiki/HTTrack

61 questions

votes

5 answers

mirror single page with httrack

I am trying to use httrack (http://www.httrack.com/) in order to download a single page, not the entire site. So, for example, when using httrack in order to download www.google.com it should only download the html found under www.google.com along…

http command-line httrack

asked Dec 28 '09 at 07:55

Max

14,808
14
71
121

votes

2 answers

httrack wget curl scrape & fetch

There are a number of tools on the internet for downloading a static copy of a website, such as HTTrack. There are also many tools, some commercial, for “scraping” content from a website, such as Mozenda. Then there are tools which are apparently…

curl download web-scraping wget httrack

asked Sep 30 '13 at 15:39

Malik A. Rumi

1,361
2
17
31

votes

2 answers

Error with Capture URL / Catch URL in HTTrack

I have a problem when click capture URL from HTTrack. That is it generate a Proxy address not correct. This is result : Please TEMPORARILY set your browser's proxy preferences to: Proxy's address: fe80::141b:2ce3:3f57:fefb Proxy's port: …

proxy httrack

asked Jul 20 '14 at 07:46

Hoc N

votes

3 answers

How can I make HTTrack only download files on the current domain?

No matter how hard I try, I can't seem to get httrack to leave links going to other domains intact. I've tried using the --stay-on-same-domain argument, and that doesn't seem to do it. I've also tried adding a filter doesn't do it. There simply…

http download httrack

asked May 02 '14 at 05:49

Alex

22,845
25
92
147

votes

3 answers

Use httrack to download just one site, not external sites

I tried using httrack to download my phpbb forum, but no matter what setup I use, I cannot get it to stop downloading the entire wikipedia site as well, and many other websites whose links are anywhere in the forum... What I managed to do it make it…

httrack

asked Dec 13 '16 at 18:12

Predrag Stojadinović

3,121
5
30
49

votes

1 answer

How to bundle httrack into a python 3 executable

There is a great website copier that I would like to bundle in my executable, created with python 3 and py2exe. On the HTTrack official website in the FAQ section they say that there is a DLL/library version available. But I don't know where to…

python dll bundle py2exe httrack

asked Mar 06 '16 at 05:32

yuval

1,948
3
21
36

votes

3 answers

Compiling Httrack on MAC OS X

I'm trying to compile httrack on my MAC. ./configure is successful. But while compiling the package i'm getting following error, and not able to resolve it. In file included from htscore.c:40: In file included from ./htscore.h:81: In file included…

macos openssl httrack

asked Dec 16 '15 at 19:20

user3730989

votes

1 answer

Retrieving a complete webpage including dynamically loaded links/images

Problem Downloading a complete working offline copy of a website that loads links/images dynamically Research There are questions (e.g. [1], [2], [3]) on Stackoverflow addressing this issue, most of which have the top answers using wget or httrack,…

python selenium web-scraping wget httrack

asked Apr 15 '18 at 17:11

Nader Alexan

1,802
20
31

votes

2 answers

Issue downloading a complete website for offline use with HTTrack

I downloaded sonst.cc with HTTrack, but when viewing it offline there’s no content. Every single tab is empty. Why is that? Is there any other app with which I could download the whole thing? I’m losing my mind over here. Thanks. Edit: When I open…

download httrack

asked Jan 10 '15 at 11:13

jay

votes

2 answers

HTTrack possible using cookies

I want to download the page from a URL, easy enough. But on the first page I have to login, as I normally do from a normal browser. But HTTrack is downloading from the first page since it can't use my cookies or login. Is it any way for me to get…

php httrack

asked Dec 03 '13 at 22:14

Malin Pedersen

votes

2 answers

httrack follow redirects

I try to mirror webpages recursively starting from URL supplied by user (there is a depth limit set of course). Wget didn't catch links from css/js so I decided to use httrack. I try to mirror some site like this: # httrack -r6…

unix download automation httrack

asked Aug 11 '12 at 21:26

neutrinus

1,429
12
20

votes

2 answers

How to block httrack ex programs?

all HTTRACK USER AGENT REQUESTS : Mozilla/2.0 (compatible; MS FrontPage Express 2.0) Mozilla/4.05 [fr] (Win98; I) Lynx/2.8rel.3 libwww-FM/2.14 Java1.1.4 Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98) HyperBrowser (Cray; I; OrganicOS…

httrack

asked Nov 04 '17 at 16:21

Ahmet Berk Başaran

votes

1 answer

HTTrack: How to download folders only from a certain subfolder level?

HTTrack gives filter options but I cannot figure out how to download a certain subfolder level and ignore all other subfolders. Example:…

httrack

asked Mar 27 '16 at 06:10

Avatar

11,039
8
98
167

votes

1 answer

How can I mirror the results of MOSS plagiarism detection?

MOSS is a well-known server for checking software plagiarism. It allows teachers to send homework submissions, calculates the similarity between different submissions, and colors code blocks that are very similar. Here is an example of the results…

html wget mirroring plagiarism-detection httrack

asked May 02 '21 at 18:48

Erel Segal-Halevi

26,318
26
92
153

votes

2 answers

How should I download specific file type from folder (and ONLY it's subfolders) using wget or httrack?

I'm trying to use HTTrack or Wget do download some .docx files from a website. I want to do this only for a folder and it's subfolders. Ex: www.examplewebsite.com/doc (this goes down 5 more levels) How would be a good way to do this?

wget httrack

asked May 23 '16 at 07:12

NoBlink

2 3 4 5 Next