Questions tagged [httrack]

HTTrack (Website copier)

HTTrack is a free and open source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3.

HTTrack allows users to download World Wide Web sites from the Internet to a local computer.[4][5] By default, HTTrack arranges the downloaded site by the original site's relative link-structure. The downloaded (or "mirrored") website can be browsed by opening a page of the site in a browser.

HTTrack can also update an existing mirrored site and resume interrupted downloads. HTTrack is configurable by options and by filters (include/exclude), and has an integrated help system. There is a basic command line version and two GUI versions (WinHTTrack and WebHTTrack); the former can be part of scripts and cron jobs.

HTTrack uses a Web crawler to download a website. Some parts of the website may not be downloaded by default due to the robots exclusion protocol unless disabled during the program. HTTrack can follow links that are generated with basic JavaScript and inside Applets or Flash, but not complex links (generated using functions or expressions) or server-side image maps.

Reference :

http://www.httrack.com/

http://en.wikipedia.org/wiki/HTTrack

61 questions

votes

2 answers

Download website links having specific elements around

I need to mirror recursively some site wallpaper images having a specific markup around, like:

Original Resolution: 4800x2700
Views:

wget httrack

asked Jul 30 '17 at 16:24

mike

votes

1 answer

HTTrack returns file not found

I downloaded a website with HTTrack using the following command: /usr/local/bin/httrack https://www.website.com -O /Users/mainuser/Desktop/website -n -j I than located the index.html file in website folder and run it. Chrome returns the message:…

html html-parsing httrack

asked Oct 09 '16 at 11:49

sanjihan

3,913
6
37
78

votes

1 answer

How to download a website including all files with links starting with a certain path

I'd like to build a static website based on the styling of a Wordpress template, Inovado. I downloaded the website using HTTrack (in Linux) with the following command: httrack http://inovado.hellominti.com The resulting index.html contains several…

wget httrack

asked Sep 23 '16 at 08:28

Kurt Peek

34,968
53
191
361

votes

1 answer

Remove Domain URL from downloaded wbsite by HTTrack

I have downloaded full website by HTTrack. But after downloading the site all URL contain the Domain name url of the site like: www.example.com/index.html instead of index.html is there any way to remove this url ?

html css url directory httrack

asked Sep 10 '16 at 18:30

akib

votes

0 answers

What does block the crawl of my website by Httrack or Wget?

I am attempting to clone my website to show it for a presentation offline. However I tried either with Httrack either with Wget and both are stoping to the second level of the source tree. What could be the reason ? This is the Wget cmd : wget -r…

wget httrack

asked May 30 '16 at 21:22

Baldráni

4,066
3
41
63

votes

1 answer

Node.js get HTTP_USER_AGENT and Block HTTrack

I want to block all bots (like a HTTrack) on my website. Normally, I would use .htaccess file to block bots via RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]. However, my server is running Node.js Express. How can I get HTTP_USER_AGENT and do a…

javascript node.js user-agent httrack

asked Apr 25 '16 at 16:30

Barış Saçıkara

votes

1 answer

HTTrack wait until page search completed

I'm trying to download with HTTrack the results of a search request at the URL here Unfortunately the download starts immediately and doesn't get the search result (as the page is still showing a wheel). Question: is it possible to force a pause…

http download web httrack

asked Apr 24 '16 at 19:32

Tom

1,315
3
22
41

votes

1 answer

Download .torrent from YTS

Is it possible to download all torrent files from the yts website? In HHTRACK I get a mirror error, probably caused by the captcha that you need to enter before accessing the site. Is there a way to bypass this or use another method?

download torrent httrack

asked Apr 18 '16 at 09:04

dcf007

votes

1 answer

Using subprocess to run HTTrack from python in Windows

I'm in the process of writing a web scraping python script, and one of the things I'd like it to be able to do is have it take a snapshot of certain pages (all of the html, style sheets, and images necessary to view that particular page properly…

python windows subprocess httrack

asked Jan 13 '16 at 21:19

Empiromancer

3,258
1
15
41

votes

1 answer

Trying to mirror site that uses strapdown.js

there is a site that uses strapdown.js that I am trying to mirror using httrack or wget, but I fall short, because the site contains markdown and not HTML. Only strapdown converts the links to html links. Hence the client needs to interpret…

javascript linux unix wget httrack

asked Nov 26 '14 at 12:56

Buddy

votes

1 answer

httrack only downloads the index.html file

Usually when I download sites with Httrack I get all the files; images, CSS, JS etc. Today, the program finished downloading in just 2 seconds and only grabs the index.html file with CSS, IMG code etc inside still linking to external. I've already…

html web download httrack

asked Nov 22 '14 at 17:16

user3379220

votes

1 answer

How do I push the result of this complex command line grep statement to mysql database?

This code searches through website html files and extracts a list of domain names... httrack --skeleton http://www.ilovefreestuff.com -V "cat \$0" | grep -iEo '[[:alnum:]-]+\.(com|net|org)' The result looks like…

mysql bash grep httrack

asked May 24 '14 at 22:31

Wyatt Jackson

-1

votes

1 answer

How to find the directory structure and the file names under a php website?

How do i get the directory structure and filenames under a PHP website I do not own?. Not the code, just the structure and the filenames.? I tried httrack, but since it's a PHP website, it doesn't work.

php web-scraping scrapy web-crawler httrack

asked Jun 04 '20 at 01:24

user12871659

-1

votes

1 answer

Different source code in inspect and in view-source code

While I was looking for source code a website it showed me some random-looking JS code in body block in view-source-code like following: