1

I need to monitor a list of websites through different ISPs to see if they are blocked. I have a different machine for each ISP. I'm trying to write a code to automatically check if the websites are blocked. A few things came to my mind but they are not working for different reasons:

ping: I thought I would ping websites but then some websites have their ICMP ports closed on their side.

get request (or javascript image trick): I thought maybe I just GET the webpage but that wouldn't help because the blocked pages still return some non-standard blockage page. 200 status.

Lastly, I thought maybe I get a copy of the website on a non-blocked machine and compare the page with the one on testing machines but there are 2 problems: I don't know how to compare two pages (like what element would I compare) and secondly, some websites are dynamic thus they return slightly different versions.

Any thoughts will be helpful.

Community
  • 1
  • 1
Kiarash
  • 5,626
  • 7
  • 38
  • 67

3 Answers3

1

I'm guessing that the ISP block page is consistent to that ISP no matter what site you're trying to access. Instead of comparing the retrieved page to a "known true" page, what about comparing it to a "known false" page?

Downside: you would have to repeat this process for each ISP, since they will almost certainly have different "block" pages.

aapierce
  • 887
  • 9
  • 14
  • they have different block pages and i don't know how to find a block page easily but even if we know a known false page per ISP, how do you compare two webpages? you can imagine it would be a bigger hassle to identify unique elements per block page. – Kiarash Feb 21 '14 at 20:02
  • If you're lucky, the "block" page might have a `` in it with a unique source (probably pointing to the ISP's domain name). Of course, all this is moot if you don't know what the ISP's block page looks like. – aapierce Feb 21 '14 at 20:22
0

Two Options:

Compare the size of the page. If two page's sizes are fairly similar then you know the ISPs are likely not blocking the site. If one page's size is extremely small, chances are that ISP is blocking the site.

Grab elements from the pages such as headers, titles, button text, etc. and compare those to each other. If you have x matches the page is likely not blocked, if you have no matches the page is likely blocked.

Freelancer
  • 86
  • 1
  • 7
  • 1
    This is good except if the original website is light weight. Then the differences would be little... – Kiarash Feb 21 '14 at 20:50
0

store the template of the blocked site page.

For each GET responseBody check if its contents match to blocked template.

If the firewall forces a redirect you could check if the IP/DNS of the response server is equal to the firewall.

Roger Barreto
  • 1,758
  • 1
  • 14
  • 21