0

I am new in java.I want to make a simple web crawler.how to access a robots.txt file for a website in java.actually i dont know much about robots.txt. plz help me out.

Toukir Naim
  • 161
  • 1
  • 7
  • The robots.txt file is in a pretty standard location on every single website (since any given number of various search engines need to be able to find it). Accessing it is as simple as performing a get of [url]/robots.txt ;) – Mike McMahon Apr 10 '12 at 23:45

1 Answers1

1

You need to solve two tasks:

  1. use a HTTP library to fetch files over HTTP -- How to send HTTP request in java?
  2. write or use a parser for robots.txt files -- robots.txt parser java
Community
  • 1
  • 1
j13r
  • 2,462
  • 1
  • 18
  • 26