0

I want to use Jsoup to screen scrape contents of a website. But I have to login first into the site. On browsing to the main page I get a dialog for username and password. Since it is not a form, I am getting the "not authorized" page as the response in in Jsoup. I tried to look for the url using firebug but I guess the dialog is appearing before the other page components are loaded. Hence I don't know what are the parameters I need to pass for the username and password fields nor do I get to know the service where I need to post.

This is a C# based website. I have seen this authentication mechanism in several Sharepoint sites. How should I go ahead with this kind of login mechanism ?

TwoA
  • 85
  • 6
  • Take a look here - http://stackoverflow.com/questions/32698446/login-on-website-using-jsoup it might be useful. If you add the URL to you question, It will be easier to answer you. – TDG Dec 18 '15 at 13:06

1 Answers1

1

Sounds like the page is using basic authentication. This happens before any HTML is sent to the client, so that's why you don't see it in firebug.

You need to send the username and password in a http-header, and here's a link that shows you how to do that: Jsoup connection with basic access authentication

Community
  • 1
  • 1
Mikael Nitell
  • 909
  • 5
  • 15
  • you can achieve this with `CookieContainer` i think. – Zwan Dec 18 '15 at 13:10
  • thanks. I studied about "Basic Authentication". But when I try to run my program it gives me the following exception: - `"main" java.io.IOException: Authentication failure` I have cross checked the credentials they are correct. Can it also depend on the network I am in ? Not that the site is not public but I remember getting a html response when I was in a different network, although that too was about not being authorized. – TwoA Dec 19 '15 at 07:30
  • well, your answer helped me some other sites with the same authentication mechanism. Thanks. – TwoA Dec 19 '15 at 22:13