0

I am trying to create a program that is able to track/manage social media followers, with the simple idea of using a URL object and a BufferedReader in Java, extracting the strings and filtering it out to only display/keep certain things out of the HTML document. For example; loading the Following page on a user's account and returning each user on the list, and then checking if each account on that list follows the user checking.

The problem is certain links from social media sites, like twitter.com/username/following, are only accessible when logged into Twitter and just return the login page when trying to read from it with a URL object.

I'm not very experienced with "Web Programming" but I was wondering if there is a way that I can "log in" with a URL object in Java, or other methods of doing so, so that it displays the correct page I am trying to load and extract Strings/data from.

Thank you for any help or resources you can provide.

  • 2
    It's much simpler to use API client libraries (sdk's) for interfacing with social platforms. – Jasper Huzen Jan 20 '20 at 00:33
  • Welcome to Stack Overflow! The question doesn't appear to include any attempt to solve the problem. Please edit the question to show what you've tried, and show a specific roadblock you're running into with [mcve]. For more information, please see [ask]. – Erty Seidohl Jan 20 '20 at 03:51

2 Answers2

1

If a website allows logon using basic authentication, you can add the 'Authorization' header to your URL request.

The following answer already outlines how to add such a header to your request in Java: https://stackoverflow.com/a/5137446

How ever, social media sites don't support this kind of authentication, and logging in with a person's credentials using a program might even be against terms of service.

If you'd like to receive information about a user in your program, you must request an API from the social media site. This might be a java library used for retrieving information from their site, or a schema that documents how to retrieve information, usually by using a REST API they provide on their web server.

This sort of code won't require the user's login credentials, and will most likely work using a standard called OAUTH2 where your program is authenticated to retrieve information by receiving a login key from the user, by means of them "connecting" their account to your service.

Take a look at the Twitter developer's page: https://developer.twitter.com/en/docs/basics/getting-started

0

What you are trying is called screen-scraping technique which is inherently error prone since structure of web pages change often.

It is possible to navigate to URLs that require login and you would need to mimic exactly what browser does in that case i.e. following all redirects, storing hidden variables, setting and re-sending cookies, sending user name, password at right times.

You can take a look at Network Panel in your browser.

I am assuming you are using your own twitter user name and password to login.

Sameer Naik
  • 1,088
  • 12
  • 25