0

When opening a connection, how can I find out the best URL format to use?

Many sites return different results based on whether the URL uses "www" and/or "https".

For example, here's a test that I wrote to see some of the different results:

import java.util.Scanner;
import java.util.ArrayList;
import java.net.*;
import java.io.*;

public class Test {

    public static void main(String[] args)
   {
      String baseURL = "google.com";

      try
      {
         java.net.URL url = new java.net.URL("http://" + baseURL);
         java.net.URLConnection connection = url.openConnection();
         connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
         BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

         String line;
         int lineCount = 0;

         while ((line = in.readLine()) != null)
         {
            lineCount++;
         }

         System.out.println("http://" + baseURL + " = " + lineCount + " lines");
      }

      catch (Exception ex)
      {
         System.out.println("http://" + baseURL + " throws an error");
      }



      try
      {
         java.net.URL url = new java.net.URL("http://www." + baseURL);
         java.net.URLConnection connection = url.openConnection();
         connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
         BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

         String line;
         int lineCount = 0;

         while ((line = in.readLine()) != null)
         {
          lineCount++;
         }

         System.out.println("http://www." + baseURL + " = " + lineCount + " lines");
      }

      catch(Exception ex)
      {
         System.out.println("http://www." + baseURL + " throws an error");
      }







      try
      {
         java.net.URL url = new java.net.URL("https://" + baseURL);
         java.net.URLConnection connection = url.openConnection();
         connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
         BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

         String line;
         int lineCount = 0;

         while ((line = in.readLine()) != null)
         {
            lineCount++;
         }

         System.out.println("https://" + baseURL + " = " + lineCount + " lines");
      }

      catch (Exception ex)
      {
         System.out.println("https://" + baseURL + " throws an error");
      }



      try
      {
         java.net.URL url = new java.net.URL("https://www." + baseURL);
         java.net.URLConnection connection = url.openConnection();
         connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
         BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

         String line;
         int lineCount = 0;

         while ((line = in.readLine()) != null)
         {
            lineCount++;
         }

         System.out.println("https://www." + baseURL + " = " + lineCount + " lines");
      }

      catch (Exception ex)
      {
         System.out.println("https://www." + baseURL + " throws an error");
      }
   }
}

Here were the results of running it on four different websites:

http://stackoverflow.com = 4205 lines
http://www.stackoverflow.com = 4205 lines
https://stackoverflow.com = 4205 lines
https://www.stackoverflow.com = 2 lines

 

http://qvc.com = 2438 lines
http://www.qvc.com = 2438 lines
https://qvc.com throws an error
https://www.qvc.com = 0 lines

 

http://facebook.com = 0 lines
http://www.facebook.com = 0 lines
https://facebook.com = 25 lines
https://www.facebook.com = 25 lines

 

http://google.com = 6 lines
http://www.google.com = 6 lines
https://google.com = 343 lines
https://www.google.com = 343 lines

Given a base URL, like "google.com", what's the proper way of checking to see which format I should use for the website?

Pikamander2
  • 4,767
  • 3
  • 37
  • 53

2 Answers2

1

Check the HTTP response code. If you get a redirect, then you probably used the wrong format. e.g. http://www.stackoverflow.com will do a 301 redirect to just http://stackoverflow.com.

Marc B
  • 340,537
  • 37
  • 382
  • 468
  • 1
    Isn't there a way to tell the request to follow redirects? – OneCricketeer Sep 13 '16 at 21:52
  • Probably, but I don't do java, so no idea what option it'd be. – Marc B Sep 13 '16 at 21:53
  • `Isn't there a way to tell the request to follow redirects?` Cast the URLConnection to an [HttpURLConnect](https://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html) and call `setFollowRedirects(true);` – copeg Sep 13 '16 at 22:09
  • @copeg- According to the [docs](https://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html#setFollowRedirects(boolean)), it's set to follow redirects by default, but it doesn't do what you would expect it to. See [this answer](http://stackoverflow.com/a/26046079/1741346) for more details. – Pikamander2 Sep 13 '16 at 22:32
0

After reading Marc B's answer, a few other StackOverflow threads (which I linked in the original question's comments), and this guide, here's what I came up with:

String baseURL = "google.com";

try
{
     java.net.URL url = new java.net.URL("http://" + baseURL);
     java.net.HttpURLConnection connection = (HttpURLConnection)url.openConnection();
     connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");

     int response = connection.getResponseCode();
     System.out.println("Response code: " + response);

     if (response == 301 || response == 302 || response == 303)
     {
            System.out.println("Redirect location: " + connection.getHeaderField("Location"));
     }

     else
     {
            BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));

            String line;
            int lineCount = 0;

            while ((line = in.readLine()) != null)
            {
            lineCount++;
            }

            System.out.println("http://" + baseURL + " = " + lineCount + " lines\n");
     }
}

catch (Exception ex)
{
     System.out.println("http://" + baseURL + " throws an error\n");
}

Which outputs this:

Response code: 302
Redirect location: https://www.google.com/?gws_rd=ssl

You can also use HttpURLConnection.HTTP_MOVED_TEMP, HttpURLConnection.HTTP_MOVED_PERM, and HttpURLConnection.HTTP_SEE_OTHER instead of the numeric response codes. That's probably a better practice, actually.

Pikamander2
  • 4,767
  • 3
  • 37
  • 53