When opening a connection, how can I find out the best URL format to use?
Many sites return different results based on whether the URL uses "www" and/or "https".
For example, here's a test that I wrote to see some of the different results:
import java.util.Scanner;
import java.util.ArrayList;
import java.net.*;
import java.io.*;
public class Test {
public static void main(String[] args)
{
String baseURL = "google.com";
try
{
java.net.URL url = new java.net.URL("http://" + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("http://" + baseURL + " = " + lineCount + " lines");
}
catch (Exception ex)
{
System.out.println("http://" + baseURL + " throws an error");
}
try
{
java.net.URL url = new java.net.URL("http://www." + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("http://www." + baseURL + " = " + lineCount + " lines");
}
catch(Exception ex)
{
System.out.println("http://www." + baseURL + " throws an error");
}
try
{
java.net.URL url = new java.net.URL("https://" + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("https://" + baseURL + " = " + lineCount + " lines");
}
catch (Exception ex)
{
System.out.println("https://" + baseURL + " throws an error");
}
try
{
java.net.URL url = new java.net.URL("https://www." + baseURL);
java.net.URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
int lineCount = 0;
while ((line = in.readLine()) != null)
{
lineCount++;
}
System.out.println("https://www." + baseURL + " = " + lineCount + " lines");
}
catch (Exception ex)
{
System.out.println("https://www." + baseURL + " throws an error");
}
}
}
Here were the results of running it on four different websites:
http://stackoverflow.com = 4205 lines
http://www.stackoverflow.com = 4205 lines
https://stackoverflow.com = 4205 lines
https://www.stackoverflow.com = 2 lines
http://qvc.com = 2438 lines
http://www.qvc.com = 2438 lines
https://qvc.com throws an error
https://www.qvc.com = 0 lines
http://facebook.com = 0 lines
http://www.facebook.com = 0 lines
https://facebook.com = 25 lines
https://www.facebook.com = 25 lines
http://google.com = 6 lines
http://www.google.com = 6 lines
https://google.com = 343 lines
https://www.google.com = 343 lines
Given a base URL, like "google.com", what's the proper way of checking to see which format I should use for the website?