106

Does anyone know if and how it is possible to search Google programmatically - especially if there is a Java API for it?

laser
  • 1,378
  • 13
  • 13
Dan
  • 9,093
  • 13
  • 48
  • 69

8 Answers8

139

Some facts:

  1. Google offers a public search webservice API which returns JSON: http://ajax.googleapis.com/ajax/services/search/web. Documentation here

  2. Java offers java.net.URL and java.net.URLConnection to fire and handle HTTP requests.

  3. JSON can in Java be converted to a fullworthy Javabean object using an arbitrary Java JSON API. One of the best is Google Gson.

Now do the math:

public static void main(String[] args) throws Exception {
    String google = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=";
    String search = "stackoverflow";
    String charset = "UTF-8";
    
    URL url = new URL(google + URLEncoder.encode(search, charset));
    Reader reader = new InputStreamReader(url.openStream(), charset);
    GoogleResults results = new Gson().fromJson(reader, GoogleResults.class);
    
    // Show title and URL of 1st result.
    System.out.println(results.getResponseData().getResults().get(0).getTitle());
    System.out.println(results.getResponseData().getResults().get(0).getUrl());
}

With this Javabean class representing the most important JSON data as returned by Google (it actually returns more data, but it's left up to you as an exercise to expand this Javabean code accordingly):

public class GoogleResults {

    private ResponseData responseData;
    public ResponseData getResponseData() { return responseData; }
    public void setResponseData(ResponseData responseData) { this.responseData = responseData; }
    public String toString() { return "ResponseData[" + responseData + "]"; }

    static class ResponseData {
        private List<Result> results;
        public List<Result> getResults() { return results; }
        public void setResults(List<Result> results) { this.results = results; }
        public String toString() { return "Results[" + results + "]"; }
    }

    static class Result {
        private String url;
        private String title;
        public String getUrl() { return url; }
        public String getTitle() { return title; }
        public void setUrl(String url) { this.url = url; }
        public void setTitle(String title) { this.title = title; }
        public String toString() { return "Result[url:" + url +",title:" + title + "]"; }
    }

}

###See also:


Update since November 2010 (2 months after the above answer), the public search webservice has become deprecated (and the last day on which the service was offered was September 29, 2014). Your best bet is now querying http://www.google.com/search directly along with a honest user agent and then parse the result using a HTML parser. If you omit the user agent, then you get a 403 back. If you're lying in the user agent and simulate a web browser (e.g. Chrome or Firefox), then you get a way much larger HTML response back which is a waste of bandwidth and performance.

Here's a kickoff example using Jsoup as HTML parser:

String google = "http://www.google.com/search?q=";
String search = "stackoverflow";
String charset = "UTF-8";
String userAgent = "ExampleBot 1.0 (+http://example.com/bot)"; // Change this to your company's name and bot homepage!

Elements links = Jsoup.connect(google + URLEncoder.encode(search, charset)).userAgent(userAgent).get().select(".g>.r>a");

for (Element link : links) {
    String title = link.text();
    String url = link.absUrl("href"); // Google returns URLs in format "http://www.google.com/url?q=<url>&sa=U&ei=<someKey>".
    url = URLDecoder.decode(url.substring(url.indexOf('=') + 1, url.indexOf('&')), "UTF-8");
    
    if (!url.startsWith("http")) {
        continue; // Ads/news/etc.
    }
    
    System.out.println("Title: " + title);
    System.out.println("URL: " + url);
}
BalusC
  • 992,635
  • 352
  • 3,478
  • 3,452
  • Thanks so much - is this not breaking the license agreement as mentioned in answer above? Really appreciate the code! – Dan Sep 16 '10 at 14:48
  • Nope, also see [documentation](http://code.google.com/apis/ajaxsearch/documentation) with [a Java example](http://code.google.com/apis/ajaxsearch/documentation/#fonje_snippets_java). – BalusC Sep 16 '10 at 14:52
  • 11
    Please note that the Google Search API has become deprecated since November 2010 (2 months after the above answer was been posted). Endusers are encouraged to move to Google Custom Search API: https://developers.google.com/custom-search/v1/overview – BalusC Jun 15 '12 at 21:02
  • 2
    @BalusC Isn't Google's custom search only for searching inside a particular website rather then entire web?? – Pargat Jun 24 '12 at 19:55
  • To get the other 4 result you should use start=3: String google = "http://ajax.googleapis.com/ajax/services/search/web?start=3&v=1.0&q="; Somebody know if exist a way to get more than 4 results? – Accollativo Oct 06 '13 at 20:55
  • But _sometime_ it do't give any **exception** and search the text from google and sometime it throws `NullPointerException`. why? – Akhilesh Dhar Dubey Nov 24 '13 at 13:57
  • 1
    Also, what if you don't have a company name or a bot page?? – Mike Warren Oct 17 '14 at 22:17
  • 1
    In Scala val searchResults = Jsoup.connect(googleBase + URLEncoder.encode(searchQuery, charset)) .userAgent(userAgent) .get() .select(".g>.r>a"); – Vladimir Stazhilov Nov 27 '16 at 10:35
  • @AkhileshDubey bad request or timeout, you should first verify the connection and then query – Vladimir Stazhilov Nov 27 '16 at 10:35
15

To search google using API you should use Google Custom Search, scraping web page is not allowed

In java you can use CustomSearch API Client Library for Java

The maven dependency is:

<dependency>
    <groupId>com.google.apis</groupId>
    <artifactId>google-api-services-customsearch</artifactId>
    <version>v1-rev57-1.23.0</version>
</dependency> 

Example code searching using Google CustomSearch API Client Library

public static void main(String[] args) throws GeneralSecurityException, IOException {

    String searchQuery = "test"; //The query to search
    String cx = "002845322276752338984:vxqzfa86nqc"; //Your search engine

    //Instance Customsearch
    Customsearch cs = new Customsearch.Builder(GoogleNetHttpTransport.newTrustedTransport(), JacksonFactory.getDefaultInstance(), null) 
                   .setApplicationName("MyApplication") 
                   .setGoogleClientRequestInitializer(new CustomsearchRequestInitializer("your api key")) 
                   .build();

    //Set search parameter
    Customsearch.Cse.List list = cs.cse().list(searchQuery).setCx(cx); 

    //Execute search
    Search result = list.execute();
    if (result.getItems()!=null){
        for (Result ri : result.getItems()) {
            //Get title, link, body etc. from search
            System.out.println(ri.getTitle() + ", " + ri.getLink());
        }
    }

}

As you can see you will need to request an api key and setup an own search engine id, cx.

Note that you can search the whole web by selecting "Search entire web" on basic tab settings during setup of cx, but results will not be exactly the same as a normal browser google search.

Currently (date of answer) you get 100 api calls per day for free, then google like to share your profit.

Petter Friberg
  • 19,652
  • 9
  • 51
  • 94
12

In the Terms of Service of google we can read:

5.3 You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) and shall ensure that you comply with the instructions set out in any robots.txt file present on the Services.

So I guess the answer is No. More over the SOAP API is no longer available

Manuel Selva
  • 16,987
  • 21
  • 76
  • 127
3

Google TOS have been relaxed a bit in April 2014. Now it states:

"Don’t misuse our Services. For example, don’t interfere with our Services or try to access them using a method other than the interface and the instructions that we provide."

So the passage about "automated means" and scripts is gone now. It evidently still is not the desired (by google) way of accessing their services, but I think it is now formally open to interpretation of what exactly an "interface" is and whether it makes any difference as of how exactly returned HTML is processed (rendered or parsed). Anyhow, I have written a Java convenience library and it is up to you to decide whether to use it or not:

https://github.com/afedulov/google-web-search

Alex Fedulov
  • 1,163
  • 14
  • 26
  • after hours researching for a solution written in java which really works, your solution seems to be the most viable way of doing this inside a java environment. Your code needs some adjustments by the way... – Digao Mar 19 '17 at 23:20
  • feel free to open an issue on github – Alex Fedulov Mar 20 '17 at 09:28
2

Indeed there is an API to search google programmatically. The API is called google custom search. For using this API, you will need an Google Developer API key and a cx key. A simple procedure for accessing google search from java program is explained in my blog.

Now dead, here is the Wayback Machine link.

Ken Y-N
  • 12,690
  • 21
  • 62
  • 98
Sai Sunder
  • 941
  • 1
  • 8
  • 16
  • In your blog, on the part about the API key, you mentioned something about the server key, for programs that are written in Java. I am writing mine in Java, and was wanting to know if I should use a server key, and how would I use my API key in my program. Also, would I have to download any libraries? – Mike Warren Oct 17 '14 at 22:02
0

As an alternative to BalusC answer as it has been deprecated and you have to use proxies, you can use this package. Code sample:

Map<String, String> parameter = new HashMap<>();
parameter.put("q", "Coffee");
parameter.put("location", "Portland");
GoogleSearchResults serp = new GoogleSearchResults(parameter);

JsonObject data = serp.getJson();
JsonArray results = (JsonArray) data.get("organic_results");
JsonObject first_result = results.get(0).getAsJsonObject();
System.out.println("first coffee: " + first_result.get("title").getAsString());

Library on GitHub

Hartator
  • 4,496
  • 3
  • 36
  • 67
-1

In light of those TOS alterations last year we built an API that gives access to Google's search. It was for our own use only but after some requests we decided to open it up. We're planning to add additional search engines in the future!

Should anyone be looking for an easy way to implement / acquire search results you are free to sign up and give the REST API a try: https://searchapi.io

It returns JSON results and should be easy enough to implement with the detailed docs.

It's a shame that Bing and Yahoo are miles ahead on Google in this regard. Their APIs aren't cheap, but at least available.

Stan Smulders
  • 4,910
  • 1
  • 23
  • 21
-2

Just an alternative. Searching google and parsing the results can also be done in a generic way using any HTML Parser such as Jsoup in Java. Following is the link to the mentioned example.

Update: Link no longer works. Please look for any other example. https://www.codeforeach.com/java/example-how-to-search-google-using-java

Prashanth
  • 813
  • 10
  • 7