130

How do I get an InputStream from a URL?

for example, I want to take the file at the url wwww.somewebsite.com/a.txt and read it as an InputStream in Java, through a servlet.

I've tried

InputStream is = new FileInputStream("wwww.somewebsite.com/a.txt");

but what I got was an error:

java.io.FileNotFoundException
user207421
  • 289,834
  • 37
  • 266
  • 440
Whitebear
  • 1,563
  • 2
  • 10
  • 20
  • 1
    Why did you rollback the removal of the `servlets` tag? There is no `javax.servlet.*` API involved here. You would have exactly the same problem when doing so in a plain vanilla Java class with a `main()` method. – BalusC Aug 03 '11 at 20:14
  • 1
    Perhaps you should familiarize yourself with what a URL is: http://docs.oracle.com/javase/tutorial/networking/urls/definition.html – b1nary.atr0phy Jul 27 '13 at 04:18

6 Answers6

244

Use java.net.URL#openStream() with a proper URL (including the protocol!). E.g.

InputStream input = new URL("http://www.somewebsite.com/a.txt").openStream();
// ...

See also:

Community
  • 1
  • 1
BalusC
  • 992,635
  • 352
  • 3,478
  • 3,452
  • 2
    Do you know if this makes a network request on each read of the InputStream or whether it reads the entire file at once so it doesn't have to make network requests on reads? – gsingh2011 Jan 05 '14 at 23:08
  • Calling this method in UI thread in Android will raise an exception. Do it in a background thread. Use [Bolts-Android](https://github.com/BoltsFramework/Bolts-Android) – Behrouz.M Mar 06 '19 at 10:16
20

Try:

final InputStream is = new URL("http://wwww.somewebsite.com/a.txt").openStream();
whiskeysierra
  • 4,761
  • 1
  • 25
  • 36
12

(a) wwww.somewebsite.com/a.txt isn't a 'file URL'. It isn't a URL at all. If you put http:// on the front of it it would be an HTTP URL, which is clearly what you intend here.

(b) FileInputStream is for files, not URLs.

(c) The way to get an input stream from any URL is via URL.openStream(), or URL.getConnection().getInputStream(), which is equivalent but you might have other reasons to get the URLConnection and play with it first.

user207421
  • 289,834
  • 37
  • 266
  • 440
4

Your original code uses FileInputStream, which is for accessing file system hosted files.

The constructor you used will attempt to locate a file named a.txt in the www.somewebsite.com subfolder of the current working directory (the value of system property user.dir). The name you provide is resolved to a file using the File class.

URL objects are the generic way to solve this. You can use URLs to access local files but also network hosted resources. The URL class supports the file:// protocol besides http:// or https:// so you're good to go.

user207421
  • 289,834
  • 37
  • 266
  • 440
Cristian Botiza
  • 411
  • 4
  • 9
2

Pure Java:

 urlToInputStream(url,httpHeaders);

With some success I use this method. It handles redirects and one can pass a variable number of HTTP headers asMap<String,String>. It also allows redirects from HTTP to HTTPS.

private InputStream urlToInputStream(URL url, Map<String, String> args) {
    HttpURLConnection con = null;
    InputStream inputStream = null;
    try {
        con = (HttpURLConnection) url.openConnection();
        con.setConnectTimeout(15000);
        con.setReadTimeout(15000);
        if (args != null) {
            for (Entry<String, String> e : args.entrySet()) {
                con.setRequestProperty(e.getKey(), e.getValue());
            }
        }
        con.connect();
        int responseCode = con.getResponseCode();
        /* By default the connection will follow redirects. The following
         * block is only entered if the implementation of HttpURLConnection
         * does not perform the redirect. The exact behavior depends to 
         * the actual implementation (e.g. sun.net).
         * !!! Attention: This block allows the connection to 
         * switch protocols (e.g. HTTP to HTTPS), which is <b>not</b> 
         * default behavior. See: https://stackoverflow.com/questions/1884230 
         * for more info!!!
         */
        if (responseCode < 400 && responseCode > 299) {
            String redirectUrl = con.getHeaderField("Location");
            try {
                URL newUrl = new URL(redirectUrl);
                return urlToInputStream(newUrl, args);
            } catch (MalformedURLException e) {
                URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
                return urlToInputStream(newUrl, args);
            }
        }
        /*!!!!!*/

        inputStream = con.getInputStream();
        return inputStream;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Full example call

private InputStream getInputStreamFromUrl(URL url, String user, String passwd) throws IOException {
        String encoded = Base64.getEncoder().encodeToString((user + ":" + passwd).getBytes(StandardCharsets.UTF_8));
        Map<String,String> httpHeaders=new Map<>();
        httpHeaders.put("Accept", "application/json");
        httpHeaders.put("User-Agent", "myApplication");
        httpHeaders.put("Authorization", "Basic " + encoded);
        return urlToInputStream(url,httpHeaders);
    }
jschnasse
  • 5,082
  • 1
  • 19
  • 48
  • `HttpURLConnection` will already follow redirects unless you tell it not to, which you haven't. – user207421 May 01 '18 at 01:42
  • 1
    I know OP didn't mention headers but I appreciate the succinct (well, considering it's Java) example. – chbrown May 02 '18 at 00:53
  • @EJP I added some explanation as inline comment. I think, I mainly introduced the redirect block for the case when HTTP 301 redirectes a HTTP address to a HTTPS address. Of course, this goes beyond the original question, but is a common use case that is not handled by default implementation. See: https://stackoverflow.com/questions/1884230/urlconnection-doesnt-follow-redirect – jschnasse May 02 '18 at 15:25
  • Your code works equally well without the redirect block, as `HttpURLConnection` already follows redirects by default, as I already stated. – user207421 Dec 14 '18 at 12:05
  • @user207421 This is partly correct. The redirect block is for protocol switches like http->https which is not supported by default. I tried to express that in the in-code comment. See https://stackoverflow.com/questions/1884230/urlconnection-doesnt-follow-redirect . – jschnasse Dec 14 '18 at 12:07
-1

Here is a full example which reads the contents of the given web page. The web page is read from an HTML form. We use standard InputStream classes, but it could be done more easily with JSoup library.

<dependency>
    <groupId>javax.servlet</groupId>
    <artifactId>javax.servlet-api</artifactId>
    <version>3.1.0</version>
    <scope>provided</scope>

</dependency>

<dependency>
    <groupId>commons-validator</groupId>
    <artifactId>commons-validator</artifactId>
    <version>1.6</version>
</dependency>  

These are the Maven dependencies. We use Apache Commons library to validate URL strings.

package com.zetcode.web;

import com.zetcode.service.WebPageReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import javax.servlet.ServletException;
import javax.servlet.ServletOutputStream;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@WebServlet(name = "ReadWebPage", urlPatterns = {"/ReadWebPage"})
public class ReadWebpage extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {

        response.setContentType("text/plain;charset=UTF-8");

        String page = request.getParameter("webpage");

        String content = new WebPageReader().setWebPageName(page).getWebPageContent();

        ServletOutputStream os = response.getOutputStream();
        os.write(content.getBytes(StandardCharsets.UTF_8));
    }
}

The ReadWebPage servlet reads the contents of the given web page and sends it back to the client in plain text format. The task of reading the page is delegated to WebPageReader.

package com.zetcode.service;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.stream.Collectors;
import org.apache.commons.validator.routines.UrlValidator;

public class WebPageReader {

    private String webpage;
    private String content;

    public WebPageReader setWebPageName(String name) {

        webpage = name;
        return this;
    }

    public String getWebPageContent() {

        try {

            boolean valid = validateUrl(webpage);

            if (!valid) {

                content = "Invalid URL; use http(s)://www.example.com format";
                return content;
            }

            URL url = new URL(webpage);

            try (InputStream is = url.openStream();
                    BufferedReader br = new BufferedReader(
                            new InputStreamReader(is, StandardCharsets.UTF_8))) {

                content = br.lines().collect(
                      Collectors.joining(System.lineSeparator()));
            }

        } catch (IOException ex) {

            content = String.format("Cannot read webpage %s", ex);
            Logger.getLogger(WebPageReader.class.getName()).log(Level.SEVERE, null, ex);
        }

        return content;
    }

    private boolean validateUrl(String webpage) {

        UrlValidator urlValidator = new UrlValidator();

        return urlValidator.isValid(webpage);
    }
}

WebPageReader validates the URL and reads the contents of the web page. It returns a string containing the HTML code of the page.

<!DOCTYPE html>
<html>
    <head>
        <title>Home page</title>
        <meta charset="UTF-8">
    </head>
    <body>
        <form action="ReadWebPage">

            <label for="page">Enter a web page name:</label>
            <input  type="text" id="page" name="webpage">

            <button type="submit">Submit</button>

        </form>
    </body>
</html>

Finally, this is the home page containing the HTML form. This is taken from my tutorial about this topic.

Jan Bodnar
  • 8,285
  • 5
  • 54
  • 62