0

I created a java function to open a file in HDFS. The function is used only the API HDFS. I do not use any Hadoop dependencies in my code. My function worked well:

public static openFile()
    {
        System.out.print("main for testing the Hdfs WEB API");
        URL url = new URL("http://URI/webhdfs/v1/PATH_TO_File?op=OPEN");

        try {
                HttpURLConnection con = (HttpURLConnection) url.openConnection() ;
                con.setRequestMethod("GET");
                con.setDoInput(true);
                InputStream in = con.getInputStream();
                int ch;
                while((ch=in.read())!=-1)
                {
                    System.out.print((char) ch);
                }
                
            } catch (IOException e) {
                
                e.printStackTrace();
            }
    }

I'm doing a new function to return a List of Files in HDFS. The second function is:

 public static ListFile()
        {
            System.out.print("main for testing the Hdfs WEB API");
            URL url = new URL("http://URI/webhdfs/v1/PATH_TO_File?op=LISTSTATUS");
    
            try {
                    HttpURLConnection con = (HttpURLConnection) url.openConnection() ;
                    con.setRequestMethod("GET");
                    con.setDoInput(true);
                    InputStream in = con.getInputStream();
                    
                    logger.info("list is '{}' ", url.openStream());
                    
                } catch (IOException e) {
                    
                    e.printStackTrace();
                }
        }

Could you please help me, how can I return the list of the files in HDFS using the stream to get the response using a scanner ? Knowing that the URLs worked well when I run them in the browser. Thanks in advance

Isabelle
  • 121
  • 6
  • `url.openStream()` returns a stream. You need to read the stream to get the response using a scanner, for example - https://stackoverflow.com/a/2793153/2308683 – OneCricketeer Feb 23 '21 at 20:24
  • If you've made updates based on my comment, [edit] your question to include it – OneCricketeer Feb 24 '21 at 14:38
  • I literally linked you to the solution. Explain what problem you're having using `new Scanner(url.openStream())` or repeating `while((ch=in.read())!=-1)` from the first code block here – OneCricketeer Feb 24 '21 at 14:52
  • You need to actually parse the json response to get only the list of files ... https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#List_a_Directory None of the code you've shown here uses hadoop dependencies – OneCricketeer Feb 24 '21 at 15:30

1 Answers1

1

You can use the exact same logic as the first solution, but this time, use a StringBuilder to get the full response which you then need to parse using a JSON library.

InputStream in = con.getInputStream();
int ch;
StringBuilder sb = new StringBuilder();
while((ch=in.read())!=-1) {
    sb.append((char) ch);
}
String response = sb.toString();
// TODO: parse response string 

Note: libraries like Retrofit / Gson would make this more straightforward

OneCricketeer
  • 126,858
  • 14
  • 92
  • 185