In the end, my ultimate goals are:
- Read from a URL (what this question is about)
- Save the retrieved [PDF] content to a BLOB field in a DB (already have that nailed down)
- Read from the BLOB field and attach that content to an email
- All without going to a filesystem
The goal with the following method is to get a byte[]
that can be used downstream as an email attachment (to avoid writing to disk):
public byte[] retrievePDF() {
HttpClient httpClient = new HttpClient();
GetMethod httpGet = new GetMethod("http://website/document.pdf");
httpClient.executeMethod(httpGet);
InputStream is = httpGet.getResponseBodyAsStream();
byte[] byteArray = new byte[(int) httpGet.getResponseContentLength()];
is.read(byteArray, 0, byteArray.length);
return byteArray;
}
For a particular PDF, the getResponseContentLength()
method returns 101,689 as the length. The strange part is that if I set a break-point and interrogate the byteArray
variable, it has 101,689 byte elements, however, after byte #3744 the remaining bytes of the array are all zeroes (0
). The resulting PDF is then not readable by a PDF-reader client, like Adobe Reader.
Why would that happen?
Retrieving this same PDF via browser and saving to disk, or using a method like the following (which I patterned after an answer to this StackOverflow post), results in a readable PDF:
public void retrievePDF() {
FileOutputStream fos = null;
URL url;
ReadableByteChannel rbc = null;
url = new URL("http://website/document.pdf");
DataSource urlDataSource = new URLDataSource(url);
/* Open a connection, then set appropriate time-out values */
URLConnection conn = url.openConnection();
conn.setConnectTimeout(120000);
conn.setReadTimeout(120000);
rbc = Channels.newChannel(conn.getInputStream());
String filePath = "C:\\temp\\";
String fileName = "testing1234.pdf";
String tempFileName = filePath + fileName;
fos = new FileOutputStream(tempFileName);
fos.getChannel().transferFrom(rbc, 0, 1 << 24);
fos.flush();
/* Clean-up everything */
fos.close();
rbc.close();
}
For both approaches, the size of the resulting PDF is 101,689-bytes when doing a Right-click > Properties... in Windows.
Why would the byte array essentially "stop" part-way through?