0

I want to compare two files, one is in file system and the other is being downloaded from a HTTP URL.

We have tried to compare by byte[] arrays (we used HTTPRequestBuilder by Apache), but the concern is that the files may be too large and they may exhaust the memory. Do we have any good alternates.

Ahmad
  • 1,960
  • 3
  • 21
  • 33
  • Do you want to compare the files merely to see if they're equal or get the actual differences between them? – hd1 Dec 11 '12 at 15:10

1 Answers1

2

You can compare the contents from two InputStream objects by reading just a buffer at a time. You'll need to read data as and when you "run out" from each stream, noting that you when you call read you may not end up actually reading a full buffer.

The two streams are equal if each byte-by-byte comparison from the buffers is equal and the streams run out of data at the same time. I suspect the code may be slightly fiddly, but it shouldn't be too bad.

In fact, for simpler code, if you wrap each InputStream in a BufferedInputStream, you could probably just compare byte-by-byte (calling the parameterless read() method on each iteration) without losing too much performance:

public boolean equals(InputStream x, InputStream y)
{
    // TODO: Only wrap them if they're not already buffered
    x = new BufferedInputStream(x);
    y = new BufferedInputStream(y);

    while (true)
    {
        int xValue = x.read();
        int yValue = y.read();
        if (xValue != yValue)
        {
            return false;
        }
        if (xValue == -1)
        {
            // Reached the end of both streams at the same time
            return true;
        }
    }
}
Jon Skeet
  • 1,261,211
  • 792
  • 8,724
  • 8,929
  • thanks Jon, but if suppose "InputStream x" is coming from http call, do we have any risk of losing connection or anything..?.. – Ahmad Dec 11 '12 at 13:21
  • @Ahmad: There's always the risk that the connection will drop, sure... and you'll get an exception. I can't see how you could avoid that. – Jon Skeet Dec 11 '12 at 13:22
  • @Jon...that was my concern Jon..I think if we take the whole fle in byte array the risk will be lowest n that's what I am doing.. – Ahmad Dec 11 '12 at 15:01