1

I have a scenario create like checkin & checkout of documents. Then how to manage and check that if a user checking in a document, is a new checkin or an existing checkin of document. on what file attributes we can differentiate this. I do not want to use lastModifiedTime, Size, or name of the file. Please let me know. Thanks..

Shubhankar
  • 77
  • 2
  • 5
  • 15
  • 2
    why not hash the file contents, that's the approach that version control systems such as Git take – codebox Aug 18 '15 at 07:29

2 Answers2

1

when i had such a thing to do, i tried with MD5 hash (in perl). i think this might help: How can I generate an MD5 hash?

Community
  • 1
  • 1
  • Thanks for MD5 hash technique, but if some changes has been made then how to identify that user modified the same file? – Shubhankar Aug 18 '15 at 10:28
  • you want to find out "which user" modified it? or "the file with different MD5" is the same one or a different file altogether? – Nauduri Venkata Ravi Rama Sast Aug 18 '15 at 10:56
  • Yes Both, which user and "the file with different MD5" is the same one or a different file altogether. My aim is to: I have to download a file, and then modify it and upload the same file, but here while uploading how to check that i am uploading the same downloaded file with some modified content – Shubhankar Aug 18 '15 at 11:02
  • yes, the first thing that strikes to me is that you might need something like **inode** , but you are saying **downloading a file** so it might be a varied environment, I think this can help : [http://stackoverflow.com/questions/7162164/does-windows-have-inode-numbers-like-linux](http://stackoverflow.com/questions/7162164/does-windows-have-inode-numbers-like-linux). – Nauduri Venkata Ravi Rama Sast Aug 18 '15 at 11:12
  • I didn't got anything related to me by this link, but thanks for that. Please give something with example to meet my scenario. – Shubhankar Aug 18 '15 at 11:20
  • see if your file is getting downloaded on a unix/linux system it will have a unique **inode** no assigned to it. so you can tell if the inode of the downloaded and uploaded file is same or not. however if you are downloading the file onto a windows machine then a **fileid** will be generated with it and so you get that and check it at the upload time. These uniqueids will persist even if the name changes...well thats the idea anyways, sorry but working code will take some time. – Nauduri Venkata Ravi Rama Sast Aug 18 '15 at 11:30
  • which field & How to get that field using java? – Shubhankar Aug 18 '15 at 11:33
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/87336/discussion-between-shubhankar-and-nauduri-venkata-ravi-rama-sast). – Shubhankar Aug 19 '15 at 05:06
0

The md5 file encode will help you with this. This is a working example of how to get the MD5 value.

public static void main(String[] args)throws Exception
{
    MessageDigest md = MessageDigest.getInstance("MD5");
    FileInputStream fis = new FileInputStream("c:\\loging.log");

    byte[] dataBytes = new byte[1024];

    int nread = 0; 
    while ((nread = fis.read(dataBytes)) != -1) {
      md.update(dataBytes, 0, nread);
    };
    byte[] mdbytes = md.digest();

    //convert the byte to hex format method 1
    StringBuffer sb = new StringBuffer();
    for (int i = 0; i < mdbytes.length; i++) {
      sb.append(Integer.toString((mdbytes[i] & 0xff) + 0x100, 16).substring(1));
    }

    System.out.println("Digest(in hex format):: " + sb.toString());

    //convert the byte to hex format method 2
    StringBuffer hexString = new StringBuffer();
    for (int i=0;i<mdbytes.length;i++) {
        String hex=Integer.toHexString(0xff & mdbytes[i]);
        if(hex.length()==1) hexString.append('0');
        hexString.append(hex);
    }
    System.out.println("Digest(in hex format):: " + hexString.toString());
}

Output:

Digest(in hex format):: e72c504dc16c8fcd2fe8c74bb492affa
Digest(in hex format):: e72c504dc16c8fcd2fe8c74bb492affa

What you have to do is to compare the old MD5 value with the new one and if it corresponds, no changes have been made to the file

IlGala
  • 3,043
  • 3
  • 31
  • 46
  • but if some changes has been made to the file then how to identify that user modified the same file? becoz I running MD5 on the same file I will be getting different HEX Code – Shubhankar Aug 18 '15 at 10:34
  • I don't understand... If I have file with MD5 value a1a1a1a1 and the user changes the file, then the MD5 value will change aswell, for example to a2a2a2a2. Since MD5 old value (a1a1a1a1) is different to MD5 new value (a2a2a2a2) i know that someone did some changes – IlGala Aug 18 '15 at 10:43
  • Exactly Correct. But if I passed some other file to the MD5 hashing instead of the file in which I made changes, then how to identify that this file I am passing is not that file in which i made changes? – Shubhankar Aug 18 '15 at 10:50
  • Well you have to identify somehow the files... With a specific folder, or the filename or a reference in a database – IlGala Aug 18 '15 at 10:55
  • My aim is to: I have to download a file, and then modify it and upload the same file, but here while uploading how to check that i am uploading the same downloaded file with some modified content – Shubhankar Aug 18 '15 at 10:59
  • A list where you select which file have you downloaded? – IlGala Aug 18 '15 at 11:01
  • which user modified and "the file with different MD5" is the same one or a different file altogether – Shubhankar Aug 18 '15 at 11:06
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/87268/discussion-between-shubhankar-and-ilgala). – Shubhankar Aug 18 '15 at 12:14
  • Hi Please help me on this "which user modified and "the file with different MD5" is the same one or a different file altogether" – Shubhankar Aug 19 '15 at 05:06