How to match two paths pointing to the same file?

Question

I have two lists containing paths to a directory of music files and I want to determine which of these files are stored on both lists and which are only stored on one. The problem lies in that the format of the paths differ between the two lists.

Format example:

List1: file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3
List2: \\FILE\Musik\30 Seconds To Mars.mp3

How do I go about comparing these two file paths and matching them to the same source?

Try FileInfo.FullPath, but I'm not sure it will resolve ambiguities... — alxx, Jul 13 '12 at 06:35
@SwanandPurankar: as I understand, this question is about detecting equal file paths, not content. — alxx, Jul 13 '12 at 06:37
You are sure those example paths are realistic? Especially "List 2" would mean that you have a share "Musik" on a computer named "FILE" which happens to be the local computer. Although technically valid, it looks rather strange. List2 as "\\localhost\FILE\Musik\..." would make more sense. — Christian.K, Jul 13 '12 at 06:46
@Christian.K: I am quite sure. All files are located on an external server which I can access through both of these path formats. — Laudable Bauble, Jul 13 '12 at 07:04
If they are on an external server, then `file://servername/FILE/Musik/...` and `\\servername\Musik\...` or `\\servername\FILE\Musik\...` would have been more sensible, YMMV of course :-) — Christian.K, Jul 13 '12 at 07:07
possible duplicate of [Best way to determine if two path reference to same file in C#](http://stackoverflow.com/questions/410705/best-way-to-determine-if-two-path-reference-to-same-file-in-c-sharp) — nawfal, Dec 31 '13 at 13:06

score 1 · Answer 1 · answered Jul 13 '12 at 06:57

The answer depends on your notion of "same file". If you merely want to check if the file is equal, but not the very same file, you could simply generate a hash over the file's content and compare that. If the hashes are equal (please use a strong hash, like SHA-256), you can be confident that the files are also. Likewise you could of course also compare the files byte by byte.

If you really want to figure that the two files are actually the same file, i.e. just addressed via different means (like file-URL or UNC path), you have a little more work to do.

First you need to find out the true file system path for each of the addresses. For example, you need to find the file system path behind the UNC path and/or file-URL (which typically is the URL itself). In the case of UNC paths, that are shares on a remote computer, you might even be able to do so.

Also, even if you have the local path figured out somehow, you also need to deal with different redirection mechanisms for local paths (on Windows junctions/reparse points/links; on UNIX symbolic or hard links). For example, you could have a share using file system link as source, while the file URL uses the true source path. So to the casual observer they still look like different files.

Having all that said, the "algorithm" would be something like this:

Figure out the source path for the URLs, UNC paths/shares, etc. you have
Figure out the local source path from those paths (considering links/junctions, subst.exe, etc.)
Normalize those paths, if necessary (i.e. a/b/../c is actually a/c)
Compare the resulting paths.

score 0 · Answer 2 · answered Jul 13 '12 at 06:40

0

I think the best way to do it is by temporarily converting one of the paths to the other one's format. I would suggest you change the first to match the second.

string List1 = "file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3" 
string List2 = "\\FILE\Musik\30 Seconds To Mars.mp3"

I would recommend you use Replace()-method.

Get rid of "file://localhost":

var tempStr = List1.Replace("file://localhost", "");

Change all '%20' into spaces:
```
tempStr = List1.Replace("%20", " ");
```
Change all '/' into '\':
```
tempStr = List1.Replace("/", "\");
```

Voilà! To strings in matching format!

answered Jul 13 '12 at 06:40

オスカー

1,399
4
16
37

This will only compare strings, not paths. How do you know you can rid of `file://localhost`? Doing so implies that the `\\FILE\Musik` share is actually on the local computer. – Christian.K Jul 13 '12 at 06:44
I think you should look at how you are creating the lists and work out a reliable consistent method. As the way I see it, is that \\FILE is on the server called FILE? Or at least accordingly to UNC paths format. – Phil Jul 13 '12 at 07:00
Manipulating the strings directly is certainly a workable alternative, however I had hoped that there would be a simpler and cleaner way to do it. – Laudable Bauble Jul 13 '12 at 07:06
@PhilCartmell: Yes, \\FILE is a server. List1 is basically a copy of iTunes library.xml file which uses the localhost format. The format in List2 is the default specified when I map out a directory through Directory.EnumerateFiles(), so I suppose could do something different with List2. – Laudable Bauble Jul 13 '12 at 07:17
Hi - did you manage to test my code? Just wondering whether it worked/helped? – Phil Jul 16 '12 at 08:50
@PhilCartmell: Yep, I tested your code and I have currently built my solution upon string and regex operations much alike what you proposed. Thank you. I am still hoping to find a solution that does not involve direct string manipulation though. – Laudable Bauble Jul 17 '12 at 02:33

score 0 · Answer 3 · answered Jul 13 '12 at 06:57

Use python: you can easily compare the two files like this

    >>> import filecmp
    >>> filecmp.cmp('file1.txt', 'file1.txt')
    True
    >>> filecmp.cmp('file1.txt', 'file2.txt')
    False

to open the files with the file:// syntax use URLLIB

    >>> import urllib
    >>> file1 = urllib.urlopen('file://localhost/tmp/test')

for the normal files path use the standard file open.

    >>> file2 = open('/pathtofile','r')

score 0 · Answer 4 · answered Jul 13 '12 at 07:17

I agree completely with Christian, you should re-think structure of the lists, but the below should get you going.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication5
{
    class Program
    {
        public static List<string> SanitiseList(List<string> list)
        {

            List<string> sanitisedList = new List<string>();

            foreach (string filename in list)
            {
                String sanitisedFilename = String.Empty;

                if (!String.IsNullOrEmpty(filename))
                {
                    sanitisedFilename = filename;

                    // get rid of the encoding
                    sanitisedFilename = Uri.UnescapeDataString(sanitisedFilename);

                    // first of all change all back-slahses to forward slashes
                    sanitisedFilename = sanitisedFilename.Replace(@"\", @"/");

                    // if we have two back-slashes at the beginning assume its localhsot
                    if (sanitisedFilename.Substring(0, 2) == "//")
                    {
                        // remove these first double slashes and stick in localhost

                        sanitisedFilename = sanitisedFilename.TrimStart('/');
                        sanitisedFilename = sanitisedFilename = "//localhost" + "/" + sanitisedFilename;
                    }

                    // remove file
                    sanitisedFilename = sanitisedFilename.Replace(@"file://", "//");

                    // remove double back-slashes
                    sanitisedFilename = sanitisedFilename.Replace("\\", @"\");

                    // remove double forward-slashes (but not the first two)
                    sanitisedFilename = sanitisedFilename.Substring(0,2) + sanitisedFilename.Substring(2, sanitisedFilename.Length - 2).Replace("//", @"/");

                }

                if (!String.IsNullOrEmpty(sanitisedFilename))
                {
                    sanitisedList.Add(sanitisedFilename);
                }
            }

            return sanitisedList;
        }

        static void Main(string[] args)
        {

            List<string> listA = new List<string>();
            List<string> listB = new List<string>();

            listA.Add("file://localhost//FILE/Musik/BritneySpears.mp3");
            listA.Add("file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3");
            listB.Add("file://localhost//FILE/Musik/120%20Seconds%20To%20Mars.mp3");

            listB.Add(@"\\FILE\Musik\30 Seconds To Mars.mp3");
            listB.Add(@"\\FILE\Musik\5 Seconds To Mars.mp3");

            listA = SanitiseList(listA);
            listB = SanitiseList(listB);

            List<string> missingFromA = listB.Except(listA).ToList();
            List<string> missingFromB = listA.Except(listB).ToList();

        }
    }
}

How to match two paths pointing to the same file?

4 Answers4