1

I want to know if there is a way to convert fast a whole list of string into a one unique sha512 hash string.

For now I use this method for get a unique sha512 hash, but this way become slower and slower when the list have more and more string.

string hashDataList = string.Empty;

for (int i = 0; i < ListOfElement.Count; i++)
{
    if (i < ListOfElement.Count)
    {
        hashDataList += ListOfElement[i];
    }
}

hashDataList = MakeHash(HashDataList);

Console.WriteLine("Hash: "+hashDataList);

Edit:

Method for make the hash:

    public static string MakeHash(string str)
    {
        using (var hash = SHA512.Create())
        {
            var bytes = Encoding.UTF8.GetBytes(str);
            var hashedInputBytes = hash.ComputeHash(bytes);

            var hashedInputStringBuilder = new StringBuilder(128);
            foreach (var b in hashedInputBytes)
                hashedInputStringBuilder.Append(b.ToString("X2"));

            str = hashedInputStringBuilder.ToString();
            hashedInputStringBuilder.Clear();
            GC.SuppressFinalize(bytes);
            GC.SuppressFinalize(hashedInputBytes);
            GC.SuppressFinalize(hashedInputStringBuilder);
            return str;
        }
    }
Heavenly
  • 11
  • 1
  • 7
  • 2
    What is the point of that inner `if` statement? It is never false. –  Dec 10 '18 at 14:36
  • 1
    Depends if `"A", "BC"` should be considered the same as `"AB", "C"`. – Alex K. Dec 10 '18 at 14:36
  • Where is the time being spent? You could try using a `StringBuilder` instead of using `+=` but without knowing what `MakeHash` does that's a pure guess. – D Stanley Dec 10 '18 at 14:37
  • Not enough information here. How much of the list changes between each time (since you say it becomes slower and slower I assume you execute this code multiple times). Have you measured your code using a profiler to figure out where the bottleneck is? If not, then go do that first, everything else is going to be based on assumptions. – Lasse V. Karlsen Dec 10 '18 at 14:37
  • You also have two very similar variables (`HashDataList` and `hashDataList`) which is confusing (or possibly a bug?) – D Stanley Dec 10 '18 at 14:38
  • The way you do it now is irreversible because you concatenate the strings without any delimiter. Deliberately so? You could also use `string.join(",", ListOfElement)` to create a single comma-separated string from your list and hash that... – LocEngineer Dec 10 '18 at 14:40
  • 2
    `string hashDataList = MakeHash(string.Concat(ListOfElement));` – Rufus L Dec 10 '18 at 14:51
  • I have edit my post for show the hash method. – Heavenly Dec 10 '18 at 14:51
  • 1
    @Heavenly Why all the GC shenanigans? – D Stanley Dec 10 '18 at 14:55
  • How many calls are made to `MakeHash` within the service lifetime? If may be worth moving `hash` to be assigned in the static constructor and held as a field, so it only needs to be created once / is disposed with the app. That avoids some overhead per call, though means it sits in memory when not in use. – JohnLBevan Dec 10 '18 at 15:18
  • After each new data to provide on the network, the hash need to be made. Each host need to have this unique hash for be listed on the network. Once they are listed they can share the same data to other users. So it's depend the activity of the network. – Heavenly Dec 10 '18 at 16:45

2 Answers2

3

Try this, using built-in SHA512:

StringBuilder sb = new StringBuilder();

foreach(string s in ListOfElement) 
{
    sb.Append(s);
}

hashDataList = BitConverter.ToString   (new System.Security.Cryptography.SHA512CryptoServiceProvider()
                           .ComputeHash(Encoding.UTF8.GetBytes(sb.ToString()))).Replace("-", String.Empty).ToUpper();

Console.WriteLine("Hash: "+hashDataList);

Performance depends a lot on MakeHash() implementation as well.

farbiondriven
  • 2,239
  • 2
  • 14
  • 25
  • Thank you I'm going to test that, I have edit my post for show the hash method. – Heavenly Dec 10 '18 at 14:52
  • @Heavenly - Note that while this is on the right track (and so I up-voted it), it does have a bug that makes it potentially non-unique. See the first part of my answer for details – Mark Adelsberger Dec 10 '18 at 15:05
  • Don't use Encoding.Default. Here it's especially bad because now the hash depends on the OS regional settings of the user. – ckuri Dec 10 '18 at 17:04
1

I think the problem might be a bit misstated here. First from a performance standpoint:

Any method of hashing a list of strings will take longer as the number (and length) of the strings increases. The only way to avoid this would be to ignore some of the data in (at least some of) the strings, and then you lose the assurances that a hash should give you.

So you can try to make the whole thing faster, so that you can process more (and/or longer) strings in an acceptable time frame. Without knowing the performance characteristics of the hashing function, we can't say if that's possible; but as farbiondriven's answer suggests, about the only plausible strategy is to assemble a single string and hash that once.

The potential objection to this, I suppose, would be: does it affect the uniqueness of the hash. There are two factors to consider:

First, if you just concatenate all the strings together, then you would get the same output hash for

["element one and ", "element two"]

as for

["element one ", "and element two"]

because the concatenated data is the same. One way to correct this is to insert each string's length before the string (with a delimiter to show the end of the length). For example you could build

"16:element one and 11:element two"

for the first array above, and

"12:element one 15:and element two"

for the second.

The other possible concern (though it isn't really valid) could arise if the individual strings are never longer than a single SHA512 hash, but the total amount of data in the array is. In that case, your method (hashing each string and concatenating them) might seem safer, because whenever you has data that's longer than the actual hash, it's mathematically possible for a hash collision to occur. But as I say, this concern is not valid for at least one, and possibly two reasons.

The biggest reason is: hash collisions in a 512-bit hash are ridiculously unlikely. Even though the math says it could happen, it is beyond safe to assume that it literally never will. If you're going to worry about a hash collision at that level, you might as well also worry about your data being spontaneously corrupted due to RAM errors that occur in just such a pattern as to avoid detection. At that level of improbability, you simply can't program around a vast number of catastrophic things that "could" (but won't) happen, and you really might as well count hash collisions among them.

The second reason is: if you're paranoid enough not to buy the first reason, then how can you be sure that hashing shorter strings guarantees uniqueness?

What concatenating a hash per string does do if the individual strings are less than 512 bits, is it means that the hash ends up being longer than the source data - which defeats the typical purposes of a hash. If that's acceptable, then you probably want an encryption algorithm instead of a hash.

Mark Adelsberger
  • 32,904
  • 2
  • 24
  • 41
  • I want to make hash for accept only other host who provide the same hash, if they have the whole data than they are listed on my network. For provide also this data to others users. – Heavenly Dec 10 '18 at 15:28
  • @Heavenly - I would read up on how to properly use crypto for authentication, because this doesn't sound like it to me. – Mark Adelsberger Dec 11 '18 at 22:09
  • The problem with authentification if someone edit his program he can edit datas and share this data edited for spam users :/ – Heavenly Dec 12 '18 at 20:02