13

UPDATE

Following Mr Cheese's answer, it seems that the

public static string Join<T>(string separator, IEnumerable<T> values)

overload of string.Join gets its advantage from the use of the StringBuilderCache class.

Does anybody have any feedback on the correctness or reason of this statement?

Could I write my own,

public static string Join<T>(
    string separator,
    string prefix,
    string suffix,
    IEnumerable<T> values)

function that uses the StringBuilderCache class?


After submitting my answer to this question I got drawn into some analysis of which would be the best performing answer.

I wrote this code, in a console Program class to test my ideas.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;

class Program
{
    static void Main()
    {
        const string delimiter = ",";
        const string prefix = "[";
        const string suffix = "]";
        const int iterations = 1000000;

        var sequence = Enumerable.Range(1, 10).ToList();

        Func<IEnumerable<int>, string, string, string, string>[] joiners =
            {
                Build,
                JoinFormat,
                JoinConcat
            };

        // Warmup
        foreach (var j in joiners)
        {
            Measure(j, sequence, delimiter, prefix, suffix, 5);
        }

        // Check
        foreach (var j in joiners)
        {
            Console.WriteLine(
                "{0} output:\"{1}\"",
                j.Method.Name,
                j(sequence, delimiter, prefix, suffix));
        }

        foreach (var result in joiners.Select(j => new
                {
                    j.Method.Name,
                    Ms = Measure(
                        j,
                        sequence,
                        delimiter,
                        prefix,
                        suffix,
                        iterations)
                }))
        {
            Console.WriteLine("{0} time = {1}ms", result.Name, result.Ms);
        }

        Console.ReadKey();
    }

    private static long Measure<T>(
        Func<IEnumerable<T>, string, string, string, string> func,
        ICollection<T> source,
        string delimiter,
        string prefix,
        string suffix,
        int iterations)
    {
        var stopwatch = new Stopwatch();

        stopwatch.Start();
        for (var i = 0; i < iterations; i++)
        {
            func(source, delimiter, prefix, suffix);
        }

        stopwatch.Stop();

        return stopwatch.ElapsedMilliseconds;
    }

    private static string JoinFormat<T>(
        IEnumerable<T> source,
        string delimiter,
        string prefix,
        string suffix)
    {
        return string.Format(
            "{0}{1}{2}",
            prefix,
            string.Join(delimiter, source),
            suffix);
    }

    private static string JoinConcat<T>(
        IEnumerable<T> source,
        string delimiter,
        string prefix,
        string suffix)
    {
        return string.Concat(
            prefix,
            string.Join(delimiter, source),
            suffix);
    }

    private static string Build<T>(
        IEnumerable<T> source,
        string delimiter,
        string prefix,
        string suffix)
    {
        var builder = new StringBuilder();
        builder = builder.Append(prefix);

        using (var e = source.GetEnumerator())
        {
            if (e.MoveNext())
            {
                builder.Append(e.Current);
            }

            while (e.MoveNext())
            {
                builder.Append(delimiter);
                builder.Append(e.Current);
            }
        }

        builder.Append(suffix);
        return builder.ToString();
    }
}

running the code, in release configuration, built with optimizations, from the command line I get output like this.

...

Build time = 1555ms

JoinFormat time = 1715ms

JoinConcat time = 1452ms

The only suprise here (to me) is that the Join-Format combination is the slowest. After considering this answer, this makes a little more sense, the output of the string.Join is being processed by the outer StringBuilder in string.Format, there is an inherent delay with this approach.

After musing, I don't clearly understand how string.Join can be faster. I've read about its use of FastAllocateString() but I don't understand how the buffer can be accurately pre-allocated without calling .ToString() on every member of sequence. Why is the Join-Concat combination faster?

Once I understand that, would it be possible to write my own unsafe string Join function, that takes the extra prefix and suffix parameters and out performs the "safe" alternatives.

I've had several attempts and whilst they work, they are not faster.

Community
  • 1
  • 1
Jodrell
  • 31,518
  • 3
  • 75
  • 114
  • Note that using `string.Concat` is equivalent to just using the `+` operator: `prefix + string.Join(delimiter, source) + suffix`. – Jon Skeet Nov 16 '12 at 15:33
  • @JonSkeet, that was born out by my testing but I've omitted the code for brevity. – Jodrell Nov 16 '12 at 15:34
  • (Also, while it's *great* that you gave us very-nearly-complete code, if you'd included the class declaration and using directives, it would have been even better...) – Jon Skeet Nov 16 '12 at 15:35
  • 1
    I would also call the `func(source, delimiter, prefix, suffix);` once, before starting the stop watch. (to avoid JIT issues) – L.B Nov 16 '12 at 15:40
  • @JonSkeet edited as requested. The code is transcribed so apologies for any errata. – Jodrell Nov 16 '12 at 15:45
  • @L.B that is the purpose of the `\\Warmup` in `Main()`. – Jodrell Nov 16 '12 at 15:47
  • I suggest instead of counting time, to count ticks. Sometimes windows throttles the processor and it will produce wrong results. Counting the ticks will reduce such effects. – John Alexiou Nov 17 '12 at 01:56
  • @ja72, does that make a difference to the outcome? After making the two small changes required to see, I can confirm that the relative performance of the methods is unaffected. (I think the stopwatch always counts ticks internally.) – Jodrell Nov 19 '12 at 09:01
  • @WouterH, thanks for the corrections – Jodrell Nov 19 '12 at 14:58

4 Answers4

4

To try and answer your original question, I think the answer lies in (the amazing) Reflector tool. You are using collections of objects that are IEnumerable which then also causes the overload of the same type in String.Join method to be called. Interestingly, this function is remarkably similar to your Build function since it enumerates the collection and uses a string builder which means it doesn't need to know the length of all of the strings in advance.

public static string Join<T>(string separator, IEnumerable<T> values)
{

    if (values == null)
    {
        throw new ArgumentNullException("values");
    }
    if (separator == null)
    {
        separator = Empty;
    }
    using (IEnumerator<T> enumerator = values.GetEnumerator())
    {
        if (!enumerator.MoveNext())
        {
            return Empty;
        }
        StringBuilder sb = StringBuilderCache.Acquire(0x10);
        if (enumerator.Current != null)
        {
            string str = enumerator.Current.ToString();
            if (str != null)
            {
                sb.Append(str);
            }
        }
        while (enumerator.MoveNext())
        {
            sb.Append(separator);
            if (enumerator.Current != null)
            {
                string str2 = enumerator.Current.ToString();
                if (str2 != null)
                {
                    sb.Append(str2);
                }
            }
        }
        return StringBuilderCache.GetStringAndRelease(sb);
    }
}

It seems to be doing something with cached StringBuilders which I don't fully understand but it's probably why it's faster due to some internal optimisation. As I'm working on a laptop I may have been caught out by power management state changes before so I've rerun the code with the 'BuildCheat' method (avoids the string builder buffer capacity doubling) included and the times are remarkably close to String.Join(IEnumerable) (also ran outside of the debugger).

Build time = 1264ms

JoinFormat = 1282ms

JoinConcat = 1108ms

BuildCheat = 1166ms

private static string BuildCheat<T>(
    IEnumerable<T> source,
    string delimiter,
    string prefix,
    string suffix)
{
    var builder = new StringBuilder(32);
    builder = builder.Append(prefix);

    using (var e = source.GetEnumerator())
    {
        if (e.MoveNext())
        {
            builder.Append(e.Current);
        }

        while (e.MoveNext())
        {
            builder.Append(delimiter);
            builder.Append(e.Current);
        }
    }

    builder.Append(suffix);
    return builder.ToString();
}

The answer the final part of your question is where you mention the use of FastAllocateString but as you can see, it's not called above in the overloaded method that passes IEnumerable, it's only called when it's working directly with strings and it most definitely does loop through the array of strings to sum up their lengths prior to creating the final output.

public static unsafe string Join(string separator, string[] value, int startIndex, int count)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }
    if (startIndex < 0)
    {
        throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_StartIndex"));
    }
    if (count < 0)
    {
        throw new ArgumentOutOfRangeException("count", Environment.GetResourceString("ArgumentOutOfRange_NegativeCount"));
    }
    if (startIndex > (value.Length - count))
    {
        throw new ArgumentOutOfRangeException("startIndex", Environment.GetResourceString("ArgumentOutOfRange_IndexCountBuffer"));
    }
    if (separator == null)
    {
        separator = Empty;
    }
    if (count == 0)
    {
        return Empty;
    }
    int length = 0;
    int num2 = (startIndex + count) - 1;
    for (int i = startIndex; i <= num2; i++)
    {
        if (value[i] != null)
        {
            length += value[i].Length;
        }
    }
    length += (count - 1) * separator.Length;
    if ((length < 0) || ((length + 1) < 0))
    {
        throw new OutOfMemoryException();
    }
    if (length == 0)
    {
        return Empty;
    }
    string str = FastAllocateString(length);
    fixed (char* chRef = &str.m_firstChar)
    {
        UnSafeCharBuffer buffer = new UnSafeCharBuffer(chRef, length);
        buffer.AppendString(value[startIndex]);
        for (int j = startIndex + 1; j <= num2; j++)
        {
            buffer.AppendString(separator);
            buffer.AppendString(value[j]);
        }
    }
    return str;
}

Just out of interest I changed your program to not use generics and made JoinFormat and JoinConcat accept a simple array of strings (I couldn't readily change Build since it uses an enumerator), so String.Join uses the other implementation above. The results are pretty impressive:

JoinFormat time = 386ms

JoinConcat time = 226ms

Perhaps you can find a solution that makes the best of fast string arrays whilst also using generic inputs...

Mr Cheese
  • 51
  • 1
  • I had a go! I changed IEnumerable to IList to use an array with .Count and it scales better with longer inputs. Using inputs 1 to 10, JoinConcat=1175ms, BuildBetter=1187ms. Using inputs 1000000000 to 1000000010: JoinConcat=1539ms, BuildBetter=1427ms `private static string BuildBetter( IList source, string delimiter, string prefix, string suffix) { string[] values = new string[source.Count]; for (int i = 0; i < source.Count; i++) { values[i] = source[i].ToString(); } return string.Concat(prefix, string.Join(delimiter, values), suffix); }` – Mr Cheese Nov 16 '12 at 23:30
  • 2
    Just so you know, you don't need to use Reflector (or any decompiling tool) for the .NET framework source code. You can get the original source code used to compile a release here [http://referencesource.microsoft.com/netframework.aspx](http://referencesource.microsoft.com/netframework.aspx). It includes comments and original variable names, which can make it easier to read. (As a note, use Internet Explorer to download the source code. See [this answer](http://stackoverflow.com/a/12118869/721276)) – Christopher Currens May 31 '13 at 15:50
1

To provide some additional information I have run the code above on my laptop (Core i7-2620M) using VS 2012 and also to see if anything has changed between frameworks 4.0 and 4.5. The first run is compiled against .Net Framework 4.0 and then 4.5.

Framework 4.0

Build time = 1516ms

JoinFormat time = 1407ms

JoinConcat time = 1238ms

Framework 4.5

Build time = 1421ms

JoinFormat time = 1374ms

JoinConcat time = 1223ms

It's good to see that the new framework seems a bit faster but it's curious that I can't reproduce your original results with the slow performance of JoinFormat. Can you provide details on your build environment and hardware?

Mr Cheese
  • 51
  • 1
  • 1
    Interesting, its .Net 4.0 on a Virtual Xeon 5160. – Jodrell Nov 16 '12 at 16:38
  • As a slight aside to your original question the times for Build are a bit skewed due to the capacity doubling of the builder's internal buffer. It's initialised with no specific capacity so it defaults to 16 chars but outputs 21 chars which means it's incurred one doubling operation. It's no longer a generic solution but serves to remove this anomaly in this demo code, just initialise the builder as follows: `var builder = new StringBuilder(32);` This gives a time for BuildCheat which is roughly in the middle of JoinFormat and JoinConcat. – Mr Cheese Nov 16 '12 at 20:36
-1

Try using StringBuilder.AppendFormat in Build<T> method instead of StringBuilder.Append

Rui Jarimba
  • 9,732
  • 10
  • 46
  • 74
  • 3
    Where and why? I don't see how that would help. – Jon Skeet Nov 16 '12 at 15:39
  • Wouldn't it help to replace the 2 Append instructions inside `while (e.MoveNext())` with `builder.AppendFormat("{0}{1}", delimiter, e.Current);`? Inside method `private static string Build` – Rui Jarimba Nov 16 '12 at 16:51
  • I'd personally doubt it - the `AppendFormat` code would have to parse the format string on each iteration, don't forget. Try it though :) – Jon Skeet Nov 16 '12 at 16:56
  • @RuiJarimba, if you try that, using the code I've provided, you'll see its slower. I get `2501ms` vs the original `1856ms`. It would be interesting if that varies for you. – Jodrell Nov 16 '12 at 17:01
-2

Easiest WorkAround(To add prefix and suffix to a string):

string[] SelectedValues = { "a", "b", "c" };
string seperatedValues = string.Join("\n- ", SelectedValues);
seperatedValues = "- " + seperatedValues;

Output:
- a
- b
- c

You may use string builder

Machavity
  • 28,730
  • 25
  • 78
  • 91