4

I have recently discovered LINQ and I find it very interesting to use. Currently I have the following function, and I am not sure whether it would be MORE efficient, and after all, produce the same output.

Can you please tell me your opinion about this?

The function simply removes punctuation in a very simple manner:

private static byte[] FilterText(byte[] arr)
    {
        List<byte> filteredBytes = new List<byte>();
        int j = 0; //index for filteredArray

        for (int i = 0; i < arr.Length; i++)
        {
            if ((arr[i] >= 65 && arr[i] <= 90) || (arr[i] >= 97 && arr[i] <= 122) || arr[i] == 10 || arr[i] == 13 || arr[i] == 32)
            {
                filteredBytes.Insert(j, arr[i]) ;
                j++;
            }
        }

        //return the filtered content of the buffer
        return filteredBytes.ToArray();
    }

The LINQ alternative:

    private static byte [] FilterText2(byte[] arr)
    {
        var x = from a in arr
                where ((a >= 65 && a <= 90) || (a >= 97 && a <= 122) || a == 10 || a == 13 || a == 32)
                select a;

        return x.ToArray();
    }
Robert Levy
  • 27,992
  • 6
  • 59
  • 93
test
  • 2,336
  • 4
  • 33
  • 51
  • 3
    Why do you have `j`? Just use `Add` instead of `Insert` and you can dispense with that counter. – Kirk Woll May 17 '12 at 18:09
  • You probably should replace the `.Insert(j,` part with `.Add(` and remove the `j` counter altogether. – Douglas May 17 '12 at 18:09
  • yes thank you, i just modified the code and didn't even arrange that! – test May 17 '12 at 18:10
  • 2
    Also, since most characters would be retained, you should replace `new List()` with `new List(arr.Length)`. This will avoid having to recreate the list’s internal structure as it grows larger. – Douglas May 17 '12 at 18:11
  • but would there be any way to remove the extra blocks after I add what I have to add? – test May 17 '12 at 18:12
  • 1
    You don’t need to. The parameter passed to the `List(int)` constructor only indicates the capacity (how much it _can_ hold without resizing), not how much it actually does hold. – Douglas May 17 '12 at 18:13

6 Answers6

14

LINQ usually is slightly less efficient than simple loops and procedural code, but the difference is typically small and the conciseness and ease of reading usually makes it worth converting simple projections and filtering to LINQ.

If the performance really matters, measure it and decide for yourself if the performance of the LINQ code is adequate.

Mark Byers
  • 719,658
  • 164
  • 1,497
  • 1,412
4

LinQ is great to keep things simple. Performances wise, it can really become a problem if you start doing a lot of conversions to lists, arrays, and so on.

MyObject.where(...).ToList().something().ToList().somethingelse.ToList();

This is well known to be a killer, try to convert to a final list as late as possible.

ClemKeirua
  • 539
  • 3
  • 9
  • 1
    +1: Good point. If you need to do further filtering/ordering/operations on the returned sequence, then it might be preferable to return it as an `Enumerable` without the `ToArray()`. – Douglas May 17 '12 at 18:17
  • @Douglas I think you mean `IEnumerable` (`Enumerable` is a static class), but otherwise agreed. – Adam Mihalcin May 17 '12 at 18:30
4

Screw performance, LINQ is awsome because of this:

private static bool IsAccepted(byte b)
{
    return (65 <= b && b <= 90) || 
           (97 <= b && b <= 122) || 
           b == 10 || b == 13 || b == 32;
}

arr.Where(IsAccepted).ToArray(); // equivalent to FilterText(arr)

I.e. you do not write the how, but just the what. Also, it's about as fast (slow) as the other method which you presented: Where(..) gets evaluated lazily in ToArray() which internally creates a List and converts that to an Array iirc.

And by the way, strings are Unicode in C#, so don't use this to do some simple string formatting (there are far nicer alternatives for that).

Matthew
  • 25,652
  • 26
  • 93
  • 158
M.Stramm
  • 1,233
  • 14
  • 27
2

For the most part, I agree with @MarkByers. Linq will be a little less efficient than procedural code. Generally, the deficiency can be traced to compilation of an expression tree. Nevertheless, the readability & time improvements are worth the hit in 99% of cases. When you encounter a performance issue, benchmark, modify, and re-benchmark.

With that said, LINQ is pretty closely related to lambdas and anonymous delegates. These features are and often talked about as if they are the same thing. There are cases where these constructs can be faster than procedural code. It looks like your example can be one of those cases. I would rewrite your code as follows:

private static byte [] FilterText2(byte[] arr) {

   return arr.Where( a=> (a >= 65 && a <= 90)  || 
                         (a >= 97 && a <= 122) || 
                          a == 10 || a == 13   || a == 32
                  ).ToArray();
}

Again, do some bench marks for your specific scenario, as YMMV. A lot of ink has been spilled on which is faster and under what scenarios. Here is some of that ink:

Community
  • 1
  • 1
EBarr
  • 11,438
  • 7
  • 59
  • 81
1

Many LINQ statements are easily parallelizable. Just add AsParallel() to the beginning of a query. You can also add AsOrdered() if you want the original order to be preserved at the expense of some performance. For example, the following LINQ statement:

arr.Where(IsAccepted).ToArray();

can be written as:

arr.AsParallel().AsOrdered().Where(IsAccepted).ToArray();

You just have to make sure its overhead doesn't outweigh its benefits:

var queryA = from num in numberList.AsParallel()
             select ExpensiveFunction(num); //good for PLINQ

var queryB = from num in numberList.AsParallel()
             where num % 2 > 0
             select num; //not as good for PLINQ
Risky Martin
  • 2,371
  • 2
  • 13
  • 15
1

Every good written imperative code will be more time and space effective than good written declarative code, because that declarative one must be translated to imperative one (except you own a Prolog machine ... which you probably don't, because you are asking about .Net :-) ).

But if you can solve a problem using LINQ in simpler and more readable way than using loops, it's worth it. When you see something like

var actualPrices = allPrices
    .Where(price => price.ValidFrom <= today && price.ValidTo >= today)
    .Select(price => price.PriceInUSD)
    .ToList();

it's "one line" where it's obvious what it's doing on the first sight. Declaring a new collection, looping through old one, writing if and adding something to the new one is not. So it's a win if you don't want to save every millisecond (which you probably don't, because you are using .Net and not C with embedded ASM). And LINQ is highly optimalized - there are more codebases - one for collections, one for XML, one for SQL ..., so it is generally not much slower. No reason NOT to use it.

Some LINQ expressions can be easily parallelized using Parallel LINQ, almost "for free" (= no more code, but the parallelism overhead is still there, so count with it).

eMko
  • 1,098
  • 9
  • 19