2

For one of my projects thats kind of a content-aggregator i'd like to introduce concurrency and if possible parallelism. At first hand this may seem pointless because concurrency and parallelism take different approaches. (Concurrency via threads introduces immediate concurrency, where as parallelism provides a potential).

So to better explain my problem, let me summarize my problem set.

As my project is a content-aggregator (that aggregates feeds,podcasts and similar stuff) it basically reads the data from web, parses them to return the meaningful data.

So as of right now i took a very simplistic sequential approach. Let's say that we've some amount of feeds we have to parse.

foreach(feed in feeds)
{
   read_from_web(feed)
   parse(feed)
}

So with sequential approach time taken parse all feeds and process them greatly depends on not only the parser code but time needed to get the xml source from web. We all know that it may take variable time to get read the source from web (because of the network conditions and similar issues).

So to speed up the code i can take an approach of worker threads which will introduce an immediate concurrency;

concurrency

So a defined number of worker threads can take a feed & parse concurrently (which will for sure speed up the whole the process - as we'll see lesser impact of waiting for data over the net).

This is all okay until the point that, my target audience of the project mostly runs multi-core cpus -- because of the fact that they're gamers --.

I want to also utilize these cores while processing the content so started reading on the potential parallelism http://oreilly.com/catalog/0790145310262. I've still not finished reading it yet and don't know if this is already discusses but i'm quite a bit obsessed with this and wanted to ask over stackoverflow to get an overall idea.

So as the book describes potential parallelism: Potential Parallelism means that your program is written so that it runs faster when parallel hardware is available and roughly the same as an equivalent sequential program when it's not.

So the real question is, while i'm using worker threads for concurrency, can i still use possible parallelism? (running my feed parsers on worker threads and still distributing them to cpu cores -- if the cpu supports multi-cores of course)

Community
  • 1
  • 1
HuseyinUslu
  • 3,985
  • 5
  • 31
  • 47
  • 1
    I don't really see the distinction. Worker threads execute in parallel when multiple cores are available – jalf Mar 05 '11 at 12:53
  • Is there a resource on this you can supply me please? I want to read this in details to get a better understanding. – HuseyinUslu Mar 05 '11 at 13:12
  • I'm with jalf on this; if something is running in parallel then it _is_ occurring concurrently. – Grant Thomas Mar 05 '11 at 13:32
  • Let me re-phrase then; if i use Parallel Extensions Parallel.ForEach given that the code runs on a single-core cpu, will this code run concurrently or will just act as sequential? If later is the answer then that's not the solution for me as i've to achieve concurrency even in single-core platforms. – HuseyinUslu Mar 05 '11 at 13:36

2 Answers2

1

I think it's more useful to think about IO-bound work and CPU-bound work; threads can help with both.

For IO-bound work you are presumably waiting for external resources (in your case, feeds to be read). If you must wait on multiple external resources then it only makes sense to wait on them in parallel rather than wait on them one after the other. This is best done by spinning up threads which block on the IO.

For CPU-bound work you want to use all of your cores to maximize the throughput of completing that work. To do that, you should create a pool of worker threads roughly the same size as your number of cores and break up and distribute the work across them. [How you break up and distribute the work is itself an interesting problem.]

In practice, I find that most applications have both of these problems and it makes sense to use threads to solve both kinds of problems.

Alex Miller
  • 65,227
  • 26
  • 112
  • 160
  • It seems i was greatly mistaken -- operating system & schedulers automatically schedule threads to all available cores. That means my worker threads will be already running in parallel for if there available cores. The questions is why Parallel Extensions is introduced? From what i read until now, in single-core situations code that uses Parallel Extensions will be acting as sequential? – HuseyinUslu Mar 05 '11 at 13:51
0

Okay it seems i was greatly mistaken by the books description on possible parallelism. Thanks to answers i was able to figure out things;

From msdn: http://msdn.microsoft.com/en-us/library/dd460717(VS.100).aspx

The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces in the .NET Framework version 4. The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.

So basically it means TPL can handle all the details of concurrency via threading and also supports parallelism on multi-cores.

HuseyinUslu
  • 3,985
  • 5
  • 31
  • 47