0

I wanted to use OpenMP around a for-loop that basically sums up the array elements. For comparison, I also sum up the array elements in a serial fashion. For large array-sizes, array_size>=5795, the sums are no longer equal.

This is the OpenMP-parallelized loop:

#pragma omp parallel for reduction(+:parallel_sum)
    for (i = 0; i < array_size; i++) {
        parallel_sum += array[i];
    }

And this is the full code:

#include<stdio.h>
#include<stdlib.h>
#include<omp.h>

int main() {
    float          *array, serial_sum, parallel_sum;
    size_t          array_size, i;

    serial_sum   = 0.;
    parallel_sum = 0.;

#pragma omp parallel
    printf( "number of threads used in parallized part: %d\n", omp_get_num_threads());

    printf("enter size of array\n");
    scanf("%zu", &array_size);

    // memory allocation
    array = (float *) malloc(sizeof(float) * array_size);

    // initialize array
    for (i = 0; i < array_size; i++)
        array[i] = i;

    // calculate the array sums, parallel first, then serial
#pragma omp parallel for reduction(+:parallel_sum)
    for (i = 0; i < array_size; i++) {
        parallel_sum += array[i];
    }

    for (i = 0; i < array_size; i++)
        serial_sum = serial_sum + array[i];

    if (serial_sum == parallel_sum)
        printf("\nserial and parallel sums are equal\n");
    else
        printf("\nserial and Parallel sums are NOT equal\n");

    // free memory
    free(array);

    return 0;
}

Since the sum can be easily calculated by hand using the sum-formula by Gauss, sum = (n^2+n)/2, where n corresponds to array_size, both sums are wrong... I guess I am having some memory issues then...?

Alf
  • 1,240
  • 1
  • 22
  • 39
  • 1
    Typical `float` types only have ~6 decimal digits of precision. Floating-point addition is not associative if some of the operations are inexact. – EOF Sep 21 '16 at 08:58
  • @EOF ohhhh, of course! That is slightly embarrassing... thanks :) Type that as an answer and I mark it as solution. – Alf Sep 21 '16 at 09:01
  • Depending on your compile options, your serial_sum may involve parallel batched sums, with different partitioning from your parallel reduction. – tim18 Sep 21 '16 at 10:28
  • @tim18 I was trying to keep the compiler options short, basically just `gcc -fopenmp -Wall` – Alf Sep 21 '16 at 10:43

1 Answers1

2

Since floating-point types have limited precision, addition is not associative (see Are floating point operations in C associative? and Is Floating point addition and multiplication associative?).

So a + (b + c) can evaluate unequal to (a + b) + c.

When parallelizing/vectorizing, the order operations can indeed be different than adding the numbers from lowest to highest sequentially.

If you printed out the computed values, you would likely find them quite close in this case, just not exactly equal.

Community
  • 1
  • 1
EOF
  • 5,857
  • 2
  • 23
  • 45
  • and an easy fix for the above mentioned code would be `double serial_sum, parallel_sum;` – Alf Sep 21 '16 at 09:07
  • 1
    @Alf: Well, you'd end up running into the same problem for higher numbers. Around ~`pow(10,8)` or so. – EOF Sep 21 '16 at 09:10