Algorithm to mix sound

Question

I have two raw sound streams that I need to add together. For the purposes of this question, we can assume they are the same bitrate and bit depth (say 16 bit sample, 44.1khz sample rate).

Obviously if I just add them together I will overflow and underflow my 16 bit space. If I add them together and divide by two, then the volume of each is halved, which isn't correct sonically - if two people are speaking in a room, their voices don't become quieter by half, and a microphone can pick them both up without hitting the limiter.

So what's the correct method to add these sounds together in my software mixer?
Am I wrong and the correct method is to lower the volume of each by half?
Do I need to add a compressor/limiter or some other processing stage to get the volume and mixing effect I'm trying for?

-Adam

Same question, but better answers: http://dsp.stackexchange.com/questions/3581/algorithms-to-mix-audio-signals-without-clipping — sethcall, Oct 24 '13 at 21:39
I was really disappointed with this. In real life, I always hear both signals *no matter what phase they are*. But simply adding the samples of two phase-inverted waves will result in *complete silence*. Not a mention of it... — Alba Mendez, May 30 '14 at 20:56
@jmendeth Phase cancellation is real. Put two speakers right next to each other, and invert the phase from one (swap the wires). Your bass gets trashed. The reason you don't get complete cancellation is that your speakers aren't point sources and that you have two ears. — Roddy, May 30 '14 at 21:11
I know, I know... still, when people hear "sound mixing" they don't expect two sounds to cancelate each other depending on the phase, resulting in silence. — Alba Mendez, May 31 '14 at 07:44
And I don't want two instruments to have frequencies cancelled depending on "luck" for them to be phase-inverted. — Alba Mendez, May 31 '14 at 07:49

Roddy · Accepted Answer · 2008-12-17T22:33:00.970

32

You should add them together, but clip the result to the allowable range to prevent over/underflow.

In the event of the clipping occuring, you will introduce distortion into the audio, but that's unavoidable. You can use your clipping code to "detect" this condition and report it to the user/operator (equivalent of red 'clip' light on a mixer...)

You could implement a more "proper" compressor/limiter, but without knowing your exact application, it's hard to say if it would be worth it.

If you're doing lots of audio processing, you might want to represent your audio levels as floating-point values, and only go back to the 16-bit space at the end of the process. High-end digital audio systems often work this way.

edited Dec 17 '08 at 22:33

answered Dec 17 '08 at 21:44

Roddy

63,052
38
156
264

1

This answer is correct, but I embellish it with some notes on how to implement automatic level controls below (written before I had comment privileges). – podperson Apr 05 '12 at 17:26
8

@Kyberias That doesn't make sense; the first sentence literally explains exactly what to do. – Sep 26 '14 at 04:40
OP already what this answer suggesting and what is the shortcoming for doing it, from the question "Obviously if I just add them together I will overflow and underflow my 16 bit space." @user1881400 – Ratul Sharker Sep 06 '19 at 07:30

score 28 · Answer 2 · edited Dec 29 '12 at 07:01

I'd prefer to comment on one of the two highly ranked replies but owing to my meager reputation (I assume) I cannot.

The "ticked" answer: add together and clip is correct, but not if you want to avoid clipping.

The answer with the link starts with a workable voodoo algorithm for two positive signals in [0,1] but then applies some very faulty algebra to derive a completely incorrect algorithm for signed values and 8-bit values. The algorithm also does not scale to three or more inputs (the product of the signals will go down while the sum increases).

So - convert input signals to float, scale them to [0,1] (e.g. A signed 16-bit value would become
float v = ( s + 32767.0 ) / 65536.0 (close enough...))
and then sum them.

To scale the input signals you should probably do some actual work rather than multiply by or subtract a voodoo value. I'd suggest keeping a running average volume and then if it starts to drift high (above 0.25 say) or low (below 0.01 say) start applying a scaling value based on the volume. This essentially becomes an automatic level implementation, and it scales with any number of inputs. Best of all, in most cases it won't mess with your signal at all.

Thanks for the notes! This is worthy of an answer, I think, but you do now have 50 rep, so you should be able to comment on the site now. — Adam Davis, Apr 05 '12 at 14:25

score 26 · Answer 3 · answered May 05 '09 at 08:40

26

There is an article about mixing here. I'd be interested to know what others think about this.

answered May 05 '09 at 08:40

Ben Dyer

1,426
11
6

1

It's interesting. Basically it does the addition, then applies a very simple 'compression' of the signal to avoid clipping. The problem is, this will significantly change the sample values even if there's no need to clip. For some applications (maybe telephony, games) this kind of approach would probably work pretty well. But for high-end audio processing it could be considered to be degrading the signal... – Roddy Nov 09 '11 at 17:17
9

This article is misleading (see my answer below). If you feed example values into his final formulae you get bad outputs (his algebra is bad). E.g. silence input gives you -1 output. In any event, it doesn't scale to more than two inputs and it's a voodoo algorithm with no basis in reality. – podperson Apr 05 '12 at 17:28
It unwise to change the volumn for every single sample. And the algorithm is not currect because if you have two channels with same signal, the mix of those two channel should same to each single one. But that algorithm give the signal a loss. – SuperLucky Jun 14 '14 at 06:54
1

That article is plain wrong as many have suggested. Please stop upvoting, you are only getting people misled. – Bill Kotsias May 12 '15 at 17:51

score 18 · Answer 4 · answered Dec 18 '08 at 10:12

Most audio mixing applications will do their mixing with floating point numbers (32 bit is plenty good enough for mixing a small number of streams). Translate the 16 bit samples into floating point numbers with the range -1.0 to 1.0 representing full scale in the 16 bit world. Then sum the samples together - you now have plenty of headroom. Finally, if you end up with any samples whose value goes over full scale, you can either attenuate the whole signal or use hard limiting (clipping values to 1.0).

This will give much better sounding results than adding 16 bit samples together and letting them overflow. Here's a very simple code example showing how you might sum two 16 bit samples together:

short sample1 = ...;
short sample2 = ...;
float samplef1 = sample1 / 32768.0f;
float samplef2 = sample2 / 32768.0f;
float mixed = samplef1 + sample2f;
// reduce the volume a bit:
mixed *= 0.8;
// hard clipping
if (mixed > 1.0f) mixed = 1.0f;
if (mixed < -1.0f) mixed = -1.0f;
short outputSample = (short)(mixed * 32768.0f)

of course, but it will increase the chances of clipping, so adjust your volume accordingly — Mark Heath, Feb 22 '18 at 09:31
By multiplying the mixed by 0.8... don't you just bring your noise level close to 'avarage'? If you multiply a negative value for mixed (say -0.5) by 0.8, it will get closer to 0, in other words, it will get HIGHER... so either you need convert to a 0+ range before multiplying, or the comments of 'reducing the volume a bit' is just not accurate. — Bram Vaessen, Nov 21 '19 at 12:29

score 10 · Answer 5 · answered Dec 17 '08 at 22:21

10

"Quieter by half" isn't quite correct. Because of the ear's logarithmic response, dividing the samples in half will make it 6-db quieter - certainly noticeable, but not disastrous.

You might want to compromise by multiplying by 0.75. That will make it 3-db quieter, but will lessen the chance of overflow and also lessen the distortion when it does happen.

answered Dec 17 '08 at 22:21

Mark Ransom

271,357
39
345
578

3 dB quieter is halving the power, so dividing the sample values by sqrt(2). That is multiplying by 0.707 (1/sqrt(2)) rather than 0.75. I do agree that a multiplication by 0.75 is easier to achieve with bit shifts, though. – Gauthier Oct 22 '12 at 12:56
@Gauthier, I was being approximate. – Mark Ransom Oct 22 '12 at 13:25
The 0.707 comes from 10^(-3/20) and has nothing to do with 1/sqrt(2) other than that the decimal expansion shares some leading digits (if you want to be precise:) – Joris Weimar Oct 28 '16 at 20:52
1

@JorisWeimar, he's absolutely correct that halving the power would require dividing by the square root of 2. It's convention to call that -3 db, even though it's technically -3.0103 db. Again, approximations. – Mark Ransom Oct 29 '16 at 15:45
@MarkRansom yes, i was arguing in your favor. it is an approximation. but it has nothing to do with sqrt(2). – Joris Weimar Oct 30 '16 at 13:31
1

But @JorisWeimar it has *everything* to do with sqrt(2)! It's the -3db figure that's an approximation to sqrt(2), not the other way around - I thought I made that clear. Power is proportional to the square of the voltage, so to cut the power in half requires cutting the voltage (signal) by sqrt(2). It's a complete coincidence that this is approximately -3 db, for the same reason that 2^10 (1024) is very close to 10^3 (1000). – Mark Ransom Oct 30 '16 at 14:50
we are talking about reducing the amplitude using a multiplication factor. i thought one converts between amplitude and dbfs by the formula dbfs = 20 * log(amplitude). or am i mistaken? – Joris Weimar Oct 30 '16 at 19:19
1

@JorisWeimar db is a measurement of a *ratio*, in the case of dbfs it's the ratio of *full scale amplitude* to the signal in question. Your formula is exactly correct if you take this into account, with the ratio being the multiplication factor. This is how I got the figure I quoted above: `20 * log(1/sqrt(2)) = -3.0103`. – Mark Ransom Nov 01 '16 at 02:30

score 8 · Answer 6 · edited Dec 29 '12 at 07:01

I cannot believe that nobody knows the correct answer. Everyone is close enough but still, a pure philosophy. The nearest, i.e. the best was: (s1 + s2) -(s1 * s2). It's excelent approach, especially for MCUs.

So, the algorithm goes:

Find out the volume in which you want the output sound to be. It can be the average or maxima of one of the signals.
factor = average(s1) You assume that both signals are already OK, not overflowing the 32767.0
Normalize both signals with this factor:
s1 = (s1/max(s1))*factor
s2 = (s2/max(s2))*factor
Add them together and normalize the result with the same factor
output = ((s1+s2)/max(s1+s2))*factor

Note that after the step 1. you don't really need to turn back to integers, you may work with floats in -1.0 to 1.0 interval and apply the return to integers at the end with the previously chosen power factor. I hope I didn't mistake now, cause I'm in a hurry.

This is wrong. E.g. consider s1 and s2 are both 0.5, s1+s2 => 1, max(s1, s2) is 0.5, so the output is 2. You've gone way past clipping and naively adding wouldn't have. Also, 0.25 and 0.25 produce the same result. — podperson, Nov 09 '20 at 19:36

score 6 · Answer 7 · answered Feb 04 '11 at 21:14

6

You can also buy yourself some headroom with an algorithm like y= 1.1x - 0.2x^3 for the curve, and with a cap on the top and bottom. I used this in Hexaphone when the player is playing multiple notes together (up to 6).

float waveshape_distort( float in ) {
  if(in <= -1.25f) {
    return -0.984375;
  } else if(in >= 1.25f) {
    return 0.984375;
  } else {    
    return 1.1f * in - 0.2f * in * in * in;
  }
}

It's not bullet-proof - but will let you get up to 1.25 level, and smoothes the clip to a nice curve. Produces harmonic distortion, which sounds better than clipping and may be desirable in some circumstances.

answered Feb 04 '11 at 21:14

Glenn Barnett

1,941
1
20
31

Tried this and it worked well. Nice quick solution to deal with clipping. – Ehz May 29 '12 at 16:03
Also, what is implied in this answer is that you should convert to float before mixing. – Ehz Jun 04 '12 at 20:13
This looks intriguing. Where did you get those magic constants? (in particular, 1.25 and 0.984375?) – Cameron Aug 22 '12 at 21:59
1

1.25 was the ceiling I was willing to accept (125% level). 0.984375 is the y value for x=1.25 on the formula I specified. – Glenn Barnett Aug 24 '12 at 13:37
3

For the record: this is compression (and a bit of expansion). – Gauthier Oct 22 '12 at 13:17

score 3 · Answer 8 · answered Dec 08 '09 at 01:49

3

convert the samples to floating point values ranging from -1.0 to +1.0, then:

out = (s1 + s2) - (s1 * s2);

answered Dec 08 '09 at 01:49

1

I'm going to have to puzzle that one out, I guess. It seems like it might be appropriate, but if the inputs are 1 and -1, the result is 1. Not sure if I want to break out laplace for this, but if you have any references of more information on why or how this works, I'd appreciate a head start, – Adam Davis Dec 08 '09 at 19:10
2

Note also that the article states input values between 0 and 1. – Gauthier Oct 22 '12 at 13:20

score 3 · Answer 9 · answered Dec 17 '08 at 21:15

3

If you need to do this right, I would suggest looking at open source software mixer implementations, at least for the theory.

Some links:

Audacity

GStreamer

Actually you should probably be using a library.

answered Dec 17 '08 at 21:15

krusty.ar

3,989
1
23
28

1

Audacity will just add the samples, resulting a clip (if samples are high). You have to manually adjust each track's gain to prevent clipping. – olafure Nov 19 '11 at 18:09

score 3 · Answer 10 · answered Dec 17 '08 at 21:48

3

You're right about adding them together. You could always scan the sum of the two files for peak points, and scale the entire file down if they hit some kind of threshold (or if the average of it and its surrounding spots hit a threshold)

answered Dec 17 '08 at 21:48

Jon Smock

8,829
10
42
49

I agree with you, but not practical for sound stream because you can not peek the sound, maybe a windowed dynamic gain adjust will do? – SuperLucky Jun 14 '14 at 07:01

score 2 · Answer 11 · answered Dec 17 '08 at 21:17

2

I think that, so long as the streams are uncorrelated, you shouldn't have too much to worry about, you should be able to get by with clipping. If you're really concerned about distortion at the clip points, a soft limiter would probably work OK.

answered Dec 17 '08 at 21:17

Tony Arkles

1,908
13
14

score 2 · Answer 12 · edited Jun 20 '20 at 09:12

convert the samples to floating point values ranging from -1.0 to +1.0, then:

out = (s1 + s2) - (s1 * s2);

Will introduce heavy distortion when |s1 + s2| approach 1.0 (at least when I tried it when mixing simple sine waves). I read this recommendation on several locations, but in my humble opinion, it is a useless approach.

What happens physically when waves 'mix' is that their amplitutes add, just like many of the posters here suggested already. Either

clip (distorts the result as well) or
summarize your 16 bit values into a 32 bit number, and then divide by the number of your sources (that's what I would suggest as it's the only way known to me avoiding distortions)

score 1 · Answer 13 · edited Apr 18 '17 at 00:41

Since your profile says you work in embedded systems, I will assume that floating point operations are not always an option.

> So what's the correct method to add these sounds together in my software mixer?

As you guessed, adding and clipping is the correct way to go if you do not want to lose volume on the sources. With samples that are int16_t, you need to the sum to be int32_t, then limit and convert back to int16_t.

> Am I wrong and the correct method is to lower the volume of each by half?

Yes. Halving of volume is somewhat subjective, but what you can see here and there is that halving the volume (loudness) is a decrease of about 10 dB (dividing the power by 10, or the sample values by 3.16). But you mean obviously to lower the sample values by half. This is a 6 dB decrease, a noticeable reduction, but not quite as much as halving the volume (the loudness table there is very useful).

With this 6 dB reduction you will avoid all clipping. But what happens when you want more input channels? For four channels, you would need to divide the input values by 4, that is lowering by 12 dB, thus going to less that half the loudness for each channel.

> Do I need to add a compressor/limiter or some other processing stage to 
get the volume and mixing effect I'm trying for?

You want to mix, not clip, and not lose loudness on the input signals. This is not possible, not without some kind of distortion.

As suggested by Mark Ransom, a solution to avoid clipping while not losing as much as 6 dB per channel is to hit somewhere in between "adding and clipping" and "averaging".

That is for two sources: adding, dividing by somewhere between 1 and 2 (reduce the range from [-65536, 65534] to something smaller), then limiting.

If you often clip with this solution and it sounds too harsh, then you might want to soften the limit knee with a compressor. This is a bit more complex, since you need to make the dividing factor dependent on the input power. Try the limiter alone first, and consider the compressor only if you are not happy with the result.

score 1 · Answer 14 · answered Mar 10 '13 at 16:08

I did it this way once: I used floats (samples between -1 and 1), and I initialized a "autoGain" variable with a value of 1. Then I would add all the samples together (could also be more than 2). Then I would multiply the outgoing signal with autoGain. If the absolute value of the sum of the signals before multiplication would be higher than 1, I would make assign 1/this sum value. This would effectively make autogain smaller than 1 let's say 0.7 and would be equivalent to some operator quickly turning down the main volume as soon as he sees that the overall sound is getting too loud. Then I would over an adjustable period of time add to the autogain until it finally would be back at "1" (our operator has recovered from shock and is slowly cranking up the volume :-)).

score 1 · Answer 15 · answered Aug 20 '13 at 14:54

// #include <algorithm>
// short ileft, nleft; ...
// short iright, nright; ...

// Mix
float hiL = ileft + nleft;
float hiR = iright + nright;

// Clipping
short left = std::max(-32768.0f, std::min(hiL, 32767.0f));
short right = std::max(-32768.0f, std::min(hiR, 32767.0f));

score 1 · Answer 16 · answered Jul 18 '17 at 12:53

1

I did the following thing:

MAX_VAL = Full 8 or 16 or whatever value
dst_val = your base audio sample
src_val = sample to add to base

Res = (((MAX_VAL - dst_val) * src_val) / MAX_VAL) + dst_val

Multiply the left headroom of src by the MAX_VAL normalized destination value and add it. It will never clip, never be less loud and sound absolutely natural.

Example:

250.5882 = (((255 - 180) * 240) / 255) + 180

And this sounds good :)

answered Jul 18 '17 at 12:53

Julian Wingert

11
1

Can you provide an explanation, using maybe four examples where each of dst and src are high value and low value so it's easy to understand what this algorithm is doing, and why? – Adam Davis Jul 18 '17 at 13:05

score 1 · Answer 17 · answered Sep 12 '17 at 22:11

I found a new way to add samples in a way in which they can never exceed a given range. The basic Idea is to convert values in a range between -1 to 1 to a range between approximately -Infinity to +Infinity, add everything together and reverse the initial transformation. I came up with the following formulas for this:

$f(x)=-\frac{x}{|x|-1}$

$f'(x)=\frac{x}{|x|+1}$

$o=f'(\sum f(s))$

I tried it out and it does work, but for multiple loud sounds the resulting audio sounds worse than just adding the samples together and clipping every value which is too big. I used the following code to test this:

#include <math.h>
#include <stdio.h>
#include <float.h>
#include <stddef.h>
#include <stdint.h>
#include <string.h>
#include <stdbool.h>
#include <sndfile.h>

// fabs wasn't accurate enough
long double ldabs(long double x){
  return x < 0 ? -x : x;
}

// -Inf<input<+Inf, -1<=output<=+1
long double infiniteToFinite( long double sample ){
  // if the input value was too big, we'll just map it to -1 or 1
  if( isinf(sample) )
    return sample < 0 ? -1. : 1.;
  long double ret = sample / ( ldabs(sample) + 1 );
  // Just in case of calculation errors
  if( isnan(ret) )
    ret = sample < 0 ? -1. : 1.;
  if( ret < -1. )
    ret = -1.;
  if( ret > 1. )
    ret = 1.;
  return ret;
}

// -1<=input<=+1, -Inf<output<+Inf
long double finiteToInfinite( long double sample ){
  // if out of range, clamp to 1 or -1
  if( sample > 1. )
    sample = 1.;
  if( sample < -1. )
    sample = -1.;
  long double res = -( sample / ( ldabs(sample) - 1. ) );
  // sample was too close to 1 or -1, return largest long double
  if( isinf(res) )
    return sample < 0 ? -LDBL_MAX : LDBL_MAX;
  return res;
}

// -1<input<1, -1<=output<=1 | Try to avoid input values too close to 1 or -1
long double addSamples( size_t count, long double sample[] ){
  long double sum = 0;
  while( count-- ){
    sum += finiteToInfinite( sample[count] );
    if( isinf(sum) )
      sum = sum < 0 ? -LDBL_MAX : LDBL_MAX;
  }
  return infiniteToFinite( sum );
}

#define BUFFER_LEN 256

int main( int argc, char* argv[] ){

  if( argc < 3 ){
    fprintf(stderr,"Usage: %s output.wav input1.wav [input2.wav...]\n",*argv);
    return 1;
  }

  {
    SNDFILE *outfile, *infiles[argc-2];
    SF_INFO sfinfo;
    SF_INFO sfinfo_tmp;

    memset( &sfinfo, 0, sizeof(sfinfo) );

    for( int i=0; i<argc-2; i++ ){
      memset( &sfinfo_tmp, 0, sizeof(sfinfo_tmp) );
      if(!( infiles[i] = sf_open( argv[i+2], SFM_READ, &sfinfo_tmp ) )){
        fprintf(stderr,"Could not open file: %s\n",argv[i+2]);
        puts(sf_strerror(0));
        goto cleanup;
      }
      printf("Sample rate %d, channel count %d\n",sfinfo_tmp.samplerate,sfinfo_tmp.channels);
      if( i ){
        if( sfinfo_tmp.samplerate != sfinfo.samplerate
         || sfinfo_tmp.channels != sfinfo.channels
        ){
          fprintf(stderr,"Mismatching sample rate or channel count\n");
          goto cleanup;
        }
      }else{
        sfinfo = sfinfo_tmp;
      }
      continue;
      cleanup: {
        while(i--)
          sf_close(infiles[i]);
        return 2;
      }
    }

    if(!( outfile = sf_open(argv[1], SFM_WRITE, &sfinfo) )){
      fprintf(stderr,"Could not open file: %s\n",argv[1]);
      puts(sf_strerror(0));
      for( int i=0; i<argc-2; i++ )
        sf_close(infiles[i]);
      return 3;
    }

    double inbuffer[argc-2][BUFFER_LEN];
    double outbuffer[BUFFER_LEN];

    size_t max_read;
    do {
      max_read = 0;
      memset(outbuffer,0,BUFFER_LEN*sizeof(double));
      for( int i=0; i<argc-2; i++ ){
        memset( inbuffer[i], 0, BUFFER_LEN*sizeof(double) );
        size_t read_count = sf_read_double( infiles[i], inbuffer[i], BUFFER_LEN );
        if( read_count > max_read )
          max_read = read_count;
      }
      long double insamples[argc-2];
      for( size_t j=0; j<max_read; j++ ){
        for( int i=0; i<argc-2; i++ )
          insamples[i] = inbuffer[i][j];
        outbuffer[j] = addSamples( argc-2, insamples );
      }
      sf_write_double( outfile, outbuffer, max_read );
    } while( max_read );

    sf_close(outfile);
    for( int i=0; i<argc-2; i++ )
      sf_close(infiles[i]);
  }

  return 0;
}

If I'm visualizing this correctly in head, all you're doing here is reducing precision while clipping anyway, which would explain why it sounds bad. Clamping to the expected range is exactly what clipping is. — Brad, Nov 24 '17 at 17:20

score 0 · Answer 18 · edited May 23 '17 at 10:31

Thank you everyone for sharing your ideas, recently i'm also doing some work related to sound mixing. I'm also have done experimenting thing on this issue, may it help you guys :).

Note that i'm using 8Khz sample rate & 16 bit sample (SInt16) sound in ios RemoteIO AudioUnit.

Along my experiments the best result i found was something different from all this answer, but the basic is the same (As Roddy suggest)

"You should add them together, but clip the result to the allowable range to prevent over/underflow".

But what should be the best way to adding without overflow/underflow ?

Key Idea:: You have two sound wave say A & B, and the resultant wave C will the superposition of two wave A & B. Sample under limited bit range may cause it to overflow. So now we can calculate the maximum limit cross at the upside & minimum limit cross at the downside of the superposition wave form. Now we will subtract maximum upside limit cross to the upper portion of the superposition wave form and add minimum downside limit cross to the lower portion of the superposition wave form. VOILA ... you are done.

Steps:

First traverse your data loop once for the maximum value of upper limit cross & minimum value of lower limit cross.
Make another traversal to the audio data, subtract the maximum value from the positive audio data portion and add minimum value to the negative portion of audio data.

the following code would show the implementation.

static unsigned long upSideDownValue = 0;
static unsigned long downSideUpValue = 0;
#define SINT16_MIN -32768
#define SINT16_MAX 32767
SInt16* mixTwoVoice (SInt16* RecordedVoiceData, SInt16* RealTimeData, SInt16 *OutputData, unsigned int dataLength){

unsigned long tempDownUpSideValue = 0;
unsigned long tempUpSideDownValue = 0;
//calibrate maker loop
for(unsigned int i=0;i<dataLength ; i++)
{
    SInt32 summedValue = RecordedVoiceData[i] + RealTimeData[i];

    if(SINT16_MIN < summedValue && summedValue < SINT16_MAX)
    {
        //the value is within range -- good boy
    }
    else
    {
       //nasty calibration needed
        unsigned long tempCalibrateValue;
        tempCalibrateValue = ABS(summedValue) - SINT16_MIN; // here an optimization comes ;)

        if(summedValue < 0)
        {
            //check the downside -- to calibrate
            if(tempDownUpSideValue < tempCalibrateValue)
                tempDownUpSideValue = tempCalibrateValue;
        }
        else
        {
            //check the upside ---- to calibrate
            if(tempUpSideDownValue < tempCalibrateValue)
                tempUpSideDownValue = tempCalibrateValue;
        }
    }
}

//here we need some function which will gradually set the value
downSideUpValue = tempUpSideDownValue;
upSideDownValue = tempUpSideDownValue;

//real mixer loop
for(unsigned int i=0;i<dataLength;i++)
{
    SInt32 summedValue = RecordedVoiceData[i] + RealTimeData[i];

    if(summedValue < 0)
    {
        OutputData[i] = summedValue + downSideUpValue;
    }
    else if(summedValue > 0)
    {
        OutputData[i] = summedValue - upSideDownValue;
    }
    else
    {
        OutputData[i] = summedValue;
    }
}

return OutputData;
}

it works fine for me, i have later intention gradually change the value of upSideDownValue & downSideUpValue to gain a smoother output.

as far i tried with 4 pcm values from individual sources, it was ok for me. Not tried with more than that. — Ratul Sharker, Feb 21 '18 at 18:16

score 0 · Answer 19 · answered Dec 06 '18 at 09:28

This question is old but here is the valid method IMO.

Convert both sample in power.
Add both sample in power.
Normalize it. Such as the maximum value doesn't go over your limit.
Convert back in amplitude.

You can make the first 2 steps together, but will need the maximum and minimum to normalize in a second pass for step 3 and 4.

I hope it helps someone.

score -1 · Answer 20 · answered Dec 17 '08 at 21:16

I'd say just add them together. If you're overflowing your 16 bit PCM space, then the sounds you're using are already incredibly loud to begin with and you should attenuate them. If that would cause them to be too soft by themselves, look for another way of increasing the overall volume output, such as an OS setting or turning the knob on your speakers.

Algorithm to mix sound

20 Answers20

Linked