0

I generate sound samples at different frequency (sin/saw/triangle generators) as an array of double values [-1...1] ​​(1-maximum amplitude). I'd like to combine all signals to one.

1) If I add (combineWithNormalize) and finally normalize to [-1...1] - quality of sound is good, but signal is too silent.

2) If I add using linear(combineWithLinearDynaRangeCompression) or log (combineWithLnDynaRangeCompression) compression - signal is more loud, but quality is terrible (metal sounding). What am I doing wrong? I suppose that I missed processing step. What are generally acceptable algorithms for adding audio signals from several source wav-files with creating the final file (which methods do synthesizers like Yamaha etc use for this purpose)?

Additionals:

-My generated audio (combing two samples: top - combineWithNormalize, bottom - combineWithLnDynaRangeCompression). Top signal is quiet, but correct. Bottom - is more loud, but frightful. audio samples

-Java code (draft, not optimized):

// add samples and linear normalize to [-1,1]
public static double[] combineWithNormalize( double[]... audio) {
    if (audio.length == 0) return null;
    if (audio.length == 1) return audio[0];

    int maxIdx = 0;
    // look for the longest sample
    for(double[] arr: audio)
        if (arr.length > maxIdx) maxIdx = arr.length;

    // add 0 to the end of short samples
    for(int i=0; i < audio.length; i++)
        if (audio[i].length < maxIdx) audio[i] = Arrays.copyOf(audio[i], maxIdx);

    // add all samples to result (+ find absolute max value)
    double[] result = new double[maxIdx];
    double normalizer  = 1.0;
    for (int i = 0; i < maxIdx; i++) {
        for (int j = 0; j < audio.length; j++)
            result[i] += audio[j][i];
        double res = Math.abs(result[i]);
        if (res > normalizer)
            normalizer = res;
    }

    //normalize rezult
    double coeff = 1.0/ normalizer;
    if (normalizer !=1.0)
        for (int i = 0; i < maxIdx; i++)
            result[i] *= coeff;
    return result;
}

// add samples and liners compression (all samples must be [-1..1])
public static double[] combineWithLinearDynaRangeCompression(double threshold, double[]... audio) {
    if (audio.length == 0 || threshold >= 1 || threshold < 0) return null;
    if (audio.length == 1) return audio[1];
    int maxIdx = 0;

    // look for the longest sample
    for(double[] arr: audio)
        if (arr.length > maxIdx) maxIdx = arr.length;

    // add 0 to the end of short samples
    for(int i=0; i < audio.length; i++)
        if (audio[i].length < maxIdx) audio[i] = Arrays.copyOf(audio[i], maxIdx);

    double[] result = Arrays.copyOf(audio[0], maxIdx); // Copy first sample
    double linearCoeff  = (1-threshold)/(2-threshold);

    // Add all samples to first + compression
    for (int j = 0; j < maxIdx; j++) {
        double res = 0;
        for (int i = 1; i < audio.length; i++)
            res =  result[j] + audio[i][j];
        double absRes = Math.abs(res);
        result[j] = (absRes <= threshold) ? res :  Math.signum(res) * (threshold + linearCoeff * (absRes - threshold));
    }
    return result;
}

// add samples and log compression (all samples must be [-1..1])
public static double[] combineWithLnDynaRangeCompression(double threshold, double[]... audio) {
    if (audio.length == 0 || threshold >= 1 || threshold < 0) return null;
    if (audio.length == 1) return audio[0];
    int maxIdx = 0;

    // look for the longest sample
    for(double[] arr: audio)
        if (arr.length > maxIdx) maxIdx = arr.length;

    // add 0 to the end of short samples
    for(int i=0; i < audio.length; i++)
        if (audio[i].length < maxIdx) audio[i] = Arrays.copyOf(audio[i], maxIdx);

    double[] result = Arrays.copyOf(audio[0], maxIdx); // Copy first sample
    double expCoeff = alphaT[(int) threshold*100];

    // Add all samples to first + compression
    for (int j = 0; j < maxIdx; j++) {
        double res = 0;
        for (int i = 1; i < audio.length; i++)
            res =  result[j] + audio[i][j];
        double absRes = Math.abs(res);
        result[j] = (absRes <= threshold) ? res :
                Math.signum(res) * (threshold + ( 1 - threshold) *
                    Math.log(1.0 + expCoeff * (absRes-threshold) /(2-threshold)) /  Math.log(1.0 + expCoeff ));
    }
    return result;
}

// Solutions of equations  pow(1+x,1/x)=exp((1-t)/(2-t)) for t=0, 0.01, 0.02 ... 0.99
final private static double[] alphaT =
        {
                2.51286, 2.54236, 2.57254, 2.60340, 2.63499, 2.66731, 2.70040, 2.73428, 2.76899, 2.80454,
                2.84098, 2.87833, 2.91663, 2.95592, 2.99622, 3.03758, 3.08005, 3.12366, 3.16845, 3.21449,
                3.26181, 3.31048, 3.36054, 3.41206, 3.46509, 3.51971, 3.57599, 3.63399, 3.69380, 3.75550,
                3.81918, 3.88493, 3.95285, 4.02305, 4.09563, 4.17073, 4.24846, 4.32896, 4.41238, 4.49888,
                4.58862, 4.68178, 4.77856, 4.87916, 4.98380, 5.09272, 5.20619, 5.32448, 5.44790, 5.57676,
                5.71144, 5.85231, 5.99980, 6.15437, 6.31651, 6.48678, 6.66578, 6.85417, 7.05269, 7.26213,
                7.48338, 7.71744, 7.96541, 8.22851, 8.50810, 8.80573, 9.12312, 9.46223, 9.82527, 10.21474,
                10.63353, 11.08492, 11.57270, 12.10126, 12.67570, 13.30200, 13.98717, 14.73956, 15.56907, 16.48767,
                17.50980, 18.65318, 19.93968, 21.39661, 23.05856, 24.96984, 27.18822, 29.79026, 32.87958, 36.59968,
                41.15485, 46.84550, 54.13115, 63.74946, 76.95930, 96.08797, 125.93570, 178.12403, 289.19889, 655.12084
        };

Thanks in advance.

MaratSR
  • 77
  • 5
  • I think this link involves every answer related to `Audio_Processing` whether it is *Pre-Processing* or *Post-Processing*: [Android_Audio_Processing_Using_WebRTC](https://github.com/mail2chromium/Android-Audio-Processing-Using-WebRTC), You can also visit this reference: https://stackoverflow.com/a/58546599/10413749 – Muhammad Usman Apr 07 '20 at 08:47

1 Answers1

1

I haven't tested your code but I'll share some general tips:

"I generate sound samples at different frequency (sin/saw/triangle generators)"

So you have PCM samples in some Byte Array. Let's assume 16-bit, each Short @ [i] holds the amplitude of the sample @ [i]. Where [i] is your position within total samples amount.

..."As an array of Double values"...

For your digital sound (PCM), you should be using Floats. Is your input sound in 16-bit format? You could later convert to 16-bit value Integers (or Shorts).

Also check this other Answer : https://stackoverflow.com/a/10325317/2057709

To the question...

"What is correct way to add audio samples into the one without clipping"

What's wrong with using + for adding?

final_sample[i] = ( sourceA[i] /2 ) + ( sourceB[i] /2 ); //divide by 2 to halve amplitudes

We divide by 2 to halve the amplitudes of each source. This way, even if each Source had a sample value (amplitude) of 1.0, during mixing they would give 0.5 as their max value.
The mixed final_sample would now total as 1.0. Hopefully no clipping.

"1) If I add (combineWithNormalize) and finally normalize to [-1...1] :
result: Quality of sound is good, but signal is too silent."

Try boosting signal by multiplying the sample values. Eg: signal * 2.0 //double volume.

PS: Check this article + other Stack Exchange Answers for any ideas:

(1) blog: Mix Audio Samples on iOS (try same logic, code is easy to understand).

(2) SO: Modify volume gain on audio sample buffer.

(3) SO: Mixing PCM audio samples.

(4) SO: Algorithm To Mix Sound.

(5) DSP: Algorithm(s) to mix audio signals without clipping.

VC.One
  • 12,230
  • 4
  • 21
  • 51
  • Thank you for the answer. Perhaps, I couldn't correctly write problem. Main problem: signal a and b have amplitude in [-1,1] , a+b has maximum level [-2, 2]. If you use ( sourceA[i] /2 ) + ( sourceB[i] /2 ) , we'll need to normalize. And sound volume of summarizing signal will be quiet then volume of signal a or signal b. – MaratSR May 28 '18 at 10:51