9

How can I remove the silence from the beginning and the end of wave files with PyDub?

I guess I should access segment by segment and check whether it's silent or not (but I'm not able to do it) :/

e.g. I have a wave file with silence at the beginning, end, or both (like below) and I want to remove the silence at the beginning and at the end of the file:

wave file with silence

e.g. I want to import it

sound = AudioSegment.from_wav(inputfile)

cycle for every sample of sound to check whether it's silent and mark the last silent sample since when the waves starts (marker1), then get to the last sample before the wave ends (marker2) and I can export the new sound file from the two markers

newsound = sound[marker1:marker2]

newsound.export(outputfile, format="wav")
Community
  • 1
  • 1
DaniPaniz
  • 988
  • 2
  • 10
  • 18

2 Answers2

35

I would advise that you cycle in chunks of at least 10 ms in order to do it a little more quickly (less iterations) and also because individual samples don't really have a "loudness".

Sound is vibration, so at a minimum it would take 2 samples to detect whether there was actually any sound, (but that would only tell you about high frequency).

Anyway… something like this could work:

from pydub import AudioSegment

def detect_leading_silence(sound, silence_threshold=-50.0, chunk_size=10):
    '''
    sound is a pydub.AudioSegment
    silence_threshold in dB
    chunk_size in ms

    iterate over chunks until you find the first one with sound
    '''
    trim_ms = 0 # ms

    assert chunk_size > 0 # to avoid infinite loop
    while sound[trim_ms:trim_ms+chunk_size].dBFS < silence_threshold and trim_ms < len(sound):
        trim_ms += chunk_size

    return trim_ms

sound = AudioSegment.from_file("/path/to/file.wav", format="wav")

start_trim = detect_leading_silence(sound)
end_trim = detect_leading_silence(sound.reverse())

duration = len(sound)    
trimmed_sound = sound[start_trim:duration-end_trim]
Oleg Melnikov
  • 2,591
  • 1
  • 25
  • 56
Jiaaro
  • 67,024
  • 38
  • 154
  • 182
  • that's exactly what I was looking for, thanks! I didn't know that the attribute .dBFS contains the db of each chunk of sound, as well as that you can iterate over sounds this way... it's also compelling the implementation of a threshold and the size of the chunk (exactly what I was thinking of). – DaniPaniz Apr 10 '15 at 11:42
  • To contribute to this funciton I'll add a little hack that: 1) cuts the sound exactly at the point in which it departs from 0 db (so that you don't miss any ms at the onset/offset of the sound) by discarding the threshold 2) a parameter that adds a given amount of silence at the beginning or at the end of the sound if wanted – DaniPaniz Apr 10 '15 at 11:42
  • @DaniPaniz seems you could just apply the silence buffer at the trimming step, like `sound[start_trim - buffer : duration - end_trim + buffer]` – Jiaaro Apr 10 '15 at 14:39
  • done here, thanks a lot Jiaaro, your help will be cited in my works :) for f in files: if f[-4:] == '.wav': fp = path_files + f end_trim = detect_leading_silence(sound.reverse()) duration = len(sound) trimmed_sound = sound[:duration-end_trim+10] trimmed_sound.export(fp, format="wav") – DaniPaniz Apr 10 '15 at 16:21
  • Jiaaro, thanks for the code. I noted that the loop is infinite when the input file is all noise. So, added a few conditions to avoid it. – Oleg Melnikov Dec 29 '17 at 07:01
3

You can this code:

from pydub.silence import detect_nonsilent

def remove_sil(path_in, path_out, format="wav"):
    sound = AudioSegment.from_file(path_in, format=format)
    non_sil_times = detect_nonsilent(sound, min_silence_len=50, silence_thresh=sound.dBFS * 1.5)
    if len(non_sil_times) > 0:
        non_sil_times_concat = [non_sil_times[0]]
        if len(non_sil_times) > 1:
            for t in non_sil_times[1:]:
                if t[0] - non_sil_times_concat[-1][-1] < 200:
                    non_sil_times_concat[-1][-1] = t[1]
                else:
                    non_sil_times_concat.append(t)
        non_sil_times = [t for t in non_sil_times_concat if t[1] - t[0] > 350]
        sound[non_sil_times[0][0]: non_sil_times[-1][1]].export(path_out, format='wav')
petezurich
  • 6,779
  • 8
  • 29
  • 46
Dan Erez
  • 914
  • 9
  • 13