24

I'm aware of the following question: How to create a pydub AudioSegment using an numpy array?

My question is the right opposite. If I have a pydub AudioSegment how can I convert it to a numpy array?

I would like to use scipy filters and so on. It is not very clear to me what is the internal structure of the AudioSegment raw data.

Community
  • 1
  • 1
J_Zar
  • 1,334
  • 1
  • 15
  • 27

3 Answers3

16

Pydub has a facility for getting the audio data as an array of samples, it is an array.array instance (not a numpy array) but you should be able to convert it to a numpy array relatively easily:

from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")

# this is an array
samples = sound.get_array_of_samples()

You may be able to create a numpy variant of the implementation though. That method is implemented pretty simply:

def get_array_of_samples(self):
    """
    returns the raw_data as an array of samples
    """
    return array.array(self.array_type, self._data)

Creating a new audio segment from a (modified?) array of samples is also possible:

new_sound = sound._spawn(samples)

The above is a little hacky, it was written for internal use within the AudioSegment class, but it mainly just figures out what type of audio data you're using (array of samples, list of samples, bytes, bytestring, etc). It's safe to use despite the underscore prefix.

Jiaaro
  • 67,024
  • 38
  • 154
  • 182
9

You can get an array.array from an AudioSegment then convert it to a numpy.ndarray:

from pydub import AudioSegment
import numpy as np
song = AudioSegment.from_mp3('song.mp3')
samples = song.get_array_of_samples()
samples = np.array(samples)
shao.lo
  • 3,193
  • 2
  • 27
  • 36
mdeff
  • 1,208
  • 14
  • 20
  • 11
    The array won't be shaped / ordered as necessary for a scipy filter. After the above code block, you'll likely need: `samples = samples.reshape(song.channels, -1, order='F'); samples.shape # (, )`. The `samples` waveform is then ready for filtering, FFT analysis, plotting, etc (although you may want to cast it to float). – user2561747 Mar 19 '18 at 19:47
  • This comment is really helpful, combined with the answer ... solves my problem – Sachin Kumar Aug 10 '19 at 17:23
  • The code after; in a comment is neccessery? – Chris P Mar 19 '21 at 17:14
  • @ChrisP No, it is not neccessary - just for explanation – Bastian Ebeling Apr 30 '21 at 07:24
3

None of the existing answers is perfect, they miss reshaping and sample width. I have written this function that helps to convert the audio to the standard audio representation in np:

def pydub_to_np(audio: pydub.AudioSegment) -> (np.ndarray, int):
    """Converts pydub audio segment into float32 np array of shape [channels, duration_in_seconds*sample_rate],
    where each value is in range [-1.0, 1.0]. Returns tuple (audio_np_array, sample_rate)"""
    # get_array_of_samples returns the data in format:
    # [sample_1_channel_1, sample_1_channel_2, sample_2_channel_1, sample_2_channel_2, ....]
    # where samples are integers of sample_width bytes.
    return np.array(audio.get_array_of_samples(), dtype=np.float32).reshape((-1, audio.channels)).T / (
            1 << (8 * audio.sample_width)), audio.frame_rate

Piotr Dabkowski
  • 4,878
  • 4
  • 35
  • 45