Load audio from google storage using google colab (Python)

Question

i save my audio file in google storage in wav format, but when i try to load the audio using google colab, i not manage to done it.

below the example i used to load audio from google storage.

import numpy as np
import IPython.display as ipd
import librosa
import soundfile as sf
import io
from google.cloud import storage
import os

from google.colab import auth
auth.authenticate_user()


os.environ["GCLOUD_PROJECT"] = "fundpro" #project_id
BUCKET = 'parli-2020' #bucket_name
gcs = storage.Client()
bucket = gcs.get_bucket(BUCKET)
import speech_recognition as sr

for blob in bucket.list_blobs(prefix='speech/Transcribe'):

    filename = 'gs://parli-2020/' + blob.name
    X, sample_rate = librosa.core.load(filename)

but the error: filename cannot be found.[Errno 2] No such file or directory

my question : how to load audio from Google Storage/ how to read audio from google storage

Does this answer your question? [how to load audio from Google Storage/ how to read audio from google storage](https://stackoverflow.com/questions/66916535/how-to-load-audio-from-google-storage-how-to-read-audio-from-google-storage) — Jon Nordby, Apr 06 '21 at 12:06
i have try with the suggestion, but getting error on gs is not supported... is there anywhere how to maintain the audio in wav format? because there is another process after read the audio. — que23, Apr 07 '21 at 03:45

Crash0v3rrid3 · Answer 1 · 2021-04-05T09:02:11.153

0

Librosa uses the native python io implementation which doesn't support Google filesystem. You can use tensorflow's GFile implementation.

Something like this,

import numpy as np
import IPython.display as ipd
import librosa
import soundfile as sf
import io
from google.cloud import storage
import os
import tensorflow.io.gfile as gf

from google.colab import auth
auth.authenticate_user()

os.environ["GCLOUD_PROJECT"] = "fundpro" #project_id
BUCKET = 'parli-2020' #bucket_name
gcs = storage.Client()
bucket = gcs.get_bucket(BUCKET)
import speech_recognition as sr

for blob in bucket.list_blobs(prefix='speech/Transcribe'):
    filename = 'gs://parli-2020/' + blob.name
    with gf.GFile(filename, 'rb') as fp:
        X, sample_rate = librosa.core.load(fp)

edited Apr 05 '21 at 09:02

answered Apr 05 '21 at 03:15

Crash0v3rrid3

352
1
5

thank you for you respond and help, however, i get this error ModuleNotFoundError: No module named 'tensorflow.gfile', which tensorflow that suitable for tensorflow.gfile – que23 Apr 05 '21 at 03:52
Run pip install tensorflow-io. Then try again with the updated import statement. – Crash0v3rrid3 Apr 05 '21 at 03:57
AttributeError: module 'tensorflow.io.gfile' has no attribute 'Open' already can import but facing another issue on, no attribute 'Open'... – que23 Apr 05 '21 at 06:44
Try again with the above update. – Crash0v3rrid3 Apr 05 '21 at 07:08
sorry for keep asking for help, i getting this new error relating with the file format, my audio in wav format RuntimeError: Error opening : File contains data in an unknown format. and why when execute the code, with gf.GFile(filename) as fp: X, sample_rate = librosa.core.load(fp) fp is not return the file name, its return this "tensorflow.python.platform.gfile.GFile object at 0x7f884c60eed0" – que23 Apr 05 '21 at 08:37
Try opening the file as binary. – Crash0v3rrid3 Apr 05 '21 at 09:03
i need to maintan the audio in wav format, since i want to do audio analysis... in binary can load but cannot process the audio. – que23 Apr 07 '21 at 03:51
can we convert binary to wav format? – que23 Apr 07 '21 at 04:50

Load audio from google storage using google colab (Python)

1 Answers1