Questions tagged [joblib]

Joblib is a set of tools to provide lightweight pipelining in Python.

Joblib is a set of tools to provide lightweight pipelining in Python.

http://pythonhosted.org/joblib/

561 questions
68
votes
4 answers

What does the delayed() function do (when used with joblib in Python)

I've read through the documentation, but I don't understand what is meant by: The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax. I'm using it to iterate over the list I want to…
orrymr
  • 1,675
  • 3
  • 17
  • 26
57
votes
11 answers

ImportError: cannot import name 'joblib' from 'sklearn.externals'

I am trying to load my saved model from s3 using joblib import pandas as pd import numpy as np import json import subprocess import sqlalchemy from sklearn.externals import joblib ENV = 'dev' model_d2v = load_d2v('model_d2v_version_002', ENV) def…
Praneeth Sai
  • 721
  • 1
  • 4
  • 8
42
votes
1 answer

Out-of-core processing of sparse CSR arrays

How can one apply some function in parallel on chunks of a sparse CSR array saved on disk using Python? Sequentially this could be done e.g. by saving the CSR array with joblib.dump opening it with joblib.load(.., mmap_mode="r") and processing the…
rth
  • 8,446
  • 4
  • 41
  • 70
42
votes
8 answers

Tracking progress of joblib.Parallel execution

Is there a simple way to track the overall progress of a joblib.Parallel execution? I have a long-running execution composed of thousands of jobs, which I want to track and record in a database. However, to do that, whenever Parallel finishes a…
Cerin
  • 50,711
  • 81
  • 269
  • 459
41
votes
6 answers

How can we use tqdm in a parallel execution with joblib?

I want to run a function in parallel, and wait until all parallel nodes are done, using joblib. Like in the example: from math import sqrt from joblib import Parallel, delayed Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10)) But, I want…
Dror Hilman
  • 4,741
  • 7
  • 33
  • 50
23
votes
2 answers

Why is it important to protect the main loop when using joblib.Parallel?

The joblib docs contain the following warning: Under Windows, it is important to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.Parallel. In other words, you should be writing code like this: import…
Joe
  • 3,179
  • 2
  • 24
  • 41
22
votes
2 answers

How to write to a shared variable in python joblib

The following code parallelizes a for-loop. import networkx as nx; import numpy as np; from joblib import Parallel, delayed; import multiprocessing; def core_func(repeat_index, G, numpy_arrary_2D): for u in G.nodes(): …
user3813057
  • 721
  • 1
  • 12
  • 27
19
votes
3 answers

How do I store a TfidfVectorizer for future use in scikit-learn?

I have a TfidfVectorizer that vectorizes collection of articles followed by feature selection. vectroizer = TfidfVectorizer() X_train = vectroizer.fit_transform(corpus) selector = SelectKBest(chi2, k = 5000 ) X_train_sel =…
user2161903
  • 497
  • 1
  • 6
  • 19
18
votes
4 answers

How to properly pickle sklearn pipeline when using custom transformer

I am trying to pickle a sklearn machine-learning model, and load it in another project. The model is wrapped in pipeline that does feature encoding, scaling etc. The problem starts when i want to use self-written transformers in the pipeline for…
spiral
  • 191
  • 1
  • 6
18
votes
2 answers

how to save a scikit-learn pipline with keras regressor inside to disk?

I have a scikit-learn pipline with kerasRegressor in it: estimators = [ ('standardize', StandardScaler()), ('mlp', KerasRegressor(build_fn=baseline_model, nb_epoch=5, batch_size=1000, verbose=1)) ] pipeline = Pipeline(estimators) After,…
Dror Hilman
  • 4,741
  • 7
  • 33
  • 50
17
votes
1 answer

Python, parallelization with joblib: Delayed with multiple arguments

I am using something similar to the following to parallelize a for loop over two matrices from joblib import Parallel, delayed import numpy def processInput(i,j): for k in range(len(i)): i[k] = 1 for t in range(len(b)): j[t]…
Francesco
  • 363
  • 3
  • 8
16
votes
2 answers

Removing cached files after a pytest run

I'm using a joblib.Memory to cache expensive computations when running tests with py.test. The code I'm using reduces to the following, from joblib import Memory memory = Memory(cachedir='/tmp/') @memory.cache def expensive_function(x): return…
rth
  • 8,446
  • 4
  • 41
  • 70
16
votes
2 answers

Multiprocessing backed parallel loops cannot be nested below threads

What is the reason of such issue in joblib? 'Multiprocessing backed parallel loops cannot be nested below threads, setting n_jobs=1' What should I do to avoid such issue? Actually I need to implement XMLRPC server which run heavy computation in…
Alex
  • 313
  • 3
  • 12
15
votes
3 answers

Can functions know if they are already multiprocessed in Python (joblib)

I have a function that uses multiprocessing (specifically joblib) to speed up a slow routine using multiple cores. It works great; no questions there. I have a test suite that uses multiprocessing (currently just the multiprocessing.Pool() system,…
Michael Scott Cuthbert
  • 2,611
  • 2
  • 18
  • 40
15
votes
2 answers

Writing a parallel loop

I am trying to run a parallel loop on a simple example. What am I doing wrong? from joblib import Parallel, delayed import multiprocessing def processInput(i): return i * i if __name__ == '__main__': # what are your inputs, and…
KMA
  • 183
  • 1
  • 1
  • 4
1
2 3
37 38