Questions tagged [imputation]

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values). Multiple methods for imputation exist, including: imputing missing values with a single value, such as the mean or median or some specific value based on domain-expertise; distance based heuristics such as kNN; stochastic averaging via multiple imputation; and model-based methods including Expectation Maximization (EM).

Suggested tag synonym: "missing-data"

671 questions

votes

4 answers

Imputer on some Dataframe columns in Python

I am learning how to use Imputer on Python. This is my code: df=pd.DataFrame([["XXL", 8, "black", "class 1", 22], ["L", np.nan, "gray", "class 2", 20], ["XL", 10, "blue", "class 2", 19], ["M", np.nan, "orange", "class 1", 17], ["M", 11, "green",…

asked Jul 26 '16 at 07:59

Mauro Gentile

1,055
4
21
33

votes

3 answers

R: replace NA with item from vector

I am trying to replace some missing values in my data with the average values from a similar group. My data looks like this: X Y 1 x y 2 x y 3 NA y 4 x y And I want it to look like this: X Y 1 x y 2 x y 3 y y 4 x …

r replace missing-data imputation

asked Jul 13 '11 at 19:47

gregmacfarlane

1,907
2
22
44

votes

0 answers

Use of statsmodels.imputation.mice

I am exploring statsmodels.imputation.mice package to use for imputing missing values. I haven't seen any example of its usage, though, outside of http://www.statsmodels.org. From what I gather, one would create an instance of mice.MICEData and use…

statsmodels imputation

asked Sep 13 '17 at 22:45

David Makovoz

1,411
2
13
23

votes

3 answers

How to transform some columns only with SimpleImputer or equivalent

I am taking my first steps with scikit library and found myself in need of backfilling only some columns in my data frame. I have read carefully the documentation but I still cannot figure out how to achieve this. To make this more specific, let's…

python pandas scikit-learn data-science imputation

asked Aug 13 '19 at 10:31

quiet-ranger

votes

1 answer

Multiple Imputation of missing and censored data in R

I have a dataset with both missing-at-random (MAR) and censored data. The variables are correlated and I am trying to impute the missing data conditionally so that I can estimate the distribution parameters for a correlated multivariate normal…

r missing-data imputation

asked May 07 '17 at 03:02

chelsea

votes

3 answers

Implementation of sklearn.impute.IterativeImputer

Consider data which contains some nan below: Column-1 Column-2 Column-3 Column-4 Column-5 0 NaN 15.0 63.0 8.0 40.0 1 60.0 51.0 NaN 54.0 31.0 2 15.0 17.0 55.0 80.0 NaN 3 54.0 43.0 70.0 16.0 …

python dataframe scikit-learn missing-data imputation

asked Jul 22 '19 at 21:52

k.ko3n

votes

2 answers

Impute missing data with mean by group

I have a categorical variable with three levels (A, B, and C). I also have a continuous variable with some missing values on it. I would like to replace the NA values with the mean of its group. This is, missing observations from group A has to be…

r loops missing-data imputation

asked Mar 25 '19 at 20:03

Jonatan Ottino

votes

3 answers

Generate larger synthetic dataset based on a smaller dataset in Python

I have a dataset with 21000 rows (data samples) and 102 columns (features). I would like to have a larger synthetic dataset generated based on the current dataset, say with 100000 rows, so I can use it for machine learning purposes thereby. I've…

python machine-learning scikit-learn imputation

asked Mar 06 '19 at 16:04

JChat

votes

1 answer

Differences between sklearn's SimpleImputer and Imputer

In python's sklearn library there exist two classes, which are doing approximately the same things: sklearn.preprocessing.Imputer and sklearn.impute.SimpleImputer The only difference that I found is a "constant" strategy type in SimpeImputer. Is…

python machine-learning scikit-learn imputation

asked Dec 24 '18 at 11:15

MefAldemisov

votes

1 answer

How to do forward filling for each group in pandas

I have a dataframe similar to below id A B C D E 1 2 3 4 5 5 1 NaN 4 NaN 6 7 2 3 4 5 6 6 2 NaN NaN 5 4 1 I want to do a null value imputation for columns A, B, C in a forward filling but for each group. That means, I want…

python pandas imputation

asked Dec 09 '18 at 21:10

HHH

4,945
14
76
138

votes

2 answers

Is there a way to impute missing values in machine learning?

For personal knowledge, I've been trying out different imputation methods other than the mean/median/mode. I was able to try out KNN, MICE, median imputational methods so far. I was told that imputation by clustering method can also be done and my…

python machine-learning imputation

asked Apr 16 '18 at 10:06

uharsha33

votes

3 answers

Can I use Train AND Test data for Imputation?

Interestingly, I see a lot of different answers about this both on stackoverflow and other sites: While working on my training data set, I imputed missing values of a certain column using a decision tree model. So here's my question. Is it fair to…

python-2.7 data-science imputation

asked Oct 14 '17 at 20:28

Analysa

votes

3 answers

Error in "missforest" in R

Need help to get around the below error while performing data imputation in R using "missforest" package. > imputed<- missForest(dummy, maxiter = 10, ntree = 100, variablewise = TRUE, + decreasing = TRUE, verbose = TRUE, + …

r imputation

asked Sep 08 '17 at 22:33

Sandeep

votes

4 answers

Python - SkLearn Imputer usage

I have the following question: I have a pandas dataframe, in which missing values are marked by the string na. I want to run an Imputer on it to replace the missing values with the mean in the column. According to the sklearn documentation, the…

python scikit-learn imputation

asked Jul 01 '16 at 16:42

lte__

5,472
13
55
106

votes

2 answers

Testing for missing values in R

I have a time series data set which has some missing values in it. I wish to impute the missing values but I am unsure as to which method is most appropriate e.g linear, spline or stine from the imputeTS package. For the sake of completeness I wish…

r missing-data imputation imputets

asked Feb 07 '17 at 00:33

TheGoat

1,765
2
16
40

Prev 1

…

44 45 Next