Questions tagged [missing-data]

For questions relating to missing data problems, which can involve special data structures, algorithms, statistical methods, modeling techniques, visualization, among other considerations.

When working with data in regular data structures (e.g. tables, matrices, arrays, tensors), some data may not be observed, may be corrupted, or may not yet be observed. Treatment of such data requires additional annotation as well as methodological considerations when deciding how to impute or use such data in standard contexts. This becomes a problem in data-intensive contexts, such as large statistical analyses of databases.

Missing data occur in many fields, from survey data to industrial data. There are many underlying missing data mechanisms (reasons why the data is missing). In survey data for example, data might be missing due to drop-out. People answering the survey might run out of time.

Rubin classified missing data into three types:

  1. missing completely at random;
  2. missing at random;
  3. missing not at random.

Note that some statistical analysis is only valid under certain class.

2225 questions
0
votes
3 answers

What are the standard ways of filling missing values in python?

I have a very limited dataset having variety of columns having missing values. I can not prune the rows having missing values as it will reduce the size drastically. Can anyone suggest, standard procedure for it ?
0
votes
1 answer

How to do fill missing values in a column which are relative to another column?

My data set has columns labelled "City", "Zipcode", "Neighbourhood". I have all the values for Neighbourhood but some values for city and zipcode are missing. How do I match the "Neighbourhood" columns to the given values in "City" and "Zipcode"…
WhoDis
  • 1
0
votes
0 answers

Finding out the NAN values for Summary report

List item ```def drag_mis(data): list = [] for val in data.values: if np.any(val) == None: list.append(val) return list.count(val)``` """ Need a summary report like a file attached…
0
votes
1 answer

How to handle missing values in linear regression?

I have a data frame with 60 variables and all variables have missing values in a way that none of the lines are complete: complete.cases(data) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE…
Debutant
  • 165
  • 9
0
votes
1 answer

Soft-impute on the test set with fancyimpute

The python package fancyimpute provides several data imputation methods. I have tried to use the soft-impute approach; however, soft-impute doesn't offer a transform method to be used on the test dataset. More precisely, Sklearn SimpleImputer (for…
Yasmin
  • 871
  • 3
  • 13
  • 32
0
votes
0 answers

Impute NaN Values based on mean by catégories

I am struggling with something that might seem simple: I want to replace NaNby the mean of the value for the Disease import numpy as np import pandas as pd df = pd.DataFrame({'Disease' : [1, 0, 0, 1, 1], 'Value1' : [3, 1, 2, 4, np.nan], …
0
votes
3 answers

SQL code to find if a series of lists do NOT contain a particular value

I have two tables Jobs +-----+------+ | Job | Name | +-----+------+ | 1 | Foo | | 2 | Bar | | 3 | Baz | | 4 | Qwe | +-----+------+ Job_Operations +-----+--------------+ | Job | Work_Center | +-----+--------------+ | 1 | SomeCenter …
0
votes
0 answers

Handling rows with 2 lines of data in Python

My DataFrame looks like this : there are some rows ( example: 297) where the "Price" column has two values ( Plugs and Quarts), I have filled the Nans with the previous row since it belongs to the same Latin Name. However I was thinking of…
Sca
  • 103
  • 7
0
votes
1 answer

missing in function applied to pandas dataframe column

I'm trying to apply a function to my 'age' and 'area' columns in order to get the results that I show in the column 'wanted'. Unfortunately this funtion gives me errors. I know that there are other methods in Pandas, like iloc, but I would like to…
progster
  • 557
  • 2
  • 10
  • 22
0
votes
0 answers

R How do I impute missing values in a time-dependent manner based on a previous value?

I have a data frame that is time-series like in nature where an order determines the value of the $feed_type column. Unfortunately the order can change at a later $date and affect successive days. Because the order is only made once for the…
nlp
  • 81
  • 4
0
votes
2 answers

missing value conditions Pandas in a function

I would like a function where if the area column has missing values (like NULL in SQL) the result is 'A' in the target 'wanted' variable. I'm confused about use of None, isnull(), np.nan concepts in Python raw_data = {'area':…
progster
  • 557
  • 2
  • 10
  • 22
0
votes
2 answers

Create a variable based on values of two years from another variable in R

It looks simple, but I couldn't find the answer on-line. I have panel data with city characteristics over the years 1995-2015. For some variables, I just have data for the years 2000 and 2010. Therefore, I want to create new variables where I impute…
0
votes
1 answer

Highmaps get country name on click when country has no data

I have a Highmaps-map of the world, and display data for some countries. Getting a click handler for these countries is simple. (see also highmaps get country name on click event) However, I would like to be able to also detect clicks on countries…
RobAu
  • 17,042
  • 8
  • 69
  • 108
0
votes
0 answers

Python - Website Log In - Scraping CSRF Token failed

I am new here and new at programming. I am trying to log in the…
0
votes
0 answers

How to describe cases included in analyses in R?

I'm very new to R and pretty basic with analyses generally. I successfully ran a regression in R, but a lot of my data are missing. I'm fine with that because R just ignores the missing observations in the analyses and shows me the dfs in the…
Hmori
  • 35
  • 4
1 2 3
99
100