Highest Voted 'data-wrangling' Questions

5

votes

4 answers

How to write an efficient wrapper for data wrangling, allowing to turn off any wrapped part when calling the wrapper

To streamline data wrangling, I write a wrapper function consisted of several "verb functions" that process the data. Each one performs one task on the data. However, not all tasks are applicable to all datasets that pass through this process, and…

asked Mar 03 '21 at 13:35

Emman

1,295
8
19

5

votes

1 answer

Data manipulation in Pandas: create a boolean column from values on column then fill with value from yet another column

ok, I've been trying this for too long, time to ask for help. I have a dataframe that looks a bit like this: person fruit quantity all_fruits 0 p1 grapes 2 [grapes, banana] 1 p1 banana 1 [grapes, banana] 2 p2…

python pandas function dataframe data-wrangling

asked Sep 03 '20 at 12:48

Giovanna Fernandes

97
1
8

4

votes

3 answers

How to get Pandas df.merge() mismatch column name

Given the following data: data_df = pd.DataFrame({ "Reference": ("A", "A", "A", "B", "C", "C", "D", "E"), "Value1": ("U", "U", "U--","V", "W", "W--", "X", "Y"), "Value2": ("u", "u--", "u","v", "w", "w", "x", "y") }, index=[1, 2, 3,…

python pandas data-wrangling

asked May 09 '21 at 12:04

Ricardo Sanchez

3,945
8
44
73

4

votes

4 answers

Top "n" rows of each group using dplyr -- with different number per group

I'll use the built-in chickwts data as an example. Here's the data, there are 5 feed types. > head(chickwts) weight feed 1 179 horsebean 2 160 horsebean 3 136 horsebean 4 227 horsebean 5 217 horsebean 6 168 horsebean >…

r dplyr data-cleaning data-wrangling

asked Dec 02 '20 at 05:28

876868587

2,802
2
16
43

4

votes

3 answers

Check if values of one dataframe exist in another dataframe in exact order

I have 1 dataframe of data and multiple "reference" dataframes. I'm trying to automate checking if values of the dataframe match the values of the reference dataframes. Importantly, the values must also be in the same order as the values in the…

r dataframe data-wrangling

asked Jul 24 '20 at 17:43

psychcoder

447
1
7

4

votes

1 answer

R: Changing column names in pivot_wider() -- suffix to prefix

I'm trying to figure out how to alter the way in which tidyr's pivot_wider() function creates new variable names in resulting wide data sets. Specifically, I would like the "names_from" variable to be added to the prefix of the new variables rather…

r reshape tidyr data-wrangling

asked Jul 23 '20 at 19:40

mkpcr

171
10

3

votes

3 answers

Create a new column based on the the values and heading of another dataset

Say I have an original dataset whose values in the first column are from a to d in the alphabet df1: a x1 b x2 c x3 d x4 e x5 and then I have another dataset which multiple columns but whose entries reference the columns in the aforementioned…

r dataframe dplyr tidyr data-wrangling

asked May 01 '21 at 22:14

user849541

93
5

3

votes

3 answers

Tidy data with variable in intermittent rows

I have datalogger that inserts a row with a timestamp every time the logger is turned on. The timestamp string is always the same format, but there are an inconsistent number of readings per timestamp. How do I tidy the timestamp rows into a time…

r tidyverse tidyr data-wrangling

asked May 01 '21 at 20:03

JMDR

109
7

3

votes

1 answer

Pivot_longer() in R without separator?

I am trying to transform a table using pivot_longer() in R. But the separation is not by any common symbol such as "_" or "." but rather by how the column name ends (either "B" or "T"). I tried to use regular expression but not much success. Below…

r dplyr data-wrangling

asked Feb 15 '21 at 19:07

Nick

93
5

3

votes

1 answer

Calculating the ratio between the average engine life expectancies

I have a small R dataframe below containing cars made in Japan and in Mexico from 2006 to 2008. I need to calculate the ratio between the average engine life for the cars built in Japan and Mexico for each year. I am using dplyr and so far I have…

r dataframe dplyr data-wrangling

asked Oct 20 '20 at 22:03

Kintaro

157
1
7

3

votes

1 answer

How to get a series from a pandas dataframe using a series of column names?

I have a pandas dataframe df with numeric data. I also have a series s with the same index as df and values consisting of df column labels, e.g. import pandas as pd df = pd.DataFrame( index=[0, 1, 2], columns=[0, 1, 2], data=[[1, 2, 3], [4,…

python pandas indexing slice data-wrangling

asked Aug 05 '20 at 16:49

nijshar28

93
5

3

votes

4 answers

Is there a way to build a pairwise data frame based on shared values in another data frame in R?

For example DF1 is: Id1 Id2 1 10 2 10 3 7 4 7 5 10 And want DF2: Id1 Id2 1 2 1 5 2 5 3 4 The data frame DF2 is a pairwise set of values from Id1 column in DF1 that shared a common value in Id2 of…

r data-wrangling

asked Jul 24 '20 at 01:52

Jesse Leavitt

33
3

3

votes

1 answer

Turn column levels inside-out

I have a pandas DataFrame which looks like this (code to create it is at the bottom of the question): col_1 col_2 foo_1 foo_2 col_3 col_4 col_3 col_4 0 1 4 2 8 5 7 1 3 1 6 3 8 …

python pandas dataframe reshape data-wrangling

asked Jul 09 '20 at 15:31

ignoring_gravity

3,911
3
16
33

3

votes

3 answers

How do I find the two lowest values across selected columns in each row of a pandas dataframe?

In calculating grades, I drop each student's two lowest homework scores. A sample dataframe is shown here: df=pd.DataFrame([[10, 9, 10, 5, 7], [8, 7, 9, 9, 4], [10, 10, 7, 0, 8], [5, 9, 7, 6, 3], [10, 5, 0, 8, 10], [8, 9, 10, 10,…

python pandas minimum data-wrangling

asked Jul 06 '20 at 19:06

AJCaffarini

45
6

3

votes

3 answers

How to create a percentage column based on the values present in every third row?

I have a data frame containing the values of weight. I have a create a new column, percentage change of weight wherein the denominator takes the value of every third row. df <- data.frame(weight = c(30,30,109,30,309,10,20,20,14)) # expected…

r data-manipulation data-wrangling

asked Jun 28 '20 at 07:59

Silent_bliss

307
6

Questions tagged [data-wrangling]