Questions tagged [data-wrangling]

482 questions
2
votes
1 answer

probability of a row in one dataframe occurring in another dataframe

I have 2 dataframes df 1 (films sent to users): UserID Film 1 3 2 41 2 23 2 53 3 34 5 6 df 2 (films watched by users - subset of df 1): UserID Film 1 3 2 …
user341383
  • 139
  • 7
2
votes
2 answers

How to round and apply min and max to all values in Pandas Dataframe

I'm struggling on how to clean-up a dataframe. What I would like to do is truncate all items (i.e. floor()), and for any items below or over a min/max, replace with the min or max as applicable. E.g. for this dataframe: If my min and max are 1…
tendim
  • 302
  • 2
  • 8
2
votes
3 answers

Count and filter data based on paired data/every two rows?

Trying to set up for a McNemar test, but I cannot code very well (using R) My data is paired, and it is 1000 pairs long, so I have a column specifying the pair number like c(0 , 0 , 1, 1, 2, 2, 3, 3, 4, 4) A column specifying which member of the…
KVHelpMe
  • 61
  • 5
2
votes
1 answer

Generate a variable based on the most recent I/observation

My data is currently organized in Stata as follows: input str2 Country gdp_2015 gdp_2016 gdp_2017 imports_2016 imports_2017 exports_2016 "A" 11 12 13 5 6 8 5 "B" 11…
maldini425
  • 185
  • 7
2
votes
2 answers

Variable creation - Inferring age

I have a grouped dataframe; Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C') OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil') Odometer <- c(1000, 1000,…
Brad
  • 353
  • 2
  • 10
2
votes
1 answer

Combine objects in a json using javascript

Having a JSON in this format: [{ name: "A", country: "X", countryID: "02", value: 15 }, { name: "A", country: "Y", countryID: "01", value: 25 }, { name: "B", country: "X", countryID: "02", …
2
votes
2 answers

Fill column of dataframe using a list

I have the following dataframe: tibble( people = rep(c("person1", "person2", "person3"), each = 4), things = rep(c("thing1", "thing2", "thing3", "thing4"), times = 3), vals = 0) %>% group_by(people) %>% mutate(order =…
babylinguist
  • 308
  • 1
  • 11
2
votes
1 answer

Filling in multiple columns of missing data from another dataset

I have a data set that contains some missing values which can be completed by merging with a another dataset. My example: This is the updated data set I am working with. DF1 Name Paper Book Mug soap computer tablet coffee…
JeffB
  • 83
  • 9
2
votes
1 answer

Collapse rows by group based on multiple conditions (time difference and factor) in R

I am looking to collapse rows of data by group based on specified time difference (i.e. 60 mins) between timestamps and/or until a particular condition is met within the data. Here is a mock data frame of what I am working…
2
votes
3 answers

R grouping data by numeric numbers in a column

I am trying to group data by numbers in a column, I have tried different versions of group_by, cut, group etc but I have not been able to get it. I have a lot of data that looks like this: position variants 3 snv 5 snv 12 …
gdobbo
  • 23
  • 2
2
votes
4 answers

Get the sum for pair of rows

I have the following dataframe imported in R: product per1 per2 per3 A 10 20 30 B 23 14 21 C 26 95 81 Consider A:C as products listed in rows one after another and their corresponding sales values across…
Kathir
  • 21
  • 2
2
votes
1 answer

Is there a contiguous group labelling capability in Pandas

I have been puzzling over this problem for some time now. I was wandering if there is some "Pandas" like way to get there. I have a simple DataFrame with two columns PivotHigh and PivotLow, representing high values and low values. I need to "connect…
Slappy
  • 3,878
  • 1
  • 25
  • 40
2
votes
3 answers

check if numbers in a column are ascending by a certain value (R dataframe)

I have a column of numbers (index) in a dataframe like the below. I am attempting to check if these numbers are in ascending order by the value of 1. For example, group B and C do not ascend by 1. While I can check by sight, my dataframe is…
psychcoder
  • 447
  • 1
  • 7
2
votes
3 answers

Across several columns, count instances of pairs

I want to count pairs across several columns. That is, with more than two columns, count the number of times particular value pairs occur in the same row. Say I asked some people whether they liked different kinds of food, and they could answer…
mvanaman
  • 85
  • 6
2
votes
3 answers

Calculate employee count by hour and day

I have employee id, their clock in, and clock out timings by day. I want to calculate employee count by hour by day. Excel formula would work too. My sample data looks like this: Employee ID Day Clockin Clockout 1 Mon …
bp41
  • 163
  • 13
1 2
3
32 33