Questions tagged [data-wrangling]
482 questions
2
votes
1 answer
probability of a row in one dataframe occurring in another dataframe
I have 2 dataframes
df 1 (films sent to users):
UserID Film
1 3
2 41
2 23
2 53
3 34
5 6
df 2 (films watched by users - subset of df 1):
UserID Film
1 3
2 …
user341383
- 139
- 7
2
votes
2 answers
How to round and apply min and max to all values in Pandas Dataframe
I'm struggling on how to clean-up a dataframe. What I would like to do is truncate all items (i.e. floor()), and for any items below or over a min/max, replace with the min or max as applicable. E.g. for this dataframe:
If my min and max are 1…
tendim
- 302
- 2
- 8
2
votes
3 answers
Count and filter data based on paired data/every two rows?
Trying to set up for a McNemar test, but I cannot code very well (using R)
My data is paired, and it is 1000 pairs long, so I have a column specifying the pair number like
c(0 , 0 , 1, 1, 2, 2, 3, 3, 4, 4)
A column specifying which member of the…
KVHelpMe
- 61
- 5
2
votes
1 answer
Generate a variable based on the most recent I/observation
My data is currently organized in Stata as follows:
input str2 Country gdp_2015 gdp_2016 gdp_2017 imports_2016 imports_2017 exports_2016
"A" 11 12 13 5 6 8 5
"B" 11…
maldini425
- 185
- 7
2
votes
2 answers
Variable creation - Inferring age
I have a grouped dataframe;
Truck <- c('A','A','A','A','B','B','B','B','C','C','C','C')
OilChanged <- c('True','NewOil','False','False','False','False','False','False','True','NewOil','True','NewOil')
Odometer <- c(1000, 1000,…
Brad
- 353
- 2
- 10
2
votes
1 answer
Combine objects in a json using javascript
Having a JSON in this format:
[{
name: "A",
country: "X",
countryID: "02",
value: 15
},
{
name: "A",
country: "Y",
countryID: "01",
value: 25
},
{
name: "B",
country: "X",
countryID: "02",
…
console.log
- 93
- 8
2
votes
2 answers
Fill column of dataframe using a list
I have the following dataframe:
tibble(
people = rep(c("person1", "person2", "person3"), each = 4),
things = rep(c("thing1", "thing2", "thing3", "thing4"), times = 3),
vals = 0) %>%
group_by(people) %>%
mutate(order =…
babylinguist
- 308
- 1
- 11
2
votes
1 answer
Filling in multiple columns of missing data from another dataset
I have a data set that contains some missing values which can be completed by merging with a another dataset. My example:
This is the updated data set I am working with.
DF1
Name Paper Book Mug soap computer tablet coffee…
JeffB
- 83
- 9
2
votes
1 answer
Collapse rows by group based on multiple conditions (time difference and factor) in R
I am looking to collapse rows of data by group based on specified time difference (i.e. 60 mins) between timestamps and/or until a particular condition is met within the data. Here is a mock data frame of what I am working…
Robin Turkington
- 66
- 13
2
votes
3 answers
R grouping data by numeric numbers in a column
I am trying to group data by numbers in a column, I have tried different versions of
group_by, cut, group etc but I have not been able to get it.
I have a lot of data that looks like this:
position variants
3 snv
5 snv
12 …
gdobbo
- 23
- 2
2
votes
4 answers
Get the sum for pair of rows
I have the following dataframe imported in R:
product per1 per2 per3
A 10 20 30
B 23 14 21
C 26 95 81
Consider A:C as products listed in rows one after another and their corresponding sales values across…
Kathir
- 21
- 2
2
votes
1 answer
Is there a contiguous group labelling capability in Pandas
I have been puzzling over this problem for some time now. I was wandering if there is some "Pandas" like way to get there. I have a simple DataFrame with two columns PivotHigh and PivotLow, representing high values and low values. I need to "connect…
Slappy
- 3,878
- 1
- 25
- 40
2
votes
3 answers
check if numbers in a column are ascending by a certain value (R dataframe)
I have a column of numbers (index) in a dataframe like the below. I am attempting to check if these numbers are in ascending order by the value of 1. For example, group B and C do not ascend by 1. While I can check by sight, my dataframe is…
psychcoder
- 447
- 1
- 7
2
votes
3 answers
Across several columns, count instances of pairs
I want to count pairs across several columns. That is, with more than two columns, count the number of times particular value pairs occur in the same row.
Say I asked some people whether they liked different kinds of food, and they could answer…
mvanaman
- 85
- 6
2
votes
3 answers
Calculate employee count by hour and day
I have employee id, their clock in, and clock out timings by day. I want to calculate employee count by hour by day. Excel formula would work too.
My sample data looks like this:
Employee ID Day Clockin Clockout
1 Mon …
bp41
- 163
- 13