Questions tagged [data-transform]

Data transformation is the process of converting data from one format or structure into another format or structure. This can range from a simple transformation like transforming a comma-separated list to a line-break-separated list to complex transformations like speech-to-text. Strategies and technologies used can vary widely based on the complexity, volume, format and structure of the data being transformed.

152 questions
0
votes
2 answers

Filter rows based on a ID column in R

I have a data frame with an ID column, Timepoint and status. Each ID has multiple timepoints and status associated with each timepoint. I want to filter all the ID's which has the same status for all timepoints associated with the ID. How can I…
0
votes
2 answers

Row wise comparison of a dataframe in R

I have a data frame with multiple data points corresponding to each ID. When the status value is different between 2 timepoints for an ID, I want to flag the first status change. How do I achieve that in R ? Below is a sample…
Datamaniac
  • 47
  • 1
  • 5
0
votes
1 answer

Transform two source DynamoDB tables into a new DynamoDB using AWS

So I have two source tables lets call the, table1 and table2, and the destination table table3 - inside these tables there is information that needs to be extracted from columns of one table, columns of another table, and then combined to give…
0
votes
3 answers

Issues replacing value in R

I am trying to replace some values for a variable within my data set but I keep getting an unexpected value of 414 assigned instead of 9. I've been over the code a number of times but just cannot get it working. My code #replace tumor_size with…
0
votes
0 answers

How to remove level names in multi-index dataframe? How to create multi-level dataframe/csv which avoids merging columns everytime?

I hope you're all having a great day! My post is related to this How to work with multilevel data using Python Pandas? post. Feel free to have a look. This is the snippet of the approved answer: https://stackoverflow.com/a/67413282/12364266 which…
Aztec-3x5
  • 67
  • 8
0
votes
1 answer

Conditional transformation data in Power BI

The question about transformation data in Power BI. I have a text file with spaces as separators. Some rows (where day in date less than 10) contain double space before one field. It is always the third field. Tue May 4 13:57:50 BST 2021: 64 bytes…
0
votes
1 answer

Showing column names in rows, Power BI report

at start I have this table: enter image description here From this data I've created this report (on the left table based on slicer placed at the right): enter image description here Is there a way to show it like that in Power BI (instead of…
0
votes
2 answers

Transform data from columns to rows in excel

I need to transform data that is in multiple rows and multiple columns into unique rows, but there are specific rules around what i need. An example of the current data format is below: The split should be based on the style, colour and unique upc…
0
votes
1 answer

Transform Data („Cost“) in Google Data Studio from „Text“ to „Number“ does not work

Situation: I am about to create a Report in Google Data Studio based on Data copied from someone else into a Google Sheet. This data includes the metric „cost“ („Gesamtkosten“). GDS identifies this as as text/string. I want to transform it in GDS…
0
votes
1 answer

Group by and count with condition

I have a dataset, looks like this: id DateTx TypeTx Major_complaint Grade_major 1 01/02/2021 Reflexology Fatigue / exhaustion 3 1 01/03/2021 Reflexology Fatigue /…
TeoK
  • 43
  • 4
0
votes
1 answer

Generate grouped time series based on Open and Close date

I have a dataset with 3 columns namely their ID and open and close week. Some IDs do not have close week yet, so they have close week equals to NA. But all IDs have open week. set.seed(1990) mydf <- tibble(id = as.vector(outer(letters, letters,…
Afiq Johari
  • 926
  • 1
  • 8
  • 18
0
votes
1 answer

Speeding up or alternatives to group by and lag in dplyr

I notice this operation is very time consuming for seemingly simple calculation. It probably explains more than 60% out of all duration to complete the current R script. The actual data contains about 500,000 rows with about 100,000 unique ids…
Afiq Johari
  • 926
  • 1
  • 8
  • 18
0
votes
0 answers

How to effectively convert data for surface plots from table form into series of triplets

We need to draw lots of surface plots in certain proprietary software that accepts input in form of triplets, i. e. $(x_1, y_1, z_1), (x_2, y_2, z_2),...$ The raw data come to us in form of spreadsheet tables (depicted on the screen-shot, please…
S. N.
  • 5
  • 5
0
votes
2 answers

Filling in missing dates with most recent data in data frame in R

I have a data frame with country, date, identifier, cumulative_identifier, cumulative_country. Country, data, and identifier are grouped. I, however, have countries, and identifiers with missing dates. These are countries that have not submitted…
aholtz
  • 47
  • 5
0
votes
0 answers

Pandas: multiple data columns into one column in Python

I have sales data for 5 companies for three months. data is as follows :- d = {'ID':[1001, 1002,1003,1004,1005], 'Company Name':['A','B','C','D','E'], 'Sales 4/1/2019':[33, 24,43,25,51], 'Sales 5/1/2019':[47,55,48,41,46], 'Sales…