Subset a list of lists acording to names

Question

I'm struggling with a list of lists with structure (original data):

What I need from them is a list of lists with the structure (transformed data):

i.e. For each different row(name) in my sublists I need a new sublist. This sublist will have the name of this row, and the row(names) will be the names of the original sublists, and the data will be composed by columns a and e from the original sublist.

I really know I should post some code, I will try this one

original<-list(`a1` = structure(c("", "Culturas", "", "Algodão", "Soja", "Soja", "Modalidade de Emprego", "", "(Aplicação)", "Pós-emergência", "Dessecante", "Pós-emergência", "", "LMR (mg/kg)", "", "0,04", "4,0", "4,0", "Intervalo de", "", "Segurança", "(2)", "07 dias", "(1)"), .Dim = c(6L, 4L)),`a2` = structure(c("Culturas", "Cebola", "Feijão", "Soja", "Trigo", "Modalidade de Emprego (Aplicação)", "Pós-emergência", "Pós-emergência", "Pós-emergência", "Pós-emergência", "LMR (mg/kg)", "0,02", "0,02", "0,02", "0,02", "Intervalo de Segurança", "60 dias", "60 dias", "60 dias", "70 dias"), .Dim = 5:4))

original

$a1
     [,1]       [,2]                    [,3]          [,4]          
[1,] ""         "Modalidade de Emprego" ""            "Intervalo de"
[2,] "Culturas" ""                      "LMR (mg/kg)" ""            
[3,] ""         "(Aplicação)"           ""            "Segurança"   
[4,] "Algodão"  "Pós-emergência"        "0,04"        "(2)"         
[5,] "Soja"     "Dessecante"            "4,0"         "07 dias"     
[6,] "Soja"     "Pós-emergência"        "4,0"         "(1)"         

$a2
     [,1]       [,2]                                [,3]          [,4]                    
[1,] "Culturas" "Modalidade de Emprego (Aplicação)" "LMR (mg/kg)" "Intervalo de Segurança"
[2,] "Cebola"   "Pós-emergência"                    "0,02"        "60 dias"               
[3,] "Feijão"   "Pós-emergência"                    "0,02"        "60 dias"               
[4,] "Soja"     "Pós-emergência"                    "0,02"        "60 dias"               
[5,] "Trigo"    "Pós-emergência"                    "0,02"        "70 dias"

and the result

result<-list(`soja` = structure(c("a1", "a1","a2", "4,0", "4,0", "0,02"), .Dim = 3:2, .Dimnames = list( NULL, c("ATIVO", "LMR (mg/kg)"))))

result

$soja
     ATIVO LMR (mg/kg)
[1,] "a1"  "4,0"      
[2,] "a1"  "4,0"      
[3,] "a2"  "0,02"

Not clear from the examples, what you want. Is the `result` the expected output? The images and the examples have some difference in structure — akrun, Jan 04 '19 at 19:53
Seems like you had issues reading in your data - `original` is a list of matrices where the first row looks like it should actually be column names and string entries like `"0,02"` should probably be numerics like `0.02`. I'd recommend fixing your data reading process first before trying to transform the data. [How to make a list of data frames](https://stackoverflow.com/a/24376207/903061) is good background reading, and also see `?read.csv2` or the `dec` argument of whatever read function you are using to read the commas in as decimals. — Gregor Thomas, Jan 04 '19 at 20:18
@akrun exactly! `result` is just the expected output. My graphical and textual input data have indeed some differences, that's because my dataset is actually very big and I just tried to give a palatable example. — Tiago, Jan 06 '19 at 00:15
@Gregor Thank you for the feedback. I agree with you that my input data have some issues. Said rownames are actually not rownames, since it is a matrix, so they are col1, and consequentely I need rearrange my data, creating a new matrix for each unique string in col1. However, all data are textual, so no need for the comma x point issue. — Tiago, Jan 06 '19 at 00:32
I am referencing **column names**, not row names. In your picture, the column names are important. In your example input data, there are no column names. So *with that input*, the first step will be to annoyingly create the column names at this awkward point in the process, but in real life, the first step should be to fix the input process upstream so that the column names are correct. Once the column names are correct, the transformation that is the focus of this question is much easier. Thus I ask that you separate input issue from the transformation issue. — Gregor Thomas, Jan 06 '19 at 21:27
@ Gregor. I think now I understood. Is it easier to deal with colnames rather than column Indices? — Tiago, Jan 07 '19 at 19:30
It is both safer and easier to use names not indices. And it matches your picture. For example, look at the 3rd column of `original$a1`. It's 1st value is `""`. It's 2nd value is `"LMR (mg/kg)"`. It's 3rd value is `""`. Where are these values in `result`? It seems like the `""`s are bad imports, and the `"LMR(mg/kg)"` should be a column name, not a value. It is bad to fix the problem here, it should be fixed earlier. — Gregor Thomas, Jan 08 '19 at 03:22

Tiago · Answer 1 · 2019-01-08T22:40:22.210

names<-unique(unlist(simplify2array(lapply(original,`[`,,1))))
my.list3<-list()
i=1
m=1
n=1
for (i in 1:length(nomes)){
    nome<-names[i]
    my.list3[[nome]]<-matrix(NA,ncol=4)
    print(nome)
    for (n in 1:length((original))){
        for (m in 1:nrow(original[[n]])){
            if(nome==original[[n]][m,1]){
                vetor<-original[[n]][m,]
                vetor[vetor==nome]<-names(original)[[n]]
                my.list3[[i]]<-rbind(my.list3[[i]],vetor)
            }}}}
for (n in seq_along(my.list3)){
    ind <- apply(my.list3[[n]], 1, function(x) all(is.na(x)))
    my.list3[[n]] <- my.list3[[n]][ !ind, ]
    if (is.matrix(my.list3[[n]])){my.list3[[n]] <- my.list3[[n]][,c(1,3)]} 
    else{my.list3[[n]] <- my.list3[[n]][c(1,3)]}
    rownames(my.list3[[n]])<-c()
   if (is.matrix(my.list3[[n]])){colnames(my.list3[[n]])<-c("Ativo","LMR mg/kg")} else{names(my.list3[[n]])<-c("Ativo","LMR mg/kg")}}
my.list3

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

Some initial observations:

As already mentioned by Gregor, the data look corrupted, especially the column headers. So, I deliberately choose to fix the sample dataset.
The existence of identical column headers (after the fix) in both list elements suggest that the underlying data structure is a data.frame rather than a matrix. (Although technically stored as matrix with the first row containing the column headers).
As the data of each list element has identical structure (number, name, and type of columns) the data can be stored in a large data.frame object. This will make subsequent data manipulation and aggregation much easier than dealing with nested lists.

The code below converts the matrix of each list element into a data.table object and binds them into a united data.table. The names of the list elements is preserved in column ATIVO:

library(data.table)
library(magrittr)
united <- lapply(original, function(x) as.data.table(x[-1, ]) %>% setnames(x[1, ])) %>% 
  rbindlist(idcol = "ATIVO")

   ATIVO Culturas Modalidade de Emprego (Aplicação) LMR (mg/kg) Intervalo de Segurança
1:    a1  Algodão                    Pós-emergência        0,04                    (2)
2:    a1     Soja                        Dessecante         4,0                07 dias
3:    a1     Soja                    Pós-emergência         4,0                    (1)
4:    a2   Cebola                    Pós-emergência        0,02                60 dias
5:    a2   Feijão                    Pós-emergência        0,02                60 dias
6:    a2     Soja                    Pós-emergência        0,02                60 dias
7:    a2    Trigo                    Pós-emergência        0,02                70 dias

From there, we can retrieve the requested data, e.g.,

united[(order(Culturas, ATIVO)), .(Culturas, ATIVO, `LMR (mg/kg)`)]

   Culturas ATIVO LMR (mg/kg)
1:  Algodão    a1        0,04
2:   Cebola    a2        0,02
3:   Feijão    a2        0,02
4:     Soja    a1         4,0
5:     Soja    a1         4,0
6:     Soja    a2        0,02
7:    Trigo    a2        0,02

Of course, the data can be split into pieces again:

split(united, by = "Culturas")

$Algodão
   ATIVO Culturas Modalidade de Emprego (Aplicação) LMR (mg/kg) Intervalo de Segurança
1:    a1  Algodão                    Pós-emergência        0,04                    (2)

$Soja
   ATIVO Culturas Modalidade de Emprego (Aplicação) LMR (mg/kg) Intervalo de Segurança
1:    a1     Soja                        Dessecante         4,0                07 dias
2:    a1     Soja                    Pós-emergência         4,0                    (1)
3:    a2     Soja                    Pós-emergência        0,02                60 dias

$Cebola
   ATIVO Culturas Modalidade de Emprego (Aplicação) LMR (mg/kg) Intervalo de Segurança
1:    a2   Cebola                    Pós-emergência        0,02                60 dias

$Feijão
   ATIVO Culturas Modalidade de Emprego (Aplicação) LMR (mg/kg) Intervalo de Segurança
1:    a2   Feijão                    Pós-emergência        0,02                60 dias

$Trigo
   ATIVO Culturas Modalidade de Emprego (Aplicação) LMR (mg/kg) Intervalo de Segurança
1:    a2    Trigo                    Pós-emergência        0,02                70 dias

or, to meet OP's expected result:

split(united, by = "Culturas") %>% 
  lapply(`[`, j = c("ATIVO", "LMR (mg/kg)")) # data.table syntax

$Algodão
   ATIVO LMR (mg/kg)
1:    a1        0,04

$Soja
   ATIVO LMR (mg/kg)
1:    a1         4,0
2:    a1         4,0
3:    a2        0,02

$Cebola
   ATIVO LMR (mg/kg)
1:    a2        0,02

$Feijão
   ATIVO LMR (mg/kg)
1:    a2        0,02

$Trigo
   ATIVO LMR (mg/kg)
1:    a2        0,02

For demonstration, I have refrained to filter for "Soja".

Fixed Data

original <-
list(a1 = structure(c("Culturas", "Algodão", "Soja", "Soja", 
"Modalidade de Emprego (Aplicação)", "Pós-emergência", "Dessecante", 
"Pós-emergência", "LMR (mg/kg)", "0,04", "4,0", "4,0", "Intervalo de Segurança", 
"(2)", "07 dias", "(1)"), .Dim = c(4L, 4L)), a2 = structure(c("Culturas", 
"Cebola", "Feijão", "Soja", "Trigo", "Modalidade de Emprego (Aplicação)", 
"Pós-emergência", "Pós-emergência", "Pós-emergência", "Pós-emergência", 
"LMR (mg/kg)", "0,02", "0,02", "0,02", "0,02", "Intervalo de Segurança", 
"60 dias", "60 dias", "60 dias", "70 dias"), .Dim = 5:4))

original

$a1
     [,1]       [,2]                                [,3]          [,4]                    
[1,] "Culturas" "Modalidade de Emprego (Aplicação)" "LMR (mg/kg)" "Intervalo de Segurança"
[2,] "Algodão"  "Pós-emergência"                    "0,04"        "(2)"                   
[3,] "Soja"     "Dessecante"                        "4,0"         "07 dias"               
[4,] "Soja"     "Pós-emergência"                    "4,0"         "(1)"                   

$a2
     [,1]       [,2]                                [,3]          [,4]                    
[1,] "Culturas" "Modalidade de Emprego (Aplicação)" "LMR (mg/kg)" "Intervalo de Segurança"
[2,] "Cebola"   "Pós-emergência"                    "0,02"        "60 dias"               
[3,] "Feijão"   "Pós-emergência"                    "0,02"        "60 dias"               
[4,] "Soja"     "Pós-emergência"                    "0,02"        "60 dias"               
[5,] "Trigo"    "Pós-emergência"                    "0,02"        "70 dias"

Subset a list of lists acording to names

2 Answers2

Fixed Data