2

I checked the previous post on how to convert lapply output to data frame which did not help me

I have asked two questions, I tried to give a representative data but it is not. the solution works on example but not on the real data . this is my question.

how to apply a function on every column of a data?

The problem which made me crazy and I cannot solve it is

I can do that and it works perfectly on the data but I must do that 1000 times and generate 1000 files....

s1 <- normalize(df[,1], ";")
Mn <- as.data.frame(process(s1))
write.table(Mn, file= "~/Desktop/outputs/output62.txt", quote = FALSE, sep="\t")
rm(Mn,s1)

but when I use

lapply(myS, process)

I get error like:
  Error in data.frame(All_Fractions = c(161L, 153L, 218L, 2847L, 2565L,  : 
  arguments imply differing number of rows: 7, 5, 10

I have no choice but post my real data in order to be able to solve my problem.

I load them like this

df1 <- read.table("~/Desktop/df1.txt", sep="\t", header=TRUE, stringsAsFactors=FALSE)
df2 <- read.table("~/Desktop/df2.txt", sep="\t", header=TRUE, stringsAsFactors=FALSE)

This is the code I have used so far

normalize <- function(x, delim) {
    x <- gsub(")", "", x, fixed=TRUE)
    x <- gsub("(", "", x, fixed=TRUE)
    idx <- rep(seq_len(length(x)), times=nchar(gsub(sprintf("[^%s]",delim), "", 
                                                                 as.character(x)))+1)
    names <- unlist(strsplit(as.character(x), delim))
    return(setNames(idx, names))
}

myS <- lapply(df1, normalize,";") 
lookup <- normalize(df2[,1], ",")

process <- function(s) {
    lookup_try <- lookup[names(s)]
    found <- which(!is.na(lookup_try))
    pos <- lookup_try[names(s)[found]]
    return(paste(pos, sep=""))
} 

for output I tried this

Mn <- as.data.frame(lapply(myS, process),FUN=as.data.frame)

gives me error

Error in data.frame(Fraction_1 = c(393L, 674L, 79L, 2447L, 248L), Fraction_2 = c(2107L, : arguments imply differing number of rows: 5, 30, 51, 35

I tried

Mn <- as.data.frame(lapply(myS, process)) 

Error in data.frame(Fraction_1 = c(393L, 674L, 79L, 2447L, 248L), Fraction_2 = c(2107L, : arguments imply differing number of rows: 5, 30, 51, 35

gives me error

Mn <- lapply(myS, process)

I cannot save the output

write.table(Mn, file= "~/Desktop/outputs/output.txt", quote = FALSE, sep="\t")

Error in data.frame(Fraction_1 = c(393L, 674L, 79L, 2447L, 248L), Fraction_2 = c(2107L, : arguments imply differing number of rows: 5, 30, 51, 35

Community
  • 1
  • 1
nik
  • 1,966
  • 3
  • 16
  • 40
  • I see you put a lot of work into this, which is good. But it would help if you copied the errors you are getting in there too (instead of "gives me error"). It would help other people with a similar problem too since the search engine could find this post. – Mike Wise Feb 27 '16 at 12:59
  • Have you tried something like `do.call("rbind", YourList)` - which should work if the elements of your list have the same 1st dimension length (i.e. the rows are the same length). – Stephen Henderson Feb 27 '16 at 13:03
  • @Stephen Henderson where ? to be honest I have tried so many things that I even don't remember :-(( can you please let me know where exactly I should have applied it ? – nik Feb 27 '16 at 13:08
  • @Mike Wise thanks Mike, yes It is making me crazy, I did not sleep but it does not work ! I modified as you said – nik Feb 27 '16 at 13:08
  • How did you read `df1` and `df2` in? With `read.csv`? That should be in this question, but I don't see it. – Mike Wise Feb 27 '16 at 13:29

1 Answers1

1

You can only make a list into data.frame if all of the columns of the list have the same length. That is not the case here obviously.

If this is just about saving and restoring a list, try the save and load commands which are there to do this. Otherwise you might try appending elements to the individual columns (" " or NA perhaps) to make them all the same length.

In the following code I pad all the columns with spaces to make them the same length, then you can write it out with no problem.

df1 <- read.csv("df1.txt",sep="\t",stringsAsFactors=F)
df2 <- read.csv("df1.txt",sep="\t",stringsAsFactors=F)

normalize <- function(x, delim) {
  x <- gsub(")", "", x, fixed=TRUE)
  x <- gsub("(", "", x, fixed=TRUE)
  idx <- rep(seq_len(length(x)), times=nchar(gsub(sprintf("[^%s]",delim), "", 
                                                  as.character(x)))+1)
  names <- unlist(strsplit(as.character(x), delim))
  return(setNames(idx, names))
}

myS <- lapply(df1, normalize,";") 
lookup <- normalize(df2[,1], ",")

process <- function(s) {
  lookup_try <- lookup[names(s)]
  found <- which(!is.na(lookup_try))
  pos <- lookup_try[names(s)[found]]
  return(paste(pos, sep=""))
} 
Mn <- lapply(myS, process)

# ------------ Start of the answer

# Pad the vectors with spaces to make them the same length
mxlen <- max(sapply(Mn, length))
Mnn <- lapply(Mn, function(x)(c(x, rep(" ", mxlen - length(x)))))

# Write it out
write.table(Mnn, file = "output.txt", quote = FALSE, sep = "\t")
Mike Wise
  • 18,767
  • 6
  • 71
  • 95
  • I loved your answer, can you please add some explanation for Pad ? what exactly it does ? I want to learn it – nik Feb 27 '16 at 14:14
  • 1
    We want each list element (which are to become the data.frame columns) to have `mxlen` elements (be of `mxlen` length). So we use `lapply` to apply a function to each list element. That function concatenates the list element (x), with a new vector built by `rep` consisting of a number of spaces. That number is `mxlen-length(x)`, – Mike Wise Feb 27 '16 at 14:21
  • mike sorry for many stupid question, do you know how to order data frame ? I used df – nik Feb 27 '16 at 14:33
  • 1
    If you have not read the first two chapter of this book, http://adv-r.had.co.nz/ - I highly recommend that you do so now. It will save a lot of time and pain. R is not like other languages, and that book explains how and why. The link is an online version, but you can buy a copy too. – Mike Wise Feb 27 '16 at 14:37
  • @ Mike Wise thank you so much. Actually I am notifying to just sort a column. I want to sort all columns and not a specific column descending order df – nik Feb 27 '16 at 14:40
  • That sorts the whole data frame on that column. Is that not what you want? – Mike Wise Feb 27 '16 at 14:41
  • no, each column independent from another. so sort all without any reference – nik Feb 27 '16 at 14:41
  • 1
  • how I can really do programming like you ? I am straggling with simple question and I must waste hour and hour with very little success. what would be the best way to go ? – nik Feb 27 '16 at 14:54
  • I have asked a question which is making me crazy, I would like to know if you have any idea to solve it ? http://stackoverflow.com/questions/35707323/how-to-rearrange-an-order-of-matches-between-two-data-frames – nik Mar 01 '16 at 14:59
  • I am kind of slammed with work at the moment, so I can't really tackle it, much as I would like too (I have my own data to process). But I had a look at your question and it is too complex for an SO question. You need to break it down into simpler questions and solve them one at a time. For example you ask for four columns. That is essentially 4 questions. And the third and fourth columns seem complex. – Mike Wise Mar 01 '16 at 16:08