10

I want to replace my NA values from a matrix acquired by :

read.table(…)

Those values should be the mean of the corresponding row.

I.e, the following row of the table :

1 2 1 NA 2 1 1 2

would become

1 2 1 1.43 2 1 2

Thank you.

Delphine
  • 1,023
  • 5
  • 15
  • 22
  • 2
    Why would you want to do this *row-wise*? Just checking you aren't mixing up variables with objects/samples. Usually one does this column-wise, computing the mean for each variable and using that to replace `NA` within the variable. – Gavin Simpson Aug 02 '11 at 21:21
  • Also, `read.table()` returns a data.frame. Are you talking about a data frame or a proper matrix? – Gavin Simpson Aug 02 '11 at 21:21
  • @GavinSimpson One reason for this would be in questionnaire data with repeated questions for use in a measurement. The means of the other questions would be used to substitute missing data. – Irwin Dec 12 '13 at 04:52

3 Answers3

28

Here's some sample data.

m <- matrix(1:16, nrow=4)
m[c(1,4,6,11,16)] <- NA

And here's how I'd fill in missings with the row means.

k <- which(is.na(m), arr.ind=TRUE)
m[k] <- rowMeans(m, na.rm=TRUE)[k[,1]]

Your data will be in a data.frame; you'll have to convert to a matrix first using as.matrix. You may or may not want to leave it in that format; to convert back use as.data.frame.

Aaron left Stack Overflow
  • 34,320
  • 5
  • 72
  • 135
5
x[is.na(x)] <- mean(x, na.rm=TRUE)  # for vectors or for a matrix as a whole

t( apply(x, 1, function(xv) { xv[is.na(xv)] <- 
                                    mean(xv, na.rm=TRUE)
                              return(xv)}
          ) ) # for a row-oriented sol'n
IRTFM
  • 240,863
  • 19
  • 328
  • 451
1
a = c(NA, 1, 2, 3, 10)
a[which(is.na(a)==TRUE)] = mean(a,na.rm = T)
user702846
  • 4,286
  • 4
  • 34
  • 59
  • 3
    This should work, but it's unnecessarily complicated. is.na(a) returns a vector of Booleans, so the == TRUE is redundant. `which` is not necessary either, since you can index vectors either by a vector of length <= `length(a)` or by a vector of length `length(a)` containing TRUEs and FALSEs (or 0/1's which get coerced to TRUE/FALSE). Finally, avoid using T and F for TRUE and FALSE, since they can get overwritten. – Ari B. Friedman Aug 02 '11 at 20:32
  • I considered more, the training aspect :d – user702846 Aug 02 '11 at 20:37
  • For a matrix, same problem, takes the mean of everything and replaces. – Brandon Bertelsen Aug 02 '11 at 20:38
  • @BrandonBertelsen: Read the question again, and you're right. Aaron's got the solution using rowMeans. – Ari B. Friedman Aug 02 '11 at 20:45
  • @user702846: Don't mean to discourage you though! Keep at it. – Ari B. Friedman Aug 02 '11 at 20:45
  • Discouraging part is seeing "the same" answer after 15 mins by another user how also catch some points !! – user702846 Aug 02 '11 at 20:53
  • @user702846 DWin edited his answer so as to answer the original question, which is why he picked up the votes. On the other hand, your answer inspired this question, which has turned out interesting http://stackoverflow.com/questions/6918657/whats-the-use-of-which – Ari B. Friedman Aug 02 '11 at 22:08
  • Thank you for your answer. However I get the following error using this code : "Error in `[ – Delphine Aug 03 '11 at 07:26
  • guess you are working with dataframe, may be you need to convert each row into vector first ! by as.vector command ... – user702846 Aug 03 '11 at 09:51