0

I have a problem when I want to remove zeros from my data. I'm working with a ln(x) model, so zeros give me some problems. My teacher told me to fix the problem with this code:

amMx_data <- extract.ages(mMx_data, iAgeMin:iAgeMax, combine.upper=FALSE)

But I'm new in R, so I can't get it to work. I don't know how to link to my document. My data file is called mort.txt, but were do I refer to that, and what directory does R have?

Richie Cotton
  • 107,354
  • 40
  • 225
  • 343
  • 2
    Welcome to SO. You have lots of questions here, and we can't reporduce the problem. What is this `extract.ages` function? Is it something your teacher provided you with? What do you want to do with the zeros: pretend they are missing, remove that row entirely, so substitute a small positive value? If you are having problems read in your data, that's a whole different question. Start by reading the help page `?read.table`. Look at `?setwd` for determining which directory R is using. And read this: http://stackoverflow.com/q/5963269/134830 – Richie Cotton Feb 07 '14 at 12:11
  • Okay, the zeros should become an average of the previous and the next number. They should not be removed but instead changed. The extract funtion is from one of his examples. I have no problem reading in the data sets, the problem is to locate the zeros and change them. – user3283628 Feb 07 '14 at 12:25
  • What about if you have two or more consecutive zeroes? What about if the first or last last is zero? What do you want to happen then? It would be very useful if you edit your question to include some sample data, and please show us what you have tried already. (Since this is a homework question, it is important that we don't just give you an answer; you need to work on this too.) – Richie Cotton Feb 07 '14 at 13:21
  • In generel the posted code should take care of that problem. My question is not the assignment, I have to estimate a model and this is one of the problems I get before I get to explore the features of the demography package in R. – user3283628 Feb 07 '14 at 20:20

2 Answers2

0

I suppose that if you use logarithm, you need to replace zeros from your data to NA First of all, open an R console (type R in the terminal window) and type

mydata<-read.table('/full/path/to/data/mort.txt')

After that you can look at your data and perform further data transformations.

To replace zeros with something you like take a look here: Fastest way to replace NAs in a large data.table

Community
  • 1
  • 1
annndrey
  • 1,688
  • 2
  • 18
  • 23
0

Some sample data to make your problem reproducible:

n <- 100
ages <- ifelse(
  runif(n) > 0.25,
  sample(50, n, replace = TRUE),
  0
)
##  [1] 39 26  0 26  8 48 30 47 46 48 15 26  4 43  3 12 47  2  4 10  8  4  0 35 21
## [26] 34  2  4  9 15  0  0  0 27  0 35 11 24 20 35 27  0  0 16 33 18 34  2  1 31
## [51]  0 13  0 49 16 45 43 43 38 44 22 30 39  0 12  3  3 34 21 40  7 26  0  2 23
## [76]  0 46 50 24 33 32  0  8 26 40 12  0 28 35 33 30 20 14 47 10  4 31  0  4 42

First, replace the zeroes with NA.

ages[ages == 0] <- NA
##  [1] 39 26 NA 26  8 48 30 47 46 48 15 26  4 43  3 12 47  2  4 10  8  4 NA 35 21
## [26] 34  2  4  9 15 NA NA NA 27 NA 35 11 24 20 35 27 NA NA 16 33 18 34  2  1 31
## [51] NA 13 NA 49 16 45 43 43 38 44 22 30 39 NA 12  3  3 34 21 40  7 26 NA  2 23
## [76] NA 46 50 24 33 32 NA  8 26 40 12 NA 28 35 33 30 20 14 47 10  4 31 NA  4 42

Then you can use an interpolation function to replace the missing values. There are lots of such functions in R. Here I've used one from the pracma package. This has several different interpolation algorithms for you to experiment with.

library(pracma)
interp1(seq_along(ages), ages)
## [1] 39 26 26 26  8 48 30 47 46 48 15 26  4 43  3 12 47  2  4 10  8  4  4 35 21
## [26] 34  2  4  9 15 15 15 15 27 27 35 11 24 20 35 27 27 27 16 33 18 34  2  1 31
## [51] 31 13 13 49 16 45 43 43 38 44 22 30 39 39 12  3  3 34 21 40  7 26 26  2 23
## [76] 23 46 50 24 33 32 32  8 26 40 12 12 28 35 33 30 20 14 47 10  4 31 31  4 42

interp1(seq_along(ages), ages, method = "linear")
##  [1] 39.00000 26.00000 26.00000 26.00000  8.00000 48.00000 30.00000 47.00000
##  [9] 46.00000 48.00000 15.00000 26.00000  4.00000 43.00000  3.00000 12.00000
## [17] 47.00000  2.00000  4.00000 10.00000  8.00000  4.00000 19.50000 35.00000
## [25] 21.00000 34.00000  2.00000  4.00000  9.00000 15.00000 18.00000 21.00000
## [33] 24.00000 27.00000 31.00000 35.00000 11.00000 24.00000 20.00000 35.00000
## [41] 27.00000 23.33333 19.66667 16.00000 33.00000 18.00000 34.00000  2.00000
## [49]  1.00000 31.00000 22.00000 13.00000 31.00000 49.00000 16.00000 45.00000
## [57] 43.00000 43.00000 38.00000 44.00000 22.00000 30.00000 39.00000 25.50000
## [65] 12.00000  3.00000  3.00000 34.00000 21.00000 40.00000  7.00000 26.00000
## [73] 14.00000  2.00000 23.00000 34.50000 46.00000 50.00000 24.00000 33.00000
## [81] 32.00000 20.00000  8.00000 26.00000 40.00000 12.00000 20.00000 28.00000
## [89] 35.00000 33.00000 30.00000 20.00000 14.00000 47.00000 10.00000  4.00000
## [97] 31.00000 17.50000  4.00000 42.00000
Richie Cotton
  • 107,354
  • 40
  • 225
  • 343