-2

I am trying to calculate the average duration of UFO sighting (continuous) for each categorical shape that it is related with. Essentially, what is the average sighting length for each UFO shape?

I tried:

    a <- aggregate(duration..seconds. ~ shape, data=alien, FUN=mean, na.rm=TRUE)
    barplot(a$duration..seconds., names.arg=a$shape)

and got:

    no non-missing arguments to min; returning Infno non-missing arguments to max; 
    returning -InfError in plot.window(xlim, ylim, log = log, ...) : need finite 'ylim' values

I realize that I need to alter my data somehow. I would like to simply remove all of the data that has missing corresponding data (ie, we know the shape but the duration is missing - and vice versa), but I don't quite know how to do this.

Thanks for your help!

PS. the "duration..seconds." is correct, that is how it transferred over from the excel file.

    shape       duration..seconds.
    us  changing    3600    NA  4/27/2004   29.8830556  
    us  changing    300     NA  12/16/2005  29.38421    
    us  changing    3600    NA  1/21/2008   53.2    
    us  changing    900     NA  1/17/2004   28.9783333  
    ca  changing    1200    NA  1/22/2004   21.4180556  
    us  changing    3600    NA  4/27/2007   36.595  

There are 80000 logs of UFO sightings, which is why I am trying to average it. And there are 29 different shapes.

1 Answers1

0

Data

df <- read.table(text="
country shape  duration_seconds dummy1 date dummy2
us  changing    3600    NA  4/27/2004   29.8830556  
us  changing    300     NA  12/16/2005  29.38421    
us  changing    3600    NA  1/21/2008   53.2    
us  changing    900     NA  1/17/2004   28.9783333  
ca  changing    1200    NA  1/22/2004   21.4180556  
us  changing    3600    NA  4/27/2007   36.595  
", header = TRUE, stringsAsFactors = FALSE)

You can fix the column titles with

names(df) <- c("country", "shape", "duration_seconds", "dummy1", "date", "dummy2")

Using library dplyr

library(dplyr)
df %>% 
  group_by(shape)  %>%
  summarize(mean_duration_seconds = mean(duration_seconds))

#   shape    mean_duration_seconds
#   <chr>                    <dbl>
# 1 changing                 2200.

And using the original code

names(df) <- c("country", "shape", "duration_seconds", "dummy1", "date", "dummy2")
a <- aggregate(duration_seconds ~ shape, data=df, FUN=mean, na.rm=TRUE)
barplot(a$duration_seconds, names.arg=a$shape)

a
#   shape    duration_seconds
# 1 changing             2200
Andrew Lavers
  • 4,001
  • 1
  • 9
  • 18