10

I am new at R programming and I want to read a text file in R.

One of the columns, lets say column 7 is numeric and each number represent an ID I want R to read the numbers as if they were strings. And count the number of times each ID appear in the file (such that later I can assign the frequency of each ID to the given ID for latter use) I have tried

mydata<-(read.table(filename.txt))
ID=mydata[7]
freq=table(ID)

This works but it takes the IDs as numbers. Now I have tried

freq=table(as.character(ID))

But then it takes the whole column ID as only one string and from

summary(freq)

I get

Number of cases in table: 1 
Number of factors: 1 
Julius Vainora
  • 44,018
  • 9
  • 79
  • 96
user2115322
  • 101
  • 1
  • 1
  • 3

3 Answers3

13

At the time of reading the data into your data frame from the text file you can specify the type of each column using the colClasses argument. See below a file have in my computer:

> head(read.csv("R/Data/ZipcodeCount.csv"))
    X zipcode stateabb countyno  countyname
1   1     401       NY      119 WESTCHESTER
2 391     501       NY      103     SUFFOLK
3 392     544       NY      103     SUFFOLK
4 393     601       PR        1    ADJUNTAS
5 630     602       PR        3      AGUADA
6 957     603       PR        5   AGUADILLA
> head(read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5))))
    X zipcode stateabb countyno  countyname
1   1   00401       NY      119 WESTCHESTER
2 391   00501       NY      103     SUFFOLK
3 392   00544       NY      103     SUFFOLK
4 393   00601       PR      001    ADJUNTAS
5 630   00602       PR      003      AGUADA
6 957   00603       PR      005   AGUADILLA

> zip<-read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5)))
> str(zip)
'data.frame':   53424 obs. of  5 variables:
 $ X         : Factor w/ 53424 levels "1","10000081",..: 1 36316 36333 36346 43638 52311 19581 23775 26481 26858 ...
 $ zipcode   : Factor w/ 41174 levels "00401","00501",..: 1 2 3 4 5 6 6 7 8 9 ...
 $ stateabb  : Factor w/ 60 levels "","  ","AK","AL",..: 41 41 41 46 46 46 46 46 46 46 ...
 $ countyno  : Factor w/ 380 levels "","000","001",..: 106 95 95 3 5 7 5 7 7 9 ...
 $ countyname: Factor w/ 1925 levels "","ABBEVILLE",..: 1844 1662 1662 9 10 11 10 11 11 12 ...
> head(table(zip[,"zipcode"]))

00401 00501 00544 00601 00602 00603 
    1     1     1     1     1     2 

as you can see R is no longer treating zipcodes as numbers but as factors. In your case you need to specify the class of the first 6 columns and then choose factor as your seventh. So if the first 6 columns are numeric it should be something like this colClasses = c(rep("numeric",6),"factor").

tepedizzle
  • 428
  • 3
  • 7
4

without the as.character your table should work correctly(i.e. freq <- table(ID)) , Quoting from ?table, your input can be:

one or more objects which can be interpreted as factors (including character strings), or a list (or data frame) whose components can be so interpreted. (For as.table and as.data.frame, arguments passed to specific methods.)

RJ-
  • 2,749
  • 2
  • 24
  • 34
3

I think you missed the comma in your dataframe.

mydata<-(read.table(filename.txt))
ID=mydata[,7]  #added comma
freq=table(as.character(ID))
kith
  • 5,040
  • 18
  • 21