0

I've got a list of datasets, and I want to make a few changes to these datasets using R.

First, if variable "mac_sector" exists, I want to rename it to "sector".*Edit: It always says mac_sector not found, even if it is in at least one of the datasets. Also, if something is not found by if(exists()) does it then just continue on with the rest of the script, or does it terminate the script?

Second, if there is no variable called "mac_sector" or "sector", I want to create a new column variable called "sector" with putting "total" as the values.

Lastly, I rearrange the columns because I want variable "sector" to be the 3rd column in each dataset.

I wrote the script (some parts are not even in R language) below, but obviously it's not working, so I'm hoping that some of you may be able to help me with this.

I also want to save these changes to the respective datasets, but I've no idea how to even go about that in this particular case?? (I know of the save() command but I feel like it wouldn't work here)

setwd("C:\\Users\\files")
mylist = list.files(pattern="*.dta")


    #Loop through all of the datasets in C:\\Users\\files
    #Reading the datasets into R

     df <- lapply(mylist, read.dta13)

    #Naming the list of elemenents to match the files for convenience 
    names(df) <- gsub("\\.dta$", "", mylist)

      # If column mac_sector exists, rename to sector
      if(exists(mac_sector, df)){
      df <- rename(df, c(mac_sector="sector"))
      }

      # If column variable with pattern("sector") does not exist, create variable sector=total
      if(does not exist(pattern="sector")){
        sector <- c("total")
        df$sector <- sector
      }

      # rearrange variable, sector must be placed 3rd
      df <- arrange.vars(df, c("sector" = 3))

edit: I want all datasets to look like this (and some already do look like this):

Country|sector| Variable1| Variable2| Variable3|....
  GER  |  M   |   value  |   value  |   value  |....
  BELG |  K   |   value  |   value  |   value  |....
        and so on.

Now some of them look like this:

Country|mac_sector| Variable1| Variable2| Variable3|....
   GER |      F   |   value  |   value  |   value  |....
  BELG |      L   |   value  |   value  |   value  |....

In which case I want to rename mac_sector to sector.

They can also look like this:

            Country| Variable1| Variable2| Variable3|....
              GER  |   value  |   value  |   value  |....
              BELG |   value  |   value  |   value  |....

In which case I want to add a variable sector = total:

    Country|sector| Variable1| Variable2| Variable3|....
      GER  | total|   value  |   value  |   value  |....
      BELG | total|   value  |   value  |   value  |....

*Variable1, Variable2, Variable3 and so on, do not represent the same thing across datasets, just thought I should mention that.

DuEllier
  • 15
  • 4
  • Can you include some data snippet and your desired output using e.g. `dput(your_data)`. This makes it much easier to help you. – Roman Jul 08 '16 at 12:03
  • See gregor's answer to [this post](http://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) for storing your data.frames into a list. This is the preferred method for working with similar data in R. – lmo Jul 08 '16 at 12:03
  • @Jimbou The datasets are really big, so dput didn't really work that well. But I made an edit describing what they look like and what I want to happen to them! Hope this makes it clearer if not please say so! – DuEllier Jul 08 '16 at 13:04
  • Try using `head()` first and then dput that.. you'll get your sample data. – ArunK Jul 08 '16 at 13:14
  • There are still hundreds of observations, anyway it comes down to what I described in the edit in my post. – DuEllier Jul 08 '16 at 15:00

0 Answers0