Avoid that space in column name is replaced with period (".") when using read.csv()

Question

I am using R to do some data pre-processing, and here is the problem that I am faced with: I input the data using read.csv(filename,header=TRUE), and then the space in variable names became ".", for example, a variable named Full Code became Full.Code in the generated dataframe. After the processing, I use write.xlsx(filename) to export the results, while the variable names are changed. How to address this problem?

Besides, in the output .xlsx file, the first column become indices(i.e., 1 to N), which is not what I am expecting.

You can avoid having to fix the names by calling `read.csv` with the option `check.names=FALSE`. — Matthew Plourde, Jun 17 '13 at 16:46

score 38 · Accepted Answer · answered Jun 18 '13 at 15:45

38

If your set check.names=FALSE in read.csv when you read the data in then the names will not be changed and you will not need to edit them before writing the data back out. This of course means that you would need quote the column names (back quotes in some cases) or refer to the columns by location rather than name while editing.

answered Jun 18 '13 at 15:45

Greg Snow

45,559
4
73
98

1

I think using `check.names=FALSE` is a better choice than editing the column names, although re-reading the data includes lots of work for me. All in all, thanks for your answer! That really helps me lot. – zeno tsang Jun 19 '13 at 09:45

score 10 · Answer 2 · answered Jun 17 '13 at 16:39

10

To get spaces back in the names, do this (right before you export - R does let you have spaces in variable names, but it's a pain):

# A simple regular expression to replace dots with spaces
# This might have unintended consequences, so be sure to check the results
names(yourdata) <- gsub(x = names(yourdata),
                        pattern = "\\.",
                        replacement = " ")

To drop the first-column index, just add row.names = FALSE to your write.xlsx(). That's a common argument for functions that write out data in tabular format (write.csv() has it, too).

answered Jun 17 '13 at 16:39

Matt Parker

24,639
6
51
71

Thanks a lot! The `gsub()` function works fine. But I still have a question: why should we use two _escapes_ rather than simply using `pattern="\."`? – zeno tsang Jun 18 '13 at 01:00
@zenotsang Good question - I'm not sure why R requires that. – Matt Parker Jun 18 '13 at 15:01
2

The first escape is used up when the R parser parses the string, this leaves a '\' for the regular expression. If you bypass the regular parser then you can get away with a single backslash (but that is probably more work than it is worth). – Greg Snow Jun 18 '13 at 15:41
I see. Obviously following the rules in R is better:)Thanks a lot! – zeno tsang Jun 19 '13 at 09:41

score 4 · Answer 3 · answered Aug 23 '14 at 16:08

Here's a function (sorry, I know it could be refactored) that makes nice column names even if there are multiple consecutive dots and trailing dots:

makeColNamesUserFriendly <- function(ds) {
  # FIXME: Repetitive.

  # Convert any number of consecutive dots to a single space.
  names(ds) <- gsub(x = names(ds),
                    pattern = "(\\.)+",
                    replacement = " ")

  # Drop the trailing spaces.
  names(ds) <- gsub(x = names(ds),
                    pattern = "( )+$",
                    replacement = "")
  ds
}

Example usage:

ds <- makeColNamesUserFriendly(ds)

TimTeaFan · Answer 4 · 2018-06-11T19:47:09.153

Just to add to the answers already provided, here is another way of replacing the “.” or any other kind of punctation in column names by using a regex with the stringr package in the way like:

require(“stringr”)   
colnames(data) <- str_replace_all(colnames(data), "[:punct:]", " ")

For example try:

data <- data.frame(variable.x = 1:10, variable.y = 21:30, variable.z = "const")

colnames(data) <- str_replace_all(colnames(data), "[:punct:]", " ")

and

colnames(data)

will give you

[1] "variable x" "variable y" "variable z"

Avoid that space in column name is replaced with period (".") when using read.csv()

4 Answers4

Linked

Related