3

I have a short .ado file to standardize string variables from several sources. The program takes one string variable, renames it with a suffix, then replaces the original variable with the standardized name.

But syntax doesn't correctly parse the comma after the string variable. That is, instead of passing country it passes country, and throws an error.

Is the space between varname and , required? Or do I have a misunderstanding of how I should be using syntax and varlist?

clear
set obs 10
generate country = "United States of America"

* runs fine without `suffix()` options
country_names country
list

/*
. list

     +------------------------------------------+
     |                country_0         country |
     |------------------------------------------|
  1. | United States of America   United States |
  2. | United States of America   United States |
  3. | United States of America   United States |
  4. | United States of America   United States |
  5. | United States of America   United States |
     |------------------------------------------|
  6. | United States of America   United States |
  7. | United States of America   United States |
  8. | United States of America   United States |
  9. | United States of America   United States |
 10. | United States of America   United States |
     +------------------------------------------+
*/

* but gives error with `suffix()` 
country_names country, suffix(_orig)

/*
. country_names country, suffix(_orig)
, invalid name
r(198);
*/

* `set trace on` reveals comma passed as part of `varlist`
set trace on
country_names country, suffix(_orig)

/*
. country_names country, suffix(_orig)
  ---------------------------------------------------------------------------------------------- begin country_names ---
  - version 11.2
  - syntax varname(string) [, Suffix(string) ]
  - quietly {
  - if "`suffix'" == "" local suffix "_0"
  = if "_orig" == "" local suffix "_0"
  - rename `1' `1'`suffix'
  = rename country, country,_orig
, invalid name
    generate `1' = proper(`1'`suffix')
    replace `1' = "United States" if inlist(`1', "United States Of America")
    local name: variable label `1'`suffix'
    label variable `1' "`name'"
    label variable `1'`suffix' "`name' (orig)"
    }
  ------------------------------------------------------------------------------------------------ end country_names ---
r(198);
*/

* if I leave space before comma, then program works
country_names country , suffix(_orig)
list

/*
. list

     +----------------------------------------------------------+
     |                country_0    country_orig         country |
     |----------------------------------------------------------|
  1. | United States of America   United States   United States |
  2. | United States of America   United States   United States |
  3. | United States of America   United States   United States |
  4. | United States of America   United States   United States |
  5. | United States of America   United States   United States |
     |----------------------------------------------------------|
  6. | United States of America   United States   United States |
  7. | United States of America   United States   United States |
  8. | United States of America   United States   United States |
  9. | United States of America   United States   United States |
 10. | United States of America   United States   United States |
     +----------------------------------------------------------+
*/

Here's the .ado file.

*! 0.1 Richard Herron 2/11/2014

/* use to standardize country names across several data sources */

program country_names
    version 11.2
    syntax varname(string) [, Suffix(string) ]

    quietly {

        /* default suffix */
        if "`suffix'" == "" local suffix "_0"

        /* save original as new variable w/ suffix */
        rename `1' `1'`suffix'

        /* first standardize capitalization */
        generate `1' = proper(`1'`suffix')

        /* -if- picks bad names from several sources */
        replace `1' = "United States" ///
            if inlist(`1', "United States Of America")

        /* fix labels */ 
        local name: variable label `1'`suffix'
        label variable `1' "`name'"
        label variable `1'`suffix' "`name' (orig)"

    }

    end
Richard Herron
  • 8,979
  • 10
  • 62
  • 104

1 Answers1

3

Your syntax statement yields a local macro varlist if and only if you present a valid variable name.

The problem with your program is that you don't use that local.

Instead, you use the local macro with name 1. Regardless of syntax, by default local macro 0 is the whole of a command line typed after a command name and local macros 1, 2 and so forth are the first, second, and so forth "tokens" in the command line.

In deciding what are tokens, in this context, the important detail here is that Stata parses on spaces. Therefore in your example token 1 is country, (including the comma) and so (given that suffix is _orig)

rename `1' `1'`suffix'

is interpreted as

rename country, country,_orig 

If you set trace on I predict that you will see this line, which is what throws you out. As far as rename is concerned what follows country (a valid variable name) is the comma, which is not a valid variable name.

The short summary is sweet: You are using references to 1 wherever you should be using varlist.

Note: although your syntax statement does specify varname, a single variable name, what you type as a variable name is still placed in local macro varlist.

Note: you could work round this problem by always putting a space after the variable name, but I strongly recommend against that.

Note: syntax is thus blameless here.

Nick Cox
  • 30,617
  • 6
  • 27
  • 44
  • Thanks for the explanation. Or I can add `tokenize \`varlist'`, correct? Is the following interpretation correct? The `varlist` is `tokenize`d already, but on spaces rather than valid variable names so that `\`1'` is `country,`. But if I `tokenize` the `varlist` (again) it removes the trailing comma. – Richard Herron Feb 11 '14 at 19:37
  • 3
    You're welcome. If you apply `tokenize` within the program to `varlist` then local macro `1` will contain the one and only variable name you specified. But why do that? You can just use `varlist` which contains that name already. Doing otherwise means an unnecessary statement (and strange program style). – Nick Cox Feb 11 '14 at 19:54