11

I have a Python script that cleans up and performs basic statistical calculations on a large panel dataset (2,000,000+ observations).

I find that some of these tasks are better suited to Stata, and wrote a do file with the necessary commands. Thus, I want to run a .do file within my Python code. How would I go about calling a .do file from Python?

Brian Tompsett - 汤莱恩
  • 5,195
  • 62
  • 50
  • 120
svenkatesh
  • 1,062
  • 2
  • 8
  • 23
  • How do you normally run such files? Do you use a command line interface? If so, what do you enter? – wnnmaw Jan 21 '14 at 16:28
  • I usually run do files by opening stata, and typing do .do into the command line. However, I have many do files to process, and it would be easier if I could consolidate and run them from the Python file. – svenkatesh Jan 21 '14 at 16:32
  • 1
    What platform are you running on? – wnnmaw Jan 21 '14 at 17:09
  • Have a look at Andrew's comments under my question http://stackoverflow.com/questions/18532440/in-sublime-text-3-can-i-send-a-selection-of-a-do-file-to-stata - perhaps of some help. – radek Jan 21 '14 at 23:02
  • 1
    @ wnnmaw I use Windows 7 at work and Mac OS X 10.8 at home. @radek Thanks! That was a very helpful question to look at. – svenkatesh Jan 22 '14 at 15:38

3 Answers3

17

I think @user229552 points in the correct direction. Python's subprocess module can be used. Below an example that works for me with Linux OS.

Suppose you have a Python file called pydo.py with the following:

import subprocess

## Do some processing in Python

## Set do-file information
dofile = "/home/roberto/Desktop/pyexample3.do"
cmd = ["stata", "do", dofile, "mpg", "weight", "foreign"]

## Run do-file
subprocess.call(cmd) 

and a Stata do-file named pyexample3.do, with the following:

clear all
set more off

local y `1'
local x1 `2'
local x2 `3'

display `"first parameter: `y'"'
display `"second parameter: `x1'"'
display `"third parameter: `x2'"'

sysuse auto
regress `y' `x1' `x2'

exit, STATA clear

Then executing pydo.py in a Terminal window works as expected.

You could also define a Python function and use that:

## Define a Python function to launch a do-file 
def dostata(dofile, *params):
    ## Launch a do-file, given the fullpath to the do-file
    ## and a list of parameters.
    import subprocess    
    cmd = ["stata", "do", dofile]
    for param in params:
        cmd.append(param)
    return subprocess.call(cmd) 

## Do some processing in Python

## Run a do-file
dostata("/home/roberto/Desktop/pyexample3.do", "mpg", "weight", "foreign")

The complete call from a Terminal, with results:

roberto@roberto-mint ~/Desktop
$ python pydo.py

  ___  ____  ____  ____  ____ (R)
 /__    /   ____/   /   ____/
___/   /   /___/   /   /___/   12.1   Copyright 1985-2011 StataCorp LP
  Statistics/Data Analysis            StataCorp
                                      4905 Lakeway Drive
                                      College Station, Texas 77845 USA
                                      800-STATA-PC        http://www.stata.com
                                      979-696-4600        stata@stata.com
                                      979-696-4601 (fax)


Notes:
      1.  Command line editing enabled

. do /home/roberto/Desktop/pyexample3.do mpg weight foreign 

. clear all

. set more off

. 
. local y `1'

. local x1 `2'

. local x2 `3'

. 
. display `"first parameter: `y'"'
first parameter: mpg

. display `"second parameter: `x1'"'
second parameter: weight

. display `"third parameter: `x2'"'
third parameter: foreign

. 
. sysuse auto
(1978 Automobile Data)

. regress `y' `x1' `x2'

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  2,    71) =   69.75
       Model |   1619.2877     2  809.643849           Prob > F      =  0.0000
    Residual |  824.171761    71   11.608053           R-squared     =  0.6627
-------------+------------------------------           Adj R-squared =  0.6532
       Total |  2443.45946    73  33.4720474           Root MSE      =  3.4071

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |  -.0065879   .0006371   -10.34   0.000    -.0078583   -.0053175
     foreign |  -1.650029   1.075994    -1.53   0.130      -3.7955    .4954422
       _cons |    41.6797   2.165547    19.25   0.000     37.36172    45.99768
------------------------------------------------------------------------------

. 
. exit, STATA clear

Sources:

http://www.reddmetrics.com/2011/07/15/calling-stata-from-python.html

http://docs.python.org/2/library/subprocess.html

http://www.stata.com/support/faqs/unix/batch-mode/

A different route for using Python and Stata together can be found at

http://ideas.repec.org/c/boc/bocode/s457688.html

http://www.stata.com/statalist/archive/2013-08/msg01304.html

Roberto Ferrer
  • 10,756
  • 1
  • 18
  • 21
  • Thanks, this was very helpful. However, I'm running into another error when I implement this solution in Windows. I get the following message: `WindowsError: [Error 2] The system cannot find the file specified` – svenkatesh Jan 22 '14 at 19:55
  • 4
    Ah, I figured out what I'm doing wrong - first, because I'm running from Windows XP, I should be passing the `shell = 'true'` argument in the `subprocess.call()` method. Second, my Python code was not in the same working directory as is my copy of Stata. Once I fixed these two issues, everything ran smoothly. Thanks so much for your help. – svenkatesh Jan 22 '14 at 20:06
5

This answer extends @Roberto Ferrer's answer, solving a few issues I ran into.

Stata in system path

For stata to run code, it must be correctly set up in the system path (on Windows at least). At least for me, this was not automatically set up on installing Stata, and i found the simplest correction was to put in the full path (which for me was "C:\Program Files (x86)\Stata12\Stata-64) i.e.:

cmd = ["C:\Program Files (x86)\Stata12\Stata-64","do", dofile]`

How to quietly run the code in the background

It is possible to get the code to run quietly in the background (i.e. not opening up Stata each time), by adding the command /e i.e.

cmd = ["C:\Program Files (x86)\Stata12\Stata-64,"/e","do", dofile]

Log file storage location

Finally, if you are running quietly in the background, Stata will will want to save log files. It will do this in cmd's working directory. This must vary depending on where the code is being run from, but for me, since i was executing Python from Notepad++, it wanted to save the log files in C:\Program Files (x86)\Notepad++ , which Stata did not have write-access to. This can be changed by specifying the working directory when the sub-process is called.

These modifications to Roberto Ferrer's code lead to:

def dostata(dofile, *params):
    cmd = ["C:\Program Files (x86)\Stata12\Stata-64","/e","do", dofile]         
    for param in params:
        cmd.append(param)
    return (subprocess.call(cmd, cwd=r'C:\location_to_save_log_files'))
kyrenia
  • 4,495
  • 7
  • 53
  • 82
  • 1
    Spent so long trying to figure out that `"stata"` would not work, and instead I needed to define the full path to the Stata executable as you did. – ALollz Oct 29 '18 at 20:52
1

If you're running this in a command-line setting, you should be able to call Stata from the command line from python (I don't know how to invoke a shell command from within Python, but it shouldn't be too hard, see here: Calling an external command in Python). To run Stata from the command line (aka batch mode), see here: http://www.stata.com/support/faqs/unix/batch-mode/

Community
  • 1
  • 1
maxliving
  • 110
  • 1
  • 2