0
    sect<-c("Healthcare","Basic Materials","Utilities","Financial Services","Technology","Consumer" 
    "Defensive","Industrials","Communication Services","Energy","Real Estate","Consumer 
    Cyclical","NULL")

    mcap<-c("3 - Large","2 - Mid","1 - Small")

    df_total = data.frame()
    start <- as.Date("01-01-14",format="%d-%m-%y")
    end   <- as.Date("18-03-20",format="%d-%m-%y")
    theDate <- start

    while (theDate <= end){
      for (value1 in sect){
        for (value2 in mcap){
            date=theDate
            sector<-value1
            marketcap1<-value2
            newquery("Select * from table where date='%s' and sector='%s' and marketcap='%s'",date,sector,marketcap1)
   topdemo <- sqlQuery(dbhandle,newquery)
   df=data.frame(topdemo)
   df_total <- rbind(df_total,df)

     }
    }
   theDate <- theDate + 1 
   }

Instead of the "Select" query, in my program Im doing a few SQL computations. I need this code to run from 2014 to 2020 but it is taking a lot of time to execute it. Is there any way to decrease the execution time? The database has many stock prices for each market-cap and sector.

Theguy
  • 33
  • 7
  • Yes. Avoid `select *`. Only select the columns you need. – AlwaysLearning Mar 19 '20 at 10:31
  • Im actually doing a lot of computations with the data. The "Select command" was given for a reference. The program basically goes through each date from 2014, market cap and sector and calculates few things. How to reduce the time? – Theguy Mar 19 '20 at 10:40

1 Answers1

1

Run one query instead of all the loops:

select *
from table
where sector in ('Healthcare', 'Basic Materials', 'Utilities',
                 'Financial Services', 'Technology', 'Consumer' 
                 'Defensive', 'Industrials', 'Communication Services', 'Energy', 'Real Estate', 'Consumer Cyclical', 'NULL'
                 ) and
        marketcap in ('3 - Large', '2 - Mid', '1 - Small') and
        date between '2014-01-01 and '2020-03-18';

There is a lot of overhead in running lots of small queries, one is typically better.

That said, you seem to be moving lots of data. I wonder if all that data movement is necessary.

It is odd that you are looping through thousands of dates, but not including the date in the query.

Gordon Linoff
  • 1,122,135
  • 50
  • 484
  • 624
  • I have included the date in the where clause. I need to go through each and every day to calculate certain things. I ran the program for 45 mins. It executed only till April of 2014. – Theguy Mar 19 '20 at 10:56
  • @Theguy . . . Run a single query to load the data into a dataframe. If you then need to cycle through the data, do that in R. Note that if the operations can be expressed in SQL, it is probably better to do the calculation in the database. – Gordon Linoff Mar 19 '20 at 10:59