0

I am tasked with having to manually create a new directory (right-click, create new folder) in which to save my 48 weather files. The set-up I have is I am getting weather data from 4 cities from the web using: wunderground.com

Specifically, from this URL: https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KMDLAURE5&year=2018&month=2&graphspan=month&format=1

Am tasked with grabbing the data from that website, clean up the data, then save it.

In the problem I must access data from the internet 48 times, because their are 4 stations (cities) and 12 months. So I think I must create a file for each month and save all its monthly recorded temperature data. Then have all of those files stored into one directory file. I created a function for this:

import urllib

def accessData(ID, Month):
 url="https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=" + str(ID) + "&year=2017=7&month=" + str(Month) + "&graphspan=month&format=1"

 infile = urllib.urlopen(url)
 readline = infile.readlines()
 infile.close()
 return readline

Now I was given a separate file called stations.csv with contains the data:

KCASANFR131,37.778,-122.408

KDCWASHI48,38.913,-77.031

IBRITISH359,49.256,-123.245

KNYNEWYO639,40.755,-74.007

I know that, for example KCASANFR131 is the station ID but what 37.778,-122.408 is? Unsure of what that represents.

Also, should I create a list that stores the station IDs? and then call them in my nested loop, or is there a way to call those IDs from the csv file itself?

Now that I have the function it makes sense to create a nested loop outside of the function and then to call the function inside the nested loop. For for each station ID, that will be the outer loop, and the 12 months that will be the inner loop.

After it completes one full iteration it should return a list containing the results of that web request.

Here is the code for that (removed the header with an if-statement:

if I were to store that ID data in a list

stationID = [KCASANFR131, KDCWASHI48, IBRITISH359, KNYNEWYO639]

for i in range(1,13):
    data = accessDat(i)
    filename = "0{}.2017.csv".format(i)
    outfile = open(filename, 'w')
    row_count = len(data)

      for j in range (2, row_count):
        if(data[j] != '<br>\n' and data[j] != 'n'):
          outfile.write(data[j])

outile.close()

Now what I am trying to do is keep that data separate from my scripts. So I want to save each file into a pre-made data directory using the format ­­.csv Example: KDCWASHI48­04­2017.csv

Also I need to have leading zeros, so the month should always be 2 characters long. For example, January will look like 01, and December will be 12, how does one do that using the str.zfill(2) method?

ebeneditos
  • 2,368
  • 1
  • 12
  • 32
Ma_
  • 13
  • 1
  • 5
  • 1
    i'd say the `49.256,-123.245` are Latitude/Longitude, their worldspace position – Antry Apr 11 '18 at 14:58
  • @Antry Oh that makes sense. – Ma_ Apr 11 '18 at 14:59
  • If this is your code indentation, your `if(data[j] != '
    \n' and data[j] != 'n'):` statement has nothing affecting it because `outfile.write(data[j])` is on the same indentation
    – Antry Apr 11 '18 at 15:01
  • @Antry Yes, first one resolves to a street address in San Francisco. Given the ID KCASANFR131, that makes sense – shmee Apr 11 '18 at 15:02
  • @Antry how might I use that data, say if I were putting it into the outer loop? Should I include the long and lat data ? It doesn't appear in the url. – Ma_ Apr 11 '18 at 15:02
  • I don't think you really need to use the lat/long, it's in the name to be precise, and you have it in case you need it, or for some flavor info somewhere – Antry Apr 11 '18 at 15:03
  • @Antry corrected it – Ma_ Apr 11 '18 at 15:04
  • @Antry do you think I should call the csv file for stations.csv, or does it make the most sense to save that data inside a list? – Ma_ Apr 11 '18 at 15:04
  • 1
    Usually, you want to minimize I/O actions (because they are very costly), so unless you only need 1 and are never going to need to call it again, i'd advise you to populate your datastructure with the needed data from your document – Antry Apr 11 '18 at 15:05
  • @Antry how do I grab the data that is in the file stations.csv when looping? – Ma_ Apr 11 '18 at 15:17
  • I'm not quite sure by what you mean by 'grab' but let's go with `parsing` to which i'd advise you to look into the [Python csv module](https://docs.python.org/3/library/csv.html). If you do '`parse`' your data, i'd advise you do it at the start, and not each time you will need it, either you'll end up just reopening files from which you could have read earlier. – Antry Apr 11 '18 at 15:20
  • I see that your data provider has a yearly graphspan also, instead of calling 12 times ? But that'd force you to split it up inside your logic – Antry Apr 11 '18 at 15:24
  • @Antry yeah that's what I think, I might just use the split function and keep the 0'th index each time. – Ma_ Apr 11 '18 at 15:27
  • It really depends on what you want to accomplish, having to deal with DateTime's can sometimes be frustrating too, It can really depend on your familiarity with the tools. If making multiple calls is not a performance hindrance for you, then you can stay on the 12 calls. Also I don't know if it is easier to manage missing data if it is already split by month or if you received an incomplete year. I don't really know if you're looking into error management at all ? Or is this clearly not your priority – Antry Apr 11 '18 at 15:31
  • @Performance is not a priority nor is error management. Would this work as far as getting data from the stations.csv file: fileobj = open('stations.csv', 'r'). So now it is reading the data inside of the csv file. I think now I would be able to loop each 0'th index using .split and .format in that csv file. – Ma_ Apr 11 '18 at 15:36

2 Answers2

1

You've got most of it covered.

"1".zfill(2)

will result in 01 as you probably already guessed.

in the line

outfile = open(filename, 'w')

it would be good to give the specific path of your data directory.

outfile = open(os.path.join([*data_dir_path, month_path, filename]), 'w')

have a look at the post

How can I create a directory if it does not exist?

if you want to know how to create a directory if it doesnt exist.

I hope this solves all the issues, and Antry mentioned correctly that 49.256,-123.245 are Latitude/Longitude.

Kenstars
  • 611
  • 4
  • 10
  • 1
    Constructing file paths as `data_dir_path + month_path + filename` is a bad idea. Whenever possible `os.path.join()` should be used. – zwer Apr 11 '18 at 15:06
  • Thanks. So I just need to create a directory on my desktop, call it, database and that would be my data_dir_path? what is month_path representing? – Ma_ Apr 11 '18 at 15:07
  • Thanks zwer, modified accordingly. – Kenstars Apr 11 '18 at 15:09
  • @Ma_ The `month_path` is 'january' or 'feburary' or even 'march_data' or anything you want that represents your monthly segments ^^ – Antry Apr 11 '18 at 15:09
  • @Ma_, added it so that you can bucket your datasets, by each month as well , It would help in the long term when trying to sort through the data. – Kenstars Apr 11 '18 at 15:10
  • @Kenstars I think the way it was before was the way the person wanted it to be: infile = urllib.urlopen(path) data = infile.read() outfile = open(‘weatherdata.txt’, ‘w’) outfile.write(data) outfile.close() infile.close() – Ma_ Apr 11 '18 at 15:12
  • alright, then you can have it without monthwise bucketing of the data, by setting the path as you want it. Thank you for clarifying. – Kenstars Apr 11 '18 at 15:14
  • @Kenstars so when it's in the inner loop it should be openfile = open(database, i, filename, 'w'). How does it know to go to the folder I just made on my desktop? – Ma_ Apr 11 '18 at 15:16
  • @Ma_ the filename should be specified as path + actual_filename, like for example if file is in directory data and filename is weatherdata.txt then your filename variable would be "data/weatherdata.txt" – Kenstars Apr 11 '18 at 15:19
  • @Kenstars it is a folder named "database" on my desktop, would that be: openfile = open('/path/to/Desktop', database', i, 'w')? – Ma_ Apr 11 '18 at 15:25
1

Structure your Code

Here are a couple guidelines to help you on your adventure.

When developing a programming, you usually want to design it like you are making a factory, elegant individual modules inside a chain of command. Instead of rigid/absolute code with no maintainability, scalability or abstraction of it's concepts

Think like legos, components and little bricks, to each their type and task.

Data-Structure

Start with your data-structures, they are the foundation of what you are working on, simple to make, you only need to know what you are working on to make them.

class Station:
    def __init__(self, _id, _lat, _long):
        self.id = _id
        self.lat = _lat
        self.long = _long

Here we have created a class which can be the home of the information from stations.csv this will let us manipulate PythonObjects instead of only iterating through lists (even if we ultimately will, we don't have to deal directly with it when we need it), this will also make your code a lot clearer and a lot easier to read, especially for people other than you.

Factories

Progress on to adding functionalities to your Classes through methods.

We'll start off by creating a factory class, which we could call Utility which would be responsible for importing and populating your classes for exemple, by doing this, we abstract our information layer from our logic layer, so you will never have to manipulate the parsing inside your main programs loop (I am not literally talking about __main__).

Let's start with parsing our Station information from our CSV:

import csv

stations = []

with open('stations.csv', 'r') as f:
    reader = csv.reader(f) 
    for row in reader:     
        clean_row = row.split(“,”)        
        stations.append(Station(clean_row[0], clean_row[1], clean_row[2])

(This does not take into account the possibility that you have your column names on your first row and assumes they are not.)

Now we have populated our list of stations with Station python objects, which we can use to access the data directly as such:

print(stations[0].id)

[...] To be continued; if requested. (kinda high on work)

Antry
  • 443
  • 3
  • 9
  • when I apply this and add a print statement at the end it prints out everything, how do I have it isolated to just the station ids? – Ma_ Apr 11 '18 at 16:13
  • @Ma_ I have edited my answer, I hope it might help, I'd advise reading it all from the start. I have added the forgotten split. – Antry Apr 11 '18 at 16:20
  • I created a different version. I hope this isn't rude, but could you look at what I have thus far: https://codeshare.io/5RldlL – Ma_ Apr 11 '18 at 17:13
  • Also, not sure how to add the leading zero? I know it just means that the month should always be 2 characters long. Like January will look like 01 while December will be 12. But how do I use the str.zfill(2) method to add the leading zero to this? – Ma_ Apr 11 '18 at 17:15
  • but when I go to run it says 'Finished' in Atom IDE, but nothing has shown up. I am guessing it is running but is not populating anywhere. So I have to send all those temperature data lines to a directory folder I made on my desktop – Ma_ Apr 11 '18 at 17:23
  • @Ma_ No, no, you're encouraged to search, not just copying is a good thing. Let me take a quick look – Antry Apr 11 '18 at 17:37
  • @Ma_ for `zfill` it's a function with return value, so if you are using `mystring = "4"` you save the value back `mystring = mystring.zfill(2)` now your string is `mystring == "04"` – Antry Apr 11 '18 at 17:55
  • so here would that be: filename = "{}-{}.zfill(2)-2017.csv".format(vals[0], j) – Ma_ Apr 11 '18 at 20:04