I wanted to correlate temperature to household energy usage in Denmark. To do this, I needed to acquire reliable temperature measurements from Zealand but it has turned out to be a very hard thing to do.
- I logically began my journey at the Danish Meteorological Institute (DMI) but I could not find raw data that I could use. There was a lot of very useful information everywhere, but no historical data that I could download in CSV file (or in any other format).
- European Climate Assessment and Dataset (ECA&D): This was my first successful hit where I was able to acquire daily temperature means from a station called LANDBOHOJSKOLEN. This was a good beginning, but I’m looking for measurements that are taken every hour.
- Weather Underground (WU): In my search for a finer granularity (hours means instead of day means), my search has taken me to Weather Underground. They seem to have the temperature data measured at 30 minutes intervals, which was really good for my purposes. But they round the temperature values to the nearest degree which might introduce some in the correlation. In any case, I used the “comma separated file” functionality (Here is map of stations) to download temperature data by automating the process with a python script (I might have made it a bit too complicated, but it works).
import urllib2 import os.path import datetime import random import time WEATHER_PLACE="http://www.wunderground.com/history/airport/EKRK/%s/%s/%s/" \ + "/DailyHistory.html?format=1" WEATHER_PLACE="http://www.wunderground.com/history/airport/EKRK/%s/%s/%s/" \ + "DailyHistory.html?req_city=NA&req_state=NA&req_statename=NA&format=1" WEATHER_UNITS="http://www.wunderground.com/cgi-bin/findweather/getForecast?setunits=english" MAX_WAIT_TIME=1 #in seconds. def dwGenerateDateDict( D1, D2 ): # Initialize dateDict numdays = abs(D1 - D2).days dateList = [ D2 - datetime.timedelta(days=x) for x in range(0,3) ] dateDict = dict(zip(dateList, [""]*len(dateList))) while ( len(dateList) > 0 ): print("Days left: " + str(len(dateList))) # Wait for a random amount of seconds secs = random.randint(0,MAX_WAIT_TIME) print("Waiting for " + str(secs) + " seconds.") time.sleep(secs) # Select a random date ranInd = random.randint(0,len(dateList)-1) rd = dateList[ranInd] del(dateList[ranInd]) # Get the csv weather for the date urllib2.urlopen(WEATHER_UNITS) urlRes = urllib2.urlopen(WEATHER_PLACE%(rd.year, rd.month, rd.day)) # Read rest into dictionary. dateDict[rd] = urlRes.read() print("Date : " + rd.strftime("%Y%m%d") \ + ". Read : " + str(len(dateDict[rd])) + " Bytes") return (dateDict) def dwGenerateCSVFile ( dateDict, filename ): fd = open(filename, 'a') keys = dateDict.keys() keys.sort() for date in keys: datestr = str(date) datedata = dateDict[date] timetype = datedata[1:datedata.find(",")] # CET or CEST # Remove first list of header datedata = datedata[datedata.find(" \n") + 7:] # Last separator to front datedata = datedata[-7:] + datedata[:-7]; datedata = datedata.replace("<br />\n", "\n"+datestr+","+timetype+",") fd.write(datedata) fd.close() if __name__ == "__main__": D1 = datetime.date(2011,9,27) D2 = datetime.date(2013,9,23) DD = dwGenerateDateDict(D1,D2) dwGenerateCSVFile(DD, "UnderGroundWeatherRoskildeAirport.csv")
I’m going to start trying different weather APIs and see if any of them can give me the historical information that I need with the granularity level that I want. More to come…