Getting Reliable Temperature Readings

I wanted to correlate temperature to household energy usage in Denmark. To do this, I needed to acquire reliable temperature measurements from Zealand but it has turned out to be a very hard thing to do.

  1. I logically began my journey at the Danish Meteorological Institute (DMI) but I could not find raw data that I could use. There was a lot of very useful information everywhere, but no historical data that I could download in CSV file (or in any other format).
  2. European Climate Assessment and Dataset (ECA&D): This was my first successful hit where I was able to acquire daily temperature means from a station called LANDBOHOJSKOLEN. This was a good beginning, but I’m looking for measurements that are taken every hour.
  3. Weather Underground (WU): In my search for a finer granularity (hours means instead of day means), my search has taken me to Weather Underground. They seem to have the temperature data measured at 30 minutes intervals, which was really good for my purposes. But they round the temperature values to the nearest degree which might introduce some in the correlation. In any case, I used the “comma separated file” functionality (Here is map of stations) to download temperature data by automating the process with a python script (I might have made it a bit too complicated, but it works).
import urllib2
import os.path
import datetime
import random
import time

WEATHER_PLACE="http://www.wunderground.com/history/airport/EKRK/%s/%s/%s/" \
 + "/DailyHistory.html?format=1"
WEATHER_PLACE="http://www.wunderground.com/history/airport/EKRK/%s/%s/%s/" \
 + "DailyHistory.html?req_city=NA&req_state=NA&req_statename=NA&format=1"
WEATHER_UNITS="http://www.wunderground.com/cgi-bin/findweather/getForecast?setunits=english"
MAX_WAIT_TIME=1 #in seconds.

def dwGenerateDateDict( D1, D2 ):

 # Initialize dateDict
 numdays = abs(D1 - D2).days
 dateList = [ D2 - datetime.timedelta(days=x) for x in range(0,3) ]
 dateDict = dict(zip(dateList, [""]*len(dateList)))

 while ( len(dateList) > 0 ):
 print("Days left: " + str(len(dateList)))

 # Wait for a random amount of seconds
 secs = random.randint(0,MAX_WAIT_TIME)
 print("Waiting for " + str(secs) + " seconds.")
 time.sleep(secs)

 # Select a random date
 ranInd = random.randint(0,len(dateList)-1)
 rd = dateList[ranInd]
 del(dateList[ranInd])

 # Get the csv weather for the date
 urllib2.urlopen(WEATHER_UNITS)
 urlRes = urllib2.urlopen(WEATHER_PLACE%(rd.year, rd.month, rd.day))

 # Read rest into dictionary.
 dateDict[rd] = urlRes.read()
 print("Date : " + rd.strftime("%Y%m%d") \
 + ". Read : " + str(len(dateDict[rd])) + " Bytes")

 return (dateDict)

def dwGenerateCSVFile ( dateDict, filename ):

 fd = open(filename, 'a')
 keys = dateDict.keys()
 keys.sort()
 for date in keys:
 datestr = str(date)
 datedata = dateDict[date]
 timetype = datedata[1:datedata.find(",")] # CET or CEST
 # Remove first list of header
 datedata = datedata[datedata.find("
\n") + 7:]
 # Last separator to front
 datedata = datedata[-7:] + datedata[:-7];
 datedata = datedata.replace("<br />\n",
 "\n"+datestr+","+timetype+",")
 fd.write(datedata)
 fd.close()

if __name__ == "__main__":
 D1 = datetime.date(2011,9,27)
 D2 = datetime.date(2013,9,23)
 DD = dwGenerateDateDict(D1,D2)
 dwGenerateCSVFile(DD, "UnderGroundWeatherRoskildeAirport.csv")

I’m going to start trying different weather APIs and see if any of them can give me the historical information that I need with the granularity level that I want.  More to come…

Advertisement

About joelgranados

I'm fascinated with how technology and science impact our reality and am drawn to leverage them in order to increase the potential of human activity.
This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s