Getting Forecasts#

The Open Data Portal of Deutscher Wetterdienst (DWD) provides detailed forecasts for Germany and all other regions of the world in human and machine readable form. The machine readable service is called MOSMIX. In this project we

  • collect information on how to use MOSMIX,

  • automatically download newly published MOSMIX data,

  • convert MOSMIX files to CSV files.

In this project we heavily rely on techniques presented in Accessing Data.

Investigating and Understanding MOSMIX#

DWD’s open data portal is quite complex. Before we start downloading forecasts data we have to find information on data location and format.

Task: Read about MOSMIX at DWD’s MOSMIX info page. Follow relevant links and answer the following questions:

  • What are the differences between MOSMIX S and MOSMIX L?

  • What’s the URL of the most recent MOSMIX L file for station ‘Zwickau’?

  • What standard file formats are used for MOSMIX files (KMZ files)?

  • How long MOSMIX files are available at DWD’s open data portal?

Solution:

# your answers

An Archive of Forecasts#

MOSMIX data older than two days gets removed from DWD’s open data portal. To be able to analyze quality of forecasts (that is, to compare them to real observations) we have to keep them in a local archive. For this purpose we would have to visit DWD’s open data portal once a day and look for new MOSMIX files. Then we could download them and add them to our local archive. With Python we may automate this job.

Task: Write a function get_available_mosmix_files which scrapes a list of URLs of all currently available MOSMIX L files for a selected station from DWD open data portal. Arguments:

  • station ID (string).

Return value:

  • URLs (list of strings).

Solution:

# your solution

Now it’s time to download the files. Maybe we already downloaded some of them yesterday. So we should have a look in our archive directory first to avoid downloading more files than necessary.

Task: Write a function download_files which downloads all new files from a list of URLs. Arguments:

  • URLs (list of strings),

  • archive path (string).

Return value:

  • names of new files (list of strings).

Hints:

  • To check whether a file already exists, have a look at os.path.isfile.

  • Read and write in binary mode because KMZ files aren’t text files.

Solution:

# your solution

KMZ to CSV#

Now that we have MOSMIX files in our local storage we should convert them to CSV files. Each row shall contain all weather parameters for a fixed point of time. First column is the time stamp. All other columns contain all the weather parameters contained in the MOSMIX files.

Task: Write a function kmz_to_csv for converting a list of KMZ files to CSV files. Arguments:

  • archive path (string),

  • list of file names (list of strings).

No return value.

Hint: MOSMIX files use an XML feature known as namespaces. Consequently, tag names contain collons, which confuses Beautiful Soup’s standard HTML parser (which also parses simple XML files). To get MOSMIX files parsed correctly, install the lxml module and provide a second argument 'xml' to Beautiful Soup’s constructor. This tells Beautiful Soup to use a dedicated XML parser, which by default is lxml.

Solution:

# your solution

Automatic Daily Download#

To collect forecasts over a longer period of time we have to run the developed code once per day. We could implement a loop and use time.sleep to make Python wait one day before continuing with the next run. The better (simpler and more efficient) solution is to tell the operating system to run the Python program each day at a fixed time.

On Linux and macOS there is cron (and anacron) for scheduling tasks. On Windows there is the Task Scheduler.

Task: Find out the details about scheduling a daily task on your system. Then make a Python script file from your code above and let it run once per day.

Solution:

# your steps to schedule a task