Basic Usage

The goal of the task-geo project is to download and generate datasets from a collection of data sources.

In this notebook we show how to use them.

Selecting a Data Source

The first thing to do is to select the data source that we want to use.

We can see the list of all the available data sources by using the task_geo.data_sources.list_data_sources function.

[1]:
from task_geo.data_sources import list_data_sources

data_sources = list_data_sources()
data_sources
[1]:
['noaa_api', 'cds', 'us_census', 'nyt']

For this example we will be using the noaa_api one.

[10]:
data_source_name = 'noaa_api'

Loading a Data Source

Once we have selected the data source that we want to use, we can load it using the task_geo.data_sources.get_data_source function

[11]:
from task_geo.data_sources import get_data_source

noaa_api = get_data_source('noaa_api')
noaa_api
[11]:
<function task_geo.data_sources.noaa.noaa_api(countries, start_date, end_date, metrics=None)>

This will return a function that we can directly use to download data.

In order to see all the details about what it does and the arguments that it needs to be passed we can use the Python builting function help, which will print us all its documentation.

[12]:
help(noaa_api)
Help on function noaa_api in module task_geo.data_sources.noaa:

noaa_api(countries, start_date, end_date, metrics=None)
    NOAA API Data Source.

    Arguments:
        countries(list[str]):
            List of country names in FIPS format.
        start_date(datetime):
            Start date for the data.
        end_date(datetime):
            End date for the date. (Optional, if not present will be set to the current day.)
        metrics(list[str]): Optional.List of metrics to retrieve,valid values are:
            TMIN: Minimum temperature.
            TMAX: Maximum temperature.
            TAVG: Average of temperature.
            SNOW: Snowfall (mm).
            SNWD: Snow depth (mm).

    Example:
    >>> from datetime import datetime
    >>> countries = ['FR']
    >>> start_date = datetime(2020, 1, 1)
    >>> end_date = datetime(2020, 1, 15)
    >>> noaa_api(countries, start_date, end_date)

Using the Data Source

All the returned data sources will be functions that can be directly called.

For example, let’s use the noaa_api function that we just loaded to get data from NOAA stations in France between 2020-01-01 and 2020-01-15.

[14]:
from datetime import datetime

data = noaa_api(
    ['FR'],
    start_date=datetime(2020, 1, 1),
    end_date=datetime(2020, 1, 15)
)
2020-04-03 20:01:18,466 - INFO - ftp_connector - Connecting to NOAA FTP server.
2020-04-03 20:03:00,211 - INFO - api_connector - Requesting data for FR
[15]:
data.head()
[15]:
latitude longitude elevation country name date station tmax tmin
0 48.0689 -1.7339 36.0 France RENNES-ST JACQUES 2020-01-01 FR000007130 10.4 4.8
1 48.0689 -1.7339 36.0 France RENNES-ST JACQUES 2020-01-02 FR000007130 11.0 7.8
2 48.0689 -1.7339 36.0 France RENNES-ST JACQUES 2020-01-03 FR000007130 13.1 NaN
3 48.0689 -1.7339 36.0 France RENNES-ST JACQUES 2020-01-04 FR000007130 10.4 1.4
4 48.0689 -1.7339 36.0 France RENNES-ST JACQUES 2020-01-05 FR000007130 9.5 3.0