Basic Usage¶
The goal of the task-geo project is to download and generate datasets from a collection of data sources.
In this notebook we show how to use them.
Selecting a Data Source¶
The first thing to do is to select the data source that we want to use.
We can see the list of all the available data sources by using the task_geo.data_sources.list_data_sources
function.
[1]:
from task_geo.data_sources import list_data_sources
data_sources = list_data_sources()
data_sources
[1]:
['noaa_api', 'cds', 'us_census', 'nyt']
For this example we will be using the noaa_api
one.
[10]:
data_source_name = 'noaa_api'
Loading a Data Source¶
Once we have selected the data source that we want to use, we can load it using the task_geo.data_sources.get_data_source
function
[11]:
from task_geo.data_sources import get_data_source
noaa_api = get_data_source('noaa_api')
noaa_api
[11]:
<function task_geo.data_sources.noaa.noaa_api(countries, start_date, end_date, metrics=None)>
This will return a function that we can directly use to download data.
In order to see all the details about what it does and the arguments that it needs to be passed we can use the Python builting function help
, which will print us all its documentation.
[12]:
help(noaa_api)
Help on function noaa_api in module task_geo.data_sources.noaa:
noaa_api(countries, start_date, end_date, metrics=None)
NOAA API Data Source.
Arguments:
countries(list[str]):
List of country names in FIPS format.
start_date(datetime):
Start date for the data.
end_date(datetime):
End date for the date. (Optional, if not present will be set to the current day.)
metrics(list[str]): Optional.List of metrics to retrieve,valid values are:
TMIN: Minimum temperature.
TMAX: Maximum temperature.
TAVG: Average of temperature.
SNOW: Snowfall (mm).
SNWD: Snow depth (mm).
Example:
>>> from datetime import datetime
>>> countries = ['FR']
>>> start_date = datetime(2020, 1, 1)
>>> end_date = datetime(2020, 1, 15)
>>> noaa_api(countries, start_date, end_date)
Using the Data Source¶
All the returned data sources will be functions that can be directly called.
For example, let’s use the noaa_api
function that we just loaded to get data from NOAA stations in France between 2020-01-01
and 2020-01-15
.
[14]:
from datetime import datetime
data = noaa_api(
['FR'],
start_date=datetime(2020, 1, 1),
end_date=datetime(2020, 1, 15)
)
2020-04-03 20:01:18,466 - INFO - ftp_connector - Connecting to NOAA FTP server.
2020-04-03 20:03:00,211 - INFO - api_connector - Requesting data for FR
[15]:
data.head()
[15]:
latitude | longitude | elevation | country | name | date | station | tmax | tmin | |
---|---|---|---|---|---|---|---|---|---|
0 | 48.0689 | -1.7339 | 36.0 | France | RENNES-ST JACQUES | 2020-01-01 | FR000007130 | 10.4 | 4.8 |
1 | 48.0689 | -1.7339 | 36.0 | France | RENNES-ST JACQUES | 2020-01-02 | FR000007130 | 11.0 | 7.8 |
2 | 48.0689 | -1.7339 | 36.0 | France | RENNES-ST JACQUES | 2020-01-03 | FR000007130 | 13.1 | NaN |
3 | 48.0689 | -1.7339 | 36.0 | France | RENNES-ST JACQUES | 2020-01-04 | FR000007130 | 10.4 | 1.4 |
4 | 48.0689 | -1.7339 | 36.0 | France | RENNES-ST JACQUES | 2020-01-05 | FR000007130 | 9.5 | 3.0 |