ESIIL API Project - Get started with open reproducible science!¶
What is Open Reproducible Science? Open Reproducible science is simply the science of open communication and collaboration, and reproduction in research. Open communication in the sense of sharing data and processes used in the research, and open collaboration in the sense of working together with other researchers to ensure that knowledge continues to advance and science continues to evolve, open reproduction in the sense that fellow researchers can follow the process you used in achieving your results to achieve the same results.
Example: GitHub : Github helps researchers to be able to track the processes in data exploration and analysis. with the version control in github fellow researches can track changes made to various data sets and results in repositories. This enables them to be able to reproduce the data.
What is Machine Readable Name - A machine-readable name refers to a name that is formatted in a way that can be easily processed and understood by computer systems. In the context of data management and organization, machine-readable names are typically designed to adhere to certain conventions or standards to ensure consistency and interoperability across different software platforms and databases.
Machine-readable names often avoid special characters, spaces, and other formatting elements that may cause issues when processing or parsing the names programmatically. Instead, they may use underscores, hyphens, camel case, or other conventions to separate words and make the names more readable for machines. Renamed the filename from "Get Started with Open Reproducible Science!.ipynb" to "getting_started_with_open_reproducible_science.ipynb" for easier understanding.
Here are some suggestions for creating readable, well-documented scientific workflows that are easier to reproduce¶
Things to do to write a clean code:
- Go through the code and look for errors.
- Make sure you label your processes with a comment.
- Make sure you run your code to confirm if there are errors in it.
Advantages of doing these things:
- You are able to remove and add lines of code to ensure the code runs properly.
- You are able to ensure that those who read your code understand it perfectly.
- You are able to confirm in the present that your code contains no errors instead of waiting till you have a lot of code written already.
Getting Started with the Project¶
This project focuses on utilizing the National Centers for Environmental Information (NCEI) Access Data Service, which offers a RESTful application programming interface (API). This API allows users to access and subset data by applying a specific set of parameters to the version 1 (v1) URL. More information can be found here: https://www.ncei.noaa.gov/support/access-data-service-api-user-documentation
The Global Historical Climatology Network - Daily (GHCNd) gathers daily weather information from around the world. It's managed by NOAA, which is part of the U.S. government. The data includes things like how hot it gets each day, and it's measured mostly in degrees Celsius. People collect this information using different tools, like weather stations and satellites.
Citation: Menne, M.J., I. Durre, B. Korzeniewski, S. McNeal, K. Thomas, X. Yin, S. Anthony, R. Ray, R.S. Vose, B.E.Gleason, and T.G. Houston, 2012: Global Historical Climatology Network - Daily (GHCNd) [ed.], Version 3. Available online at https://www.ncdc.noaa.gov/ghcn-daily. NOAA National Climatic Data Center.
http://doi.org/10.7289/V5D21VHZ
[20/04/2024].
Programming Part Begins¶
# Import required libraries
import matplotlib.pyplot as plt
import pandas as pd
# Work with vector data
import geopandas as gpd
# Save maps and plots to files
import holoviews as hv
# Create interactive maps and plots
import hvplot.pandas
# Search for locations by name - this might take a moment
from osmnx import features as osm
Downloading daily summaries from the Murtala Mohammed station, Nigeria, from 1973 to the present, to analyze precipitation and temperature trends using APIs from the National Centers for Environmental Information (NCEI), a part of the National Oceanic and Atmospheric Administration (NOAA) website.
# Get Climate Data API
ncei_api_url = (
'https://www.ncei.noaa.gov/access/services/data/v1'
'?dataset=daily-summaries'
'&dataTypes=TAVG,TMIN,TMAX,PRCP&stations=NIM00065201'
'&startDate=1973-02-06'
'&endDate=2024-04-13'
'&includeStationName=true'
'&includeStationLocation=1'
'&units=standard')
ncei_api_url
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TAVG,TMIN,TMAX,PRCP&stations=NIM00065201&startDate=1973-02-06&endDate=2024-04-13&includeStationName=true&includeStationLocation=1&units=standard'
# Download data into Murtala data frame
murtala_df = pd.read_csv(
ncei_api_url,
index_col='DATE',
parse_dates=True,
na_values=['NaN'])
# Display Dataframe Data
murtala_df
STATION | NAME | LATITUDE | LONGITUDE | ELEVATION | PRCP | TAVG | TMAX | TMIN | |
---|---|---|---|---|---|---|---|---|---|
DATE | |||||||||
1973-02-06 | NIM00065201 | MURTALA MUHAMMED, NI | 6.577 | 3.321 | 41.1 | NaN | 88 | NaN | NaN |
1973-05-18 | NIM00065201 | MURTALA MUHAMMED, NI | 6.577 | 3.321 | 41.1 | NaN | 80 | NaN | NaN |
1973-05-30 | NIM00065201 | MURTALA MUHAMMED, NI | 6.577 | 3.321 | 41.1 | NaN | 81 | NaN | NaN |
1973-06-03 | NIM00065201 | MURTALA MUHAMMED, NI | 6.577 | 3.321 | 41.1 | NaN | 82 | NaN | NaN |
1973-08-26 | NIM00065201 | MURTALA MUHAMMED, NI | 6.577 | 3.321 | 41.1 | NaN | 81 | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2024-04-09 | NIM00065201 | MURTALA MUHAMMED, NI | 6.577 | 3.321 | 41.1 | NaN | 86 | 94.0 | NaN |
2024-04-10 | NIM00065201 | MURTALA MUHAMMED, NI | 6.577 | 3.321 | 41.1 | NaN | 86 | 97.0 | 80.0 |
2024-04-11 | NIM00065201 | MURTALA MUHAMMED, NI | 6.577 | 3.321 | 41.1 | NaN | 87 | 97.0 | 79.0 |
2024-04-12 | NIM00065201 | MURTALA MUHAMMED, NI | 6.577 | 3.321 | 41.1 | NaN | 86 | NaN | NaN |
2024-04-13 | NIM00065201 | MURTALA MUHAMMED, NI | 6.577 | 3.321 | 41.1 | NaN | 88 | 95.0 | NaN |
7563 rows × 9 columns
# Check that the data was imported into a pandas DataFrame
type(murtala_df)
pandas.core.frame.DataFrame
# Clean up the dataframe
murtala_df = murtala_df[['PRCP', 'TAVG','TMIN','TMAX']]
murtala_df
PRCP | TAVG | TMIN | TMAX | |
---|---|---|---|---|
DATE | ||||
1973-02-06 | NaN | 88 | NaN | NaN |
1973-05-18 | NaN | 80 | NaN | NaN |
1973-05-30 | NaN | 81 | NaN | NaN |
1973-06-03 | NaN | 82 | NaN | NaN |
1973-08-26 | NaN | 81 | NaN | NaN |
... | ... | ... | ... | ... |
2024-04-09 | NaN | 86 | NaN | 94.0 |
2024-04-10 | NaN | 86 | 80.0 | 97.0 |
2024-04-11 | NaN | 87 | 79.0 | 97.0 |
2024-04-12 | NaN | 86 | NaN | NaN |
2024-04-13 | NaN | 88 | NaN | 95.0 |
7563 rows × 4 columns
Plotting the Precpitation column (PRCP), Average Temperature( TAVG), Minimum Temperature (TMIN), and Maximum Temperature (TMAX) vs Time to explore the data¶
# Plot your data frame
murtala_df.plot()
<Axes: xlabel='DATE'>
Plotting the only the Precpitation column (PRCP) vs Time to explore the data¶
# Plot the PRCPC data using .plot
murtala_df.plot(
y='PRCP',
title='DAILY PRECIPITATION IN MURTALA MOHAMMED',
xlabel='Year of record',
ylabel='Precipitation')
<Axes: title={'center': 'DAILY PRECIPITATION IN MURTALA MOHAMMED'}, xlabel='Year of record', ylabel='Precipitation'>
Plotting the only the Average Temperature (TAVG) vs Time to explore the data¶
# Plot the TAVG data using .plot
murtala_df.plot(
y='TAVG',
title='AVERAGE TEMPERATURE IN MURTALA MOHAMMED',
xlabel='Year of record',
ylabel='Average Temperature')
<Axes: title={'center': 'AVERAGE TEMPERATURE IN MURTALA MOHAMMED'}, xlabel='Year of record', ylabel='Average Temperature'>
Converting Temperature to Celsius
# Convert Temperature to celcius
murtala_df.loc[:, 'TCel'] = (murtala_df['TAVG'] - 32 ) * 5 / 9
# Display Df with TCel Column
murtala_df
/tmp/ipykernel_3893/2395632380.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy murtala_df.loc[:, 'TCel'] = (murtala_df['TAVG'] - 32 ) * 5 / 9
PRCP | TAVG | TMIN | TMAX | TCel | |
---|---|---|---|---|---|
DATE | |||||
1973-02-06 | NaN | 88 | NaN | NaN | 31.111111 |
1973-05-18 | NaN | 80 | NaN | NaN | 26.666667 |
1973-05-30 | NaN | 81 | NaN | NaN | 27.222222 |
1973-06-03 | NaN | 82 | NaN | NaN | 27.777778 |
1973-08-26 | NaN | 81 | NaN | NaN | 27.222222 |
... | ... | ... | ... | ... | ... |
2024-04-09 | NaN | 86 | NaN | 94.0 | 30.000000 |
2024-04-10 | NaN | 86 | 80.0 | 97.0 | 30.000000 |
2024-04-11 | NaN | 87 | 79.0 | 97.0 | 30.555556 |
2024-04-12 | NaN | 86 | NaN | NaN | 30.000000 |
2024-04-13 | NaN | 88 | NaN | 95.0 | 31.111111 |
7563 rows × 5 columns
Next, Subsetting and Resampling
# Subset the data
murtala_1990_2024_df= murtala_df.loc['1990':'2024']
murtala_1990_2024_df
PRCP | TAVG | TMIN | TMAX | TCel | |
---|---|---|---|---|---|
DATE | |||||
1991-04-13 | NaN | 84 | NaN | NaN | 28.888889 |
1991-10-04 | NaN | 79 | NaN | NaN | 26.111111 |
1994-02-03 | NaN | 83 | NaN | NaN | 28.333333 |
1994-02-24 | NaN | 83 | NaN | NaN | 28.333333 |
1994-04-15 | 0.16 | 80 | NaN | NaN | 26.666667 |
... | ... | ... | ... | ... | ... |
2024-04-09 | NaN | 86 | NaN | 94.0 | 30.000000 |
2024-04-10 | NaN | 86 | 80.0 | 97.0 | 30.000000 |
2024-04-11 | NaN | 87 | 79.0 | 97.0 | 30.555556 |
2024-04-12 | NaN | 86 | NaN | NaN | 30.000000 |
2024-04-13 | NaN | 88 | NaN | 95.0 | 31.111111 |
7475 rows × 5 columns
Getting into Action: Calculating Annual Statistics
# Resample the data to look at yearly mean values
murtala_annual_mean_df = murtala_1990_2024_df.resample('YS').mean()
murtala_annual_mean_df
PRCP | TAVG | TMIN | TMAX | TCel | |
---|---|---|---|---|---|
DATE | |||||
1991-01-01 | NaN | 81.500000 | NaN | NaN | 27.500000 |
1992-01-01 | NaN | NaN | NaN | NaN | NaN |
1993-01-01 | NaN | NaN | NaN | NaN | NaN |
1994-01-01 | 0.160000 | 82.000000 | NaN | NaN | 27.777778 |
1995-01-01 | NaN | NaN | NaN | NaN | NaN |
1996-01-01 | NaN | NaN | NaN | NaN | NaN |
1997-01-01 | NaN | NaN | NaN | NaN | NaN |
1998-01-01 | 0.434444 | 82.079365 | 74.800000 | 89.000000 | 27.821869 |
1999-01-01 | 0.180000 | 80.918644 | 74.660944 | 87.390756 | 27.177024 |
2000-01-01 | 0.600000 | 82.130841 | 75.945205 | 89.471429 | 27.850467 |
2001-01-01 | 0.355000 | 80.306513 | 74.822485 | 87.628272 | 26.836952 |
2002-01-01 | 0.287826 | 80.096774 | 74.654545 | 87.828125 | 26.720430 |
2003-01-01 | 0.000000 | 80.042373 | 74.825000 | 86.714286 | 26.690207 |
2004-01-01 | 0.515000 | 80.978378 | 73.367347 | 86.566038 | 27.210210 |
2005-01-01 | 1.528947 | 82.657534 | 75.000000 | 87.800000 | 28.143075 |
2006-01-01 | 1.132500 | 82.964413 | 72.446809 | 88.773913 | 28.313563 |
2007-01-01 | 0.327222 | 81.189189 | 71.200000 | 90.605839 | 27.327327 |
2008-01-01 | 0.426786 | 79.699153 | 73.976190 | 88.909091 | 26.499529 |
2009-01-01 | 1.027674 | 80.545961 | 74.951807 | 88.940367 | 26.969978 |
2010-01-01 | 0.424286 | 81.064246 | 76.197802 | 90.109023 | 27.257914 |
2011-01-01 | 0.559200 | 80.134247 | 74.276190 | 89.977941 | 26.741248 |
2012-01-01 | 0.366170 | 79.814085 | 74.222222 | 88.806084 | 26.563380 |
2013-01-01 | 0.371667 | 80.023952 | 74.917160 | 88.718750 | 26.679973 |
2014-01-01 | 0.663846 | 80.516014 | 74.470588 | 88.398496 | 26.953341 |
2015-01-01 | 0.747059 | 80.939799 | 74.524064 | 89.641221 | 27.188777 |
2016-01-01 | 0.528000 | 81.918033 | 75.329502 | 90.454183 | 27.732240 |
2017-01-01 | 0.216136 | 81.048110 | 75.036585 | 89.375635 | 27.248950 |
2018-01-01 | 0.485000 | 81.589385 | 75.085470 | 89.979839 | 27.549659 |
2019-01-01 | 0.364203 | 81.428177 | 75.109804 | 89.265116 | 27.460098 |
2020-01-01 | 0.264894 | 81.691667 | 74.704167 | 90.634259 | 27.606481 |
2021-01-01 | 0.257750 | 82.073239 | 75.386555 | 90.264706 | 27.818466 |
2022-01-01 | 0.346774 | 81.512465 | 74.736220 | 89.918182 | 27.506925 |
2023-01-01 | 0.362600 | 82.098315 | 75.263158 | 90.820961 | 27.832397 |
2024-01-01 | 0.235000 | 85.140000 | 77.537037 | 95.564516 | 29.522222 |
Plotting the Resampled Data
# Plotting mean annual temperature values
# Plotting the data using .plot
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4)) # Create a figure with two subplots
# Plot the first dataframe
murtala_df.plot(
y='TCel',
title='UWM-Mean Annual Temperature',
xlabel='Date',
ylabel='Temp in (C)',
ax=axes[0] # This tells the plot to use the first subplot
)
# Plot the second dataframe
murtala_annual_mean_df.plot(
y='TCel',
title='UWM-Mean Annual Temperature',
xlabel='Date',
ylabel='Temp in (C)',
ax=axes[1] # This tells the plot to use the second subplot
)
fig.tight_layout() # Adjusts plot to ensure everything fits without overlap
plt.show() # Display the plots
Display Interactive Map of Location¶
# Search for United Tribes Technical College
mmia_gdf = osm.features_from_address(
'Murtala Mohammed International Airport, Ikeja, Lagos, Nigeria',
{'operator': ['Nigerian Ministry of Aviation']})
mmia_gdf
geometry | nodes | aerodrome | aerodrome:type | aeroway | closest_town | ele | iata | icao | name | name:en | name_1 | operator | ref | source | wikidata | wikipedia | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
element_type | osmid | |||||||||||||||||
way | 370666112 | POLYGON ((3.30331 6.55708, 3.30309 6.55765, 3.... | [3743576805, 5325733536, 7249137481, 532573353... | international | international;public | aerodrome | Lagos | 40 | LOS | DNMM | Murtala Mohammed International Airport | Murtala Mohammed International Airport | Murtala Muhammed | Nigerian Ministry of Aviation | LOS | wikipedia | Q1043631 | en:Murtala Mohammed International Airport |
mmia_gdf.plot()
<Axes: >
# Plot UTTC boundary
mmia_map = mmia_gdf.hvplot(
# Givethe map a descriptive title
title="Murtala Mohammed International Airport, Ikeja, Lagos, Nigeria",
# Add a basemap
geo=True, tiles='EsriImagery',
# Change the colors
fill_color='white', fill_alpha=0.2,
line_color='skyblue', line_width=5,
# Change the image size
frame_width=400, frame_height=400)
# Save the map as a file to put on the web
hv.save(mmia_map, 'mmia11.html')
# Display the map
mmia_map
WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='p1131', ...)
Discuss Results from Plots¶
Murtala, Nigeria experienced a significant rise in temperature from year 2020 -2024¶
The Annual mean temperature in Murtala Nigeria has been between 26.5 degree celcius and 28.5 degree celcius in the last 40 years. However,in 2020-2024 there was a significant rise in temperature to about 29.5 degree celcius. Which means 2020-2024 has been the hottest season so far. This can be attributed to global warming as the whole world is getting warmer as time passes.
Converting into Markdown file to link with my GitHub bio page¶
# Convert .ipynb file to .html file
!jupyter nbconvert new_notebook_for_reproducible_science.ipynb --to html
[NbConvertApp] Converting notebook new_notebook_for_reproducible_science.ipynb to html [NbConvertApp] WARNING | Alternative text is missing on 4 image(s). [NbConvertApp] Writing 525657 bytes to new_notebook_for_reproducible_science.html