ESIIL API Project - Get started with open reproducible science!¶

  1. What is Open Reproducible Science? Open Reproducible science is simply the science of open communication and collaboration, and reproduction in research. Open communication in the sense of sharing data and processes used in the research, and open collaboration in the sense of working together with other researchers to ensure that knowledge continues to advance and science continues to evolve, open reproduction in the sense that fellow researchers can follow the process you used in achieving your results to achieve the same results.

  2. Example: GitHub : Github helps researchers to be able to track the processes in data exploration and analysis. with the version control in github fellow researches can track changes made to various data sets and results in repositories. This enables them to be able to reproduce the data.

What is Machine Readable Name - A machine-readable name refers to a name that is formatted in a way that can be easily processed and understood by computer systems. In the context of data management and organization, machine-readable names are typically designed to adhere to certain conventions or standards to ensure consistency and interoperability across different software platforms and databases.

Machine-readable names often avoid special characters, spaces, and other formatting elements that may cause issues when processing or parsing the names programmatically. Instead, they may use underscores, hyphens, camel case, or other conventions to separate words and make the names more readable for machines. Renamed the filename from "Get Started with Open Reproducible Science!.ipynb" to "getting_started_with_open_reproducible_science.ipynb" for easier understanding.

Here are some suggestions for creating readable, well-documented scientific workflows that are easier to reproduce¶

Things to do to write a clean code:

  1. Go through the code and look for errors.
  2. Make sure you label your processes with a comment.
  3. Make sure you run your code to confirm if there are errors in it.

Advantages of doing these things:

  1. You are able to remove and add lines of code to ensure the code runs properly.
  2. You are able to ensure that those who read your code understand it perfectly.
  3. You are able to confirm in the present that your code contains no errors instead of waiting till you have a lot of code written already.

Getting Started with the Project¶

This project focuses on utilizing the National Centers for Environmental Information (NCEI) Access Data Service, which offers a RESTful application programming interface (API). This API allows users to access and subset data by applying a specific set of parameters to the version 1 (v1) URL. More information can be found here: https://www.ncei.noaa.gov/support/access-data-service-api-user-documentation

The Global Historical Climatology Network - Daily (GHCNd) gathers daily weather information from around the world. It's managed by NOAA, which is part of the U.S. government. The data includes things like how hot it gets each day, and it's measured mostly in degrees Celsius. People collect this information using different tools, like weather stations and satellites.

Citation: Menne, M.J., I. Durre, B. Korzeniewski, S. McNeal, K. Thomas, X. Yin, S. Anthony, R. Ray, R.S. Vose, B.E.Gleason, and T.G. Houston, 2012: Global Historical Climatology Network - Daily (GHCNd) [ed.], Version 3. Available online at https://www.ncdc.noaa.gov/ghcn-daily. NOAA National Climatic Data Center.

http://doi.org/10.7289/V5D21VHZ

[20/04/2024].

Programming Part Begins¶

In [1]:
# Import required libraries
import matplotlib.pyplot as plt
import pandas as pd

# Work with vector data
import geopandas as gpd

# Save maps and plots to files
import holoviews as hv
# Create interactive maps and plots
import hvplot.pandas

# Search for locations by name - this might take a moment
from osmnx import features as osm

Downloading daily summaries from the Murtala Mohammed station, Nigeria, from 1973 to the present, to analyze precipitation and temperature trends using APIs from the National Centers for Environmental Information (NCEI), a part of the National Oceanic and Atmospheric Administration (NOAA) website.

In [2]:
# Get Climate Data API
ncei_api_url = (
'https://www.ncei.noaa.gov/access/services/data/v1'
'?dataset=daily-summaries'
'&dataTypes=TAVG,TMIN,TMAX,PRCP&stations=NIM00065201'
'&startDate=1973-02-06'
'&endDate=2024-04-13'
'&includeStationName=true'
'&includeStationLocation=1'
'&units=standard')
ncei_api_url
Out[2]:
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TAVG,TMIN,TMAX,PRCP&stations=NIM00065201&startDate=1973-02-06&endDate=2024-04-13&includeStationName=true&includeStationLocation=1&units=standard'
In [3]:
# Download data into Murtala data frame
murtala_df = pd.read_csv(
  ncei_api_url,
  index_col='DATE',
  parse_dates=True,
  na_values=['NaN'])

# Display Dataframe Data
murtala_df
Out[3]:
STATION NAME LATITUDE LONGITUDE ELEVATION PRCP TAVG TMAX TMIN
DATE
1973-02-06 NIM00065201 MURTALA MUHAMMED, NI 6.577 3.321 41.1 NaN 88 NaN NaN
1973-05-18 NIM00065201 MURTALA MUHAMMED, NI 6.577 3.321 41.1 NaN 80 NaN NaN
1973-05-30 NIM00065201 MURTALA MUHAMMED, NI 6.577 3.321 41.1 NaN 81 NaN NaN
1973-06-03 NIM00065201 MURTALA MUHAMMED, NI 6.577 3.321 41.1 NaN 82 NaN NaN
1973-08-26 NIM00065201 MURTALA MUHAMMED, NI 6.577 3.321 41.1 NaN 81 NaN NaN
... ... ... ... ... ... ... ... ... ...
2024-04-09 NIM00065201 MURTALA MUHAMMED, NI 6.577 3.321 41.1 NaN 86 94.0 NaN
2024-04-10 NIM00065201 MURTALA MUHAMMED, NI 6.577 3.321 41.1 NaN 86 97.0 80.0
2024-04-11 NIM00065201 MURTALA MUHAMMED, NI 6.577 3.321 41.1 NaN 87 97.0 79.0
2024-04-12 NIM00065201 MURTALA MUHAMMED, NI 6.577 3.321 41.1 NaN 86 NaN NaN
2024-04-13 NIM00065201 MURTALA MUHAMMED, NI 6.577 3.321 41.1 NaN 88 95.0 NaN

7563 rows × 9 columns

In [4]:
# Check that the data was imported into a pandas DataFrame
type(murtala_df)
Out[4]:
pandas.core.frame.DataFrame
In [5]:
# Clean up the dataframe
murtala_df = murtala_df[['PRCP', 'TAVG','TMIN','TMAX']]
murtala_df
Out[5]:
PRCP TAVG TMIN TMAX
DATE
1973-02-06 NaN 88 NaN NaN
1973-05-18 NaN 80 NaN NaN
1973-05-30 NaN 81 NaN NaN
1973-06-03 NaN 82 NaN NaN
1973-08-26 NaN 81 NaN NaN
... ... ... ... ...
2024-04-09 NaN 86 NaN 94.0
2024-04-10 NaN 86 80.0 97.0
2024-04-11 NaN 87 79.0 97.0
2024-04-12 NaN 86 NaN NaN
2024-04-13 NaN 88 NaN 95.0

7563 rows × 4 columns

Plotting the Precpitation column (PRCP), Average Temperature( TAVG), Minimum Temperature (TMIN), and Maximum Temperature (TMAX) vs Time to explore the data¶

In [6]:
# Plot your data frame 
murtala_df.plot()
Out[6]:
<Axes: xlabel='DATE'>
No description has been provided for this image

Plotting the only the Precpitation column (PRCP) vs Time to explore the data¶

In [7]:
# Plot the PRCPC data using .plot
murtala_df.plot(
    y='PRCP',
    title='DAILY PRECIPITATION IN MURTALA MOHAMMED',
    xlabel='Year of record',
    ylabel='Precipitation')
Out[7]:
<Axes: title={'center': 'DAILY PRECIPITATION IN MURTALA MOHAMMED'}, xlabel='Year of record', ylabel='Precipitation'>
No description has been provided for this image

Plotting the only the Average Temperature (TAVG) vs Time to explore the data¶

In [8]:
# Plot the TAVG data using .plot
murtala_df.plot(
    y='TAVG',
    title='AVERAGE TEMPERATURE IN MURTALA MOHAMMED',
    xlabel='Year of record',
    ylabel='Average Temperature')
Out[8]:
<Axes: title={'center': 'AVERAGE TEMPERATURE IN MURTALA MOHAMMED'}, xlabel='Year of record', ylabel='Average Temperature'>
No description has been provided for this image

Converting Temperature to Celsius

In [9]:
# Convert Temperature to celcius
murtala_df.loc[:, 'TCel'] = (murtala_df['TAVG'] - 32 ) * 5 / 9

# Display Df with TCel Column
murtala_df
/tmp/ipykernel_3893/2395632380.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  murtala_df.loc[:, 'TCel'] = (murtala_df['TAVG'] - 32 ) * 5 / 9
Out[9]:
PRCP TAVG TMIN TMAX TCel
DATE
1973-02-06 NaN 88 NaN NaN 31.111111
1973-05-18 NaN 80 NaN NaN 26.666667
1973-05-30 NaN 81 NaN NaN 27.222222
1973-06-03 NaN 82 NaN NaN 27.777778
1973-08-26 NaN 81 NaN NaN 27.222222
... ... ... ... ... ...
2024-04-09 NaN 86 NaN 94.0 30.000000
2024-04-10 NaN 86 80.0 97.0 30.000000
2024-04-11 NaN 87 79.0 97.0 30.555556
2024-04-12 NaN 86 NaN NaN 30.000000
2024-04-13 NaN 88 NaN 95.0 31.111111

7563 rows × 5 columns

Next, Subsetting and Resampling

In [10]:
# Subset the data
murtala_1990_2024_df= murtala_df.loc['1990':'2024']
murtala_1990_2024_df
Out[10]:
PRCP TAVG TMIN TMAX TCel
DATE
1991-04-13 NaN 84 NaN NaN 28.888889
1991-10-04 NaN 79 NaN NaN 26.111111
1994-02-03 NaN 83 NaN NaN 28.333333
1994-02-24 NaN 83 NaN NaN 28.333333
1994-04-15 0.16 80 NaN NaN 26.666667
... ... ... ... ... ...
2024-04-09 NaN 86 NaN 94.0 30.000000
2024-04-10 NaN 86 80.0 97.0 30.000000
2024-04-11 NaN 87 79.0 97.0 30.555556
2024-04-12 NaN 86 NaN NaN 30.000000
2024-04-13 NaN 88 NaN 95.0 31.111111

7475 rows × 5 columns

Getting into Action: Calculating Annual Statistics

In [11]:
# Resample the data to look at yearly mean values
murtala_annual_mean_df = murtala_1990_2024_df.resample('YS').mean()
murtala_annual_mean_df
Out[11]:
PRCP TAVG TMIN TMAX TCel
DATE
1991-01-01 NaN 81.500000 NaN NaN 27.500000
1992-01-01 NaN NaN NaN NaN NaN
1993-01-01 NaN NaN NaN NaN NaN
1994-01-01 0.160000 82.000000 NaN NaN 27.777778
1995-01-01 NaN NaN NaN NaN NaN
1996-01-01 NaN NaN NaN NaN NaN
1997-01-01 NaN NaN NaN NaN NaN
1998-01-01 0.434444 82.079365 74.800000 89.000000 27.821869
1999-01-01 0.180000 80.918644 74.660944 87.390756 27.177024
2000-01-01 0.600000 82.130841 75.945205 89.471429 27.850467
2001-01-01 0.355000 80.306513 74.822485 87.628272 26.836952
2002-01-01 0.287826 80.096774 74.654545 87.828125 26.720430
2003-01-01 0.000000 80.042373 74.825000 86.714286 26.690207
2004-01-01 0.515000 80.978378 73.367347 86.566038 27.210210
2005-01-01 1.528947 82.657534 75.000000 87.800000 28.143075
2006-01-01 1.132500 82.964413 72.446809 88.773913 28.313563
2007-01-01 0.327222 81.189189 71.200000 90.605839 27.327327
2008-01-01 0.426786 79.699153 73.976190 88.909091 26.499529
2009-01-01 1.027674 80.545961 74.951807 88.940367 26.969978
2010-01-01 0.424286 81.064246 76.197802 90.109023 27.257914
2011-01-01 0.559200 80.134247 74.276190 89.977941 26.741248
2012-01-01 0.366170 79.814085 74.222222 88.806084 26.563380
2013-01-01 0.371667 80.023952 74.917160 88.718750 26.679973
2014-01-01 0.663846 80.516014 74.470588 88.398496 26.953341
2015-01-01 0.747059 80.939799 74.524064 89.641221 27.188777
2016-01-01 0.528000 81.918033 75.329502 90.454183 27.732240
2017-01-01 0.216136 81.048110 75.036585 89.375635 27.248950
2018-01-01 0.485000 81.589385 75.085470 89.979839 27.549659
2019-01-01 0.364203 81.428177 75.109804 89.265116 27.460098
2020-01-01 0.264894 81.691667 74.704167 90.634259 27.606481
2021-01-01 0.257750 82.073239 75.386555 90.264706 27.818466
2022-01-01 0.346774 81.512465 74.736220 89.918182 27.506925
2023-01-01 0.362600 82.098315 75.263158 90.820961 27.832397
2024-01-01 0.235000 85.140000 77.537037 95.564516 29.522222

Plotting the Resampled Data

In [12]:
# Plotting mean annual temperature values
# Plotting the data using .plot

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))  # Create a figure with two subplots

# Plot the first dataframe
murtala_df.plot(
    y='TCel',
    title='UWM-Mean Annual Temperature',
    xlabel='Date',
    ylabel='Temp in (C)',
    ax=axes[0]  # This tells the plot to use the first subplot
)

# Plot the second dataframe
murtala_annual_mean_df.plot(
    y='TCel',
    title='UWM-Mean Annual Temperature',
    xlabel='Date',
    ylabel='Temp in (C)',
    ax=axes[1]  # This tells the plot to use the second subplot
)

fig.tight_layout()  # Adjusts plot to ensure everything fits without overlap
plt.show()  # Display the plots
No description has been provided for this image

Display Interactive Map of Location¶

In [13]:
# Search for United Tribes Technical College
mmia_gdf = osm.features_from_address(
    'Murtala Mohammed International Airport, Ikeja, Lagos, Nigeria',
    {'operator': ['Nigerian Ministry of Aviation']})
mmia_gdf
Out[13]:
geometry nodes aerodrome aerodrome:type aeroway closest_town ele iata icao name name:en name_1 operator ref source wikidata wikipedia
element_type osmid
way 370666112 POLYGON ((3.30331 6.55708, 3.30309 6.55765, 3.... [3743576805, 5325733536, 7249137481, 532573353... international international;public aerodrome Lagos 40 LOS DNMM Murtala Mohammed International Airport Murtala Mohammed International Airport Murtala Muhammed Nigerian Ministry of Aviation LOS wikipedia Q1043631 en:Murtala Mohammed International Airport
In [14]:
mmia_gdf.plot()
Out[14]:
<Axes: >
No description has been provided for this image
In [16]:
# Plot UTTC boundary
mmia_map = mmia_gdf.hvplot(
    # Givethe map a descriptive title
    title="Murtala Mohammed International Airport, Ikeja, Lagos, Nigeria",
    # Add a basemap
    geo=True, tiles='EsriImagery',
    # Change the colors
    fill_color='white', fill_alpha=0.2,
    line_color='skyblue', line_width=5,
    # Change the image size
    frame_width=400, frame_height=400)

# Save the map as a file to put on the web
hv.save(mmia_map, 'mmia11.html')

# Display the map
mmia_map
WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='p1131', ...)
Out[16]:

Discuss Results from Plots¶

Murtala, Nigeria experienced a significant rise in temperature from year 2020 -2024¶

The Annual mean temperature in Murtala Nigeria has been between 26.5 degree celcius and 28.5 degree celcius in the last 40 years. However,in 2020-2024 there was a significant rise in temperature to about 29.5 degree celcius. Which means 2020-2024 has been the hottest season so far. This can be attributed to global warming as the whole world is getting warmer as time passes.

Converting into Markdown file to link with my GitHub bio page¶

In [ ]:
# Convert .ipynb file to .html file
!jupyter nbconvert new_notebook_for_reproducible_science.ipynb --to html
[NbConvertApp] Converting notebook new_notebook_for_reproducible_science.ipynb to html
[NbConvertApp] WARNING | Alternative text is missing on 4 image(s).
[NbConvertApp] Writing 525657 bytes to new_notebook_for_reproducible_science.html