415 words

Introduction

The Irish government has made available several datasets relating to the COVID-19 pandemic. These are available at the Ireland’s COVID-19 Data Hub.

This example looks at how you can download and produce visualisations of the daily case numbers using a Python script. The output from the script is a HTML file (covid-19-ie-cases.html) containing a table of cases with calculated 7 and 14 day averages over the last 14 days and a plot of those values.

Libraries

The following libraries are used in the script:

  • requests - allows sending HTTP requests through Python, which will be used to obtain the dataset from the internet.
  • pandas - open-source Python library that provides powerful data structures and data analysis tools to deal with datasets.
  • matplotlib.pyplot - used for data visualisation.
  • matplotlib.dates - used for date formatting on data visualisations.
  • jinja2 - web template engine for the Python.
import requests
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import jinja2

Define Constants

# Constants
POPULATION = 4757976 #2016 Census
DAYS = 14
DATA_URL = 'https://opendata.arcgis.com/datasets/d8eb52d56273413b84b0187a4e9117be_0.csv'
DATA_FILE = 'covid-19-ie-cases.csv'
HTML_FILE = "covid-19-ie-cases.html"
PNG_FILE = 'covid-19-ie-cases.png'
TEMPLATE_FILE = 'template.html'

Fetch Data

Download the latest version of the CSV file and load it into a Pandas DataFrame.

# Get latest data in csv format
data_content = requests.get(DATA_URL).content
csv_file = open(DATA_FILE, 'wb')
csv_file.write(data_content)
csv_file.close()
df = pd.read_csv(DATA_FILE)

Calculations

Calculate 7 and 14 day averages and numbers per 100,000 population.

df['Num7DayAverage'] = df['ConfirmedCovidCases'].rolling(7).mean()
df['Num14DayAverage'] = df['ConfirmedCovidCases'].rolling(14).mean()
df['Num7DayPer100K'] = df['ConfirmedCovidCases'].rolling(7).sum() / POPULATION * 100000
df['Num14DayPer100K'] = df['ConfirmedCovidCases'].rolling(14).sum() / POPULATION * 100000

df['Is7DayAverageRising'] = df['Num7DayAverage'].pct_change() > 0
df['Is14DayAverageRising'] = df['Num14DayAverage'].pct_change() > 0
df['Is7DayPer100KRising'] = df['Num7DayPer100K'].pct_change() > 0
df['Is14DayPer100KRising'] = df['Num14DayPer100K'].pct_change() > 0

Format Data

# Format columns
df['Date'] = pd.to_datetime(df['Date']).dt.date
df['Num7DayAverage'] = df['Num7DayAverage'].round(0).astype(pd.Int64Dtype())
df['Num14DayAverage'] = df['Num14DayAverage'].round(0).astype(pd.Int64Dtype())
df['Num7DayPer100K'] = df['Num7DayPer100K'].round(0).astype(pd.Int64Dtype())
df['Num14DayPer100K'] = df['Num14DayPer100K'].round(0).astype(pd.Int64Dtype())

Filter Data

# Filter data for table
df = df.tail(DAYS)

Generate HTML file

jinja2 uses template.html to format the data from the DataFrame using Bootstrap. It also creates a placeholder for the plot that will be created later in the script.

# Load template
templateLoader = jinja2.FileSystemLoader(searchpath="./")
templateEnv = jinja2.Environment(loader=templateLoader)
template = templateEnv.get_template(TEMPLATE_FILE)

# Convert DataFrame to dictionary
rows = (
    df
    .to_dict(orient='records')
)[:DAYS]

# Write HTML file
file = open(HTML_FILE, "w")
file.write (template.render(days=DAYS, rows=rows))
file.close()

Plot Data

Plot data and save as a .png file that is included in the HTML template.

ax = plt.gca()
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
plt.ylabel('Cases/100K')
df.plot(kind='line',x='Date',y='Num7DayPer100K',ax=ax)
df.plot(kind='line',x='Date',y='Num14DayPer100K', ax=ax)
ax.legend(["7 Day Moving Average", "14 Day Moving Average"])
plt.savefig(PNG_FILE)

Full code for this example available on GitHub here.