{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# NYC measles cases by age (2018 - 2019)\n", "\n", "* This Jupyter/Python notebook creates a bar chart of the 2018-2019 NYC measles outbreak cases by age group.\n", "* This notebook is part of the [Visualizing the 2019 Measles Outbreak](https://carlos-afonso.github.io/measles/) open-source GitHub project.\n", "* [Carlos Afonso](https://www.linkedin.com/in/carlos-afonso-w/), November 6, 2019." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from datetime import datetime\n", "import matplotlib.pyplot as plt\n", "import os\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read and show the data\n", "\n", "The data was manually collected from the [NYC Health Measles webpage](https://www1.nyc.gov/site/doh/health/health-topics/measles.page) and saved as a CSV file. This manual approach was used because the data is small." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Start DateEnd DateUnder 1 year1 to 4 years5 to 17 years18 years and overTotal
02018-09-012019-08-19102277146124649
\n", "
" ], "text/plain": [ " Start Date End Date Under 1 year 1 to 4 years 5 to 17 years \\\n", "0 2018-09-01 2019-08-19 102 277 146 \n", "\n", " 18 years and over Total \n", "0 124 649 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Set (relative) path to the CSV data file\n", "data_file = os.path.join('..', 'data', 'nyc-health', 'final', 'nyc-measles-cases-by-age.csv')\n", "\n", "# Import data from the CSV file as a pandas dataframe\n", "df = pd.read_csv(data_file)\n", "\n", "# Show the data\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract context information" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to show the start and end dates in the plot, to provide context. We use just the month information for consistency across the other data visualizations, especially the \"NYC new case by month\"." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Sep 2018', 'Aug 2019']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Notes about the lambda function below:\n", "# - 1. strptime transforms the raw date string to a datetime object\n", "# - 2. strftime transforms the datetime object to a nicelly formatted date string\n", "[start_month, end_month] = map(\n", " lambda x: datetime.strptime(x, '%Y-%m-%d').strftime('%b %Y'),\n", " df.iloc[0, :2]\n", ")\n", "\n", "# Show the nicelly formated date strings\n", "[start_month, end_month]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also want to show the total number of cases in the plot, to provide context." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "649" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the number of total cases\n", "total_cases = df.iloc[0, -1]\n", "\n", "# Check if there is a problem with the data where the reported total\n", "# does not match the sum of the number of cases for each age group\n", "if total_cases != df.iloc[0, 2:-1].sum():\n", " print('WARNING: cases for each age group do NOT add up to the reported total!')\n", "\n", "# Show the total cases\n", "total_cases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract the data to plot" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Under 1 year 102\n", "1 to 4 years 277\n", "5 to 17 years 146\n", "18 years and over 124\n", "Name: 0, dtype: object" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract the data to plot\n", "data_to_plot = df.iloc[-1, 2:-1]\n", "\n", "# Show the data to plot\n", "data_to_plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create default bar chart" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "default_fig = plt.figure()\n", "ax = data_to_plot.plot.bar()\n", "plt.title('NYC measles cases by age group')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save default bar chart" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Set image file path/name (without file extension)\n", "img_file = os.path.join('..', 'images', 'nyc-measles-cases-by-age-bar-chart-default')\n", "\n", "# Save as PNG image\n", "default_fig.savefig(img_file + '.png', bbox_inches='tight', dpi=200)\n", "\n", "# Save as SVG image\n", "default_fig.savefig(img_file + '.svg', bbox_inches='tight')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create improved bar chart\n", "\n", "We want the bar chart to be clear and to contain the necessary context.\n", "\n", "To contextualize the bar chart we:\n", "* use a title that explictly says what the bar chart represents;\n", "* add text annotations that provides information about:\n", " * the start and end dates, \n", " * the total number of cases during that period, and\n", " * the data and image sources.\n", "\n", "To make the bar chart as clear as possible we:\n", "* use an horizontal bar chart because it is easier to read than a vertical one;\n", "* explicitly show the number and percentage of cases for each age group;\n", "* use a large enough font to make all labels easy to read;\n", "* remove unnecessary elements (x-axis ticks and values, y-axis ticks, and plot box)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Define font parameters\n", "fn = 'Arial' # font name\n", "fsb = 18 # font size base\n", "\n", "# Create figure\n", "fig = plt.figure()\n", "\n", "# Add figure title\n", "#fig.suptitle('NYC measles cases by age group', fontname=fn, fontsize=(fsb + 6))\n", "plt.title('NYC measles cases by age group', fontname=fn, fontsize=(fsb + 6))\n", "\n", "# Create the horizontal bar chart\n", "ax = data_to_plot.plot.barh(alpha=0.3, color='red', width=0.8)\n", "\n", "# Invert the y-axis\n", "ax.invert_yaxis()\n", "\n", "# Remove the x-axis ticks and values\n", "ax.get_xaxis().set_ticks([])\n", "\n", "# Remove the y-axis ticks only (keep the labels)\n", "ax.yaxis.set_ticks_position('none')\n", "\n", "# Set the y-axis labels font properties\n", "ax.set_yticklabels(data_to_plot.keys(), fontname=fn, fontsize=fsb)\n", "\n", "# Create labels in front of the bars showing the number and percentage of cases.\n", "# Note: we round the percentages to the nearest integer.\n", "for i in ax.patches:\n", " label = str(i.get_width()) + \" (\" + str(int(round(100 * i.get_width() / total_cases))) + \"%)\"\n", " ax.text(i.get_width() + 5, i.get_y() + 0.5, label, fontname=fn, fontsize=fsb)\n", "\n", "# Remove the axes box\n", "plt.box(False)\n", "\n", "# Add note about the total cases\n", "text = str(total_cases) + ' total confirmed cases from ' + start_month + ' to ' + end_month\n", "fig.text(0.5, 0.0, text, fontname = fn, fontsize = (fsb - 2), horizontalalignment='center')\n", "\n", "# Add note about the end of the outbreak\n", "text = 'Community transmission was declared over on Sep 3, 2019'\n", "fig.text(0.5, -0.1, text, fontname = fn, fontsize = (fsb - 2), horizontalalignment='center')\n", "\n", "# Add note about the Data and Image sources\n", "sources = 'Data: NYC Health, Image: carlos-afonso.github.io/measles'\n", "fig.text(0.5, -0.2, sources, fontname='Lucida Console', fontsize=(fsb - 4), horizontalalignment='center')\n", "\n", "# Show figure\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save improved bar chart" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Set image file path/name (without file extension)\n", "img_file = os.path.join('..', 'images', 'nyc-measles-cases-by-age-bar-chart')\n", "\n", "# Save as PNG image\n", "fig.savefig(img_file + '.png', bbox_inches='tight', dpi=200)\n", "\n", "# Save as SVG image\n", "fig.savefig(img_file + '.svg', bbox_inches='tight')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Export notebook as HTML" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Export this notebook as a static HTML page\n", "os.system('jupyter nbconvert --to html nyc-measles-cases-by-age-final.ipynb')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }