NYC measles cases by neighborhood (2018 - 2019)¶

This Jupyter/Python notebook creates a geo/map visualization of 2018-2019 NYC measles outbreak cases by neighborhood.
This notebook is part of the Visualizing the 2019 Measles Outbreak open-source GitHub project.
Carlos Afonso, November 6, 2019

Import libraries¶

from bokeh.io import export_png, export_svgs
from bokeh.models import ColumnDataSource, LabelSet, Range1d, Title
from bokeh.plotting import figure, output_notebook, show
from bokeh.tile_providers import get_provider, Vendors

from datetime import datetime

import math
import os
import pandas as pd
import re

Notes about additional bokeh.io dependencies:

In order to use the export() functions, users have to install some additional dependencies. These dependencies can be installed via conda:

conda install selenium phantomjs pillow

Note that the minimum compatible version of PhantomJS is 2.1.
Reference: https://bokeh.pydata.org/en/latest/docs/user_guide/export.html

Read all data¶

This section reads all the necessary data, namelly the measles by neighborhood data and the neighborhood coordinates data.

The measles by neighborhood data has been manually collected from the NYC Health Measles webpage and saved in a CSV file.

The neighborhood coordinates data was collected with simple google queries (e.g., "Williamsburg, NY coordinates") and saved in a CSV file.

# Define data files paths
measles_data_file = os.path.join('..', 'data', 'nyc-health', 'final', 'nyc-measles-confirmed-cases-by-neighborhood.csv')
coordinates_data_file = os.path.join('..', 'data', 'nyc-neighborhoods-coordinates.csv')

# Read measles data
if os.path.exists(measles_data_file):    
    measles_df = pd.read_csv(measles_data_file)
    print('Done: read the measles data')
else:
    print('ERROR: measles data file not found:', measles_data_file)

# Read coordinates data
if os.path.exists(coordinates_data_file):    
    coordinates_df = pd.read_csv(coordinates_data_file)
    print('Done: read the coordinates data')
else:
    print('ERROR: coordinates data file not found:', measles_data_file)

Done: read the measles data
Done: read the coordinates data

Check measles data¶

The measles data is shown below. These are the details about the two data columns:

neighborhood: neighborhood name
confirmed cases (start_date to end_date): number of confirmed measles cases in that neighborhood during the outbreak (from start_date to update_date)

Note that some of the data entries/rows are for a set of neighborhoods (e.g., "Brighton Beach/Coney Island"), because this is how NYC Health reports the data.

Also, note that the last row is a TOTAL count.

# Show all the measles data
measles_df

Check coordinates data¶

The neighborhood coordinates data is shown below, and it contains three simple columns: neighborhood name, latitute and longitude.

Note that some of the data entries/rows are for a set of neighborhoods (e.g., "Brighton Beach/Coney Island"). This is because NYC Health reports the measles data using those neighborhood sets. The latitude/longitude of a set of neighborhoods is the average of the individual neighborhoods latitudes/longitudes.

# Show the first rows of the coordinates data
coordinates_df.head()

Extract reference dates¶

Extract reference dates from the original column names, and transform them into nicelly formatted strings.

measles_original_cols = measles_df.columns

measles_original_cols

Index(['neighborhood', 'confirmed cases (2018-09-01 to 2019-08-19)'], dtype='object')

[start_month, end_month] = list(map(
    lambda x: datetime.strptime(x, '%Y-%m-%d').strftime('%b %Y'), 
    re.findall(r'\d{4}-\d{2}-\d{2}', measles_original_cols[1])
))

[start_month, end_month]

['Sep 2018', 'Aug 2019']

Rename measles data columns¶

Three of the columns names contain useful date information which we will want to shown in the visualizations, to provide context. However, that makes those column names very long/verbose. So, now that we already saved the original column names, let's rename them with shorter names, to make it easier to reference them.

measles_new_cols = ['neighborhood', 'cases']

measles_df.columns = measles_new_cols

measles_df.head()

Extract total cases¶

# Extract the reported total cases
total_cases = measles_df[measles_df['neighborhood'] == 'TOTAL']['cases'].values[0]

# Show the total cases
total_cases

649

# Confirm the total cases is correct
measles_df[measles_df['neighborhood'] != 'TOTAL']['cases'].sum()

649

Merge measles and coordinates data¶

# Merge the measles and coordinates data into a single dataframe
df = pd.merge(measles_df, coordinates_df, on='neighborhood')

# Show the data
df

Add Mercator coordinates¶

The cell below defines functions to transfrom latitude and longitude to the corresponding Mercator X and Y coordinates.

Note: the functions below were inspired by the code and comments in this post: https://gis.stackexchange.com/questions/156035/calculating-mercator-coordinates-from-lat-lon

# Function to calculate the Mercator X coordinate for a given longitude
def mercator_x(lon):
    r_major = 6378137.0
    x = r_major * math.radians(lon)
    return x

# Function to calculate the Mercator Y coordinate for a given latitude and longitude
def mercator_y(lat, lon):
    x = mercator_x(lon)
    scale = x/lon
    y = 180.0/math.pi * math.log(math.tan(math.pi/4.0 + lat * (math.pi/180.0)/2.0)) * scale
    return y

# Function to calculate the Mercator X and Y coordinates for a given latitude and longitude
def mercator_x_y(lat, lon):
    x = mercator_x(lon)
    y = mercator_y(lat, lon)
    return [x, y]

#def mercator_x_y(lat, lon):
#    r_major = 6378137.0
#    x = r_major * math.radians(lon)
#    scale = x/lon
#    y = 180.0/math.pi * math.log(math.tan(math.pi/4.0 + lat * (math.pi/180.0)/2.0)) * scale
#    return [x, y]

Test the mercator_x_y function for the NYC coordinates:

# NYC Mercator X and Y coordinates range:
# nyc_mercator_x_range = (-8242000, -8210000)
# nyc_mercator_y_range = (4965000, 4990000)

# NYC latitude and longitude
nyc_lat = 40.7128
nyc_lon = -74.0060

mercator_x_y(nyc_lat, nyc_lon)

[-8238310.235647004, 4970071.579142425]

Compute the Mercator coordinates for all neighborhoods:

df['mercator_x'] = df['longitude'].apply(mercator_x)
df['mercator_y'] = df.apply(lambda r: mercator_y(r['latitude'], r['longitude']), axis=1)

# Show the beginning of the dataframe to see the new columns (mercator_x and mercator_y)
df.head()

Create map¶

# Function to define the bubble size
def cases_radius(cases):
    if cases < 10:
        return 25
    elif cases < 100:
        return 35
    elif cases < 200:
        return 45
    else:
        return 55

# Calculate the bubble size for all neighborhoods:
df['cases_radius'] = df['cases'].apply(cases_radius)

# Function to define the x-axis offset for the label showing the number of cases
# Used to put the case number labels in the center of the bubbles
def cases_label_x_offset(cases):
    if cases < 10:
        return -6
    elif cases < 100:
        return -12
    elif cases < 200:
        return -14
    else:
        return -16

# Function to define the y-axis offset for the label showing the number of cases
# Used to put the case number labels in the center of the bubbles
def cases_label_y_offset(cases):
    if cases < 10:
        return -8
    elif cases < 100:
        return -10
    elif cases < 200:
        return -10
    else:
        return -10

# Calculate the 'cases_label_x_offset' and 'cases_label_y_offset' for all neighborhoods:
df['cases_label_x_offset'] = df['cases'].apply(cases_label_x_offset)
df['cases_label_y_offset'] = df['cases'].apply(cases_label_y_offset)

# Create new dataframe columns for the x and y offsets for the labels with neighborhood names
df['neighborhood_label_x_offset'] = [0] * len(df)
df['neighborhood_label_y_offset'] = [0] * len(df)

# Manually adjust the 'neighborhood_label_x_offset' and 'neighborhood_label_y_offset' for all neighborhoods
# Place the neighborhood labels in a "nice" position, so that they don't overlap with each other or with the bubbles
cols = ['neighborhood_label_x_offset', 'neighborhood_label_y_offset']
df.loc[df['neighborhood'] == 'Bensonhurst',                 cols] = [-80, -30]
df.loc[df['neighborhood'] == 'Borough Park',                cols] = [-125, -10]
df.loc[df['neighborhood'] == 'Brighton Beach/Coney Island', cols] = [-115, -35]
df.loc[df['neighborhood'] == 'Chelsea/Clinton',             cols] = [-15, 15]
df.loc[df['neighborhood'] == 'Crown Heights',               cols] = [15, -10]
df.loc[df['neighborhood'] == 'Far Rockaway',                cols] = [-80, -30]
df.loc[df['neighborhood'] == 'Flatbush',                    cols] = [15, -10]
df.loc[df['neighborhood'] == 'Flushing',                    cols] = [15, -10]
df.loc[df['neighborhood'] == 'Jamaica',                     cols] = [15, -10]
df.loc[df['neighborhood'] == 'Midwood/Marine Park',         cols] = [-10, -30]
df.loc[df['neighborhood'] == 'Port Richmond',               cols] = [-45, -30]
df.loc[df['neighborhood'] == 'Red Hook',                    cols] = [-85, -10]
df.loc[df['neighborhood'] == 'Sunset Park',                 cols] = [-110, -10]
df.loc[df['neighborhood'] == 'West Queens',                 cols] = [-40, -30]
df.loc[df['neighborhood'] == 'Williamsburg',                cols] = [30, -10]
df.loc[df['neighborhood'] == 'Willowbrook',                 cols] = [-25, -30]

output_notebook()

#tile_provider = get_provider(Vendors.CARTODBPOSITRON)
tile_provider = get_provider(Vendors.CARTODBPOSITRON_RETINA)

# Create figure
p = figure(
    x_axis_type = 'mercator',
    y_axis_type = 'mercator',
    height = 650,
    width = 650
)

# Define figure title
title = Title(
    text = 'NYC measles cases by neighborhood',
    align = 'center',
    text_font_size = '18pt',
    text_font_style = 'normal'
)

# Define contextual notes to add to the figure
note_1 = Title(
    text = str(total_cases) + ' total confirmed cases from ' + start_month + ' to ' + end_month,
    align = 'center',
    text_font_size = '12pt',
    text_font_style = 'normal'
)
note_2 = Title(
    text = 'Community transmission was declared over on Sep 3, 2019',
    align = 'center',
    text_font_size = '12pt',
    text_font_style = 'normal'
)

# Define credits to add to the figure
credits = Title(
    text = 'Data: NYC Health, Map: OpenStreetMap/CartoDB, Image: carlos-afonso.github.io/measles',
    align = 'center',
    text_font = 'consolas',
    text_font_size = '10pt',
    text_font_style = 'normal'
)

# Add the title, notes, and credits to the figure
p.add_layout(title, 'above')
p.add_layout(note_1, 'below')
p.add_layout(note_2, 'below')
p.add_layout(Title(text=' ', text_font_size='0pt'), 'below') # Adds some empty space
p.add_layout(credits, 'below')

# Add tile provider
p.add_tile(tile_provider)

# Make axis invisible
p.axis.visible = False

# Define the dataframe (df) as the data source
source = ColumnDataSource(df)

# Add the bubbles / circles
p.circle(
    x = 'mercator_x',
    y = 'mercator_y',
    alpha = 0.3,
    fill_color = 'red',
    line_color = 'red',
    size = 'cases_radius',
    source = source
)

# Add labels with the number of cases
p.add_layout(LabelSet(
    x = 'mercator_x',
    y = 'mercator_y',
    text = 'cases',
    #level = 'glyph',
    x_offset = 'cases_label_x_offset',
    y_offset = 'cases_label_y_offset',
    source = source,
    render_mode = 'canvas',
    #text_color = 'black',
    text_font_size = '12pt'
))

# Add labels with the neighborhood names
p.add_layout(LabelSet(
    x = 'mercator_x',
    y = 'mercator_y',
    text = 'neighborhood',
    #level = 'glyph',
    x_offset = 'neighborhood_label_x_offset',
    y_offset = 'neighborhood_label_y_offset',
    source = source,
    render_mode = 'canvas',
    #text_color = 'black',
    text_font_size = '12pt'
))

# Remove the toolbar, to remove plot interactivity
# In this case it's better to have just a static image, to keep the focus on the area of interest
p.toolbar_location = None

show(p)

Save map¶

Note: The export_png below produces a relativelly low resolution image. To get a higher resolution PNG image we can use a manual method: open the SVG version on a browser, zoom in/out as needed, take a screenshot, and then crop the schreenshot.

# Define output file name
output_file = os.path.join('..', 'images', 'nyc-measles-cases-by-neighborhood-map-final')

# Save as PNG
export_png(p, output_file + '.png')

# Save as SVG
p.output_backend = 'svg'
export_svgs(p, output_file + '.svg')

['..\\images\\nyc-measles-cases-by-neighborhood-map-final.svg']

Export notebook as HTML¶

# Export this notebook as a static HTML page
os.system('jupyter nbconvert --to html nyc-measles-cases-by-neighborhood-final.ipynb')

0

	neighborhood	latitude	longitude
0	Bensonhurst	40.6139	-73.9922
1	Borough Park	40.6350	-73.9921
2	Brighton Beach	40.5781	-73.9597
3	Coney Island	40.5755	-73.9707
4	Brighton Beach/Coney Island	40.5768	-73.9652

	neighborhood	cases	latitude	longitude
0	Bensonhurst	3	40.6139	-73.9922
1	Borough Park	121	40.6350	-73.9921
2	Brighton Beach/Coney Island	4	40.5768	-73.9652
3	Chelsea/Clinton	1	40.7552	-73.9966
4	Crown Heights	8	40.6694	-73.9422
5	Far Rockaway	1	40.5999	-73.7448
6	Flatbush	1	40.6415	-73.9594
7	Flushing	3	40.7675	-73.8331
8	Jamaica	2	40.7027	-73.7890
9	Midwood/Marine Park	5	40.6159	-73.9466
10	Port Richmond	3	40.6355	-74.1255
11	Red Hook	1	40.6734	-74.0083
12	Sunset Park	16	40.6527	-74.0093
13	West Queens	1	40.7443	-73.8882
14	Williamsburg	473	40.7081	-73.9571
15	Willowbrook	6	40.6032	-74.1385

	neighborhood	cases	latitude	longitude	mercator_x	mercator_y
0	Bensonhurst	3	40.6139	-73.9922	-8.236774e+06	4.955558e+06
1	Borough Park	121	40.6350	-73.9921	-8.236763e+06	4.958652e+06
2	Brighton Beach/Coney Island	4	40.5768	-73.9652	-8.233768e+06	4.950119e+06
3	Chelsea/Clinton	1	40.7552	-73.9966	-8.237264e+06	4.976300e+06
4	Crown Heights	8	40.6694	-73.9422	-8.231208e+06	4.963700e+06