Visualizing UFO Sightings with Plotly.py
Motivation
Presenting information to client or colleagues, specifically those unfamiliar with the data, can be a complex task. Maintaining viewer attention, creating relatable information and conveying meaning are all important goals when presenting data. How can we reach those goals?
Project
Let’s learn how to visualize the frequency of UFO sighting in the US using Plotly.py’s Choropleth Map implementation.
**All code and datasets are available on github here.
Plotly.py
Plotly.py is a highly extensible visualization library for Python. Built on top of Plotly.js. This library can be used to build bar charts, 3D graphs as well as provides functionality for geographic-based mapping.
Datasets
In this project we will use the UFO Sightings in the US dataset found on Kaggle. This set represents UFO sightings across the United States by ‘location’, ‘date’ and object ‘shape’. We will mostly be concerned with the field ‘location’ in this article.
We will also need some kind of definition of geographic data, specifically the states themselves. Luckily a standard format for describing this is readily available in the form of GeoJSON. The link gives a larger overview of this format but suffice to say it provides a convenient JSON-based way to describe geographic information. We can download and use the GeoJSON for U.S. states located here.
Open and explore the GeoJSON with the following…
f = open('gz_2010_us_040_00_5m.json')
states = json.load(f)states['features'][0]
As you can see here, each node in the JSON object holds the name, fips id and geometry of each state. This information will be used when building the map in Plotly.py.
Data Cleaning
We need to remove null rows and trailing spaces. Finally we will filter to only U.S. states.
df_ufo = pd.read_csv('ufo_location_shape.csv')# Remove nan rows
df_ufo = df_ufo[df_ufo['State'].notna()]# Only USA States
df_ufo = df_ufo[df_ufo['USA'] == "1"]# Remove trailing spaces
df_ufo['State'] = df_ufo['State'].apply(lambda x : x.rstrip())# Remove random bad data
df_ufo = df_ufo[(df_ufo['State'] != '??')
& (df_ufo['State'] != 'England/UK')
& (df_ufo['State'] != 'ON')
& (df_ufo['State'] != 'AB')]
Raw Data
You could present raw data to your viewers…
df_ufo
… but this is poor for quick insights and understanding by people unfamiliar with the data presented. The viewer is required to hold information and build spacial relations implicitly, which isn’t ideal.
Aggregate Data (Better but not Great)
…Or you could present (slightly better) aggregated data…
df_agg = df_ufo.groupby("State").size().to_frame().reset_index()
df_agg.columns = ["State", "Sightings"]
df_agg.head()
This works but it is still a static chart with a large amount of data. It still has some of the pitfalls of raw data that it requires the viewer to build relations that could otherwise be presented for them.
We need to format this data for quick understanding and in a way that can hold interest…
Visualizing using Choropleth Graph
Plotly.py built-in functionality for displaying geographic data in it’s Choropleth module. A choropleth is a map that displays variable data as different shading on a selected geographic region. We will use this model to visualize UFO sightings across the U.S.
First we need to map the state abbreviations with their respective fips number. We do this by setting a new pandas column for fips codes and inserting a value based on the abbreviation.
state_codes = {
‘WA’: ‘53’, ‘DE’: ‘10’, ‘DC’: ‘11’, ‘WI’: ‘55’, ‘WV’: ‘54’, ‘HI’: ‘15’,
‘FL’: ‘12’, ‘WY’: ‘56’, ‘PR’: ‘72’, ‘NJ’: ‘34’, ‘NM’: ‘35’, ‘TX’: ‘48’,
‘LA’: ‘22’, ‘NC’: ‘37’, ‘ND’: ‘38’, ‘NE’: ‘31’, ‘TN’: ‘47’, ‘NY’: ‘36’,
‘PA’: ‘42’, ‘AK’: ‘02’, ‘NV’: ‘32’, ‘NH’: ‘33’, ‘VA’: ‘51’, ‘CO’: ‘08’,
‘CA’: ‘06’, ‘AL’: ‘01’, ‘AR’: ‘05’, ‘VT’: ‘50’, ‘IL’: ‘17’, ‘GA’: ‘13’,
‘IN’: ‘18’, ‘IA’: ‘19’, ‘MA’: ‘25’, ‘AZ’: ‘04’, ‘ID’: ‘16’, ‘CT’: ‘09’,
‘ME’: ‘23’, ‘MD’: ‘24’, ‘OK’: ‘40’, ‘OH’: ‘39’, ‘UT’: ‘49’, ‘MO’: ‘29’,
‘MN’: ‘27’, ‘MI’: ‘26’, ‘RI’: ‘44’, ‘KS’: ‘20’, ‘MT’: ‘30’, ‘MS’: ‘28’,
‘SC’: ‘45’, ‘KY’: ‘21’, ‘OR’: ‘41’, ‘SD’: ‘46’
}
for index, row in df_agg.iterrows():
df_agg.at[index,’fips’] = state_codes[row[‘State’]]
Next we build the map itself…
fig = px.choropleth(
data_frame=df_agg,
geojson=states,
locations=’fips’,
color=’Sightings’,
color_continuous_scale=”Reds”,
range_color=(0, 400)
)
fig.update_layout(margin={“r”:0,”t”:0,”l”:0,”b”:0})
Let’s discuss what’s going on here…
- date_frame: Pandas dataframe containing the sightings and fips id’s.
- geojson: GeoJSON state definitions. These will be used to build the state boundaries on the map.
- locations: The dataframe column ‘fips’ used to map the sightings to the states in the GeoJSON.
- Color: The dataframe column ‘Sightings’ used to determine the color shade.
- Finally the next columns define what color palette to use as well as the scale.
The end result is a visualization of the following…
As we can see our data us much easier to present and digest. We have given the viewer attention grabbing, relatable information. The viewer is no longer required to build abstract relations in thought.
Thanks for following along!