A Visual History of Nobel Prize Winners (Datacamp project)
Explore a dataset from Kaggle containing a century's worth of Nobel Laureates. Who won? Who got snubbed?
The Nobel Prize is perhaps the world's most well known scientific award. Every year it is given to scientists and scholars in chemistry, literature, physics, medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time the prize was Eurocentric and male-focused, but nowadays it's not biased in any way. Surely, right?
Well, let's find out! What characteristics do the prize winners have? Which country gets it most often? And has anybody gotten it twice? It's up to you to figure this out.
The dataset used in this project is from The Nobel Foundation on Kaggle.
The most Nobel of Prizes
The Nobel Prize is perhaps the world's most well known scientific award. Except for the honor, prestige and substantial prize money the recipient also gets a gold medal showing Alfred Nobel (1833 - 1896) who established the prize. Every year it's given to scientists and scholars in the categories chemistry, literature, physics, physiology or medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time the Prize was very Eurocentric and male-focused, but nowadays it's not biased in any way whatsoever. Surely. Right?
Well, we're going to find out! The Nobel Foundation has made a dataset available of all prize winners from the start of the prize, in 1901, to 2016. Let's load it in and take a look.
# ... YOUR CODE FOR TASK 1 ...
import pandas as pd
import seaborn as sns
import numpy as np
# Reading in the Nobel Prize data
nobel = pd.read_csv("datasets/nobel.csv")
# Taking a look at the first several winners
# ... YOUR CODE FOR TASK 1 ...
nobel.head(6)
So, who gets the Nobel Prize?
Just looking at the first couple of prize winners, or Nobel laureates as they are also called, we already see a celebrity: Wilhelm Conrad Röntgen, the guy who discovered X-rays. And actually, we see that all of the winners in 1901 were guys that came from Europe. But that was back in 1901, looking at all winners in the dataset, from 1901 to 2016, which sex and which country is the most commonly represented?
(For country, we will use the birth_country
of the winner, as the organization_country
is NaN
for all shared Nobel Prizes.)
# out between 1901 and 2016
# ... YOUR CODE FOR TASK 2 ...
display(len(nobel[(nobel["year"] >= 1901) &( nobel["year"] <= 2016)]))
# Display the number of prizes won by male and female recipients.
# ... YOUR CODE FOR TASK 2 ...
display(nobel["sex"].value_counts())
# Display the number of prizes won by the top 10 nationalities.
# ... YOUR CODE FOR TASK 2 ...
nobel["birth_country"].value_counts().head(10)
nobel['usa_born_winner'] = nobel["birth_country"] == "United States of America"
nobel['decade'] = np.floor((nobel["year"]//10)*10).astype(int)
prop_usa_winners = nobel.groupby("decade", as_index=False )['usa_born_winner'].mean()
# Display the proportions of USA born winners per decade
# ... YOUR CODE FOR TASK 3 ...
display(prop_usa_winners)
sns.set(style='whitegrid')
# and setting the size of all plots.
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [11, 7]
# Plotting USA born winners
ax = sns.lineplot(data=prop_usa_winners, x="decade", y="usa_born_winner")
# Adding %-formatting to the y-axis
from matplotlib.ticker import PercentFormatter
# ... YOUR CODE FOR TASK 4 ...
ax.yaxis.set_major_formatter(PercentFormatter(1.0))
plt.show()
What is the gender of a typical Nobel Prize winner?
So the USA became the dominating winner of the Nobel Prize first in the 1930s and had kept the leading position ever since. But one group that was in the lead from the start, and never seems to let go, are men. Maybe it shouldn't come as a shock that there is some imbalance between how many male and female prize winners there are, but how significant is this imbalance? And is it better or worse within specific prize categories like physics, medicine, literature, etc.?
nobel['female_winner'] = nobel["sex"] == "Female"
prop_female_winners = nobel.groupby(["decade","category"], as_index=False )['female_winner'].mean()
print(prop_female_winners.head())
# Plotting USA born winners with % winners on the y-axis
# ... YOUR CODE FOR TASK 5 ...
ax = sns.lineplot(data=prop_female_winners, x="decade", y="female_winner", hue = "category")
ax.yaxis.set_major_formatter(PercentFormatter(1.0))
plt.show()
The first woman to win the Nobel Prize
The plot above is a bit messy as the lines are overplotting. But it does show some interesting trends and patterns. Overall the imbalance is pretty large with physics, economics, and chemistry having the largest imbalance. Medicine has a somewhat positive trend, and since the 1990s the literature prize is also now more balanced. The big outlier is the peace prize during the 2010s, but keep in mind that this just covers the years 2010 to 2016.
Given this imbalance, who was the first woman to receive a Nobel Prize? And in what category?
# ... YOUR CODE FOR TASK 5 ...
nobel[nobel["sex"] == "Female"].nsmallest(1, 'year')
# ... YOUR CODE FOR TASK 5 ...
nobel.groupby("full_name", as_index=False ).filter(lambda x : x['full_name'].shape[0]>=2)
How old are you when you get the prize?
The list of repeat winners contains some illustrious names! We again meet Marie Curie, who got the prize in physics for discovering radiation and in chemistry for isolating radium and polonium. John Bardeen got it twice in physics for transistors and superconductivity, Frederick Sanger got it twice in chemistry, and Linus Carl Pauling got it first in chemistry and later in peace for his work in promoting nuclear disarmament. We also learn that organizations also get the prize as both the Red Cross and the UNHCR have gotten it twice.
But how old are you generally when you get the prize?
nobel['birth_date'] = pd.to_datetime(nobel['birth_date'])
# Calculating the age of Nobel Prize winners
nobel['age'] = nobel['year'] - nobel['birth_date'].dt.year
# Plotting the age of Nobel Prize winners
sns.lmplot(x="year", y="age", data=nobel, lowess=True, aspect=2, line_kws={'color' : 'black'})
Age differences between prize categories
The plot above shows us a lot! We see that people use to be around 55 when they received the price, but nowadays the average is closer to 65. But there is a large spread in the laureates' ages, and while most are 50+, some are very young.
We also see that the density of points is much high nowadays than in the early 1900s -- nowadays many more of the prizes are shared, and so there are many more winners. We also see that there was a disruption in awarded prizes around the Second World War (1939 - 1945).
Let's look at age trends within different prize categories.
# ... YOUR CODE FOR TASK 9 ...
sns.lmplot(x="year", y="age", data=nobel, lowess=True, aspect=2, line_kws={'color' : 'black'}, row = 'category')
Oldest and youngest winners
More plots with lots of exciting stuff going on! We see that both winners of the chemistry, medicine, and physics prize have gotten older over time. The trend is strongest for physics: the average age used to be below 50, and now it's almost 70. Literature and economics are more stable. We also see that economics is a newer category. But peace shows an opposite trend where winners are getting younger!
In the peace category we also a winner around 2010 that seems exceptionally young. This begs the questions, who are the oldest and youngest people ever to have won a Nobel Prize?
# ... YOUR CODE FOR TASK 10 ...
display(nobel.nlargest(n=1, columns='age'))
# The youngest winner of a Nobel Prize as of 2016
# ... YOUR CODE FOR TASK 10 ...
nobel.nsmallest(n=1, columns='age')
You get a prize!
Hey! You get a prize for making it to the very end of this notebook! It might not be a Nobel Prize, but I made it myself in paint so it should count for something. But don't despair, Leonid Hurwicz was 90 years old when he got his prize, so it might not be too late for you. Who knows.
Before you leave, what was again the name of the youngest winner ever who in 2014 got the prize for "[her] struggle against the suppression of children and young people and for the right of all children to education"?
youngest_winner = 'Malala'
print(str(nobel.nsmallest(1, 'age').full_name))