Download Datasets and Presentation slides for this post HERE

Bitcoin and Cryptocurrencies: Full dataset, filtering, and reproducibility

Since the launch of Bitcoin in 2008, hundreds of similar projects based on the blockchain technology have emerged. We call these cryptocurrencies (also coins or cryptos in the Internet slang). Some are extremely valuable nowadays, and others may have the potential to become extremely valuable in the future¹. In fact, on the 6th of December of 2017, Bitcoin has a market capitalization above $200 billion.

The astonishing increase of Bitcoin market capitalization in 2017.

*¹ WARNING: The cryptocurrency market is exceptionally volatile² and any money you put in might disappear into thin air. Cryptocurrencies mentioned here might be scams similar to Ponzi Schemes or have many other issues (overvaluation, technical, etc.). Please do not mistake this for investment advice. *

² Update on March 2020: Well, it turned out to be volatile indeed :D

That said, let's get to business. We will start with a CSV we conveniently downloaded on the 6th of December of 2017 using the coinmarketcap API (NOTE: The public API went private in 2020 and is no longer available) named datasets/coinmarketcap_06122017.csv.

import pandas as pd
import numpy as np
import warnings

pd.set_option('display.expand_frame_repr', False)

warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

import pandas as pd

# Importing matplotlib and setting aesthetics for plotting later.
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'svg' 
plt.style.use('fivethirtyeight')

# Reading datasets/coinmarketcap_06122017.csv into pandas
dec6 = pd.read_csv('datasets/coinmarketcap_06122017.csv', index_col = 0)

# Selecting the 'id' and the 'market_cap_usd' columns
market_cap_raw = dec6[['id','market_cap_usd']]

# Counting the number of values
# ... YOUR CODE FOR TASK 2 ...
display(dec6.info(),dec6.head())
market_cap_raw.count()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1326 entries, 0 to 1325
Data columns (total 15 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   24h_volume_usd      1270 non-null   float64
 1   available_supply    1031 non-null   float64
 2   id                  1326 non-null   object 
 3   last_updated        1326 non-null   int64  
 4   market_cap_usd      1031 non-null   float64
 5   max_supply          215 non-null    float64
 6   name                1326 non-null   object 
 7   percent_change_1h   1273 non-null   float64
 8   percent_change_24h  1270 non-null   float64
 9   percent_change_7d   1283 non-null   float64
 10  price_btc           1326 non-null   float64
 11  price_usd           1326 non-null   float64
 12  rank                1326 non-null   int64  
 13  symbol              1326 non-null   object 
 14  total_supply        1211 non-null   float64
dtypes: float64(10), int64(2), object(3)
memory usage: 165.8+ KB

None

id                1326
market_cap_usd    1031
dtype: int64

Discard the cryptocurrencies without a market capitalization

Why do the count() for id and market_cap_usd differ above? It is because some cryptocurrencies listed in coinmarketcap.com have no known market capitalization, this is represented by NaN in the data, and NaNs are not counted by count(). These cryptocurrencies are of little interest to us in this analysis, so they are safe to remove.

cap = market_cap_raw.query('market_cap_usd > 0')

# Counting the number of values again
# ... YOUR CODE FOR TASK 3 ...
cap.count()

id                1031
market_cap_usd    1031
dtype: int64

How big is Bitcoin compared with the rest of the cryptocurrencies?

At the time of writing, Bitcoin is under serious competition from other projects, but it is still dominant in market capitalization. Let's plot the market capitalization for the top 10 coins as a barplot to better visualize this.

TOP_CAP_TITLE = 'Top 10 market capitalization'
TOP_CAP_YLABEL = '% of total cap'

# Selecting the first 10 rows and setting the index
cap10 = cap.head(10).set_index('id')

# Calculating market_cap_perc
cap10 = cap10.assign(market_cap_perc = lambda x: x['market_cap_usd']*100/(cap.market_cap_usd.sum()))


print(cap10.info())
# Plotting the barplot with the title defined above 
ax = cap10.plot.bar(y ='market_cap_perc', title = TOP_CAP_TITLE)

# Annotating the y axis with the label defined above
# ... YOUR CODE FOR TASK 4 ...
ax.set_ylabel(TOP_CAP_YLABEL)

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, bitcoin to cardano
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   market_cap_usd   10 non-null     float64
 1   market_cap_perc  10 non-null     float64
dtypes: float64(2)
memory usage: 240.0+ bytes
None

Text(0, 0.5, '% of total cap')

Making the plot easier to read and more informative

While the plot above is informative enough, it can be improved. Bitcoin is too big, and the other coins are hard to distinguish because of this. Instead of the percentage, let's use a log¹⁰ scale of the "raw" capitalization. Plus, let's use color to group similar coins and make the plot more informative¹.

For the colors rationale: bitcoin-cash and bitcoin-gold are forks of the bitcoin blockchain². Ethereum and Cardano both offer Turing Complete smart contracts. Iota and Ripple are not minable. Dash, Litecoin, and Monero get their own color.

¹ This coloring is a simplification. There are more differences and similarities that are not being represented here.

² The bitcoin forks are actually very different, but it is out of scope to talk about them here. Please see the warning above and do your own research.

COLORS = ['orange', 'green', 'orange', 'cyan', 'cyan', 'blue', 'silver', 'orange', 'red', 'green']

# Plotting market_cap_usd as before but adding the colors and scaling the y-axis  
ax = cap10.plot.bar(y='market_cap_usd', color = COLORS,logy =True, title = TOP_CAP_TITLE)

# Annotating the y axis with 'USD'
# ... YOUR CODE FOR TASK 5 ...
ax.set_ylabel('USD')

# Final touch! Removing the xlabel as it is not very informative
# ... YOUR CODE FOR TASK 5 ...
ax.set_xlabel('')

Text(0.5, 0, '')

What is going on?! Volatility in cryptocurrencies

The cryptocurrencies market has been spectacularly volatile since the first exchange opened. This notebook didn't start with a big, bold warning for nothing. Let's explore this volatility a bit more! We will begin by selecting and plotting the 24 hours and 7 days percentage change, which we already have available.

volatility = dec6[['id', 'percent_change_24h', 'percent_change_7d']]

# Setting the index to 'id' and dropping all NaN rows
volatility = volatility.set_index('id').dropna()

# Sorting the DataFrame by percent_change_24h in ascending order
volatility = volatility.sort_values('percent_change_24h')

# Checking the first few rows
# ... YOUR CODE FOR TASK 6 ...
volatility.head()

Well, we can already see that things are a bit crazy

It seems you can lose a lot of money quickly on cryptocurrencies. Let's plot the top 10 biggest gainers and top 10 losers in market capitalization.

def top10_subplot(volatility_series, title):
    # Making the subplot and the figure for two side by side plots
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 6))
    
    # Plotting with pandas the barchart for the top 10 losers
    ax = volatility_series[:10].plot.bar(color='darkred', ax=axes[0])
    ax.set_xticklabels( volatility_series[:10].index, rotation=45, ha='right')
    
    # Setting the figure's main title to the text passed as parameter
    # ... YOUR CODE FOR TASK 7 ...
    fig.suptitle(title)
    # Setting the ylabel to '% change'
    # ... YOUR CODE FOR TASK 7 ...
    ax.set_ylabel('% change')
    # Same as above, but for the top 10 winners
    ax = volatility_series[-10:].plot.bar(color='darkblue',ax=axes[1])
    ax.set_xticklabels( volatility_series[-10:].index, rotation=45, ha='right')
    
    plt.xticks(rotation=45)
    # Returning this for good practice, might use later
    return fig, ax

DTITLE = "24 hours top losers and winners"

# Calling the function above with the 24 hours period series and title DTITLE  
fig, ax = top10_subplot(volatility.percent_change_24h, DTITLE)

Ok, those are... interesting. Let's check the weekly Series too.

800% daily increase?! Why are we doing this tutorial and not buying random coins?¹

After calming down, let's reuse the function defined above to see what is going weekly instead of daily.

¹ Please take a moment to understand the implications of the red plots on how much value some cryptocurrencies lose in such short periods of time

volatility7d = volatility.sort_values('percent_change_7d')
print(volatility7d.head(10))

WTITLE = "Weekly top losers and winners"

# Calling the top10_subplot function
fig, ax = top10_subplot(volatility7d.percent_change_7d, WTITLE)

                                 percent_change_24h  percent_change_7d
id                                                                    
royalties                                     -9.06             -99.59
flappycoin                                   -95.85             -96.61
credence-coin                                -94.22             -95.31
cagecoin                                     -36.26             -92.68
tyrocoin                                     -79.02             -87.43
electra                                      -40.59             -81.29
jetcoin                                      -36.03             -80.66
everus                                       -21.99             -76.86
ether-for-the-rest-of-the-world               -3.05             -75.03
landcoin                                     -52.11             -73.62

How small is small?

The names of the cryptocurrencies above are quite unknown, and there is a considerable fluctuation between the 1 and 7 days percentage changes. As with stocks, and many other financial products, the smaller the capitalization, the bigger the risk and reward. Smaller cryptocurrencies are less stable projects in general, and therefore even riskier investments than the bigger ones¹. Let's classify our dataset based on Investopedia's capitalization definitions for company stocks.

¹ Cryptocurrencies are a new asset class, so they are not directly comparable to stocks. Furthermore, there are no limits set in stone for what a "small" or "large" stock is. Finally, some investors argue that bitcoin is similar to gold, this would make them more comparable to a commodity instead.

largecaps = cap.query('market_cap_usd > 10000000000')

# Printing out largecaps
# ... YOUR CODE FOR TASK 9 ...
print(largecaps)

             id  market_cap_usd
0       bitcoin    2.130493e+11
1      ethereum    4.352945e+10
2  bitcoin-cash    2.529585e+10
3          iota    1.475225e+10

Most coins are tiny

Note that many coins are not comparable to large companies in market cap, so let's divert from the original Investopedia definition by merging categories.

This is all for now. Thanks for completing this project!

# "cap" DataFrame. Returns an int.
# INSTRUCTORS NOTE: Since you made it to the end, consider it a gift :D
def capcount(query_string):
    return cap.query(query_string).count().id

# Labels for the plot
LABELS = ["biggish", "micro", "nano"]

# Using capcount count the biggish cryptos
biggish = capcount('market_cap_usd >= 300000000')

# Same as above for micro ...
micro = capcount('market_cap_usd < 300000000 & market_cap_usd > 50000000')

# ... and for nano
nano =  capcount('market_cap_usd < 50000000')
print(nano)

# Making a list with the 3 counts
values = [biggish, micro, nano]

# Plotting them with matplotlib 
# ... YOUR CODE FOR TASK 10 ...
plt.bar(LABELS,values)
# plt.xlabels(LABELS)

896

<BarContainer object of 3 artists>

	24h_volume_usd	available_supply	id	last_updated	market_cap_usd	max_supply	name	percent_change_1h	percent_change_24h	percent_change_7d	price_btc	price_usd	rank	symbol	total_supply
0	9.007640e+09	1.672352e+07	bitcoin	1512549554	2.130493e+11	2.100000e+07	Bitcoin	0.12	7.33	17.45	1.000000	12739.500000	1	BTC	1.672352e+07
1	1.551330e+09	9.616537e+07	ethereum	1512549553	4.352945e+10	NaN	Ethereum	-0.18	-3.93	-7.33	0.036177	452.652000	2	ETH	9.616537e+07
2	1.111350e+09	1.684044e+07	bitcoin-cash	1512549578	2.529585e+10	2.100000e+07	Bitcoin Cash	1.65	-5.51	-4.75	0.120050	1502.090000	3	BCH	1.684044e+07
3	2.936090e+09	2.779530e+09	iota	1512549571	1.475225e+10	2.779530e+09	IOTA	-2.38	83.35	255.82	0.000424	5.307460	4	MIOTA	2.779530e+09
4	2.315050e+08	3.873915e+10	ripple	1512549541	9.365343e+09	1.000000e+11	Ripple	0.56	-3.70	-14.79	0.000019	0.241754	5	XRP	9.999309e+10

	percent_change_24h	percent_change_7d
id
flappycoin	-95.85	-96.61
credence-coin	-94.22	-95.31
coupecoin	-93.93	-61.24
tyrocoin	-79.02	-87.43
petrodollar	-76.55	542.96