Gain Mastery of Data Visualization Using Matplotlib in Simple Steps

A Beginner’s Guide to Understanding the Secret to Gaining Business Insights

Introduction

Whether you are just starting out or you have years of experience as a data professional, data visualization is a crucial tool for gaining insight from your data. These insights are presented visually so that you can easily see the trends, patterns and connections in your data. Matplotlib is one of the most popular and versatile libraries for creating data visualization.

In this article, we will cover the following:

  1. Installing and Importing Matplotlib

  2. Components of Matplotlib

  3. Creating Basic Plots

  4. Customizing Plots

Prerequisites

While a basic understanding of Python is recommended, you can also follow along if you are new to the programming language. We shall use Jupyter Notebook to show the step-by-step process for visualizing your data. Here is a link to download and install Anaconda, a Jupyter Notebook can be found inside it.

Installing and Importing Matplotlib

To use Matplotlib, you need to install it first. You can use Python’s standard package manager ’pip.’ Open your Jupyter Notebook and run the following command:

pip install matplotlib

This command will download the latest Matplotlib from the Python Package Index (PyPI) and install it on your computer. After the installation is complete, you can import Matplotlib into your Jupyter Notebook using the following code:

import matplotlib.pyplot as plt

Components of Matplotlib

Matplotlib has several components; a few of them are:

  1. Figure: This is a container that holds all the elements of a plot.

  2. Axes: This is the part of the page that holds the data to be displayed. A plot can contain one or more axes.

  3. Axis: These are X and Y axes (plural of axis) with tick marks and labels that allow you to understand the data values corresponding to the points in the plot.

Creating Basic Plots

We can create a basic plot of some data points using the example below:

# Import matplotlib module
import matplotlib.pyplot as plt 

# Create a figure and an axes 
fig, ax = plt.subplots() 

# Sample data to be plotted
years = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020] 
population = [9872, 9912, 10098, 11208, 12289, 13897, 13991, 14583, 16721, 17210] 

# Plot the data on the axes 
ax.plot(years, population) 

# Add labels and a title 
ax.set_xlabel('Year') 
ax.set_ylabel('Population (millions)') 
ax.set_title('Population Growth Over Time') 

# Show the plot 
plt.show()

In the above code snippet, we first imported the matplotlib module, then we created a figure and an axes. The years and population variables hold the data to be plotted. Typically, this data comes from your database records or an external source. ax.plot(years, population) is used to plot the data on the axes. We then set the X and Y axis labels and title to the plot, and finally display it with plt.show().

The plot created is displayed below:

Customizing Plots

You can customize the appearance of your plot by adding features like markers, color, linestyle and grid lines.

# Import matplotlib module
import matplotlib.pyplot as plt 

# Create a figure and an axes 
fig, ax = plt.subplots() 

# Sample data to be plotted
years = [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020] 
population = [9872, 9912, 10098, 11208, 12289, 13897, 13991, 14583, 16721, 17210] 

# Plot the data on the axes 
ax.plot(years, population, linestyle ='-', marker='v', color='b') 

# Add labels and a title 
ax.set_xlabel('Year') 
ax.set_ylabel('Population (millions)') 
ax.set_title('Population Growth Over Time') 

# Show the plot 
plt.grid(True)
plt.show()

The type of plot to be used is determined by the kind of data and the insights to be derived from it. The following are some common types of plots and the kinds of data that suits them:

  1. Line plot: This is most suitable for time series and continuous data and is used to display trends over time, compare multiple datasets, and show continuous data points.

  2. Bar plot: Bar plots are recommended for categorical data (nominal or ordinal).

  3. Scatter plot: This is used for two continuous variables to show the correlation or relationship between them.

  4. Pie chart: This is used for categorical data to show the relative proportion of different categories.

  5. Heatmap: Heatmap is best used for two-dimensional data to visualize relationships between two variables and to identify patterns in matrices.

  6. Box plot (Box-and-Whisker Plot): This is used for continuous data to display the distribution of data including quartiles, and detect outliers.

As a general rule, consider the nature of your data, the relationship you want to establish, and the message you want to communicate when selecting a plot type. For starters, you may want to experiment with different types of plots to find the one that best fits your specific analysis.