Data Visualization Basics in Google Colab

Table of Contents

Introduction to Google Colab and Python Libraries for Data Visualization

Google Colab Setup

Google Colab is a powerful tool for running Python code in your web browser, particularly useful for data analysis and visualizations. Follow these steps to get started:

Step 1: Access Google Colab

Open your web browser and go to the Google Colab website.
If prompted, sign in with your Google account.
Click on “New Notebook” to create a new Colab notebook.

Step 2: Using the Google Colab Interface

Interface Overview:
- Code Cells: These cells allow you to write and execute code.
- Text Cells: These cells allow you to write formatted text using Markdown.
Running Code:
- Write your Python code in the code cell.
- Click the Run button (or press Shift + Enter) to execute the code.
Installing Libraries:
- Use !pip install to install any additional libraries needed, directly from the notebook environment.

Python Libraries for Data Visualization

The most commonly used Python libraries for data visualization include:

Matplotlib
Seaborn
Plotly

Step 3: Installing Required Libraries

!pip install matplotlib seaborn plotly

Step 4: Importing Libraries

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

Step 5: Basic Data Visualization Examples

Matplotlib Example:

import numpy as np

# Generate some data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create a line plot
plt.plot(x, y)
plt.title('Sine Wave using Matplotlib')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

Seaborn Example:

import seaborn as sns
import pandas as pd

# Load sample data
data = sns.load_dataset('iris')

# Create a scatter plot
sns.scatterplot(data=data, x='sepal_length', y='sepal_width', hue='species')
plt.title('Iris Dataset using Seaborn')
plt.show()

Plotly Example:

import plotly.express as px
import seaborn as sns

# Load sample data
data = sns.load_dataset('iris')

# Create a scatter plot
fig = px.scatter(data, x='sepal_length', y='sepal_width', color='species', title='Iris Dataset using Plotly')
fig.show()

Conclusion

Google Colab makes it easy to start coding with Python for data visualization. By following the steps outlined above, you can set up your environment and create basic visualizations using Matplotlib, Seaborn, and Plotly. These tools provide a strong foundation for more advanced data analysis and visualization tasks.

Basic Plotting Techniques with Matplotlib

Line Plot

import matplotlib.pyplot as plt

# Data
x = [0, 1, 2, 3, 4]
y = [0, 1, 4, 9, 16]

# Plot
plt.plot(x, y)

# Labels
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Line Plot Example')

# Show plot
plt.show()

Scatter Plot

import matplotlib.pyplot as plt

# Data
x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6]
y = [99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86]

# Plot
plt.scatter(x, y)

# Labels
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Scatter Plot Example')

# Show plot
plt.show()

Bar Plot

import matplotlib.pyplot as plt

# Data
x = ['A', 'B', 'C', 'D']
y = [23, 45, 56, 78]

# Plot
plt.bar(x, y)

# Labels
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Plot Example')

# Show plot
plt.show()

Histogram

import matplotlib.pyplot as plt

# Data
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6, 6, 7, 8, 9, 9, 10]

# Plot
plt.hist(data, bins=5)

# Labels
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram Example')

# Show plot
plt.show()

Pie Chart

import matplotlib.pyplot as plt

# Data
labels = 'A', 'B', 'C', 'D'
sizes = [15, 30, 45, 10]
colors = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue']

# Plot
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=140)

# Title
plt.title('Pie Chart Example')

# Show plot
plt.show()

Conclusion

These snippets provide basic implementations of common plotting techniques using Matplotlib in Python. Using these, you can effectively visualize data in Google Colab for various analytical purposes. Ensure to run each code snippet individually in a Colab notebook cell to see the corresponding plots.

Advanced Visualization with Seaborn

Overview

In this section, we’ll focus on creating advanced visualizations using Seaborn. Seaborn is built on top of Matplotlib and provides a high-level interface for drawing attractive statistical graphics.

Required Libraries

Ensure you have the following necessary imports:

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

Dataset

For demonstration purposes, we’ll use the built-in tips dataset provided by Seaborn.

# Load the 'tips' dataset
tips = sns.load_dataset("tips")

1. Pairplot

A pairplot allows you to visualize pairwise relationships in a dataset. It’s particularly useful for exploring data and understanding relationships between different variables.

# Pairplot with hue based on 'sex'
sns.pairplot(tips, hue="sex")
plt.show()

2. Heatmap

Heatmaps are great for visualizing matrix-like data, especially for showing correlations between variables.

# Compute the correlation matrix
corr = tips.corr()

# Create a heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.show()

3. Boxplot with Facets

Boxplots are useful for showing the distribution of data and outliers. Faceting can help compare different subsets.

# Boxplot with facets
sns.catplot(x="day", y="total_bill", hue="smoker", kind="box", data=tips)
plt.show()

4. Violin Plot

Violin plots combine the benefits of boxplots and density plots. They show the distribution of the data across different categories.

# Violin plot
sns.violinplot(x="day", y="total_bill", hue="sex", split=True, data=tips)
plt.show()

5. Jointplot

Jointplots allow you to visualize a bivariate relationship along with the univariate distributions of each variable.

# Jointplot
sns.jointplot(x="total_bill", y="tip", data=tips, kind='reg')
plt.show()

6. PairGrid

A PairGrid can be used to create a matrix of plots to provide detailed introspection of the dataset.

# PairGrid with customized plots
g = sns.PairGrid(tips, hue="sex")
g.map_upper(sns.kdeplot, cmap="Blues_d")
g.map_lower(plt.scatter)
g.map_diag(sns.kdeplot, lw=3)
g.add_legend()
plt.show()

7. Swarm Plot

Swarm plots show all data points while avoiding overlap, providing insight into the distribution and relationships between variables.

# Swarm plot
sns.swarmplot(x="day", y="total_bill", hue="sex", data=tips)
plt.show()

8. LM Plot

LM plots (Linear Model plots) are useful for conducting regression analysis and showing the best fit line.

# LM plot
sns.lmplot(x="total_bill", y="tip", hue="sex", data=tips)
plt.show()

These examples demonstrate powerful ways to visualize and analyze your data using Seaborn in Google Colab. Incorporate them into your project to create compelling and informative visualizations.

Interactive Visualizations with Plotly

Introduction

Plotly is a powerful data visualization library that enables the creation of interactive charts and plots. This section will guide you through the implementation of interactive visualizations using Plotly.

Loading Data

For this demonstration, let’s work with a sample dataset.

import plotly.express as px
import plotly.graph_objects as go
import pandas as pd

# Loading sample data
data = px.data.gapminder()

Scatter Plot

Create an interactive scatter plot showing life expectancy versus GDP per capita.

fig = px.scatter(data, 
                 x="gdpPercap", 
                 y="lifeExp",
                 color="continent",
                 hover_name="country",
                 log_x=True,
                 size_max=60,
                 animation_frame="year",
                 title="Life Expectancy vs GDP per Capita",
                 labels={"gdpPercap": "GDP per Capita", "lifeExp": "Life Expectancy"}
                 )
fig.show()

Line Plot

Creating a line plot for average life expectancy over the years.

average_life_expectancy = data.groupby('year', as_index=False)['lifeExp'].mean()

fig = px.line(average_life_expectancy, 
              x="year", 
              y="lifeExp", 
              title="Average Life Expectancy Over Years",
              labels={"year": "Year", "lifeExp": "Life Expectancy"}
              )
fig.show()

Bar Plot

Creating an interactive bar plot for GDP per capita by continent in a particular year.

# Filter data for a specific year
year_data = data[data['year'] == 2007]

fig = px.bar(year_data, 
             x='continent', 
             y='gdpPercap', 
             color='continent', 
             title="GDP per Capita by Continent in 2007",
             labels={"continent": "Continent", "gdpPercap": "GDP per Capita"}
             )
fig.show()

Histogram

Creating a histogram for the distribution of life expectancy.

fig = px.histogram(data, 
                   x="lifeExp", 
                   nbins=30,
                   title="Life Expectancy Distribution",
                   labels={"lifeExp": "Life Expectancy"}
                   )
fig.show()

Interactive Dashboard

Combining multiple plots into an interactive dashboard using subplots.

from plotly.subplots import make_subplots

# Setting up subplots
fig = make_subplots(rows=2, cols=2, subplot_titles=("Life Expectancy vs GDP", 
                                                    "Average Life Expectancy Over Years", 
                                                    "GDP per Capita by Continent", 
                                                    "Life Expectancy Distribution"))

# Adding scatter plot
scatter = px.scatter(data, 
                     x="gdpPercap", 
                     y="lifeExp",
                     color="continent",
                     log_x=True)
fig.add_trace(scatter.data[0], row=1, col=1)

# Adding line plot
line = px.line(average_life_expectancy, x="year", y="lifeExp")
fig.add_trace(line.data[0], row=1, col=2)

# Adding bar plot
bar = px.bar(year_data, x='continent', y='gdpPercap', color='continent')
for trace in bar.data:
    fig.add_trace(trace, row=2, col=1)

# Adding histogram
hist = px.histogram(data, x="lifeExp", nbins=30)
fig.add_trace(hist.data[0], row=2, col=2)

# Updating layout
fig.update_layout(height=800, width=1200, title_text="Interactive Dashboard of Gapminder Data")

fig.show()

Conclusion

By following the above implementations, you should be able to create various interactive visualizations using Plotly in your project. These visualizations will help in better data analysis and insights.

Google Colab: Real-world Data Visualization Projects

Project: Visualizing Global COVID-19 Data

Objective

Visualize global COVID-19 statistics to analyze trends and patterns using data from a reliable source such as Our World in Data.

Data Source

Data from “Our World In Data” (https://ourworldindata.org/coronavirus-source-data)

Step-by-step Implementation

Load and Inspect Data
Preprocessing
Plotting Trends Over Time
Comparing Countries
Interactive Visualizations

1. Load and Inspect Data

import pandas as pd

# Load the data directly from the URL
url = 'https://covid.ourworldindata.org/data/owid-covid-data.csv'
data = pd.read_csv(url)

# Display the first few rows
data.head()

2. Preprocessing

Filter the data to only include relevant columns and handle missing values.

# Select relevant columns
columns = [
    'date', 'location', 'total_cases', 'new_cases', 
    'total_deaths', 'new_deaths', 'total_vaccinations', 
    'people_vaccinated', 'people_fully_vaccinated'
]
data = data[columns]

# Convert date column to datetime
data['date'] = pd.to_datetime(data['date'])

# Handle missing values by filling with zeros
data = data.fillna(0)

# Display the first few rows after preprocessing
data.head()

3. Plotting Trends Over Time

Plot global trends for total cases and total deaths.

import matplotlib.pyplot as plt

# Group data by date and sum cases and deaths globally
global_data = data.groupby('date')[['total_cases', 'total_deaths']].sum().reset_index()

# Plot the trends
plt.figure(figsize=(14, 7))
plt.plot(global_data['date'], global_data['total_cases'], label='Total Cases')
plt.plot(global_data['date'], global_data['total_deaths'], label='Total Deaths')
plt.xlabel('Date')
plt.ylabel('Count')
plt.title('Global COVID-19 Total Cases and Total Deaths Over Time')
plt.legend()
plt.show()

4. Comparing Countries

Comparing the COVID-19 trends of multiple countries.

# Filter data for specific countries
countries = ['United States', 'India', 'Brazil']
filtered_data = data[data['location'].isin(countries)]

# Plot the trends for each country
plt.figure(figsize=(14, 7))
for country in countries:
    country_data = filtered_data[filtered_data['location'] == country]
    plt.plot(country_data['date'], country_data['total_cases'], label=f'Total Cases - {country}')
    plt.plot(country_data['date'], country_data['total_deaths'], label=f'Total Deaths - {country}')

plt.xlabel('Date')
plt.ylabel('Count')
plt.title('COVID-19 Total Cases and Total Deaths Over Time by Country')
plt.legend()
plt.show()

5. Interactive Visualizations

Creating interactive visualizations using Plotly.

import plotly.express as px

# Interactive plot for the total cases and deaths over time
fig = px.line(global_data, x='date', y=['total_cases', 'total_deaths'], 
              labels={'value':'Count', 'variable':'Metric'},
              title='Global COVID-19 Total Cases and Total Deaths Over Time')

# Display plot in Google Colab
fig.show()

# Interactive comparison between countries
fig_country = px.line(filtered_data, x='date', y='total_cases', color='location',
                      labels={'total_cases':'Total Cases', 'location':'Country'},
                      title='COVID-19 Total Cases Over Time by Country')

# Display plot in Google Colab
fig_country.show()

Conclusion

By following these steps, you can effectively visualize and analyze real-world COVID-19 data, drawing meaningful insights through both static and interactive plots. This practical implementation uses data from a reliable source and showcases the capabilities of various plotting libraries in Google Colab.

Mastering Data Analytics with Matplotlib in Python

« Older Entries

Data Visualization Basics in Google Colab

Introduction to Google Colab and Python Libraries for Data Visualization

Google Colab Setup

Step 1: Access Google Colab

Step 2: Using the Google Colab Interface

Python Libraries for Data Visualization

Step 3: Installing Required Libraries

Step 4: Importing Libraries

Step 5: Basic Data Visualization Examples

Matplotlib Example:

Seaborn Example:

Plotly Example:

Conclusion

Basic Plotting Techniques with Matplotlib

Line Plot

Scatter Plot

Bar Plot

Histogram

Pie Chart

Conclusion

Advanced Visualization with Seaborn

Overview

Required Libraries

Dataset

1. Pairplot

2. Heatmap

3. Boxplot with Facets

4. Violin Plot

5. Jointplot

6. PairGrid

7. Swarm Plot

8. LM Plot

Interactive Visualizations with Plotly

Introduction

Loading Data

Scatter Plot

Line Plot

Bar Plot

Histogram

Interactive Dashboard

Conclusion

Google Colab: Real-world Data Visualization Projects

Project: Visualizing Global COVID-19 Data

Objective

Data Source

Step-by-step Implementation

1. Load and Inspect Data

2. Preprocessing

3. Plotting Trends Over Time

4. Comparing Countries

5. Interactive Visualizations

Conclusion

Related Posts