Introduction to Data Visualization in Python
Setting Up the Environment
- Ensure you have Python installed on your machine. You can download it from python.org.
- Install the required libraries using
pip
.
pip install matplotlib seaborn
Importing Libraries
Start by importing the necessary libraries.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
Basic Plotting with Matplotlib
Line Plot
# Sample Data
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Plot
plt.figure(figsize=(10, 6))
plt.plot(x, y, marker='o', linestyle='-', color='b', label='Sine Wave')
plt.title('Line Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.grid(True)
plt.show()
Bar Plot
# Sample Data
categories = ['A', 'B', 'C', 'D']
values = [10, 20, 15, 25]
# Plot
plt.figure(figsize=(8, 5))
plt.bar(categories, values, color='skyblue')
plt.title('Bar Plot Example')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.show()
Scatter Plot
# Sample Data
x = np.random.rand(50)
y = np.random.rand(50)
# Plot
plt.figure(figsize=(8, 6))
plt.scatter(x, y, color='r', marker='x')
plt.title('Scatter Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
Enhanced Plotting with Seaborn
Histogram
# Sample Data
data = np.random.randn(1000)
# Plot
plt.figure(figsize=(8, 6))
sns.histplot(data, kde=True, color='purple')
plt.title('Histogram Example with Seaborn')
plt.xlabel('Data')
plt.ylabel('Frequency')
plt.show()
Box Plot
# Sample Data
data = pd.DataFrame({
'Category': np.random.choice(['A', 'B', 'C'], 100),
'Values': np.random.randn(100)
})
# Plot
plt.figure(figsize=(8, 6))
sns.boxplot(x='Category', y='Values', data=data, palette='Set3')
plt.title('Box Plot Example with Seaborn')
plt.xlabel('Category')
plt.ylabel('Values')
plt.show()
Pair Plot
# Sample Data
data = sns.load_dataset('iris')
# Plot
sns.pairplot(data, hue='species', palette='bright', markers=['o', 's', 'D'])
plt.title('Pair Plot Example with Seaborn')
plt.show()
Conclusion
This covers the basic introduction to data visualization in Python using Matplotlib and Seaborn. By following these examples, you can create various types of plots to visualize your data effectively.
Setting Up the Environment
To set up the environment for a project involving Seaborn and Matplotlib for data visualization in Python, follow these steps. This guide assumes you have already conducted basic setup instructions and have Python installed.
Step 1: Create a Virtual Environment
Navigate to your project directory:
cd path/to/your/project
Create a virtual environment:
python -m venv venv
Activate the virtual environment:
- On Windows:
venvScriptsactivate
- On macOS/Linux:
source venv/bin/activate
- On Windows:
Step 2: Install Required Libraries
Upgrade pip:
pip install --upgrade pip
Install Seaborn and Matplotlib:
pip install seaborn matplotlib
Verify installation by checking the versions:
python -c "import seaborn as sns; import matplotlib.pyplot as plt; print('Seaborn:', sns.__version__, 'Matplotlib:', plt.__version__)"
Step 3: Set Up Jupyter Notebook (Optional but Recommended)
Install Jupyter Notebook:
pip install notebook
Start Jupyter Notebook:
jupyter notebook
- Navigate to the provided URL, typically
http://localhost:8888/tree
, in your web browser.
- Navigate to the provided URL, typically
Step 4: Configure Matplotlib Defaults (Optional)
Create a configuration file:
import matplotlib.pyplot as plt
plt.rcParams.update({
'figure.figsize': (10, 6),
'axes.titlesize': 16,
'axes.labelsize': 14,
'xtick.labelsize': 12,
'ytick.labelsize': 12,
'legend.fontsize': 12
})Alternatively, save these settings in a Python script called
plot_config.py
for future reuse:def set_plot_defaults():
import matplotlib.pyplot as plt
plt.rcParams.update({
'figure.figsize': (10, 6),
'axes.titlesize': 16,
'axes.labelsize': 14,
'xtick.labelsize': 12,
'ytick.labelsize': 12,
'legend.fontsize': 12
})- Then, you can import and use
set_plot_defaults()
in your main scripts.
- Then, you can import and use
Step 5: Test the Environment Setup
- Create a simple test script or Jupyter Notebook cell:
import seaborn as sns
import matplotlib.pyplot as plt
# Load example dataset
data = sns.load_dataset('iris')
# Create a simple plot
sns.scatterplot(data=data, x='sepal_length', y='sepal_width', hue='species')
plt.title('Sepal Length vs Sepal Width')
plt.show()
This will create a scatter plot using the Iris dataset, ensuring that your environment is correctly set up for data visualization with Seaborn and Matplotlib in Python.
Basic Plots with Matplotlib
Below are examples of creating basic plots using Matplotlib:
Line Plot
import matplotlib.pyplot as plt
# Data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
# Creating the plot
plt.plot(x, y)
# Adding title and labels
plt.title('Simple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
# Displaying the plot
plt.show()
Scatter Plot
import matplotlib.pyplot as plt
# Data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
# Creating the plot
plt.scatter(x, y)
# Adding title and labels
plt.title('Simple Scatter Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
# Displaying the plot
plt.show()
Bar Plot
import matplotlib.pyplot as plt
# Data
categories = ['A', 'B', 'C', 'D']
values = [3, 7, 5, 2]
# Creating the plot
plt.bar(categories, values)
# Adding title and labels
plt.title('Simple Bar Plot')
plt.xlabel('Categories')
plt.ylabel('Values')
# Displaying the plot
plt.show()
Histogram
import matplotlib.pyplot as plt
# Data
data = [1, 1, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5]
# Creating the plot
plt.hist(data, bins=5)
# Adding title and labels
plt.title('Simple Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
# Displaying the plot
plt.show()
Pie Chart
import matplotlib.pyplot as plt
# Data
labels = ['A', 'B', 'C', 'D']
sizes = [15, 30, 45, 10]
colors = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue']
explode = (0.1, 0, 0, 0) # explode the 1st slice (i.e. 'A')
# Creating the plot
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=140)
# Adding title
plt.title('Simple Pie Chart')
# Displaying the plot
plt.show()
By following these implementations, you can create various basic plots using Matplotlib to visualize different types of data effectively.
Basic Plots with Seaborn
Seaborn is a Python visualization library based on Matplotlib that provides a high-level interface for drawing attractive and informative statistical graphics. In this section, we will cover how to create some basic plots with Seaborn.
Importing Libraries
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
Loading Example Dataset
We will use the built-in ‘tips’ dataset in Seaborn for our examples.
# Load the 'tips' dataset
df = sns.load_dataset('tips')
Scatter Plot
Scatter plots are used to observe relationships between variables.
# Scatter plot with regression line
sns.lmplot(x='total_bill', y='tip', data=df)
plt.title('Scatter Plot of Total Bill vs Tip')
plt.show()
# Scatter plot without regression line
sns.scatterplot(x='total_bill', y='tip', data=df)
plt.title('Scatter Plot of Total Bill vs Tip')
plt.show()
Line Plot
Line plots are used to visualize data points by connecting them with lines.
# Line plot
sns.lineplot(x='size', y='total_bill', data=df)
plt.title('Line Plot of Size vs Total Bill')
plt.show()
Histogram
Histograms are used to visualize the distribution of a single numerical variable.
# Histogram
sns.histplot(df['total_bill'], bins=30, kde=True)
plt.title('Histogram of Total Bill')
plt.show()
Box Plot
Box plots are used to show the distribution of quantitative data and compare between groups.
# Box plot
sns.boxplot(x='day', y='total_bill', data=df)
plt.title('Box Plot of Total Bill by Day')
plt.show()
Bar Plot
Bar plots are useful for visualizing the count or mean of a categorical variable.
# Bar plot of count per day
sns.countplot(x='day', data=df)
plt.title('Count Plot of Days')
plt.show()
# Bar plot of mean total_bill per day
sns.barplot(x='day', y='total_bill', data=df, estimator=np.mean)
plt.title('Mean Total Bill per Day')
plt.show()
By following the code snippets above, you can create various basic plots using Seaborn to visualize your data effectively. You can customize these plots further by referring to the Seaborn documentation for additional parameters and styling options.
Customizing Plots in Matplotlib
To customize plots in Matplotlib, we will look at different aspects such as title, axis labels, legend, and styles.
1. Importing Libraries
First, ensure that you have imported the necessary libraries:
import matplotlib.pyplot as plt
import numpy as np
2. Generating Sample Data
Create some sample data for demonstration.
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
3. Basic Customizations
3.1 Setting Titles and Labels
plt.plot(x, y1, label='Sine Wave')
plt.plot(x, y2, label='Cosine Wave')
plt.title("Sine and Cosine Waves") # Set the title
plt.xlabel("X-axis: Time (s)") # Set the x-axis label
plt.ylabel("Y-axis: Amplitude") # Set the y-axis label
3.2 Adding a Legend
plt.legend(loc='upper right') # Set the location of the legend
3.3 Customizing Lines and Markers
plt.plot(x, y1, color='blue', linestyle='--', linewidth=2, marker='o', markersize=5)
plt.plot(x, y2, color='red', linestyle='-', linewidth=1, marker='x', markersize=5)
3.4 Adding a Grid
plt.grid(True) # Display a grid
3.5 Adjusting Axis Limits
plt.xlim(0, 10) # Set x-axis limits
plt.ylim(-1.5, 1.5) # Set y-axis limits
3.6 Applying Styles
Matplotlib comes with several styles. Applying them can drastically change the appearance of your plot.
plt.style.use('seaborn-darkgrid') # Apply a pre-defined style
3.7 Displaying the Plot
plt.show() # Render the plot
Full Example
Bringing it all together, here’s a full example:
import matplotlib.pyplot as plt
import numpy as np
# Generate sample data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
# Customizing plots
plt.style.use('seaborn-darkgrid') # Apply style
plt.plot(x, y1, label='Sine Wave', color='blue', linestyle='--', linewidth=2, marker='o', markersize=5)
plt.plot(x, y2, label='Cosine Wave', color='red', linestyle='-', linewidth=1, marker='x', markersize=5)
plt.title("Sine and Cosine Waves") # Title
plt.xlabel("X-axis: Time (s)") # X-axis label
plt.ylabel("Y-axis: Amplitude") # Y-axis label
plt.legend(loc='upper right') # Legend
plt.grid(True) # Grid
plt.xlim(0, 10) # X-axis limits
plt.ylim(-1.5, 1.5) # Y-axis limits
plt.show() # Show plot
You now have a plot with customized titles, labels, legends, styling, and other elements that enhance its visual clarity and aesthetic appeal. This code can be directly run in a Python environment where Matplotlib is installed.
Customizing Plots in Seaborn
Customizing Seaborn plots involves modifying aesthetics, axes, titles, legends, and other elements to make the visuals more informative and appealing. Below are practical implementations to achieve these customizations:
Import Necessary Libraries
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Sample Data
data = pd.DataFrame({
'Category': ['A', 'B', 'C', 'D'],
'Values': [4, 3, 8, 6]
})
Basic Plot Customization
- Customizing Colors
sns.set(style='whitegrid') # Set style
plt.figure(figsize=(8, 5)) # Set figure size
# Bar Plot with custom colors
bar_plot = sns.barplot(x='Category', y='Values', data=data, palette='viridis')
- Adding Titles and Labels
bar_plot.set_title('Custom Bar Plot Title', fontsize=16) # Add title with custom font size
bar_plot.set_xlabel('Category Axis', fontsize=14) # Add x-axis label with custom font size
bar_plot.set_ylabel('Values Axis', fontsize=14) # Add y-axis label with custom font size
- Customizing Axes
# Customizing axis limits and tick parameters
bar_plot.set(ylim=(0, 10), xticks=[0, 1, 2, 3], yticks=[0, 2, 4, 6, 8, 10])
# Rotating x-axis labels for better readability
for item in bar_plot.get_xticklabels():
item.set_rotation(45)
- Adding Annotations
# Adding annotations to bars
for idx, row in data.iterrows():
bar_plot.text(idx, row['Values'] + 0.2, row['Values'], color='black', ha="center")
Advanced Plot Customization
- Customizing Legends
# Creating a line plot with different styles for legend customization example
line_plot = sns.lineplot(x='Category', y='Values', data=data, label='Line 1', color='blue')
# Customize legend
line_plot.legend(title='Legend Title', loc='upper left', fontsize='large', title_fontsize='13')
- FacetGrid for Complex Customization
# Creating a FacetGrid for multi-plot customization
facet = sns.FacetGrid(data, col="Category", col_wrap=2, height=4, aspect=1.5)
facet.map(sns.barplot, 'Category', 'Values')
# Adding titles and customizations to each facet
for ax in facet.axes.flat:
ax.set_title(ax.get_title().replace('Category = ', 'Category: '))
ax.set_xlabel('Custom X Label')
ax.set_ylabel('Custom Y Label')
# Adding annotation for facet plots
for idx, row in data.iterrows():
ax.text(idx, row['Values'] + 0.2, row['Values'], ha="center")
- Customizing Grids and Styles
# Customizing the grid style
sns.set(style='whitegrid', context='talk') # 'talk' context for larger elements
# Customizing ticks
sns.set_style("ticks", {"xtick.major.size": 8, "ytick.major.size": 8})
plt.figure(figsize=(8, 5))
# Regenerate a bar plot with new grid customizations
bar_plot = sns.barplot(x='Category', y='Values', data=data, palette='pastel')
Display Plot
# To ensure the plot renders in some environments
plt.show()
By integrating these codes into your Seaborn workflow, you can effectively customize various aspects of your visualizations to enhance readability and presentation quality.
Advanced Visualization Techniques with Matplotlib
1. Introduction
In this section, we will explore advanced visualization techniques using Matplotlib. We will cover the following topics:
- Subplots and Combining Multiple Plots
- 3D Plots
- Customizing Color Maps
- Creating Animations
2. Subplots and Combining Multiple Plots
Code Example
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
# Creating subplots
fig, axs = plt.subplots(2, 1, figsize=(10, 8))
axs[0].plot(x, y1, label='Sin(x)')
axs[0].set_title('Sine Wave')
axs[0].legend()
axs[1].plot(x, y2, label='Cos(x)', color='r')
axs[1].set_title('Cosine Wave')
axs[1].legend()
plt.tight_layout()
plt.show()
3. 3D Plots
Code Example
from mpl_toolkits.mplot3d import Axes3D
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
x, y = np.meshgrid(x, y)
z = np.sin(np.sqrt(x**2 + y**2))
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis')
ax.set_title("3D Surface Plot")
plt.show()
4. Customizing Color Maps
Code Example
data = np.random.rand(10, 10)
plt.figure(figsize=(8, 6))
plt.imshow(data, cmap='coolwarm', interpolation='nearest')
plt.colorbar()
plt.title("Custom Color Map")
plt.show()
5. Creating Animations
Code Example
import matplotlib.animation as animation
fig, ax = plt.subplots()
x = np.linspace(0, 2*np.pi, 100)
line, = ax.plot(x, np.sin(x))
def update(frame):
line.set_ydata(np.sin(x + frame / 10))
return line,
ani = animation.FuncAnimation(fig, update, frames=100, interval=50, blit=True)
plt.show()
These examples illustrate some advanced visualization techniques you can use with Matplotlib to enhance your data visualizations in Python.
Advanced Visualization Techniques with Seaborn
In this section, we’ll cover some advanced visualization techniques using Seaborn to help you create more informative and beautiful visualizations. We will explore:
- Heatmaps
- Pairplots
- FacetGrid
- JointPlots
- Violin Plots
Heatmaps
Heatmaps are useful for visualizing matrix-like data, showing patterns within the data matrix.
import seaborn as sns
import matplotlib.pyplot as plt
# Sample data
data = sns.load_dataset("flights").pivot("month", "year", "passengers")
# Create a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(data, annot=True, fmt="d", cmap="YlGnBu")
plt.title("Heatmap of Flight Passengers Over Years")
plt.show()
Pairplots
Pairplots are used to visualize relationships between multiple variables in a dataset.
import seaborn as sns
import matplotlib.pyplot as plt
# Sample data
data = sns.load_dataset("iris")
# Create a pairplot
sns.pairplot(data, hue="species", palette="husl")
plt.suptitle("Pairplot of Iris Data", y=1.02)
plt.show()
FacetGrid
FacetGrid is used for plotting multiple graphs based on the categories of a variable.
import seaborn as sns
import matplotlib.pyplot as plt
# Sample data
data = sns.load_dataset("tips")
# Create a FacetGrid
g = sns.FacetGrid(data, col="time", row="smoker", margin_titles=True)
g.map(sns.scatterplot, "total_bill", "tip")
plt.subplots_adjust(top=0.9)
g.fig.suptitle("FacetGrid of Tips Data")
plt.show()
JointPlots
JointPlots are useful for visualizing the relationship between two variables along with their marginal distributions.
import seaborn as sns
import matplotlib.pyplot as plt
# Sample data
data = sns.load_dataset("penguins")
# Create a jointplot
sns.jointplot(x="flipper_length_mm", y="bill_length_mm", data=data, kind="hex", color="k")
plt.suptitle("Jointplot of Penguins Data", y=1.02)
plt.show()
Violin Plots
Violin plots are used for visualizing the distribution of the data and its probability density.
import seaborn as sns
import matplotlib.pyplot as plt
# Sample data
data = sns.load_dataset("tips")
# Create a violin plot
plt.figure(figsize=(10, 6))
sns.violinplot(x="day", y="total_bill", hue="sex", data=data, palette="muted", split=True)
plt.title("Violin Plot of Tips Data by Day and Sex")
plt.show()
You can integrate these advanced techniques into your existing project to elevate the quality and informativeness of your visualizations.
Comparative Analysis of Seaborn and Matplotlib
9. Comparative Analysis of Seaborn and Matplotlib
For this section, we will perform a comparative analysis of Seaborn and Matplotlib by generating similar visualizations using both libraries. This will illustrate their differences in terms of syntax, aesthetics, and functionalities.
Dataset
To ensure a fair comparison, we will use the same dataset for both Seaborn and Matplotlib. Let’s use the famous Iris dataset for this comparison.
Code Implementation
Importing Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
# Load the Iris dataset
iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
iris_df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
Scatter Plot Comparison
Seaborn Implementation
# Seaborn Scatter Plot
plt.figure(figsize=(10, 6))
sns.scatterplot(data=iris_df, x='sepal length (cm)', y='sepal width (cm)', hue='species')
plt.title('Seaborn Scatter Plot of Sepal Length vs. Sepal Width')
plt.show()
Matplotlib Implementation
# Matplotlib Scatter Plot
plt.figure(figsize=(10, 6))
species_mapping = {'setosa': 'r', 'versicolor': 'g', 'virginica': 'b'}
for species, color in species_mapping.items():
subset = iris_df[iris_df['species'] == species]
plt.scatter(subset['sepal length (cm)'], subset['sepal width (cm)'], color=color, label=species)
plt.title('Matplotlib Scatter Plot of Sepal Length vs. Sepal Width')
plt.xlabel('sepal length (cm)')
plt.ylabel('sepal width (cm)')
plt.legend()
plt.show()
Histogram Comparison
Seaborn Implementation
# Seaborn Histogram
plt.figure(figsize=(10, 6))
sns.histplot(data=iris_df, x='sepal length (cm)', hue='species', multiple='stack')
plt.title('Seaborn Histogram of Sepal Length')
plt.show()
Matplotlib Implementation
# Matplotlib Histogram
plt.figure(figsize=(10, 6))
for species, color in species_mapping.items():
plt.hist(iris_df[iris_df['species'] == species]['sepal length (cm)'], bins=15, color=color, alpha=0.5, label=species)
plt.title('Matplotlib Histogram of Sepal Length')
plt.xlabel('sepal length (cm)')
plt.ylabel('Frequency')
plt.legend()
plt.show()
Pair Plot Comparison
Seaborn Implementation
# Seaborn Pair Plot
sns.pairplot(iris_df, hue='species', height=2.5)
plt.suptitle('Seaborn Pair Plot', y=1.02)
plt.show()
Matplotlib Implementation
# Matplotlib Pair Plot
from pandas.plotting import scatter_matrix
plt.figure(figsize=(12, 12))
scatter_matrix(iris_df, alpha=0.8, figsize=(12, 12), diagonal='hist', marker='o', c=iris.target, cmap='viridis')
plt.suptitle('Matplotlib Pair Plot', y=1.02)
plt.show()
Conclusion
From these examples, we see that:
- Seaborn provides a higher-level API for creating statistical graphics, providing built-in themes, and color palettes to make it easy to create aesthetically pleasing and complex visualizations.
- Matplotlib is more versatile and offers a more granular level of control over the style and layout of plots. However, it often requires more lines of code to achieve the same results as Seaborn.
This comparative analysis should give you a practical understanding of when to use each library and help you appreciate their respective strengths in data visualization tasks.
Case Studies and Practical Applications
Case Study 1: Analyzing Sales Trends with Matplotlib and Seaborn
Problem Statement:
A retail company wants to analyze its sales data over the past year to identify trends and make data-driven decisions. We will use Matplotlib for detailed customization and Seaborn for quick and informative visuals.
Data Preparation:
Assume we have the following columns in our sales data:
date
: The date of the sales entrysales
: The amount of salescategory
: Product category
Implementation:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
data = pd.read_csv('sales_data.csv')
# Convert 'date' column to datetime
data['date'] = pd.to_datetime(data['date'])
# Resample to monthly sales
monthly_sales = data.resample('M', on='date').sum()
# Plot monthly sales using Matplotlib
plt.figure(figsize=(10, 5))
plt.plot(monthly_sales.index, monthly_sales['sales'], marker='o')
plt.title('Monthly Sales Trend')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.grid(True)
plt.show()
# Plot sales distribution by category using Seaborn
plt.figure(figsize=(10, 5))
sns.boxplot(x='category', y='sales', data=data)
plt.title('Sales Distribution by Category')
plt.xlabel('Category')
plt.ylabel('Sales')
plt.show()
Case Study 2: Visualizing Customer Demographics
Problem Statement:
A marketing team needs to understand the demographic distribution of customers to tailor their marketing strategies. We will create visualizations to highlight age and income distributions among customers.
Data Preparation:
Assume we have the following columns in our customer data:
customer_id
: Unique identifier for customersage
: Age of the customerincome
: Income of the customer
Implementation:
# Load data
customer_data = pd.read_csv('customer_data.csv')
# Age distribution using Seaborn
plt.figure(figsize=(10, 5))
sns.histplot(customer_data['age'], bins=20, kde=True)
plt.title('Customer Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()
# Income distribution using Matplotlib
plt.figure(figsize=(10, 5))
plt.hist(customer_data['income'], bins=20, edgecolor='black')
plt.title('Customer Income Distribution')
plt.xlabel('Income')
plt.ylabel('Frequency')
plt.show()
Case Study 3: Performance Metrics Visualization
Problem Statement:
A software development team wants to visualize key performance metrics such as code commits, bug fixes, and feature deployments over time.
Data Preparation:
Assume we have the following columns in our performance metrics data:
week
: The week of the recordcommits
: Number of code commitsbug_fixes
: Number of bug fixesfeature_deployments
: Number of new features deployed
Implementation:
# Load data
performance_data = pd.read_csv('performance_metrics.csv')
# Convert 'week' column to datetime
performance_data['week'] = pd.to_datetime(performance_data['week'], format='%Y-%W%U')
# Plotting performance metrics trends
plt.figure(figsize=(10, 5))
# Commits
sns.lineplot(x='week', y='commits', data=performance_data, marker='o', label='Commits')
# Bug fixes
sns.lineplot(x='week', y='bug_fixes', data=performance_data, marker='o', label='Bug Fixes')
# Feature deployments
sns.lineplot(x='week', y='feature_deployments', data=performance_data, marker='o', label='Feature Deployments')
plt.title('Weekly Performance Metrics')
plt.xlabel('Week')
plt.ylabel('Count')
plt.legend()
plt.grid(True)
plt.show()
These case studies provide real-world applications demonstrating how to leverage Matplotlib and Seaborn for data visualization in different scenarios. This implementation covers various aspects of data visualization, including temporal trends, categorical distributions, and performance metrics, making it readily applicable for practical usage.