Matplotlib: Data Visualization

Introduction

Matplotlib is Python’s most popular library for creating static, interactive, and animated visualizations. It helps turn data into insights through charts and graphs.

Note

“A picture is worth a thousand words” - especially in data analysis! Matplotlib makes it easy to create publication-quality charts that tell your data’s story. 📊


Installing Matplotlib

pip install matplotlib

Baby Steps: Your First Plot

import matplotlib.pyplot as plt

# Simple line plot
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y)
plt.show()

Adding Labels and Title:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y)
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('My First Plot')
plt.grid(True)  # Add grid
plt.show()

Note

Always label your axes and add a title! A chart without labels is like a story without context.


Line Graphs

Single Line:

import matplotlib.pyplot as plt

weeks = [1, 2, 3, 4, 5, 6, 7, 8]
confidence = [9, 8, 7, 5, 4, 3, 5, 6]

plt.plot(weeks, confidence, marker='o', linestyle='-', color='blue', linewidth=2)
plt.xlabel('Weeks into Semester')
plt.ylabel('Confidence Level (1-10)')
plt.title('The Reality of Learning Python')
plt.grid(True, alpha=0.3)
plt.show()

Multiple Lines:

import matplotlib.pyplot as plt

weeks = [1, 2, 3, 4, 5, 6, 7, 8]
confidence = [9, 8, 7, 5, 4, 3, 5, 6]
motivation = [10, 9, 7, 6, 4, 3, 4, 5]
sleep_hours = [7, 6, 6, 5, 4, 3, 4, 5]

plt.plot(weeks, confidence, marker='o', label='Confidence')
plt.plot(weeks, motivation, marker='s', label='Motivation')
plt.plot(weeks, sleep_hours, marker='^', label='Sleep Hours')

plt.xlabel('Weeks into Semester')
plt.ylabel('Level / Hours')
plt.title('Student Life: A Visual Tragedy')
plt.legend()  # Show legend
plt.grid(True, alpha=0.3)
plt.show()

Real-World Example: Learning Journey

import matplotlib.pyplot as plt
import numpy as np

# The emotional journey of learning Python
events = ['Start', 'Hello\nWorld', 'Variables', 'Loops', 'Functions',
          'OOP', 'Debugging', 'Projects', 'Realization']
weeks = np.arange(len(events))
confidence = [7, 10, 8, 6, 4, 3, 2, 5, 4]

plt.figure(figsize=(12, 6))
plt.plot(weeks, confidence, marker='o', linewidth=2, markersize=8,
         color='#2962FF', markerfacecolor='orange')

# Annotate key moments
plt.annotate('Peak of Confidence', xy=(1, 10), xytext=(1.5, 9),
             arrowprops=dict(facecolor='green', shrink=0.05))
plt.annotate('Valley of Despair', xy=(6, 2), xytext=(6.5, 3),
             arrowprops=dict(facecolor='red', shrink=0.05))

plt.xticks(weeks, events, rotation=45)
plt.xlabel('Learning Milestones')
plt.ylabel('Confidence Level (1-10)')
plt.title('The Emotional Journey of Learning Python\n(Emotional Support: You\'re Not Alone!)')
plt.grid(True, alpha=0.3)
plt.ylim(0, 11)
plt.tight_layout()
plt.show()

Bar Charts

Vertical Bars:

import matplotlib.pyplot as plt

subjects = ['Python', 'Java', 'WebDev', 'Database']
scores = [85, 78, 92, 88]

plt.bar(subjects, scores, color='skyblue', edgecolor='black')
plt.xlabel('Subjects')
plt.ylabel('Scores')
plt.title('My Grades')
plt.ylim(0, 100)
plt.show()

Horizontal Bars:

import matplotlib.pyplot as plt

subjects = ['Python', 'Java', 'WebDev', 'Database']
scores = [85, 78, 92, 88]

plt.barh(subjects, scores, color='lightcoral')
plt.xlabel('Scores')
plt.ylabel('Subjects')
plt.title('My Grades (Horizontal)')
plt.xlim(0, 100)
plt.show()

Grouped Bars:

import matplotlib.pyplot as plt
import numpy as np

activities = ['Studying', 'Gaming', 'Social Media', 'Sleep', 'Procrastination']
claimed_hours = [8, 2, 1, 8, 1]
actual_hours = [3, 6, 4, 5, 6]

x = np.arange(len(activities))
width = 0.35

fig, ax = plt.subplots(figsize=(10, 6))
bars1 = ax.bar(x - width/2, claimed_hours, width, label='Claimed Hours',
                color='lightgreen', edgecolor='black')
bars2 = ax.bar(x + width/2, actual_hours, width, label='Actual Hours',
                color='salmon', edgecolor='black')

ax.set_xlabel('Activities')
ax.set_ylabel('Hours per Day')
ax.set_title('Time Spent vs Time Claimed to Spend\n(Why Time Tracking Apps Make Students Cry)')
ax.set_xticks(x)
ax.set_xticklabels(activities, rotation=45)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

Note

Grouped bar charts are perfect for comparing two or more sets of data across categories. Notice how “Actual Hours” tells a very different story than “Claimed Hours”! 😅


Scatter Plots

Basic Scatter Plot:

import matplotlib.pyplot as plt

study_hours = [2, 4, 6, 8, 10, 3, 5, 7, 9, 11]
grades = [50, 60, 70, 80, 90, 55, 65, 75, 85, 95]

plt.scatter(study_hours, grades, color='purple', s=100, alpha=0.6)
plt.xlabel('Study Hours per Week')
plt.ylabel('Grade (%)')
plt.title('Study Hours vs Grades')
plt.grid(True, alpha=0.3)
plt.show()

Advanced Scatter with Categories:

import matplotlib.pyplot as plt
import numpy as np

# Coffee consumption vs code quality study
np.random.seed(42)

# Different student categories
tea_drinkers_coffee = np.random.uniform(0, 2, 10)
tea_drinkers_bugs = np.random.uniform(10, 30, 10)

moderate_coffee = np.random.uniform(2, 5, 10)
moderate_bugs = np.random.uniform(5, 20, 10)

addicts_coffee = np.random.uniform(5, 8, 10)
addicts_bugs = np.random.uniform(15, 40, 10)

zombies_coffee = np.random.uniform(8, 12, 10)
zombies_bugs = np.random.uniform(30, 60, 10)

plt.figure(figsize=(10, 6))
plt.scatter(tea_drinkers_coffee, tea_drinkers_bugs,
            label='Tea Drinkers', s=100, alpha=0.6, marker='o')
plt.scatter(moderate_coffee, moderate_bugs,
            label='Moderate Coffee Users', s=100, alpha=0.6, marker='s')
plt.scatter(addicts_coffee, addicts_bugs,
            label='Caffeine Addicts', s=100, alpha=0.6, marker='^')
plt.scatter(zombies_coffee, zombies_bugs,
            label='Energy Drink Zombies', s=100, alpha=0.6, marker='D')

plt.xlabel('Coffee Cups per Day')
plt.ylabel('Bugs per 100 Lines of Code')
plt.title('Relationship Between Coffee Consumption and Code Quality\n(Correlation ≠ Causation... or does it?)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Pie Charts

import matplotlib.pyplot as plt

activities = ['Actually Studying', 'Planning to Study', 'Netflix Research',
              'Social Media\nNetworking', 'Existential Crisis', 'Sleep']
time_spent = [5, 15, 25, 30, 15, 10]
colors = ['#ff6b6b', '#4ecdc4', '#45b7d1', '#96ceb4', '#ffeaa7', '#dfe6e9']
explode = (0, 0.1, 0.1, 0.1, 0, 0)  # Explode some slices

plt.figure(figsize=(10, 8))
plt.pie(time_spent, labels=activities, colors=colors, autopct='%1.1f%%',
        startangle=90, explode=explode, shadow=True)
plt.title('Where My Semester Went\n(A Pie Chart of Regret)', fontsize=14, fontweight='bold')
plt.axis('equal')  # Equal aspect ratio ensures circular pie
plt.tight_layout()
plt.show()

Note

Pie charts are great for showing proportions, but use them sparingly. If you have more than 6-7 categories, consider a bar chart instead!


Histograms

import matplotlib.pyplot as plt
import numpy as np

# Generate random exam scores
np.random.seed(42)
scores = np.random.normal(70, 15, 100)  # mean=70, std=15, 100 students

plt.figure(figsize=(10, 6))
plt.hist(scores, bins=20, color='steelblue', edgecolor='black', alpha=0.7)
plt.xlabel('Exam Scores')
plt.ylabel('Number of Students')
plt.title('Distribution of Exam Scores')
plt.axvline(scores.mean(), color='red', linestyle='--', linewidth=2,
            label=f'Mean: {scores.mean():.1f}')
plt.legend()
plt.grid(True, alpha=0.3, axis='y')
plt.show()

Subplots (Multiple Plots)

import matplotlib.pyplot as plt
import numpy as np

# Create 2x2 grid of plots
fig, axs = plt.subplots(2, 2, figsize=(12, 10))

# Plot 1: Line graph
weeks = np.arange(1, 9)
confidence = [9, 8, 7, 5, 4, 3, 5, 6]
axs[0, 0].plot(weeks, confidence, marker='o', color='blue')
axs[0, 0].set_title('Confidence Over Time')
axs[0, 0].grid(True, alpha=0.3)

# Plot 2: Bar chart
subjects = ['Py', 'Java', 'Web', 'DB']
scores = [85, 78, 92, 88]
axs[0, 1].bar(subjects, scores, color='green')
axs[0, 1].set_title('Subject Scores')

# Plot 3: Scatter plot
study = np.random.rand(20) * 10
grades = study * 8 + np.random.rand(20) * 10
axs[1, 0].scatter(study, grades, color='red', alpha=0.6)
axs[1, 0].set_title('Study vs Grades')

# Plot 4: Pie chart
activities = ['Study', 'Gaming', 'Sleep', 'Other']
time = [20, 30, 25, 25]
axs[1, 1].pie(time, labels=activities, autopct='%1.1f%%')
axs[1, 1].set_title('Time Distribution')

plt.tight_layout()
plt.show()

Customization and Styling

import matplotlib.pyplot as plt
import numpy as np

# Use built-in styles
plt.style.use('seaborn-v0_8-darkgrid')  # or 'ggplot', 'bmh', 'fivethirtyeight'

x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.figure(figsize=(10, 6))
plt.plot(x, y, linewidth=3, color='#2962FF')
plt.xlabel('X', fontsize=14, fontweight='bold')
plt.ylabel('Y', fontsize=14, fontweight='bold')
plt.title('Sine Wave', fontsize=16, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.show()

# Reset to default style
plt.style.use('default')

Saving Figures

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y)
plt.title('My Plot')

# Save in different formats
plt.savefig('my_plot.png', dpi=300, bbox_inches='tight')
plt.savefig('my_plot.pdf', bbox_inches='tight')
plt.savefig('my_plot.jpg', dpi=150, bbox_inches='tight')

plt.show()

Note

Always use bbox_inches='tight' to avoid cutting off labels. Higher DPI (dots per inch) = better quality but larger file size. 300 DPI is good for printing.


Tasks

Task 1: Emotional Journey Visualization

Create a line graph titled “The Emotional Journey of Learning Python” with weeks (1-12) on x-axis and confidence level (1-10) on y-axis. Include data points for: Week 1 (confidence=9, “First Hello World”), Week 3 (confidence=7, “Variables”), Week 5 (confidence=4, “Discovered Debugging”), Week 8 (confidence=6, “Understood OOP”), Week 12 (confidence=3, “Realized How Much I Don’t Know”). Add annotations for key events.

Hint: Use plt.annotate() with xy and xytext parameters. Use plt.plot() with marker='o' for data points.

Task 2: Time Reality Check Bar Chart

Create a grouped bar chart comparing “Time Claimed” vs “Actual Time” for 5 activities: Studying, Gaming, Social Media, Productive Procrastination, Sleep. Use different colors for each group. Add error bars representing ±2 hours “Self-Deception Margin”. Include subtitle: “Why Time Tracking Apps Make Students Cry”.

Hint: Use plt.bar() with x - width/2 and x + width/2 for grouped bars. Add yerr parameter for error bars.

Task 3: Coffee vs Code Quality Scatter Plot

Create a scatter plot with coffee_cups_per_day (0-10) on x-axis and bugs_per_hundred_lines (0-50) on y-axis. Create 4 categories: Tea Drinkers (0-2 cups), Moderate (2-5 cups), Addicts (5-8 cups), Zombies (8+ cups). Use different colors and markers for each. Add a trend line using np.polyfit() and np.poly1d().

Hint: Generate random data for each category. Use np.polyfit(x, y, 1) for linear fit, then plt.plot() for trend line.

Task 4: Semester Distribution Pie Chart

Create a pie chart titled “Where My Semester Went” with slices: Actually Studying (8%), Planning to Study (20%), Netflix Research (25%), Social Media Networking (22%), Existential Crisis (15%), Sleep (10%). Use custom colors reflecting emotional states. Explode the “Netflix Research” slice. Add percentage labels.

Hint: Use explode parameter as tuple with 0.1 for slices to explode. Use autopct='%1.1f%%' for percentages.

Task 5: Multi-Plot Dashboard

Create a 2×2 subplot figure showing: (1) Line: Motivation over 8 weeks, (2) Bar: Grades in 4 subjects, (3) Scatter: Study hours vs Assignment scores, (4) Histogram: Distribution of daily screen time. Add appropriate titles, labels, and styling to each subplot.

Hint: Use fig, axs = plt.subplots(2, 2, figsize=(12, 10)). Access plots using axs[row, col]. Use plt.tight_layout() to prevent overlap.


Summary

  • Matplotlib is the standard library for data visualization in Python

  • plt.plot() creates line graphs

  • plt.bar() and plt.barh() create bar charts

  • plt.scatter() creates scatter plots

  • plt.pie() creates pie charts

  • plt.hist() creates histograms

  • Always add labels, titles, and legends for clarity

  • Use plt.subplots() to create multiple plots

  • Customize colors, markers, line styles for better visuals

  • Save figures using plt.savefig()

  • Matplotlib works seamlessly with NumPy and Pandas