Matplotlib: Data Visualization¶
Introduction¶
Matplotlib is Python’s most popular library for creating static, interactive, and animated visualizations. It helps turn data into insights through charts and graphs.
Note
“A picture is worth a thousand words” - especially in data analysis! Matplotlib makes it easy to create publication-quality charts that tell your data’s story. 📊
Installing Matplotlib¶
pip install matplotlib
Baby Steps: Your First Plot¶
import matplotlib.pyplot as plt
# Simple line plot
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.show()
Adding Labels and Title:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('My First Plot')
plt.grid(True) # Add grid
plt.show()
Note
Always label your axes and add a title! A chart without labels is like a story without context.
Line Graphs¶
Single Line:
import matplotlib.pyplot as plt
weeks = [1, 2, 3, 4, 5, 6, 7, 8]
confidence = [9, 8, 7, 5, 4, 3, 5, 6]
plt.plot(weeks, confidence, marker='o', linestyle='-', color='blue', linewidth=2)
plt.xlabel('Weeks into Semester')
plt.ylabel('Confidence Level (1-10)')
plt.title('The Reality of Learning Python')
plt.grid(True, alpha=0.3)
plt.show()
Multiple Lines:
import matplotlib.pyplot as plt
weeks = [1, 2, 3, 4, 5, 6, 7, 8]
confidence = [9, 8, 7, 5, 4, 3, 5, 6]
motivation = [10, 9, 7, 6, 4, 3, 4, 5]
sleep_hours = [7, 6, 6, 5, 4, 3, 4, 5]
plt.plot(weeks, confidence, marker='o', label='Confidence')
plt.plot(weeks, motivation, marker='s', label='Motivation')
plt.plot(weeks, sleep_hours, marker='^', label='Sleep Hours')
plt.xlabel('Weeks into Semester')
plt.ylabel('Level / Hours')
plt.title('Student Life: A Visual Tragedy')
plt.legend() # Show legend
plt.grid(True, alpha=0.3)
plt.show()
Real-World Example: Learning Journey¶
import matplotlib.pyplot as plt
import numpy as np
# The emotional journey of learning Python
events = ['Start', 'Hello\nWorld', 'Variables', 'Loops', 'Functions',
'OOP', 'Debugging', 'Projects', 'Realization']
weeks = np.arange(len(events))
confidence = [7, 10, 8, 6, 4, 3, 2, 5, 4]
plt.figure(figsize=(12, 6))
plt.plot(weeks, confidence, marker='o', linewidth=2, markersize=8,
color='#2962FF', markerfacecolor='orange')
# Annotate key moments
plt.annotate('Peak of Confidence', xy=(1, 10), xytext=(1.5, 9),
arrowprops=dict(facecolor='green', shrink=0.05))
plt.annotate('Valley of Despair', xy=(6, 2), xytext=(6.5, 3),
arrowprops=dict(facecolor='red', shrink=0.05))
plt.xticks(weeks, events, rotation=45)
plt.xlabel('Learning Milestones')
plt.ylabel('Confidence Level (1-10)')
plt.title('The Emotional Journey of Learning Python\n(Emotional Support: You\'re Not Alone!)')
plt.grid(True, alpha=0.3)
plt.ylim(0, 11)
plt.tight_layout()
plt.show()
Bar Charts¶
Vertical Bars:
import matplotlib.pyplot as plt
subjects = ['Python', 'Java', 'WebDev', 'Database']
scores = [85, 78, 92, 88]
plt.bar(subjects, scores, color='skyblue', edgecolor='black')
plt.xlabel('Subjects')
plt.ylabel('Scores')
plt.title('My Grades')
plt.ylim(0, 100)
plt.show()
Horizontal Bars:
import matplotlib.pyplot as plt
subjects = ['Python', 'Java', 'WebDev', 'Database']
scores = [85, 78, 92, 88]
plt.barh(subjects, scores, color='lightcoral')
plt.xlabel('Scores')
plt.ylabel('Subjects')
plt.title('My Grades (Horizontal)')
plt.xlim(0, 100)
plt.show()
Grouped Bars:
import matplotlib.pyplot as plt
import numpy as np
activities = ['Studying', 'Gaming', 'Social Media', 'Sleep', 'Procrastination']
claimed_hours = [8, 2, 1, 8, 1]
actual_hours = [3, 6, 4, 5, 6]
x = np.arange(len(activities))
width = 0.35
fig, ax = plt.subplots(figsize=(10, 6))
bars1 = ax.bar(x - width/2, claimed_hours, width, label='Claimed Hours',
color='lightgreen', edgecolor='black')
bars2 = ax.bar(x + width/2, actual_hours, width, label='Actual Hours',
color='salmon', edgecolor='black')
ax.set_xlabel('Activities')
ax.set_ylabel('Hours per Day')
ax.set_title('Time Spent vs Time Claimed to Spend\n(Why Time Tracking Apps Make Students Cry)')
ax.set_xticks(x)
ax.set_xticklabels(activities, rotation=45)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
Note
Grouped bar charts are perfect for comparing two or more sets of data across categories. Notice how “Actual Hours” tells a very different story than “Claimed Hours”! 😅
Scatter Plots¶
Basic Scatter Plot:
import matplotlib.pyplot as plt
study_hours = [2, 4, 6, 8, 10, 3, 5, 7, 9, 11]
grades = [50, 60, 70, 80, 90, 55, 65, 75, 85, 95]
plt.scatter(study_hours, grades, color='purple', s=100, alpha=0.6)
plt.xlabel('Study Hours per Week')
plt.ylabel('Grade (%)')
plt.title('Study Hours vs Grades')
plt.grid(True, alpha=0.3)
plt.show()
Advanced Scatter with Categories:
import matplotlib.pyplot as plt
import numpy as np
# Coffee consumption vs code quality study
np.random.seed(42)
# Different student categories
tea_drinkers_coffee = np.random.uniform(0, 2, 10)
tea_drinkers_bugs = np.random.uniform(10, 30, 10)
moderate_coffee = np.random.uniform(2, 5, 10)
moderate_bugs = np.random.uniform(5, 20, 10)
addicts_coffee = np.random.uniform(5, 8, 10)
addicts_bugs = np.random.uniform(15, 40, 10)
zombies_coffee = np.random.uniform(8, 12, 10)
zombies_bugs = np.random.uniform(30, 60, 10)
plt.figure(figsize=(10, 6))
plt.scatter(tea_drinkers_coffee, tea_drinkers_bugs,
label='Tea Drinkers', s=100, alpha=0.6, marker='o')
plt.scatter(moderate_coffee, moderate_bugs,
label='Moderate Coffee Users', s=100, alpha=0.6, marker='s')
plt.scatter(addicts_coffee, addicts_bugs,
label='Caffeine Addicts', s=100, alpha=0.6, marker='^')
plt.scatter(zombies_coffee, zombies_bugs,
label='Energy Drink Zombies', s=100, alpha=0.6, marker='D')
plt.xlabel('Coffee Cups per Day')
plt.ylabel('Bugs per 100 Lines of Code')
plt.title('Relationship Between Coffee Consumption and Code Quality\n(Correlation ≠ Causation... or does it?)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Pie Charts¶
import matplotlib.pyplot as plt
activities = ['Actually Studying', 'Planning to Study', 'Netflix Research',
'Social Media\nNetworking', 'Existential Crisis', 'Sleep']
time_spent = [5, 15, 25, 30, 15, 10]
colors = ['#ff6b6b', '#4ecdc4', '#45b7d1', '#96ceb4', '#ffeaa7', '#dfe6e9']
explode = (0, 0.1, 0.1, 0.1, 0, 0) # Explode some slices
plt.figure(figsize=(10, 8))
plt.pie(time_spent, labels=activities, colors=colors, autopct='%1.1f%%',
startangle=90, explode=explode, shadow=True)
plt.title('Where My Semester Went\n(A Pie Chart of Regret)', fontsize=14, fontweight='bold')
plt.axis('equal') # Equal aspect ratio ensures circular pie
plt.tight_layout()
plt.show()
Note
Pie charts are great for showing proportions, but use them sparingly. If you have more than 6-7 categories, consider a bar chart instead!
Histograms¶
import matplotlib.pyplot as plt
import numpy as np
# Generate random exam scores
np.random.seed(42)
scores = np.random.normal(70, 15, 100) # mean=70, std=15, 100 students
plt.figure(figsize=(10, 6))
plt.hist(scores, bins=20, color='steelblue', edgecolor='black', alpha=0.7)
plt.xlabel('Exam Scores')
plt.ylabel('Number of Students')
plt.title('Distribution of Exam Scores')
plt.axvline(scores.mean(), color='red', linestyle='--', linewidth=2,
label=f'Mean: {scores.mean():.1f}')
plt.legend()
plt.grid(True, alpha=0.3, axis='y')
plt.show()
Subplots (Multiple Plots)¶
import matplotlib.pyplot as plt
import numpy as np
# Create 2x2 grid of plots
fig, axs = plt.subplots(2, 2, figsize=(12, 10))
# Plot 1: Line graph
weeks = np.arange(1, 9)
confidence = [9, 8, 7, 5, 4, 3, 5, 6]
axs[0, 0].plot(weeks, confidence, marker='o', color='blue')
axs[0, 0].set_title('Confidence Over Time')
axs[0, 0].grid(True, alpha=0.3)
# Plot 2: Bar chart
subjects = ['Py', 'Java', 'Web', 'DB']
scores = [85, 78, 92, 88]
axs[0, 1].bar(subjects, scores, color='green')
axs[0, 1].set_title('Subject Scores')
# Plot 3: Scatter plot
study = np.random.rand(20) * 10
grades = study * 8 + np.random.rand(20) * 10
axs[1, 0].scatter(study, grades, color='red', alpha=0.6)
axs[1, 0].set_title('Study vs Grades')
# Plot 4: Pie chart
activities = ['Study', 'Gaming', 'Sleep', 'Other']
time = [20, 30, 25, 25]
axs[1, 1].pie(time, labels=activities, autopct='%1.1f%%')
axs[1, 1].set_title('Time Distribution')
plt.tight_layout()
plt.show()
Customization and Styling¶
import matplotlib.pyplot as plt
import numpy as np
# Use built-in styles
plt.style.use('seaborn-v0_8-darkgrid') # or 'ggplot', 'bmh', 'fivethirtyeight'
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.figure(figsize=(10, 6))
plt.plot(x, y, linewidth=3, color='#2962FF')
plt.xlabel('X', fontsize=14, fontweight='bold')
plt.ylabel('Y', fontsize=14, fontweight='bold')
plt.title('Sine Wave', fontsize=16, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.show()
# Reset to default style
plt.style.use('default')
Saving Figures¶
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.title('My Plot')
# Save in different formats
plt.savefig('my_plot.png', dpi=300, bbox_inches='tight')
plt.savefig('my_plot.pdf', bbox_inches='tight')
plt.savefig('my_plot.jpg', dpi=150, bbox_inches='tight')
plt.show()
Note
Always use bbox_inches='tight' to avoid cutting off labels. Higher DPI (dots per inch) = better quality but larger file size. 300 DPI is good for printing.
Tasks¶
Task 1: Emotional Journey Visualization
Create a line graph titled “The Emotional Journey of Learning Python” with weeks (1-12) on x-axis and confidence level (1-10) on y-axis. Include data points for: Week 1 (confidence=9, “First Hello World”), Week 3 (confidence=7, “Variables”), Week 5 (confidence=4, “Discovered Debugging”), Week 8 (confidence=6, “Understood OOP”), Week 12 (confidence=3, “Realized How Much I Don’t Know”). Add annotations for key events.
Hint: Use plt.annotate() with xy and xytext parameters. Use plt.plot() with marker='o' for data points.
Task 2: Time Reality Check Bar Chart
Create a grouped bar chart comparing “Time Claimed” vs “Actual Time” for 5 activities: Studying, Gaming, Social Media, Productive Procrastination, Sleep. Use different colors for each group. Add error bars representing ±2 hours “Self-Deception Margin”. Include subtitle: “Why Time Tracking Apps Make Students Cry”.
Hint: Use plt.bar() with x - width/2 and x + width/2 for grouped bars. Add yerr parameter for error bars.
Task 3: Coffee vs Code Quality Scatter Plot
Create a scatter plot with coffee_cups_per_day (0-10) on x-axis and bugs_per_hundred_lines (0-50) on y-axis. Create 4 categories: Tea Drinkers (0-2 cups), Moderate (2-5 cups), Addicts (5-8 cups), Zombies (8+ cups). Use different colors and markers for each. Add a trend line using np.polyfit() and np.poly1d().
Hint: Generate random data for each category. Use np.polyfit(x, y, 1) for linear fit, then plt.plot() for trend line.
Task 4: Semester Distribution Pie Chart
Create a pie chart titled “Where My Semester Went” with slices: Actually Studying (8%), Planning to Study (20%), Netflix Research (25%), Social Media Networking (22%), Existential Crisis (15%), Sleep (10%). Use custom colors reflecting emotional states. Explode the “Netflix Research” slice. Add percentage labels.
Hint: Use explode parameter as tuple with 0.1 for slices to explode. Use autopct='%1.1f%%' for percentages.
Task 5: Multi-Plot Dashboard
Create a 2×2 subplot figure showing: (1) Line: Motivation over 8 weeks, (2) Bar: Grades in 4 subjects, (3) Scatter: Study hours vs Assignment scores, (4) Histogram: Distribution of daily screen time. Add appropriate titles, labels, and styling to each subplot.
Hint: Use fig, axs = plt.subplots(2, 2, figsize=(12, 10)). Access plots using axs[row, col]. Use plt.tight_layout() to prevent overlap.
Summary¶
Matplotlib is the standard library for data visualization in Python
plt.plot()creates line graphsplt.bar()andplt.barh()create bar chartsplt.scatter()creates scatter plotsplt.pie()creates pie chartsplt.hist()creates histogramsAlways add labels, titles, and legends for clarity
Use
plt.subplots()to create multiple plotsCustomize colors, markers, line styles for better visuals
Save figures using
plt.savefig()Matplotlib works seamlessly with NumPy and Pandas