NumPy Basics: Arrays and Numerical Computing

Introduction

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides powerful tools for working with arrays, mathematical operations, and statistical analysis.

Note

NumPy is essential for data science, machine learning, and scientific computing. It’s much faster than regular Python lists for numerical operations because it uses optimized C code under the hood.


Installing NumPy

pip install numpy

Why NumPy?

The Problem with Python Lists:

# Using regular Python lists (slow for large data)
numbers = [1, 2, 3, 4, 5]
doubled = [x * 2 for x in numbers]
print(doubled)  # [2, 4, 6, 8, 10]

The NumPy Solution:

import numpy as np

# Using NumPy arrays (fast and efficient)
numbers = np.array([1, 2, 3, 4, 5])
doubled = numbers * 2
print(doubled)  # [2 4 6 8 10]

Note

NumPy operations are vectorized - they apply to entire arrays at once, making them 10-100x faster than Python loops for large datasets.


Baby Steps: Creating NumPy Arrays

1. From Python Lists

import numpy as np

# 1D array
arr1 = np.array([1, 2, 3, 4, 5])
print(arr1)           # [1 2 3 4 5]
print(type(arr1))     # <class 'numpy.ndarray'>

# 2D array (matrix)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2)
# [[1 2 3]
#  [4 5 6]]

# Check dimensions
print(arr1.shape)     # (5,)
print(arr2.shape)     # (2, 3) - 2 rows, 3 columns

2. Using NumPy Functions

# Array of zeros
zeros = np.zeros(5)
print(zeros)          # [0. 0. 0. 0. 0.]

# Array of ones
ones = np.ones((3, 4))
print(ones)
# [[1. 1. 1. 1.]
#  [1. 1. 1. 1.]
#  [1. 1. 1. 1.]]

# Range of values
range_arr = np.arange(0, 10, 2)  # start, stop, step
print(range_arr)      # [0 2 4 6 8]

# Evenly spaced values
linear = np.linspace(0, 1, 5)    # start, stop, count
print(linear)         # [0.   0.25 0.5  0.75 1.  ]

# Random numbers
random = np.random.rand(3, 3)    # 3x3 array of random values
print(random)

Note

np.arange() is similar to Python’s range(), but returns a NumPy array. np.linspace() is useful when you need a specific number of evenly spaced points.


Array Attributes and Information

import numpy as np

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

print(arr.shape)      # (3, 4) - shape of array
print(arr.ndim)       # 2 - number of dimensions
print(arr.size)       # 12 - total number of elements
print(arr.dtype)      # int32 or int64 - data type

# Change data type
arr_float = arr.astype(float)
print(arr_float.dtype)  # float64

Array Indexing and Slicing

import numpy as np

# 1D array indexing
arr = np.array([10, 20, 30, 40, 50])
print(arr[0])         # 10 - first element
print(arr[-1])        # 50 - last element
print(arr[1:4])       # [20 30 40] - slice

# 2D array indexing
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix[0, 0])   # 1 - row 0, column 0
print(matrix[1, 2])   # 6 - row 1, column 2
print(matrix[0])      # [1 2 3] - entire first row
print(matrix[:, 0])   # [1 4 7] - entire first column

# Slicing 2D arrays
print(matrix[0:2, 1:3])
# [[2 3]
#  [5 6]]

Basic Array Operations

import numpy as np

arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([10, 20, 30, 40])

# Element-wise operations
print(arr1 + arr2)    # [11 22 33 44]
print(arr1 - arr2)    # [-9 -18 -27 -36]
print(arr1 * arr2)    # [10 40 90 160]
print(arr2 / arr1)    # [10. 10. 10. 10.]
print(arr1 ** 2)      # [1 4 9 16]

# Scalar operations
print(arr1 + 10)      # [11 12 13 14]
print(arr1 * 2)       # [2 4 6 8]

# Comparison operations
print(arr1 > 2)       # [False False  True  True]
print(arr2 == 30)     # [False False  True False]

Statistical Functions (Important!)

import numpy as np

# Student test scores
scores = np.array([45, 67, 89, 56, 78, 90, 34, 88, 76, 82])

print(f"Mean (Average): {np.mean(scores):.2f}")        # 70.50
print(f"Median (Middle): {np.median(scores):.2f}")     # 77.00
print(f"Standard Deviation: {np.std(scores):.2f}")     # 18.44
print(f"Variance: {np.var(scores):.2f}")               # 340.05
print(f"Minimum: {np.min(scores)}")                    # 34
print(f"Maximum: {np.max(scores)}")                    # 90
print(f"Sum: {np.sum(scores)}")                        # 705

Note

Standard Deviation tells you how spread out the data is. Higher std = more variation. Lower std = more consistent values. If your std is higher than your average score… you might want to study more consistently! 📚


Real-World Example: Student Performance Analysis

import numpy as np

# Weekly study hours for 10 students
study_hours = np.array([5, 12, 8, 15, 3, 20, 7, 10, 18, 6])

# Their exam scores (out of 100)
exam_scores = np.array([45, 78, 60, 85, 35, 95, 55, 70, 92, 50])

# Statistical analysis
print("=== Study Hours Analysis ===")
print(f"Average study time: {np.mean(study_hours):.2f} hours")
print(f"Standard deviation: {np.std(study_hours):.2f} hours")
print(f"Most hours studied: {np.max(study_hours)}")
print(f"Least hours studied: {np.min(study_hours)}")

print("\n=== Exam Scores Analysis ===")
print(f"Average score: {np.mean(exam_scores):.2f}")
print(f"Standard deviation: {np.std(exam_scores):.2f}")
print(f"Highest score: {np.max(exam_scores)}")
print(f"Lowest score: {np.min(exam_scores)}")

# Find correlation (do more study hours = better scores?)
correlation = np.corrcoef(study_hours, exam_scores)[0, 1]
print(f"\nCorrelation: {correlation:.2f}")
if correlation > 0.7:
    print("Strong positive correlation - studying helps!")
elif correlation > 0.4:
    print("Moderate correlation - studying somewhat helps")
else:
    print("Weak correlation - maybe study smarter, not harder?")

Advanced: Boolean Indexing and Filtering

import numpy as np

scores = np.array([45, 67, 89, 56, 78, 90, 34, 88, 76, 82])

# Filter scores above 75
high_scorers = scores[scores > 75]
print(high_scorers)   # [89 78 90 88 76 82]

# Count how many passed (>= 40)
passed = scores[scores >= 40]
print(f"Students passed: {len(passed)}/{len(scores)}")

# Multiple conditions
good_scores = scores[(scores >= 60) & (scores <= 85)]
print(good_scores)    # [67 78 76 82]

Array Reshaping and Manipulation

import numpy as np

# Create 1D array
arr = np.arange(1, 13)
print(arr)            # [ 1  2  3  4  5  6  7  8  9 10 11 12]

# Reshape to 3x4 matrix
matrix = arr.reshape(3, 4)
print(matrix)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]

# Flatten back to 1D
flat = matrix.flatten()
print(flat)           # [ 1  2  3  4  5  6  7  8  9 10 11 12]

# Transpose (flip rows and columns)
transposed = matrix.T
print(transposed)
# [[ 1  5  9]
#  [ 2  6 10]
#  [ 3  7 11]
#  [ 4  8 12]]

Real-World Application: Grade Book Analysis

import numpy as np

# 5 students, 4 subjects (Python, Java, Web Dev, Database)
grades = np.array([
    [85, 78, 90, 88],  # Student 1
    [67, 72, 68, 70],  # Student 2
    [92, 88, 95, 90],  # Student 3
    [45, 50, 48, 52],  # Student 4
    [78, 82, 80, 85]   # Student 5
])

# Calculate average per student (axis=1 means across columns)
student_averages = np.mean(grades, axis=1)
print("Student averages:", student_averages)

# Calculate average per subject (axis=0 means across rows)
subject_averages = np.mean(grades, axis=0)
print("Subject averages:", subject_averages)

# Find top performer
top_student = np.argmax(student_averages)
print(f"Top student: Student {top_student + 1}")

# Find easiest subject
easiest_subject = np.argmax(subject_averages)
subjects = ['Python', 'Java', 'Web Dev', 'Database']
print(f"Easiest subject: {subjects[easiest_subject]}")

# Find students who need help (average < 60)
struggling = np.where(student_averages < 60)[0]
print(f"Students needing help: {struggling + 1}")

Note

axis=0 operates on rows (down), axis=1 operates on columns (across). Think of it as: axis=0 gives you one result per column, axis=1 gives you one result per row.


Advanced: Random Data Generation

import numpy as np

# Set seed for reproducibility
np.random.seed(42)

# Random integers
dice_rolls = np.random.randint(1, 7, size=10)
print("Dice rolls:", dice_rolls)

# Random floats (0 to 1)
probabilities = np.random.rand(5)
print("Random probabilities:", probabilities)

# Random from normal distribution (mean=100, std=15)
iq_scores = np.random.normal(100, 15, size=20)
print("IQ scores:", iq_scores)

# Random choice from array
students = np.array(['Alice', 'Bob', 'Charlie', 'Diana'])
lucky_winner = np.random.choice(students)
print(f"Random winner: {lucky_winner}")

Complete Example: Semester Performance Tracker

import numpy as np

# Simulate 18 weeks of motivation levels (1-10 scale)
np.random.seed(42)
motivation = np.array([
    9, 8, 7, 6, 5, 4, 3, 4, 5, 3, 2, 3, 4, 5, 6, 4, 3, 2
])

print("=== MCA Student Motivation Analysis ===")
print(f"Weeks tracked: {len(motivation)}")
print(f"Mean motivation: {np.mean(motivation):.2f}")
print(f"Median motivation: {np.median(motivation):.2f}")
print(f"Standard deviation: {np.std(motivation):.2f}")
print(f"Highest motivation: {np.max(motivation)} (Week {np.argmax(motivation) + 1})")
print(f"Lowest motivation: {np.min(motivation)} (Week {np.argmin(motivation) + 1})")

# Reality check
if np.std(motivation) > np.mean(motivation):
    print("\n⚠️  Reality Check: Your motivation varies more than it exists!")
    print("Standard deviation > Mean suggests emotional roller coaster mode.")

# Count bad weeks
bad_weeks = np.sum(motivation < 5)
print(f"\nWeeks below 5 motivation: {bad_weeks}/{len(motivation)}")

# Statistical significance check
if np.mean(motivation) < 5:
    print("Status: Statistically Depressing 😢")
elif np.mean(motivation) < 7:
    print("Status: Statistically Surviving 😐")
else:
    print("Status: Statistically Thriving 🎉")

Advanced Mathematical Operations

import numpy as np

arr = np.array([1, 4, 9, 16, 25])

# Mathematical functions
print(np.sqrt(arr))           # [1. 2. 3. 4. 5.]
print(np.exp(np.array([1, 2, 3])))  # [2.71828183 7.3890561 20.08553692]
print(np.log(arr))            # [0.         1.38629436 2.19722458 2.77258872 3.21887582]

# Trigonometric functions
angles = np.array([0, 30, 45, 60, 90])
radians = np.deg2rad(angles)
print(np.sin(radians))

# Rounding
values = np.array([1.23456, 2.34567, 3.45678])
print(np.round(values, 2))    # [1.23 2.35 3.46]
print(np.floor(values))       # [1. 2. 3.]
print(np.ceil(values))        # [2. 3. 4.]

Tasks

Task 1: Daily Motivation Tracker

Create a NumPy array representing your daily motivation levels (1-10) for 18 weeks (126 days). Use np.random.randint(1, 11, 126) to generate data. Calculate mean, median, standard deviation, and determine if your motivation is “statistically significant” or “statistically depressing”. Print a reality check if std > mean.

Hint: Use np.mean(), np.median(), np.std(). Compare std with mean for the reality check.

Task 2: Grade Book Manager

Create a 2D array for 8 students and 5 subjects with random grades (0-100). Calculate: (a) Average grade per student, (b) Average grade per subject, (c) Overall class average, (d) Find top student and easiest subject, (e) Count students with average >= 75.

Hint: Use axis=1 for student averages, axis=0 for subject averages. Use np.argmax() to find indices.

Task 3: Coffee vs Code Quality Study

Create two arrays: coffee_cups (0-10) and bugs_per_100_lines (0-50) for 20 developers. Use np.random to generate data. Calculate correlation using np.corrcoef(). Determine if the relationship is positive (more coffee = more bugs) or negative (more coffee = fewer bugs).

Hint: correlation = np.corrcoef(arr1, arr2)[0, 1]. If correlation > 0, it’s positive; if < 0, it’s negative.

Task 4: Exam Score Analyzer with Filtering

Generate 50 random exam scores (0-100). Create separate arrays for: (a) Students who passed (>= 40), (b) Students with distinction (>= 75), (c) Students who failed (< 40). Calculate statistics for each group and print percentage distribution.

Hint: Use boolean indexing: passed = scores[scores >= 40]. Use len() to count elements.

Task 5: Multi-Dimensional Performance Dashboard

Create a 3D array representing [10 students × 4 subjects × 3 exams]. Generate random grades. Calculate: (a) Each student’s overall average, (b) Each subject’s average across all students and exams, (c) Each exam’s difficulty (lower average = harder), (d) Find the best performing student-subject combination.

Hint: Use np.random.randint(40, 100, (10, 4, 3)) for 3D array. Use multiple axis parameters in mean: np.mean(arr, axis=(1,2)) averages across subjects and exams.


Summary

  • NumPy provides fast, efficient arrays for numerical computing

  • Create arrays using np.array(), np.zeros(), np.ones(), np.arange(), np.linspace()

  • Access elements using indexing and slicing similar to Python lists

  • Perform vectorized operations (faster than loops)

  • Use statistical functions: mean(), median(), std(), min(), max()

  • Boolean indexing allows filtering data based on conditions

  • axis parameter controls direction of operations (0=rows, 1=columns)

  • Use reshape() to change array dimensions

  • NumPy is the foundation for Pandas, Matplotlib, and most data science libraries