ScalingΒΆ
OverviewΒΆ
SequencesΒΆ
1. Query Optimization (First Priority)ΒΆ
-- Before optimization
SELECT * FROM users WHERE age > 25;
-- After optimization
SELECT id, name FROM users
WHERE age > 25
LIMIT 100;
2. Database IndexingΒΆ
-- Add indexes for frequently queried columns
CREATE INDEX idx_user_age ON users(age);
CREATE INDEX idx_user_email ON users(email);
-- Composite index for multiple columns
CREATE INDEX idx_user_age_city ON users(age, city);
3. Caching ImplementationΒΆ
# Example using Redis cache
def get_user(user_id):
# Try cache first
cached_user = redis_cache.get(f"user:{user_id}")
if cached_user:
return cached_user
# If not in cache, get from DB and cache it
user = database.query(f"SELECT * FROM users WHERE id = {user_id}")
redis_cache.set(f"user:{user_id}", user, expire=3600)
return user
4. Vertical Scaling (Scale Up)ΒΆ
- Increase CPU
- Add more RAM
- Upgrade to faster storage (SSD/NVMe)
5. Database PartitioningΒΆ
-- Example of date-based partitioning
CREATE TABLE orders (
order_id INT,
order_date DATE,
amount DECIMAL
) PARTITION BY RANGE (YEAR(order_date)) (
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025)
);
6. Read ReplicasΒΆ
7. Horizontal Scaling (Sharding)ΒΆ
8. Database Type OptimizationΒΆ
Right choice DB based on demand
Transactions β PostgreSQL/MySQL
Analytics β ClickHouse/Redshift
Caching β Redis
Document Storage β MongoDB
Graph Data β Neo4j
9. Microservices SplitΒΆ
10. Geographic DistributionΒΆ
US Region:
- Primary DB
- Read Replicas
EU Region:
- Primary DB
- Read Replicas
Asia Region:
- Primary DB
- Read Replicas
PartitioningΒΆ
1.Types of PartitioningΒΆ
A. Range PartitioningΒΆ
-- Partition by date range
CREATE TABLE sales (
id INT,
sale_date DATE,
amount DECIMAL
) PARTITION BY RANGE (YEAR(sale_date)) (
PARTITION p_2022 VALUES LESS THAN (2023),
PARTITION p_2023 VALUES LESS THAN (2024),
PARTITION p_2024 VALUES LESS THAN (2025),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Partition by amount range
CREATE TABLE orders (
order_id INT,
amount DECIMAL
) PARTITION BY RANGE (amount) (
PARTITION p_small VALUES LESS THAN (100),
PARTITION p_medium VALUES LESS THAN (1000),
PARTITION p_large VALUES LESS THAN (10000),
PARTITION p_huge VALUES LESS THAN MAXVALUE
);
ASCII Visualize
ALL SALES DATA
βββββββββββββββββββββ
β Database β
βββββββββββ¬ββββββββββ
β
βββββββ΄ββββββ¬βββββββββββββ¬βββββββββββββ
β β β β
βββββΌββββ βββββΌββββ βββββΌββββ βββββΌββββ
β 2022 β β 2023 β β 2024 β βFuture β
β Sales β β Sales β β Sales β β Sales β
βββββββββ βββββββββ βββββββββ βββββββββ
B. List PartitioningΒΆ
-- Partition by region
CREATE TABLE customers (
id INT,
name VARCHAR(100),
country VARCHAR(50)
) PARTITION BY LIST (country) (
PARTITION p_europe VALUES IN ('France', 'Germany', 'Spain'),
PARTITION p_asia VALUES IN ('China', 'Japan', 'India'),
PARTITION p_americas VALUES IN ('USA', 'Canada', 'Brazil')
);
ASCII Visualize
CUSTOMER DATA
ββββββββββββββββββββ
β Database β
ββββββββββ¬ββββββββββ
β
βββββββββββΌββββββββββ
β β β
βββββΌββββ βββββΌββββ βββββΌββββ
βEUROPE β β ASIA β β USA β
βFrance β β China β β Texas β
βSpain β β Japan β β NYC β
βItaly β β India β β LA β
βββββββββ βββββββββ βββββββββ
C. Hash PartitioningΒΆ
-- Automatically distribute data across 4 partitions
CREATE TABLE users (
id INT,
email VARCHAR(100)
) PARTITION BY HASH(id) PARTITIONS 4;
ASCII Visualize
USER DATA (Hash by ID)
βββββββββββββββββββββββββββ
β Database β
ββββββββββββββ¬βββββββββββββ
β
βββββββββ΄βββββββ
β Hash(ID) β
βββββββββ¬βββββββ
βββββββββββΌββββββββββ
β β β
ββββΌββββ ββββΌββββ ββββΌββββ
β P0 β β P1 β β P2 β
βID%3=0β βID%3=1β βID%3=2β
ββββββββ ββββββββ ββββββββ
2. Composite Partitioning (Combining Multiple Types)ΒΆ
-- Partition by year and then by month
CREATE TABLE transactions (
id INT,
trans_date DATE,
amount DECIMAL
) PARTITION BY RANGE (YEAR(trans_date))
SUBPARTITION BY HASH(MONTH(trans_date))
SUBPARTITIONS 12 (
PARTITION p_2023 VALUES LESS THAN (2024),
PARTITION p_2024 VALUES LESS THAN (2025)
);
ASCII Visualize
SALES DATABASE
ββββββββββββββββββββββββββββ
β 2023 β
βββββββββββββ¬βββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
βββββΌββββ βββββΌββββ βββββΌββββ
β Q1 β β Q2 β β Q3 β
ββββ¬βββββ ββββ¬βββββ ββββ¬βββββ
β β β
ββJan ββApr ββJul
ββFeb ββMay ββAug
ββMar ββJun ββSep
3. Partition ManagementΒΆ
A. Adding New PartitionsΒΆ
-- Add new partition for next year
ALTER TABLE sales ADD PARTITION (
PARTITION p_2025 VALUES LESS THAN (2026)
);
B. Removing PartitionsΒΆ
C. Reorganizing PartitionsΒΆ
4. Querying Partitioned TablesΒΆ
A. Partition Pruning (Automatic)ΒΆ
-- Database will only scan relevant partitions
SELECT * FROM sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31';
B. Partition SelectionΒΆ
5. Best PracticesΒΆ
A. Sizing GuidelinesΒΆ
-- Keep partitions similarly sized
CREATE TABLE logs (
id INT,
log_date DATE,
message TEXT
) PARTITION BY RANGE (UNIX_TIMESTAMP(log_date)) (
PARTITION p1 VALUES LESS THAN (UNIX_TIMESTAMP('2023-04-01')),
PARTITION p2 VALUES LESS THAN (UNIX_TIMESTAMP('2023-07-01')),
PARTITION p3 VALUES LESS THAN (UNIX_TIMESTAMP('2023-10-01'))
);
6. Common Use CasesΒΆ
A. Time-Based Data ManagementΒΆ
-- Archiving old data
CREATE TABLE archive_sales
SELECT * FROM sales PARTITION (p_2022);
ALTER TABLE sales DROP PARTITION p_2022;
B. Geographic Data DistributionΒΆ
-- Regional data access
CREATE TABLE user_data (
id INT,
region VARCHAR(50),
data TEXT
) PARTITION BY LIST (region) (
PARTITION p_us VALUES IN ('us-east', 'us-west'),
PARTITION p_eu VALUES IN ('eu-central', 'eu-west'),
PARTITION p_asia VALUES IN ('asia-east', 'asia-south')
);
ASCII Visualize
Before Partitioning After Partitioning
ββββββββββββββββββββ βββββββββββ βββββββββββ
β ββββββββββββββββ β β βββββββ β β βββββββ β
β ββββββββββββββββ β β βββββββ β β βββββββ β
β ββββββββββββββββ β => β βββββββ β β βββββββ β
β ββββββββββββββββ β βββββββββββ βββββββββββ
ββββββββββββββββββββ βββββββββββ βββββββββββ
β βββββββ β β βββββββ β
Single Table β βββββββ β β βββββββ β
βββββββββββ βββββββββββ
Partitioned Tables
7. Monitoring PartitionsΒΆ
-- View partition information
SELECT
PARTITION_NAME,
TABLE_ROWS,
DATA_LENGTH,
INDEX_LENGTH
FROM
INFORMATION_SCHEMA.PARTITIONS
WHERE
TABLE_NAME = 'sales';
Query Flow in Partitioned Table:
Query: SELECT * FROM sales WHERE date = '2023-06-15'
βββββββββββββββ
β Query β
βββββββ¬ββββββββ
β
βΌ
βββββββββββββββ βββββββββββββ
β Partition ββββββ 2022 Data β (Skip)
β Manager β βββββββββββββ
βββββββ¬ββββββββ βββββββββββββ
ββββββββββββββ 2023 Data β (Search)
β βββββββββββββ
β βββββββββββββ
ββββββββββββββ 2024 Data β (Skip)
βββββββββββββ
Key BenefitsΒΆ
- Improved query performance
- Easier maintenance
- Better data lifecycle management
- Improved backup/recovery options
- Parallel query execution potential
Common Pitfalls to AvoidΒΆ
- Over-partitioning
- Wrong partition key selection
- Not planning for growth
- Ignoring maintenance overhead
- Not considering query patterns
Horizontal Scaling (Scaling Out)ΒΆ
DefinitionsΒΆ
- The process of adding more machines to your existing system to handle increased load and distribute it across multiple servers
- Instead of upgrading existing hardware (vertical scaling / scale up), you add more instances of servers
- Distributes load across multiple servers while keeping data synchronized
Key ComponentsΒΆ
A. Load BalancerΒΆ
- A device/software that distributes incoming network traffic across multiple servers
- Ensures no single server bears too much load
- Provides failover if one server goes down
B. Replication (Read Path)ΒΆ
- Process of copying data from one database server (master) to others (replicas)
- Improves read performance and provides redundancy
- Helps distribute database load across multiple servers
C. Sharding (Write Path)ΒΆ
- Technique of splitting database into smaller parts (shards)
- Each shard contains unique portions of data
- Improves write performance and handles large datasets
ExamplesΒΆ
Social Media PlatformΒΆ
Before Scaling:
Single Server
- 1 million users
- Slow photo uploads
- Delayed notifications
- Frequent crashes
After Horizontal Scaling:
βββ Load Balancer
βββ Server 1: User Authentication
βββ Server 2: Photo Processing
βββ Server 3: Notification System
βββ Database Layer
βββ Shard 1: Users A-M
βββ Shard 2: Users N-Z
E-commerce WebsiteΒΆ
Before: Single server handling:
- Product catalog
- User accounts
- Orders
- Payment processing
After Horizontal Scaling:
βββ Load Balancer
βββ Web Servers (3 instances)
β βββ Product Browsing
βββ Application Servers
β βββ Server 1: User Management
β βββ Server 2: Order Processing
β βββ Server 3: Payment Handling
βββ Database Layer
βββ Products DB (Master + 2 replicas)
βββ Orders DB (Sharded by date)
βββ Users DB (Sharded by region)
Video Streaming ServiceΒΆ
Original Setup:
- Single server
- Limited concurrent streams
- Buffering issues
- Regional delays
Horizontally Scaled:
βββ Global Load Balancer
βββ US Region
β βββ Content Servers (5)
β βββ User Database Shard
βββ EU Region
β βββ Content Servers (4)
β βββ User Database Shard
βββ Asia Region
βββ Content Servers (6)
βββ User Database Shard
Banking ApplicationΒΆ
Traditional Setup:
- Single core banking server
- Limited transaction processing
- Slow during peak hours
Scaled Architecture:
βββ Load Balancer
βββ Transaction Servers
β βββ Server 1: Deposits
β βββ Server 2: Withdrawals
β βββ Server 3: Transfers
βββ Database Layer
βββ Account DB
β βββ Shard 1 (A-M)
β βββ Shard 2 (N-Z)
βββ Transaction DB
βββ Current Month (Master)
βββ Historical (Replicas)



