Skip to content

ScalingΒΆ

OverviewΒΆ

Expand to show Database Scaling Cheatsheet

pic1 pic2 pic3 pic4

SequencesΒΆ

1. Query Optimization (First Priority)ΒΆ

-- Before optimization
SELECT * FROM users WHERE age > 25;

-- After optimization
SELECT id, name FROM users
WHERE age > 25
LIMIT 100;

2. Database IndexingΒΆ

-- Add indexes for frequently queried columns
CREATE INDEX idx_user_age ON users(age);
CREATE INDEX idx_user_email ON users(email);

-- Composite index for multiple columns
CREATE INDEX idx_user_age_city ON users(age, city);

3. Caching ImplementationΒΆ

# Example using Redis cache
def get_user(user_id):
    # Try cache first
    cached_user = redis_cache.get(f"user:{user_id}")
    if cached_user:
        return cached_user

    # If not in cache, get from DB and cache it
    user = database.query(f"SELECT * FROM users WHERE id = {user_id}")
    redis_cache.set(f"user:{user_id}", user, expire=3600)
    return user

4. Vertical Scaling (Scale Up)ΒΆ

  • Increase CPU
  • Add more RAM
  • Upgrade to faster storage (SSD/NVMe)
Initial: 2 CPU, 4GB RAM, HDD
↓
Upgrade: 4 CPU, 8GB RAM, SSD
↓
Further: 8 CPU, 16GB RAM, NVMe

5. Database PartitioningΒΆ

-- Example of date-based partitioning
CREATE TABLE orders (
    order_id INT,
    order_date DATE,
    amount DECIMAL
) PARTITION BY RANGE (YEAR(order_date)) (
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p2024 VALUES LESS THAN (2025)
);

6. Read ReplicasΒΆ

Primary DB (Writes)
     ↓
   ↙  ↓  β†˜
Replica1 Replica2 Replicas3 (Reads)

7. Horizontal Scaling (Sharding)ΒΆ

Shard 1: Users A-F
Shard 2: Users G-M
Shard 3: Users N-S
Shard 4: Users T-Z

8. Database Type OptimizationΒΆ

Right choice DB based on demand

Transactions β†’ PostgreSQL/MySQL
Analytics β†’ ClickHouse/Redshift
Caching β†’ Redis
Document Storage β†’ MongoDB
Graph Data β†’ Neo4j

9. Microservices SplitΒΆ

Monolithic DB
     ↓
User DB | Order DB | Product DB

10. Geographic DistributionΒΆ

US Region:
- Primary DB
- Read Replicas

EU Region:
- Primary DB
- Read Replicas

Asia Region:
- Primary DB
- Read Replicas

PartitioningΒΆ

1.Types of PartitioningΒΆ

A. Range PartitioningΒΆ

-- Partition by date range
CREATE TABLE sales (
    id INT,
    sale_date DATE,
    amount DECIMAL
) PARTITION BY RANGE (YEAR(sale_date)) (
    PARTITION p_2022 VALUES LESS THAN (2023),
    PARTITION p_2023 VALUES LESS THAN (2024),
    PARTITION p_2024 VALUES LESS THAN (2025),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- Partition by amount range
CREATE TABLE orders (
    order_id INT,
    amount DECIMAL
) PARTITION BY RANGE (amount) (
    PARTITION p_small VALUES LESS THAN (100),
    PARTITION p_medium VALUES LESS THAN (1000),
    PARTITION p_large VALUES LESS THAN (10000),
    PARTITION p_huge VALUES LESS THAN MAXVALUE
);

ASCII Visualize

ALL SALES DATA
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Database      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
    β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚           β”‚            β”‚            β”‚
β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”   β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”   β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”   β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”
β”‚ 2022  β”‚   β”‚ 2023  β”‚   β”‚ 2024  β”‚   β”‚Future β”‚
β”‚ Sales β”‚   β”‚ Sales β”‚   β”‚ Sales β”‚   β”‚ Sales β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”˜

B. List PartitioningΒΆ

-- Partition by region
CREATE TABLE customers (
    id INT,
    name VARCHAR(100),
    country VARCHAR(50)
) PARTITION BY LIST (country) (
    PARTITION p_europe VALUES IN ('France', 'Germany', 'Spain'),
    PARTITION p_asia VALUES IN ('China', 'Japan', 'India'),
    PARTITION p_americas VALUES IN ('USA', 'Canada', 'Brazil')
);

ASCII Visualize

CUSTOMER DATA
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚    Database      β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚         β”‚         β”‚
β”Œβ”€β”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”
β”‚EUROPE β”‚ β”‚ ASIA  β”‚ β”‚ USA   β”‚
β”‚France β”‚ β”‚ China β”‚ β”‚ Texas β”‚
β”‚Spain  β”‚ β”‚ Japan β”‚ β”‚ NYC   β”‚
β”‚Italy  β”‚ β”‚ India β”‚ β”‚ LA    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”˜

C. Hash PartitioningΒΆ

-- Automatically distribute data across 4 partitions
CREATE TABLE users (
    id INT,
    email VARCHAR(100)
) PARTITION BY HASH(id) PARTITIONS 4;

ASCII Visualize

USER DATA (Hash by ID)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚       Database          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
     β”‚  Hash(ID)    β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚         β”‚         β”‚
β”Œβ”€β”€β–Όβ”€β”€β”€β”  β”Œβ”€β”€β–Όβ”€β”€β”€β”  β”Œβ”€β”€β–Όβ”€β”€β”€β”
β”‚ P0   β”‚  β”‚ P1   β”‚  β”‚ P2   β”‚
β”‚ID%3=0β”‚  β”‚ID%3=1β”‚  β”‚ID%3=2β”‚
β””β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”˜

2. Composite Partitioning (Combining Multiple Types)ΒΆ

-- Partition by year and then by month
CREATE TABLE transactions (
    id INT,
    trans_date DATE,
    amount DECIMAL
) PARTITION BY RANGE (YEAR(trans_date))
  SUBPARTITION BY HASH(MONTH(trans_date))
  SUBPARTITIONS 12 (
    PARTITION p_2023 VALUES LESS THAN (2024),
    PARTITION p_2024 VALUES LESS THAN (2025)
);

ASCII Visualize

SALES DATABASE
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚         2023             β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚               β”‚               β”‚
β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”       β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”       β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”
β”‚ Q1    β”‚       β”‚ Q2    β”‚       β”‚ Q3    β”‚
β””β”€β”€β”¬β”€β”€β”€β”€β”˜       β””β”€β”€β”¬β”€β”€β”€β”€β”˜       β””β”€β”€β”¬β”€β”€β”€β”€β”˜
   β”‚               β”‚               β”‚
   β”œβ”€Jan           β”œβ”€Apr           β”œβ”€Jul
   β”œβ”€Feb           β”œβ”€May           β”œβ”€Aug
   └─Mar           └─Jun           └─Sep

3. Partition ManagementΒΆ

A. Adding New PartitionsΒΆ

-- Add new partition for next year
ALTER TABLE sales ADD PARTITION (
    PARTITION p_2025 VALUES LESS THAN (2026)
);

B. Removing PartitionsΒΆ

-- Remove old data partition
ALTER TABLE sales DROP PARTITION p_2022;

C. Reorganizing PartitionsΒΆ

-- Optimize/defragment a partition
ALTER TABLE sales REORGANIZE PARTITION p_2023;

4. Querying Partitioned TablesΒΆ

A. Partition Pruning (Automatic)ΒΆ

-- Database will only scan relevant partitions
SELECT * FROM sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31';

B. Partition SelectionΒΆ

-- Explicitly select partition
SELECT * FROM sales PARTITION (p_2023)
WHERE amount > 1000;

5. Best PracticesΒΆ

A. Sizing GuidelinesΒΆ

-- Keep partitions similarly sized
CREATE TABLE logs (
    id INT,
    log_date DATE,
    message TEXT
) PARTITION BY RANGE (UNIX_TIMESTAMP(log_date)) (
    PARTITION p1 VALUES LESS THAN (UNIX_TIMESTAMP('2023-04-01')),
    PARTITION p2 VALUES LESS THAN (UNIX_TIMESTAMP('2023-07-01')),
    PARTITION p3 VALUES LESS THAN (UNIX_TIMESTAMP('2023-10-01'))
);

6. Common Use CasesΒΆ

A. Time-Based Data ManagementΒΆ

-- Archiving old data
CREATE TABLE archive_sales
SELECT * FROM sales PARTITION (p_2022);
ALTER TABLE sales DROP PARTITION p_2022;

B. Geographic Data DistributionΒΆ

-- Regional data access
CREATE TABLE user_data (
    id INT,
    region VARCHAR(50),
    data TEXT
) PARTITION BY LIST (region) (
    PARTITION p_us VALUES IN ('us-east', 'us-west'),
    PARTITION p_eu VALUES IN ('eu-central', 'eu-west'),
    PARTITION p_asia VALUES IN ('asia-east', 'asia-south')
);

ASCII Visualize

Before Partitioning          After Partitioning
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚    β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚ β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚
β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚    β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚ β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚
β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚ => β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚ β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚
β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚ β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚
      Single Table      β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚ β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         Partitioned Tables

7. Monitoring PartitionsΒΆ

-- View partition information
SELECT
    PARTITION_NAME,
    TABLE_ROWS,
    DATA_LENGTH,
    INDEX_LENGTH
FROM
    INFORMATION_SCHEMA.PARTITIONS
WHERE
    TABLE_NAME = 'sales';

Query Flow in Partitioned Table:

Query: SELECT * FROM sales WHERE date = '2023-06-15'

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Query     β”‚
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Partition  │───→│ 2022 Data β”‚ (Skip)
β”‚  Manager    β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      │───────────→│ 2023 Data β”‚ (Search)
      β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      └───────────→│ 2024 Data β”‚ (Skip)
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key BenefitsΒΆ

  • Improved query performance
  • Easier maintenance
  • Better data lifecycle management
  • Improved backup/recovery options
  • Parallel query execution potential

Common Pitfalls to AvoidΒΆ

  • Over-partitioning
  • Wrong partition key selection
  • Not planning for growth
  • Ignoring maintenance overhead
  • Not considering query patterns

Horizontal Scaling (Scaling Out)ΒΆ

DefinitionsΒΆ

  • The process of adding more machines to your existing system to handle increased load and distribute it across multiple servers
  • Instead of upgrading existing hardware (vertical scaling / scale up), you add more instances of servers
  • Distributes load across multiple servers while keeping data synchronized

Key ComponentsΒΆ

A. Load BalancerΒΆ

  • A device/software that distributes incoming network traffic across multiple servers
  • Ensures no single server bears too much load
  • Provides failover if one server goes down

B. Replication (Read Path)ΒΆ

  • Process of copying data from one database server (master) to others (replicas)
  • Improves read performance and provides redundancy
  • Helps distribute database load across multiple servers

C. Sharding (Write Path)ΒΆ

  • Technique of splitting database into smaller parts (shards)
  • Each shard contains unique portions of data
  • Improves write performance and handles large datasets

ExamplesΒΆ

Social Media PlatformΒΆ

Before Scaling:
Single Server
- 1 million users
- Slow photo uploads
- Delayed notifications
- Frequent crashes

After Horizontal Scaling:
└── Load Balancer
    β”œβ”€β”€ Server 1: User Authentication
    β”œβ”€β”€ Server 2: Photo Processing
    β”œβ”€β”€ Server 3: Notification System
    └── Database Layer
        β”œβ”€β”€ Shard 1: Users A-M
        └── Shard 2: Users N-Z

E-commerce WebsiteΒΆ

Before: Single server handling:
- Product catalog
- User accounts
- Orders
- Payment processing

After Horizontal Scaling:
└── Load Balancer
    β”œβ”€β”€ Web Servers (3 instances)
    β”‚   └── Product Browsing
    β”œβ”€β”€ Application Servers
    β”‚   β”œβ”€β”€ Server 1: User Management
    β”‚   β”œβ”€β”€ Server 2: Order Processing
    β”‚   └── Server 3: Payment Handling
    └── Database Layer
        β”œβ”€β”€ Products DB (Master + 2 replicas)
        β”œβ”€β”€ Orders DB (Sharded by date)
        └── Users DB (Sharded by region)

Video Streaming ServiceΒΆ

Original Setup:
- Single server
- Limited concurrent streams
- Buffering issues
- Regional delays

Horizontally Scaled:
└── Global Load Balancer
    β”œβ”€β”€ US Region
    β”‚   β”œβ”€β”€ Content Servers (5)
    β”‚   └── User Database Shard
    β”œβ”€β”€ EU Region
    β”‚   β”œβ”€β”€ Content Servers (4)
    β”‚   └── User Database Shard
    └── Asia Region
        β”œβ”€β”€ Content Servers (6)
        └── User Database Shard

Banking ApplicationΒΆ

Traditional Setup:
- Single core banking server
- Limited transaction processing
- Slow during peak hours

Scaled Architecture:
└── Load Balancer
    β”œβ”€β”€ Transaction Servers
    β”‚   β”œβ”€β”€ Server 1: Deposits
    β”‚   β”œβ”€β”€ Server 2: Withdrawals
    β”‚   └── Server 3: Transfers
    └── Database Layer
        β”œβ”€β”€ Account DB
        β”‚   β”œβ”€β”€ Shard 1 (A-M)
        β”‚   └── Shard 2 (N-Z)
        └── Transaction DB
            β”œβ”€β”€ Current Month (Master)
            └── Historical (Replicas)