System Design Key Concepts

In this guide, I share my experience of how to approach system design questions with a clear framework

Apr 04, 2025

If there is one thing I am perennially thankful for my 2 years in consulting at the beginning of my career, its the ability to find structure and framework to solve any problem. Obviously, there is always scope to improvise on the spot. But what a framework does is gives you tools to get started, get over that initial brain freeze and ability to work yourself into a solution. And , I never really thought that my consulting gig would come in super handy when I was preparing for system design questions. For instance, when I began preparing those questions the first time, I read “ Design Youtube” , I kind of felt lost. I was like how do you even tackle such questions. Secondly, I was like if these are the questions I have to prepare for , then I wasn’t sure if I would be able to clear the interviews. But as usual, applying brute force and consistency helped. I read one case study everyday and thought through the problems. Then next day, reproduced the answers to those problems. And checked where I went wrong. After refining those answers, over 20 days, I was able to create a structure where I was able to answer any question in such a way , that I would explain the system design clearly thinking through most concepts.

NOTE BEFORE GETTING STARTED :

This article doesn’t cover it but highly recommend drawing the system design to explain if possible. I used to use whimsical.io or any drawing board online to explain the solutions
There are plenty of resources that I leveraged which are super helpful. Sharing a few below :
1. Grokking the System Design Interview by Educative.io
2. System Design Interview by Alex Xu
3. Medium articles of companies like Netflix.
4. Youtube channels like ByteByteGo

FRAMEWORK :

Now, before starting any question, I keep three concepts in mind :

CLARIFICATION
ASSUMPTIONS
CONSTRAINTS

CLARIFICATION :

Here, make sure you clarify the problem for the following three main parts :

Scope
Scale
Latency

Scope of the problem :

Identify what use-cases you will solve for ? For instance, for designing youtube, I can think of following use-cases :

video upload
playback
and homepage/feed

There could be more like analytics, monetization ,etc but we want to keep it simple for now.

Scale :

Understand the scale for which you are solving for. This will allow you to understand the users and how you could support the system with high availability and fallback

How many users are expected—millions or billions?
How many concurrent users/viewers should the system support?
What’s the expected volume of video uploads per day?

Assumptions :

Here think about two things : Functional Assumptions and Non-functional Assumptions

Functional Assumptions :

Assuming use-cases , this system will solve for :

User Registration and Login
Video Playback
Video Metadata
Search

Non-functional Assumptions :

Users can stream videos in multiple resolutions.
The system supports eventual consistency for things like view count and likes.
Video processing (transcoding) happens asynchronously after upload.
We'll use cloud storage for videos (e.g., S3 or equivalent).

What does CAC (Clarification, Assumptions, Constraints ) do for me ? It helps me by buying time. I not only understood the problem but also bought time to find a solution. Secondly, it helped me actually think on a deeper level for my 7 step solution below which are factors I need to think about.

R D S L A C S

R : Requirement Clarification (Already done as a part of CAC)

D : Data Modeling

S : Scalability

L : Latency

A : Availability vs Consistency

C : Caching Strategy

S : Security and Rate Limiting

Requirement Clarification :

As a part of requirement clarification, we already asked questions about Constraints, Assumptions and Clarification. And that in itself helps you to collect requirements. After you have asked constraints, clarification and assumptions, finally summarize it.

For instance for designing youtube, I would summarize it by

Based on our discussion, I’ll assume our MVP includes user authentication, video upload, playback, a homepage feed, and basic search functionality.

Key constraints I’ll keep in mind include ensuring low latency playback, high throughput for uploads and streaming, scalable storage to handle petabytes of data, and regional fault tolerance for high availability. I’ll also aim for cost efficiency in storage and compute.

Data Modeling :

What are the data tables I would need :

Users(user_id, email, username, created_at)
Videos(video_id, user_id, title, description, upload_time, tags, status, view_count)
Views(video_id, user_id, timestamp)
Subscriptions(follower_id, followed_id)

What kind of database I would need ?

I would use relational DB for users/videos, NoSQL for comments, likes, etc.

Relational DB for users/videos because it has clear , well defined relationships, requires strong consistency and data integrity.

NoSQL for comments , likes because these are write-heavy workloads and high fan-out, which NoSQL handles better. Also, Comments can vary (text, replies, reactions). NoSQL allows flexible schemas without needing frequent migrations.

Scalability :

How will I handle user growth and uploads of videos ? How will I horizontally scale it ?

Shard metadata storage by video_id/user_id :
- Instead of keeping all metadata (video titles, descriptions, tags, etc.) in one big database table, we divide (shard) it across multiple database instances based on some key like video_id or user_id.
- Why it helps: Prevents bottlenecks in a single database and allows parallel read/write operations across shards.
Partition user-generated content (uploads, views) by time or region
- Store data like uploads, views, or comments in separate partitions based on timestamp (daily/monthly) or user location.
- Why it helps: Limits the amount of data scanned in each query and enables more efficient storage and retrieval.
Use Kafka for ingesting video processing jobs
- Kafka acts as a message broker, queuing up video upload events to be picked up by downstream services like transcoding, thumbnail generation, and metadata extraction.
- Why it helps: Decouples services, smooths traffic spikes, ensures reliable and ordered processing of millions of uploads.

Latency :

Minimize the time it takes for users to load pages, see content, and start video playback.

Preload Personalized Homepage Feed

Precompute and cache homepage video recommendations for each user during off-peak hours or when the user is active.
Why it helps: Reduces delay when users open the app or site—recommendations appear instantly instead of being computed in real-time.

Use Async Processing for Uploads (Transcoding, Thumbnail Generation)

After upload, tasks like converting video to different formats or generating thumbnails are offloaded to background workers, so users aren't kept waiting.
Why it helps: Improves responsiveness—users get confirmation of upload quickly without waiting for processing to finish.

CDN Ensures Low Latency Video Streaming

Videos are distributed and served from Content Delivery Network (CDN) edge servers located close to users geographically.
Why it helps: Reduces round-trip latency and bandwidth usage by avoiding distant servers.

Availability vs Consistency

Ensure the system stays operational and responsive even when components fail. The goal is to degrade gracefully, not crash.

Replicate Services Across Multiple Zones

Deploy services (e.g., upload service, user service) in multiple geographic regions or availability zones.
Why it helps: If one zone goes down (e.g., due to a network outage), the system can route traffic to another zone.

Use Message Queues (Kafka/SQS) to Decouple Services

Place a buffer between services using a queue so if one service fails, the others can keep running.
Why it helps: Prevents cascading failures and allows retry logic.

Store Multiple Video Formats Redundantly in Cloud Storage

Videos are stored in multiple formats (480p, 720p, etc.) and duplicated across regions.
Why it helps: Guarantees availability and playback options even if some formats or locations are unavailable.

Caching Strategy :

Reduce load on databases and backend services, and improve response times for frequently accessed data.

CDN Caches Video Content at Edge Nodes

Content Delivery Network (CDN) stores copies of video files (e.g., MP4 in various resolutions) on servers geographically close to users.
Why it helps: Reduces latency and bandwidth usage by avoiding long-distance requests to central servers.

Redis Caches Frequently Accessed Metadata

Video Metadata
- Titles, thumbnails, view counts (especially for popular videos)
- Speeds up page loads and reduces database queries
Homepage Feed
- Personalized or trending videos are precomputed and cached per user or segment
- Ensures instant loading on app/web open
Trending/Popular Videos
- High-traffic content is frequently requested; caching keeps it readily available

Security and Rate Limiting

Protect the platform from abuse, ensure data privacy, and maintain trust with users.

OAuth 2.0 for Authentication

Secure protocol for login using tokens.
Enables third-party access without sharing user credentials.

Rate Limiting via API Gateway

Protects against spamming or DDoS attacks by limiting requests per user or IP (e.g., 100 requests/minute).
Helps manage load and ensures fair usage.

So, with this strategy RDSLACS, we were able to think through 7 most important parameters and finally explain how we were able to solve the problem in a structured way.

Veeraj’s Substack