Design Twitter
Functional Requirements
- Tweet
- Re-Tweet
- Follow
- Search
Capacity Estimation
- 150 Million Daily Active Users
- 350 Million Monthly Active Users
- 1.5 Billion Users Accounts
- 500 Million Tweets/day
User Categorization
-
famous
: millions of followers -
live
: currently accessing twitter -
active
: have accessed twitter recently eg. 3 hours -
passive
: have accessed twitter not recently -
inactive
: deleted account
1. User Onboarding
User Service
- User service helps in onboarding new users
- The users information is stored in an RDBMS cluster
- Information is also cached in Redis.
- When a
GET
requests arrives to get users info by id it is retrieved from Redis - If information is not found, RDBMS is queried and Redis is updated
- When a
2. Follower-Follow
Graph Service
- generates the graph of users and their followers
- the graph is stored in the RDBMS
- table: user(A) \(\rightarrow\) users who follow (A)
- table: user(A) \(\rightarrow\) users (A) is following
- information is cached in Redis
- When a
GET
request arrives for the list of followers, the information in Redis is looked up.
- When a
3. Live Websocket Notification
Live Websocket Notification
- Users who are
Live
are connected using thewebsocket
. - The events as they happen are pushed to Kafka.
- Live Websocket Notification Service consumes messages from Kafka and notifies all the connected
Live
Users
4. Tweet Service
User Timeline
- the tweets that the user has posted
Home Timeline
- the combined view of tweets of the users that the user follows
Tweet Injestion Service
- User tweets are sent to injestion service
- It is only responsible for tweet writes not reads
- If a media is associated with the tweet:
- it contacts Short URL Service, which provides a unique URL
- the media along with the short URL is forwarded to
Asset Service
that stores the media onCDN
- Tweets are forwarded to Kafka. Live user connected with Websockets will receive the tweet.
Tweet Service
- provides APIs to read tweets
Tweet Processor
- for
active
users,tweet processor service
caches the timeline in Redis cluster - To generate user timeline:
- It talks to
Graph Service
to get the ids of all the followers - It talks to
User Service
to get the user details - It consumes tweets from Kafka and inserts it into the user timeline
- It talks to
- Finally the user timeline is cached in Redis cluster
Timeline Service
- Generates user’s timeline for
passive users
,users whose timeline is not cached
- For a user time:
- list of all users that the user is following is fetched from
Graph Service
- The
user service
provides details about the users. - if any media is associated with the tweet,
Asset Service
is contacted to retreive that media -
Tweet Service
provides the tweets of the users that the user is following
- list of all users that the user is following is fetched from
Tweet Read Flow
Active Users
- Let
U1
is followed byU2, U3, U4
. -
U1
tweetst1
with tweet idt_id: 105
-
Tweet Processor Service
queries Redis-
t1
is inserted in the timeline ofU2, U3, U4
-