Requirements

Functional

  1. Support 1:1 chats
  2. Support Group chats
  3. Should be able to upload media: video, images
  4. Should indicate: sent, read, last seen

Non Functional

  1. Highly Available
  2. Low Latency
  3. Highly Scalable

Capacity Estimation

Let’s assume

  • Daily Active Users: 500 Million
  • On average each user sends 50 messages daily
        Average daily messages = 500 million * 50 = 25 billion

Each message on average is 100 bytes.

        Average daily storage required = 25 billion * 100 = 2.5 TB/day

We can store 10 years of chat history

        2.5 TB * 365 days * 10 years= 9.2 PetaBytes

Bandwidth

        2.5 TB / 8400 secs ~ 30 MB/sec

Architecture


Database Design

  • To store messages, we do not require RDBMS as we will not be performing any complex relational ops like joins.
  • key-value based databases are most suitable: Cassandra, HBase, BigTable
  • Most of the read/write requests are to retrieve messages between two users.
Row Key: {(from user, to user), time)}


Database Schema

CREATE TABLE tb_messages (
        message_id uuid,
        from_user text,
        to_user text,
        body text,
        class text,
        time timeuuid,
        PRIMARY KEY ((from_user, to_user), time)
) WITH CLUSTERING ORDER BY (time ASC);

To retrieve messages between two users:

SELECT * FROM tb_messages WHERE from_user = 'Alice' AND to_user = 'Blob' 
ORDER BY time DESC ;

For groups and users

We need two tables to capture the relationship between users and groups.

  • The Group Membership table is used for message broadcasting.
GroupID UserID
GRP123 U756
  • The User Membership table is used for listing users in a group.
UserID GroupID
U756 GRP123

API Design

We keep our APIs simple:

  1. send_message(user_id, receiver_id, channel_type, message) : Here channel_id can be either UserID or GroupID
  2. get_message(user_id, user_id2, channel_type, earliest_message_id) : The earliest_message_id is the latest message locally available on the client. It is used as the sort key to range query the chat tables.

Services Overview

  1. WebSocket Handler
  2. WebSocket Manager
  3. Message Service
  4. Group Message Service
  5. Group Service
  6. User Service
  7. Asset Service

Design

Concepts

  1. We can maintain WebSockets or Long polling between user devices and servers to push notifications.
  2. We can keep track of users connected to a particular server using a Redis Cache.

Components

Connection Components

  1. WebSocket Handler: The user connects to this WebSocket handler which keeps an open connection between user and server to push notifications.
  2. WebSocket Manager: This service stores the information about which users connected to which WebSocket handler in a Redis Cache The Redis Cache holds information:
    • all users connected to WebSocket handler
    • which user is connected to which WebSocket handler

When connection breaks between user and WebSocket and the user re-establishes connection with another handler, the information is updated to Redis.



Message Components

Message Service

  • WebSocket handler talks to message service to store/retrieve messages.
  • It exposes APIs to retrieve messages
  • Message Service sits on top of the database

Message Flow



Message Send/Recieve

  1. UserA connects to a WebSocket Handler on a port. This information is sent to Websocket Manager which stores it in its Redis Cache. This mapping [UserID: WebSocket] can be later used by other WebSocket Handler who want to send messages to UserA. UserA is assigned port: 10562.
  2. UserA sends the message to WebSocket Handler intended to UserB
  3. If the WebSocket handler does not know which web server is connected to userB, it contacts WebSocket Manager for the information. Websocket Manager maintains the information of all the connected users in its Redis Cache. In parallel, the message is also sent to the Message service which stores it in the Cassandra database.
  4. Once the Websocket handler receives the socket address of userB, it forwards the message to that WebSocket handler.

User Online

  1. If the user is online an ack is received from UserB and forwarded to the UserA as well as updated to the database

User Offline

  1. If UserB is offline, the message gets stored in the database with status set as not delivered.
  2. When UserB connects to any WebSocket handler, the handler contacts messaging service to check if there are any pending messages for UserB.


Group Messaging

Group Service

Group service stores what users are associated with which group.

Group Message Handler

The group Message handler sits on top of the Kafka queue and is responsible for sending messages to all the members of the group.



Message Flow

  1. User sends a Msg and the group that it needs to be sent to.
  2. Message service stores the message in the database as well as in a Kafka queue.
  3. Group Message Handler retrieves the messages from Kafka queue and queries Group Service to find out the users associated with that group.
  4. The user to group mapping can be stored in MySQL cluster which connects to Group Service


Media Messages

For a media message, ideally, it is compressed and encrypted at the sender’s end. Similarly, decryption and uncompression happen at the receiver’s end. When a user sends a media message following steps are taken:

  1. Media is uploaded to a server and an id is generated that is sent back to the user.
  2. This id is forwarded to the receiver. The receiver searches this id and downloads the media associated with it.

The media is stored in the S3 database. Depending upon the usage it may be stored to CDN as well.


References: