Overview

This post presents a high-level yet production-grade system design for an Uber-like ride-hailing platform. The design focuses on scalability, low latency, high availability, and real-time geospatial matching while handling massive write throughput for driver location updates.


Functional Requirements

Core Requirements

  • Riders must be able to request a ride.
  • Drivers must be able to accept or reject ride requests.
  • The system must match riders with nearby available drivers.
  • Fare estimation and calculation must be supported.
  • Real-time driver location tracking must be available.
  • The system must manage the complete trip lifecycle (Requested → Matched → In Progress → Completed).

Secondary Requirements

  • Surge pricing.
  • Rider and driver ratings.
  • Scheduled rides.

Non-Functional Requirements

  1. Low Latency (Critical)
    Ride matching must occur within seconds. Driver movement updates must appear near real-time on the rider’s map.

  2. High Availability
    The platform must operate continuously across regions. Regional or node-level failures must not cause system-wide downtime.

  3. High Scalability
    The system must handle significant traffic spikes during peak hours, bad weather, or large events.

  4. Consistency Trade-offs
    • Strong consistency is required for payments and trip state transitions (e.g., ensuring a trip is accepted by only one driver).
    • Eventual consistency is acceptable for location data, where minor delays are tolerable.
  5. Reliability
    Once a driver accepts a ride, the system must guarantee that the booking is not lost.

Capacity Estimation

Assumptions

  • Total registered users: 300 million
  • Daily active riders: 10 million
  • Active drivers: 1 million
  • Average trips per rider per day: 1
  • Total trips per day: 10 million

Traffic Estimation (RPS)

  • Seconds per day ≈ 100,000
  • Ride request rate ≈ 10,000,000 / 100,000 = ~100 RPS

Location Update Load

  • Driver location update frequency: every 5 seconds
  • Updates per second:
    1,000,000 / 5 = 200,000 writes/sec

Bandwidth Estimation

  • Size per location update ≈ 100 bytes
  • Total bandwidth ≈ 200,000 × 100 bytes = 20 MB/sec

Storage Estimation

  • Trip record size ≈ 1 KB
  • Daily trip storage ≈ 10 GB/day

Database Design

1. Persistent Storage (Relational Database)

Rider

  • rider_id
  • name
  • email
  • phone
  • rating
  • payment_method_id

Driver

  • driver_id
  • name
  • email
  • phone
  • rating
  • status (AVAILABLE, BUSY, OFFLINE)
  • vehicle_details

Trip

  • trip_id
  • pickup_location (lat, long)
  • drop_location (lat, long)
  • driver_id
  • rider_id
  • start_time
  • end_time
  • status (REQUESTED, MATCHED, IN_PROGRESS, COMPLETED, CANCELLED)
  • fare

Relational databases (e.g., PostgreSQL) are used here for strong consistency and durability.


2. Real-Time Location Storage (In-Memory / NoSQL)

Driver location updates generate extremely high write throughput, which is unsuitable for traditional relational databases.

Driver Location Store (Redis / Cassandra)

  • Key: driver_id
  • Value: { latitude, longitude, timestamp }
  • TTL: 30 seconds

Drivers automatically disappear from the system if updates stop arriving.


3. Geospatial Indexing

Efficiently finding nearby drivers is the core challenge.

Naïve range queries on latitude and longitude do not scale. A spatial index is required.

Supported approaches:

  • Geohash: Encodes latitude/longitude into a string. Shared prefixes imply proximity.
  • QuadTree: Hierarchical spatial partitioning.

Geospatial Index Table

  • Key: geohash
  • Value: list of driver_ids in that cell

Redis internally uses geohashing for its geospatial commands.


API Design

1. Request a Ride (Rider)

Endpoint
POST /v1/trips

Headers
Authorization: Bearer <rider_token>

Request Body

{
  "pickup_lat": 37.7749,
  "pickup_long": -122.4194,
  "dropoff_lat": 37.7849,
  "dropoff_long": -122.4094,
  "service_type": "uberX"
}

Response

{
  "trip_id": "7645-8732-abcd",
  "status": "SEARCHING",
  "estimated_fare": 25.50
}

The response returns immediately with a trip_id while driver matching proceeds asynchronously.


2. Accept a Ride (Driver)

Endpoint
POST /v1/trips/{trip_id}/accept

Headers
Authorization: Bearer <driver_token>

Request Body

{
  "driver_id": "8888-9999-xxxx"
}

Response

  • 200 OK on success
  • 409 Conflict if another driver accepted the trip first

3. Driver Location Update

Endpoint
POST /v1/driver/location

Request Body

{
  "lat": 37.7750,
  "long": -122.4195,
  "azimuth": 120,
  "speed": 45
}

This endpoint handles ~200,000 RPS and must be highly optimized.


Real-Time Location Delivery Architecture

Publish–Subscribe Model

  • Drivers publish location updates to the backend.
  • Riders receive updates via persistent WebSocket connections.
  • The backend acts as a broker between drivers and riders.

Connection Management Layer

Because millions of concurrent WebSocket connections cannot be handled by a single server, a dedicated Push Gateway / Connection Manager layer is introduced.

  • Riders maintain WebSocket connections to gateway servers.
  • Driver updates arrive at arbitrary backend nodes.
  • Updates are propagated via an internal message bus (Redis Pub/Sub or Kafka).
  • The appropriate gateway pushes updates to connected riders.

Microservices Architecture

Service Breakdown

  1. Trip Service
    • Manages trip lifecycle and state transitions.
  2. Ride Matching Service
    • Executes geospatial queries to find nearby available drivers.
  3. User / Driver Service
    • Manages profiles, ratings, and metadata (SQL-backed).
  4. Driver Location Service
    • High-throughput ingestion of location updates (Redis-backed).
  5. Fare Estimation Service
    • Calculates the fare of the ride.

This separation avoids shared mutable state and enables independent scaling.


Redis for Location Management

Why Redis?

  1. In-Memory Performance
    RAM-based storage enables millions of operations per second with sub-millisecond latency.

  2. Native Geospatial Support
    Redis provides built-in commands:

    • GEOADD drivers <lon> <lat> <driver_id>
    • GEORADIUS drivers <lon> <lat> <radius> km

These commands eliminate the need for custom geospatial math.


Fault Tolerance and Self-Healing

Redis stores only ephemeral driver location data.

  • If Redis crashes, all in-memory location data is lost.
  • Drivers resend location updates every 5 seconds.
  • The system fully rebuilds accurate state within seconds.

This self-healing behavior makes persistence unnecessary for real-time location data, while all critical data (users, trips, payments) remains safely stored in durable databases.


Kafka-Based Location Pipeline

To further improve scalability and resilience, Apache Kafka is introduced.

Benefits

  • Decoupling producers from consumers.
  • Buffering during traffic spikes.
  • Fan-out to multiple consumers (matching, analytics, fraud detection).

Data Flow

  1. Driver app → API Gateway
  2. Ingestion service publishes to Kafka topic (driver_locations)
  3. Location updater service consumes from Kafka
  4. Location updater writes to Redis
  5. Ride Matching Service queries Redis for real-time driver positions

This architecture supports massive scale with strong fault isolation.


Microservices Deep Dive

API Gateway

Responsibilities

  • Authentication and authorization (JWT / OAuth)
  • Request validation and schema enforcement
  • Rate limiting and abuse protection
  • Routing requests to backend microservices

Interactions

  • Entry point for Rider and Driver apps
  • Forwards requests to Trip Service, Driver Location Service, and User / Driver Service
  • Contains no business logic

Trip Service

Responsibilities

  • Owns the complete trip lifecycle
  • Enforces strong consistency for trip state transitions
  • Guarantees that a trip can be accepted by only one driver

Owned Data

  • Trips table (SQL, transactional)

APIs

  • POST /v1/trips
  • POST /v1/trips/{trip_id}/accept
  • GET /v1/trips/{trip_id}

Interactions

  • Called synchronously by API Gateway
  • Triggers Ride Matching Service asynchronously
  • Publishes trip updates to Push Gateway

Ride Matching Service

Responsibilities

  • Finds the optimal available driver for a trip
  • Applies distance, availability, and service-type filters
  • Executes geospatial queries

Characteristics

  • Stateless
  • Horizontally scalable

Interactions

  • Reads driver locations from Redis
  • Proposes driver assignment to Trip Service

Fare Estimation Service

Responsibilities

  • Calculates estimated fare before a ride is booked
  • Applies pricing rules based on distance, time, and service type
  • Incorporates surge multipliers and dynamic pricing
  • Returns a price range to the rider for transparency

Characteristics

  • Stateless
  • Pure computation service
  • Horizontally scalable

Inputs

  • Pickup location (lat, long)
  • Drop-off location (lat, long)
  • Ride type (uberX, uberXL, etc.)
  • Current surge multiplier

Outputs

  • Estimated fare range
  • Pricing breakdown (base fare, distance cost, time cost, surge)

Interactions

  • Called synchronously by Trip Service during ride creation
  • May query:
    • Pricing configuration store
    • Surge Pricing component (in-memory or Redis)

Driver Location Service

Responsibilities

  • Ingests high-volume driver location updates
  • Publishes location events to Kafka
  • Maintains real-time driver positions in Redis

Owned Data

  • Redis geospatial sets (ephemeral)

Interactions

  • Receives traffic via API Gateway
  • Produces events to Kafka
  • Kafka consumers update Redis using GEOADD

User / Driver Service

Responsibilities

  • Manages rider and driver profiles
  • Stores ratings and metadata
  • Maintains driver availability status

Owned Data

  • Users and drivers (SQL)

Interactions

  • Queried by Trip Service and Ride Matching Service

Push Gateway (Connection Manager)

Responsibilities

  • Maintains persistent WebSocket connections with riders
  • Pushes real-time trip and driver location updates
  • Scales independently from core backend services

Interactions

  • Consumes events from Kafka or internal message bus
  • Delivers updates to connected clients

End-to-End Action Flows

Flow 1: Rider Requests a Ride

Sequence

  1. Rider App → API Gateway
  2. API Gateway → Trip Service
  3. Trip Service → Fare Estimation Service (sync)
  4. Fare Estimation Service returns estimated fare
  5. Trip Service persists trip with status REQUESTED
  6. Trip Service triggers Ride Matching Service asynchronously

Payload Example

{
  "pickup_lat": 37.7749,
  "pickup_long": -122.4194,
  "dropoff_lat": 37.7849,
  "dropoff_long": -122.4094,
  "service_type": "uberX"
}
{
  "trip_id": "7645-8732-abcd",
  "status": "SEARCHING",
  "estimated_fare": {
    "min": 22.00,
    "max": 28.50,
    "currency": "GBP"
  }
}

Logical Diagram

Rider → API Gateway → Trip Service → Ride Matching Service

Flow 2: Driver Accepts a Ride

Sequence

  1. Driver App → API Gateway
  2. API Gateway → Trip Service
  3. Trip Service performs transactional state update
  4. Trip Service notifies Push Gateway

Consistency Guarantee

  • Database transaction or optimistic locking ensures only one driver can accept the trip

Flow 3: Driver Location Update

Sequence

  1. Driver App → API Gateway
  2. API Gateway → Driver Location Service
  3. Driver Location Service publishes event to Kafka
  4. Kafka consumer updates Redis using GEOADD

Logical Diagram

Driver → Gateway → Location Service → Kafka → Redis

Flow 4: Real-Time Location Push to Rider

Sequence

  1. Location update consumed from Kafka
  2. Push Gateway receives update
  3. Push Gateway sends update via WebSocket to Rider App

Payload Example

{
  "driver_id": "d123",
  "lat": 37.7750,
  "long": -122.4195
}

Why This Architecture Works

  • Fare estimation is isolated into a dedicated stateless service, allowing pricing logic to evolve independently from trip state management
  • Strong consistency is limited to the Trip Service
  • High-throughput paths are asynchronous and event-driven
  • Redis stores only ephemeral, self-healing state
  • Kafka provides buffering, decoupling, and fan-out
  • Each microservice scales independently

Summary

This design demonstrates how Uber-like systems handle:

  • Massive real-time data ingestion
  • Low-latency geospatial queries
  • Event-driven communication
  • Ephemeral yet reliable state management