A Simple Architecture for Crypto Trading Bot

Background

The first stage of algorithmic trading involves data collection. You need data to do various things such as paper trader, backtest, and train ML models. More data is better as you can build more sophistated trading systems, but equally important is the software architecture you use to process, store and manage your data. In this blog, I will discuss a system design I use for building my trading system. It includes various services and databases and how to connect them all together.

Some important data from exchanges include Raw Trades, Liquidations, Candlesticks (Klines), and Open Interest candles. One issue I found is that crypto exchanges typically do not offer polling raw trade or liquidation data from their API's. Therefore, you must rely on their WebSocket streams running 24/7 to ingest this data. Historical market data is also becoming less available by these exchanges. For example, Binance no longer allows you to download historical liquidation data. This is problematic for backtesting, as liquidation data can be quite useful.

It is therefore imperative to build dedicated services for data collection, with backup systems in place in case of a failure.

Choosing the Right Database: QuestDB

My database of choice is QuestDB which is praised for its high-ingestion capacity, and its focus on finance-related domains. I first learned about QuestDB in a blog post from quant Dean Markwick. I previously used TimeScaleDB which is another time-series database, but I never fully trusted it for high-ingestion data, so at the time I was limited in the indicators that I could build.

A major challenge you'll face is storage. Each symbol can have tens of thousands of trades per day. You will need many gigabytes (GB) of disk space to store them, maybe even hundreds if you plan on ingesting trades from 100+ symbols, across multiple exchanges, spanning several months. With QuestDB, you can mitigate this by configuring Time-To-Live (TTL) on database tables to automatically purge data older than a specific time (e.g., 30 days). What I usually do is use a 1 month TTL on my trade tables, and for my Kline table I use a much higher TTL because it consumes negligible disk space.

Here's an example of how to create a trades table with TTL configuration:

CREATE TABLE IF NOT EXISTS trades (
    symbol SYMBOL,
    price DOUBLE,
    quantity DOUBLE,
    timestamp TIMESTAMP,
    is_buyer_maker BOOLEAN,
    exchange SYMBOL,
    trade_id STRING,
    value DOUBLE
) TIMESTAMP(timestamp)
PARTITION BY DAY
TTL 15 DAYS
DEDUP UPSERT KEYS(timestamp, trade_id, exchange, symbol);

This configuration provides several benefits:

Automatic partitioning by day for optimal query performance
Deduplication to prevent duplicate trade records
Automatic cleanup of data older than 15 days
Efficient indexing on timestamp for time-series queries

If your goal is to collect as much data as possible — for backtesting or model training, and you have the storage capacity, then you can simply increase the TTL or even omit it. You can also create new tables for different data types like Klines, Open Interest, and Funding Rates, customising and partitioning according to your needs. You can also create Materialized Views for quick access to this data. These are essentially new tables that aggregate the base table at specific intervals.

      const intervals = [
        { name: '10s', partition: 'HOUR', ttl: '6h' },
        { name: '1m', partition: 'DAY', ttl: '7d' },
        { name: '5m', partition: 'DAY', ttl: '7d' },
        { name: '15m', partition: 'DAY', ttl: '7d' },
        { name: '30m', partition: 'DAY', ttl: '14d' },
        { name: '1h', partition: 'DAY', ttl: '30d' },
        { name: '4h', partition: 'DAY', ttl: '60d' },
        { name: '1d', partition: 'WEEK', ttl: '5w' }
      ];

      // Create OHLC views for trades
      for (const interval of intervals) {
        const viewName = `trades_OHLC_${interval.name}`;
        
        const createViewSQL = `
          CREATE MATERIALIZED VIEW IF NOT EXISTS ${viewName}
          WITH BASE trades REFRESH INCREMENTAL
          AS (
            SELECT
              timestamp,
              symbol,
              exchange,
              first(price) AS open,
              max(price) AS high,
              min(price) AS low,
              last(price) AS close,
              sum(quantity) AS volume,
              sum(value) AS value
            FROM trades
            SAMPLE BY ${interval.name}
          ) PARTITION BY ${interval.partition} TTL ${interval.ttl};
        `;
        
        await this.executeHttpQuery(createViewSQL);
        console.log(`Created materialized view: ${viewName}`);
      }

Indicators of Interest

I will discuss briefly some indicators you can build with raw trades, liquidations, and klines.

Volume Profile Construct granular volume profiles over customised time ranges. Useful for analysing prices with high volume nodes. These act like magnets for price action.
Order Flow Build a query to aggregate trades to construct FootPrint candles, which show the buy and sell amount at each price level. Some useful information to be gauged from this:
- Stacked Imbalances: Identifying successive price levels dominated by either buyers or sellers
- Volume Delta / Cumulative Volume Delta Analysis: Tracking the difference between buy and sell volume at each price level
Liquidations Liquidation heatmaps / visible range are quite popular amongst regular traders and certainly something that should be utilised in your trading bots.
Higher Timeframe (HTF) Data Klines, Open Interest and Funding Rate data are still very important because you can use perform a more broader analysis using a plethora of widely available indicator libraries. This data as I mentioned earlier, is readily available from exchanges. Thus, the architecture described below will include a separate worker for collecting this data as well.

Architecture Overview

I will now discuss the system architecture for the data collection. It consists of three primary microservices, each optimised for specific responsibilities:

Core Services

1. Market Ingestor Service

Purpose: Real-time data collection and persistence
Connections: Multiple crypto exchanges (Binance, Coinbase, Kraken, etc.) via WebSockets
Data Types: Raw trades, liquidation events, order book snapshots
Storage: High-frequency writes to QuestDB via TCP with automatic partitioning

2. Kline Worker Service

Purpose: Periodic aggregated data collection
Method: RESTful API polling with configurable intervals
Data Types: Kline data (multiple timeframes), Open Interest, funding rates

3. Signals Engine

Purpose: Real-time technical analysis and signal generation
Frequency: High-frequency execution (sub-second to minute intervals)
Processing: Computes technical indicators, volume profiles, and custom signals
Storage: Results cached in Redis for ultra-low latency access

The Market Ingestor service connects to QuestDB and the exchange websocket streams. Its primary task is to write to the database, but what happens if it goes down? One solution is to have a backup system running that can takeover, ideally from a different provider in a different location. But what if the database goes down? This is where you could introduce a queueing solution such as Kafka that sits between the ingestor and the database, where once the database is back up and running, the missing trades will write back into the database, and ingestion will resume.

The Kline Worker will startup by connecting to the exchange websockets, passively listening to candle closes and inserting them into the database. There will be a one-time startup job of polling historical Kline data and writing them. Thereafter, this service will run in autopilot mode, functioning similarly to the Market Ingestor in just writing Kline closes from websocket to the database. If there is a websocket connection error or restart, polling to the API could takeover until it is able to reconnect. The table is deduped, meaning both sources can write the same candle and we won't need to worry about duplicates.

The Signals Engine is a service for reading data and performing indicator computation. It could be a Cron Job that runs on schedule. A great scalable solution to build this on would be AWS Lambda. They have a generous free tier of 400,000 GB seconds per month, and from my past experiences, running it every minute won't go over that. Technical data would be stored in Redis for fast O(1) access across various applications: trading bot, app site.

This architecture provides the foundation for building a crypto trading bot.