Published: June 23, 2022

TimeScaleDB Deployment: Docker Containers and EC2 Setup

When it comes to efficiently handling time-series data, TimeScaleDB emerges as a powerful open-source solution built on top of PostgreSQL. This guide explores two essential deployment approaches: running TimeScaleDB in a Docker container locally with passthrough mount capabilities, and setting it up manually on an Amazon EC2 instance for cloud-based operations.

Why TimeScaleDB?

TimeScaleDB introduces the concept of a hypertable, which provides seamless data partitioning while presenting a virtual table abstraction across all your data. This powerful partitioning enables faster queries by quickly excluding irrelevant data, along with improved query planning and execution. With TimeScaleDB, you get the benefits of a single PostgreSQL table while handling large-scale time-series data effectively.

Docker Container with TimeScaleDB & Passthrough

Docker containers have revolutionized application deployment, and they can do wonders for your TimeScaleDB setup too. By running TimeScaleDB in a Docker container with passthrough to a local directory, you can avoid the usual disk space limitations found in container environments.

This approach is particularly valuable for development environments where you want to avoid the complications of installing and configuring database servers directly on your machine.

Here are the steps to get TimeScaleDB up and running in a Docker container:

  1. Check Running Containers - First, ensure that there are no conflicting containers running by executing docker ps.

  2. Pull the TimeScaleDB Image - Obtain the latest TimeScaleDB image for PostgreSQL 14:

docker pull timescale/timescaledb:latest-pg14
  1. Initialize Volumes - Create a new Docker container with TimeScaleDB and PostgreSQL by using a host directory as a data volume. When you hit the 10GB limit for default root volume size in Docker containers, using a passthrough mount to local disk is the recommended approach over expanding the container volume.
rm -rf pylocal_data && mkdir pylocal_data && cd pylocal_data
docker run -d --name timescaledb -p 5432:5432 -e POSTGRES_PASSWORD=ndmtppa11! -v /"$(pwd)":/var/lib/postgresql/data timescale/timescaledb:latest-pg14
💡

There are some forums on forums.docker as well as github related to git-bash mucking up docker mount. The solution is to escape the path conversion, by prefixing with /. So the working docker command to run docker from git-bash with present working directory uses /${PWD}: instead of $PWD:

  1. Access PostgreSQL Environment - Execute an interactive bash shell on the container:
winpty docker exec -it timescaledb bash
  1. Connect to PostgreSQL - Enter the container's PostgreSQL environment using psql -U postgres. The psql client is the native command-line client for PostgreSQL.

  2. Create a New Database - Within the PostgreSQL environment, create a new database using CREATE DATABASE [DATABASE_NAME].

    • Use \l to view a database listing
    • Use \c [DATABASE_NAME] to connect to a different database
    • Find more slash commands in the PostgreSQL documentation
  3. Create Tables and Schema - Use your SQL to create the necessary tables.

    • Use \d [TABLE_NAME] to view table details including data types and columns

With these steps, you'll have a TimeScaleDB instance running smoothly in a Docker container with a passthrough mount to local disk. You can confirm this setup by using docker inspect <container> to view the binding configuration.

Setting up TimeScaleDB on Amazon EC2

If you prefer a cloud-based solution, you can manually set up TimeScaleDB on an Amazon EC2 instance. Unfortunately, TimeScaleDB is not available as an extension through AWS RDS due to licensing restrictions.

TimescaleDB's licensing model means that Amazon cannot offer the "community" version that includes many advanced features. While Timescale offers a fully-managed cloud service on AWS, it can become quite expensive for larger deployments.

  1. Install TimeScaleDB from Pre-built Cloud Image - Once signed in to your EC2 dashboard, navigate to Images > AMIs. In the search bar, change to Public images and type Timescale to find available TimescaleDB images. Select your desired image and click Launch instance from image.

  2. SSH into the EC2 Instance - Access the instance using SSH. You can reference this SSH guide for detailed instructions.

  3. Become Root User - Execute sudo -i or sudo -s to gain root privileges.

  4. Configure the Database - Run the timescaledb-tune script, which is included with the timescaledb-tools package. Note the config directory from the logs (typically /etc/postgresql/14/main/).

  5. Restart PostgreSQL Service - After running the tune script, restart the service: sudo systemctl restart postgresql.service.

  6. Set up the TimescaleDB Extension - Connect to PostgreSQL and create your database:

CREATE DATABASE tsdb;
\c tsdb
CREATE EXTENSION IF NOT EXISTS timescaledb;
  1. Configure Security Group - In the AWS EC2 console, navigate to your security group and edit inbound rules. Enable port 5432 with Type as PostgreSQL, Protocol as TCP, and source as 0.0.0.0/0 (or restrict to specific IP addresses for better security).

  2. Update PostgreSQL Configuration - Modify postgresql.conf to allow external connections:

sed -i 's/'localhost'/'*'/g' /etc/postgresql/14/main/postgresql.conf
sed -i 's/#listen_addresses/listen_addresses/g' /etc/postgresql/14/main/postgresql.conf
  1. Update Authentication Configuration - Add the following line to pg_hba.conf to allow external connections:
echo "host    all             all             0.0.0.0/0               md5" >> /etc/postgresql/14/main/pg_hba.conf
  1. Set PostgreSQL Password - In the PostgreSQL environment, set a password for the postgres user:
ALTER USER postgres PASSWORD 'Password123!';
  1. Restart PostgreSQL - Apply all configuration changes: sudo service postgresql restart

  2. Connect to the Database - Use pgAdmin or any PostgreSQL client to connect using the EC2 public IP, port 5432, and your credentials.

💡

In addition, you may want to change "max_connections = 25" to "max_connections = 500" to accommodate more connections. To do this, you can use something like: sed -i 's/max_connections = 25/max_connections = 500/g' ./conf.txt

Scale Out with TimescaleDB Multi-Node

A common criticism of PostgreSQL is that once you max out your database instance, you can't scale-out effectively. This is why Timescale created TimescaleDB Multi-node, a solution for linking multiple PostgreSQL nodes to scale out ingest and query performance.

With Multi-node, you can achieve insertion rates of over 1 million rows per second. If you need this level of performance, or prefer a more managed deployment for production, TimescaleDB Cloud may be the easiest option. The UI, dashboard, and analytics features are quite impressive as well.

TimeScaleDB presents a robust and flexible solution for efficiently handling time-series data. Whether you choose to deploy it locally with Docker or in the cloud on Amazon EC2, TimeScaleDB provides the scalability and performance needed for effective time-series data management.


References