Provisioning Real-Time PostgreSQL with TimeScaleDB: Local Docker Container with Passthrough Mount & Cloud Configuration in an EC2

Published: June 23, 2022

When it comes to efficiently handling time-series data, TimeScaleDB emerges as a powerful open-source solution built on top of PostgreSQL. In this article, we'll explore how to provision and configure TimeScaleDB for two different setups: running it within a Docker container locally with a passthrough mount to a local directory as well as setting it up manually on an Amazon EC2 instance as a cloud-based database.

Why TimeScaleDB?

TimeScaleDB introduces the concept of a hypertable, which provides seamless data partitioning while presenting a virtual table abstraction across all your data. This powerful partitioning enables faster queries by quickly excluding irrelevant data, along with improved query planning and execution. With TimeScaleDB, you get the benefits of a single PostgreSQL table while handling large-scale time-series data effectively.

Docker Container with TimeScaleDB & Passthrough

Docker containers have revolutionized application deployment, and they can do wonders for your TimeScaleDB setup too. By running TimeScaleDB in a Docker container with passthrough to a local directory, you can avoid the usual disk space limitations found in container environments, making it a perfect choice for setting up heavier databases locally. Additionally, for development purposes, using Docker containers means avoiding the complications of installing and configuring database servers.

Here are the steps to get TimeScaleDB up and running in a Docker container:

  1. Check Running Containers: First, ensure that there are no conflicting containers running by executing the command docker ps.
  2. Pull the TimeScaleDB Image: Obtain the latest TimeScaleDB image for PostgreSQL 14:
docker pull timescale/timescaledb:latest-pg14
  1. Initialize volumes: Create a new Docker container with TimeScaleDB and PostgreSQL by using a Host Directory as a Data volume.
    • When we hit the limit of 10Gb for default root volume size for docker containers, using a larger root volume is tempting. However, it's likely because you're doing something that can be done a better way. This is why in this post, I suggest instead that we use a storage container (if another 10Gb would be sufficient) or use this passthrough mount to local disk.
rm -rf pylocal_data && mkdir pylocal_data && cd pylocal_data
docker run -d --name timescaledb -p 5432:5432 -e POSTGRES_PASSWORD=ndmtppa11! -v /"$(pwd)":/var/lib/postgresql/data timescale/timescaledb:latest-pg14
💡

There are some forums on forums.docker as well as github related to git-bash mucking up docker mount. The solution is to escape the path conversion, by prefixing with /. So the working docker command to run docker from git-bash with present working directory uses /${PWD}: instead of $PWD:

  1. Access PostgreSQL Environment: Next, execute an interactive bash shell on the container:
winpty docker exec -it timescaledb bash
  1. Enter the container's PostgreSQL environment using psql -U postgres. The psql client is the native command-line client for PostgreSQL.
  2. Create a New Database: Within the PostgreSQL environment, create a new database using CREATE DATABASE [DATABASE_NAME].
    • use \l to view a database listing
    • \c [DATABASE_NAME] will make a different database connection with the \c command in lowercase.
    • Find more slash commands here
  3. Create Tables and Schema: use your SQL to create the necessary tables.
    • \d [TABLE_name] will obtain the details of a table in PostgreSQL such as the data types and columns.

With these steps, you'll have a TimeScaleDB instance running smoothly in a Docker container, with a passthrough mount to local disk! You can confirm this by using docker inspect <container> to view the binding.

Setting up TimeScaleDB on Amazon EC2

If you prefer a cloud-based solution, you can manually set up TimeScaleDB on an Amazon EC2 instance. Unfortunately, TimescaleDB is not within the list of available extensions through AWS RDS. Further, TimescaleDB's licensing means that Amazon cannot offer the "community" version of TimescaleDB that includes many of its advanced features. Timescale (the company) does offer a fully-managed cloud on AWS, but having used it, it can rack up quite a bill. Its the same issue when it comes to Amazon Timestream...

TimeScale DB Setup In an EC2

  1. Installing TimeScaleDB from a Pre-built Cloud Image:
    • Once you are signed in to your EC2 dashboard, navigate to Images > AMIs.
    • In the search bar, change the search to Public images and type Timescale to find all available TimescaleDB images.
    • Select the image you want to use, and click "Launch instance from image".
  2. SSH into the EC2 Instance:. Access the instance using SSH. I have a Blog Post on how to do this.
  3. To become root user type:
    • sudo -i or sudo -s
  4. The easiest way to configure your database is to run the timescaledb-tune script, which is included with the timescaledb-tools package.
    • From the logs of the timescaledb-tune command, you should note the config directory. For me it was /etc/postgresql/14/main/
  5. After running the timescaledb-tune script, you need to restart the PostgreSQL service for the configuration changes to take effect. To restart the service, run sudo systemctl restart postgresql.service.
  6. set up the TimescaleDB extension:
    • connect to the PostgreSQL instance as the postgres superuser via sudo -u postgres psql
    • create an empty database. For example, to create a database called tsdb
CREATE DATABASE tsdb;
  • connect to the database you just created
\c tsdb
  • create the TimescaleDB extension
CREATE EXTENSION IF NOT EXISTS timescaledb
  1. Opening the port in the AWS EC2
    • Go to security group for the EC2
    • under Actions menu select edit inbound rules
    • enable port 5432 (for PostgreSQL server), choose Type as Postgresql, Protocol as TCP, port range as 5432. and source as 0.0.0.0/0 to accept all the incoming requests or you can provide a particular IP address to give access
  2. SSH back into the instance
  3. To log the conf use:
cat /etc/postgresql/14/main/postgresql.conf
  1. Update PostgreSQL Configuration: Modify postgresql.conf to allow connections from any IP address using the following commands:
sed -i 's/'localhost'/'*'/g' /etc/postgresql/14/main/postgresql.conf
sed -i 's/#listen_addresses/listen_addresses/g' /etc/postgresql/14/main/postgresql.conf
  • In reality, you may want to specify which IPs are allowed to connect to the database.
  1. Update pg_hba.conf: In the same directory, add the following line at the end of pg_hba.conf to allow all IPs to connect to the database:
echo "host    all             all             0.0.0.0/0               md5" >> /etc/postgresql/14/main/pg_hba.conf
  1. In postgres again, add a password with:
    • ALTER USER postgres PASSWORD 'Password123!';
  2. Restart the postgres service with sudo service postgresql restart
  3. Connect to the Database: Use pgAdmin to connect to the database using the public IP of the EC2 instance, port 5432, and the credentials you created.
💡

In addition, you may want to change "max_connections = 25" to "max_connections = 500" to accommodate more connections. To do this, you can use something like: sed -i 's/max_connections = 25/max_connections = 500/g' ./conf.txt

Scale Out with TimescaleDB Multi-Node

A common criticism of Postgres is that once you max out your database instance you can’t scale-out effectively. This is why Timescale created TimescaleDB Multi-node, a way of linking multiple PostgreSQL nodes to scale out ingest and query performance, while inserting over 1 million rows per second. If you need this, or you want a more managed deployment for Production, TimescaleDB cloud may be the easiest option. The UI, dashboard and analytics are also quite nice admittedly.

In conclusion though, TimeScaleDB presents a robust and flexible solution for efficiently handling time-series data. Whether you choose to deploy it locally with Docker or in the cloud on Amazon EC2, the power of TimeScaleDB is a must-have for your time-series data management.


Resources