- Git Disasters and Process Debt
- Is Code Rotting Due To AI?
- The Integration Illusion
- When MCP Fails
- Context Engineering
- Stop Email Spoofing with DMARC
- SOTA Embedding Retrieval: Gemini + pgvector for Production Chat
- A Review of Agentic Design Patterns
- Building AI Agents for Automated Podcasts
- Rediscovering Cursor
- GraphRAG > Traditional Vector RAG
- Cultural Bias in LLMs
- Mapping out the AI Landscape with Topic Modelling
- Sustainable Cloud Computing: Carbon-Aware AI
- Defensive Technology for the Next Decade of AI
- Situational Awareness: The Decade Ahead
- Mechanistic Interpretability: A Survey
- Why I Left Ubuntu
- Multi-Agent Collaboration
- Building Better Retrieval Systems
- Building an Automated Newsletter-to-Summary Pipeline with Zapier AI Actions vs AWS SES & Lambda
- Local AI Image Generation
- Deploying a Distributed Ray Python Server with Kubernetes, EKS & KubeRay
- Making the Switch to Linux for Development
- Scaling Options Pricing with Ray
- The Async Worker Pool
- Browser Fingerprinting: Introducing My First NPM Package
- Reading Data from @socket.io/redis-emitter without Using a Socket.io Client
- Socket.io Middleware for Redux Store Integration
- Sharing TypeScript Code Between Microservices: A Guide Using Git Submodules
- Efficient Dataset Storage: Beyond CSVs
- Why I switched from Plain React to Next.js 13
- Deploy & Scale Socket.io Containers in ECS with Elasticache
- Implementing TOTP Authentication in Python using PyOTP
- Simplifying Lambda Layer ARNs and Creating Custom Layers in AWS
- ›TimeScaleDB Deployment: Docker Containers and EC2 Setup
- How to SSH into an EC2 Instance Using PuTTY
When it comes to efficiently handling time-series data, TimeScaleDB emerges as a powerful open-source solution built on top of PostgreSQL. This guide explores two essential deployment approaches: running TimeScaleDB in a Docker container locally with passthrough mount capabilities, and setting it up manually on an Amazon EC2 instance for cloud-based operations.
Why TimeScaleDB?
TimeScaleDB introduces the concept of a hypertable
, which provides seamless data partitioning while presenting a virtual table abstraction across all your data. This powerful partitioning enables faster queries by quickly excluding irrelevant data, along with improved query planning and execution. With TimeScaleDB, you get the benefits of a single PostgreSQL table while handling large-scale time-series data effectively.
Docker Container with TimeScaleDB & Passthrough
Docker containers have revolutionized application deployment, and they can do wonders for your TimeScaleDB setup too. By running TimeScaleDB in a Docker container with passthrough to a local directory, you can avoid the usual disk space limitations found in container environments.
This approach is particularly valuable for development environments where you want to avoid the complications of installing and configuring database servers directly on your machine.
Here are the steps to get TimeScaleDB up and running in a Docker container:
-
Check Running Containers - First, ensure that there are no conflicting containers running by executing
docker ps
. -
Pull the TimeScaleDB Image - Obtain the latest TimeScaleDB image for PostgreSQL 14:
docker pull timescale/timescaledb:latest-pg14
- Initialize Volumes - Create a new Docker container with TimeScaleDB and PostgreSQL by using a host directory as a data volume. When you hit the 10GB limit for default root volume size in Docker containers, using a passthrough mount to local disk is the recommended approach over expanding the container volume.
rm -rf pylocal_data && mkdir pylocal_data && cd pylocal_data
docker run -d --name timescaledb -p 5432:5432 -e POSTGRES_PASSWORD=ndmtppa11! -v /"$(pwd)":/var/lib/postgresql/data timescale/timescaledb:latest-pg14
There are some forums on forums.docker as well as github related to git-bash mucking up docker mount. The solution is to escape the path conversion, by prefixing with /
. So the working docker command to run docker from git-bash with present working directory uses /${PWD}:
instead of $PWD:
- Access PostgreSQL Environment - Execute an interactive
bash
shell on the container:
winpty docker exec -it timescaledb bash
-
Connect to PostgreSQL - Enter the container's PostgreSQL environment using
psql -U postgres
. Thepsql
client is the native command-line client for PostgreSQL. -
Create a New Database - Within the PostgreSQL environment, create a new database using
CREATE DATABASE [DATABASE_NAME]
.- Use
\l
to view a database listing - Use
\c [DATABASE_NAME]
to connect to a different database - Find more slash commands in the PostgreSQL documentation
- Use
-
Create Tables and Schema - Use your SQL to create the necessary tables.
- Use
\d [TABLE_NAME]
to view table details including data types and columns
- Use
With these steps, you'll have a TimeScaleDB instance running smoothly in a Docker container with a passthrough mount to local disk. You can confirm this setup by using docker inspect <container>
to view the binding configuration.
Setting up TimeScaleDB on Amazon EC2
If you prefer a cloud-based solution, you can manually set up TimeScaleDB on an Amazon EC2 instance. Unfortunately, TimeScaleDB is not available as an extension through AWS RDS due to licensing restrictions.
TimescaleDB's licensing model means that Amazon cannot offer the "community" version that includes many advanced features. While Timescale offers a fully-managed cloud service on AWS, it can become quite expensive for larger deployments.
-
Install TimeScaleDB from Pre-built Cloud Image - Once signed in to your EC2 dashboard, navigate to Images > AMIs. In the search bar, change to Public images and type
Timescale
to find available TimescaleDB images. Select your desired image and click Launch instance from image. -
SSH into the EC2 Instance - Access the instance using SSH. You can reference this SSH guide for detailed instructions.
-
Become Root User - Execute
sudo -i
orsudo -s
to gain root privileges. -
Configure the Database - Run the
timescaledb-tune
script, which is included with thetimescaledb-tools
package. Note the config directory from the logs (typically/etc/postgresql/14/main/
). -
Restart PostgreSQL Service - After running the tune script, restart the service:
sudo systemctl restart postgresql.service
. -
Set up the TimescaleDB Extension - Connect to PostgreSQL and create your database:
CREATE DATABASE tsdb;
\c tsdb
CREATE EXTENSION IF NOT EXISTS timescaledb;
-
Configure Security Group - In the AWS EC2 console, navigate to your security group and edit inbound rules. Enable port
5432
with Type as PostgreSQL, Protocol as TCP, and source as0.0.0.0/0
(or restrict to specific IP addresses for better security). -
Update PostgreSQL Configuration - Modify
postgresql.conf
to allow external connections:
sed -i 's/'localhost'/'*'/g' /etc/postgresql/14/main/postgresql.conf
sed -i 's/#listen_addresses/listen_addresses/g' /etc/postgresql/14/main/postgresql.conf
- Update Authentication Configuration - Add the following line to
pg_hba.conf
to allow external connections:
echo "host all all 0.0.0.0/0 md5" >> /etc/postgresql/14/main/pg_hba.conf
- Set PostgreSQL Password - In the PostgreSQL environment, set a password for the postgres user:
ALTER USER postgres PASSWORD 'Password123!';
-
Restart PostgreSQL - Apply all configuration changes:
sudo service postgresql restart
-
Connect to the Database - Use pgAdmin or any PostgreSQL client to connect using the EC2 public IP, port
5432
, and your credentials.
In addition, you may want to change "max_connections = 25"
to "max_connections = 500"
to accommodate more connections. To do this, you can use something like: sed -i 's/max_connections = 25/max_connections = 500/g' ./conf.txt
Scale Out with TimescaleDB Multi-Node
A common criticism of PostgreSQL is that once you max out your database instance, you can't scale-out effectively. This is why Timescale created TimescaleDB Multi-node
, a solution for linking multiple PostgreSQL nodes to scale out ingest and query performance.
With Multi-node, you can achieve insertion rates of over 1 million rows per second. If you need this level of performance, or prefer a more managed deployment for production, TimescaleDB Cloud may be the easiest option. The UI, dashboard, and analytics features are quite impressive as well.
TimeScaleDB presents a robust and flexible solution for efficiently handling time-series data. Whether you choose to deploy it locally with Docker or in the cloud on Amazon EC2, TimeScaleDB provides the scalability and performance needed for effective time-series data management.
References
- TimescaleDB: "Self Hosted Installation Docs", TimescaleDB Documentation
- Yavar: "Allowing Remote Login to the PostgreSQL Database", Medium, 2020
- GeeksforGeeks: "Sed Command in Linux/Unix with examples", GeeksforGeeks