Code Monkey home page Code Monkey logo

teleport's Introduction

teleport

Build Status

A trigger-based Postgres replicator that performs real-time data changes based on DML triggers, and DDL migrations by either DDL event triggers or by diffing schema changes when event triggers are not available. In other words, a complete replicator that works without any special permissions on the database, just like the ones you don't have in AWS RDS.

Yes, you read it right

How it works

When DDL event triggers are not available, using a configurable time interval, Teleport diffs the current schema and replicate new tables, columns, indexes and so on from the source to the target. Inserted, updated or deleted rows are detected by triggers on the source, which generate events that teleport transform into batches for the appropriate targets.

If teleport fails to apply a batch of new/updated rows due to a schema change that is not reflected on target yet, it will queue the batch, apply the schema change and then apply the failed batches again. This ensures consistency on the data even after running migrations and changing the source schema.

Currently only source databases with Postgres versions >= 9.2.16 are supported. DDL event triggers are only available for Postgres versions >= 9.3. For AWS RDS, event triggers are only [available after Postgres versions >= 9.4.9][aws_event_triggers] Teleport requires that all replicated tables have a primary key.

Features

All the features above are replicatable by teleport:

  • INSERT/UPDATE/DELETE rows
  • Tables/columns
  • Composite types
  • Enums
  • Schemas
  • Functions
  • Indexes
  • Extensions

Install

go get -u github.com/pagarme/teleport

Getting started

Each running instance of teleport is responsible for managing a host, exposing a HTTP API to receive batches from other instances. For a master-slave replication you should run one teleport instance for the source host (master) and other for the target host (slave), and set the API of the target as the destination for the data fetched from the source.

Configuring the source instance

For the source, create a config file named source_config.yml:

batch_size: 10000
max_events_per_batch: 10000
use_event_triggers: true # Available for Postgres >= 9.3
processing_intervals:
  batcher: 100
  transmitter: 100
  applier: 100
  vacuum: 500
  ddlwatcher: 5000
database:
  name: "finops-db"
  database: "postgres"
  hostname: "postgres.mydomain.com"
  username: "teleport"
  password: "root"
  port: 5432
server:
  hostname: "0.0.0.0"
  port: 3000
targets:
  my-target:
    target_expression: "public.*"
    endpoint:
      hostname: "target.mydomain.com"
      port: 3001
    apply_schema: "test"

For each target under the targets section, it's possible to define a target_expression, which defines what tables will be replicated. The expression should be schema-qualified.

You should also set a apply_schema, which defines in what schema the data will be applied in the target, and a endpoint of the target teleport instance.

Configuring the target instance

For the target, create a config file named target_config.yml:

batch_size: 10000
max_events_per_batch: 10000
processing_intervals:
  batcher: 100
  transmitter: 100
  applier: 100
  vacuum: 500
  ddlwatcher: 5000
database:
  name: "my-target"
  database: "postgres"
  hostname: "postgres-replica.mydomain.com"
  username: "teleport"
  password: "root"
  port: 5432
server:
  hostname: "target.mydomain.com"
  port: 3001

You may have noted this config file does not include a targets section, simply because this instance will not be the source for any host. You can, however, use a instance as both source and target by simply including a targets section.

Initial load

It's possible to generate initial-load batches on the source that will be transmitted to the target. To do a initial-load, run on source:

$ teleport -config source_config.yml -mode initial-load -load-target my-target

This will create batches on the source that will be transmitted to my-target as soon as teleport starts running.

Starting up

You may start instances before the end of the initial load. This will replicate data as it's extracted from the source to the target, and further modifications will be replicated and applied later on.

On source, teleport will diff, group and batch events and transmit batches to the target. On the target, batches will be applied on the same order as they ocurred on the source.

On source, run:

$ teleport -config source_config.yml

On target, run:

$ teleport -config target_config.yml

Teleport is now up and running! \o/

Sentry Support

Teleport has native sentry. To enable it, just use the following config with the sentry DSN:

sentry_endpoint: https://user:[email protected]/8

Performance

We've been using teleport to replicate a roughly large production database (150GB) with ~50 DML updates per second and performance is pretty satisfying. Under our normal load, each teleport instance uses ~150MB of memory and not significant CPU usage nor spikes.

As teleport relies on (very light) triggers for data replication, the source database performance may be slightly affected, but impacts were negligible for our use cases.

Initial load uses Postgres' COPY FROM to load data, which makes it very fast. The initial load of our entire 150GB database took under ~14 hours using the db.m4.xlarge RDS instance for source and target.

Tests

$ docker-compose run test

License

The MIT license.

aws_event_triggers

Troubleshooting

  • Since version 0.4.0, the ids of Teleport's internal tables were changed to bigserial and bigint. The cose is backwards compatible, and works with the previous version, which used serial and int. If you are having problems with integer overflow, it will be necessary to either drop the current tables or manually alter them. See the changelog for details.

teleport's People

Contributors

greenboxal avatar gustavolivrare avatar joaolucasl avatar kevin-cantwell avatar pedrofranceschi avatar thalesmello avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

teleport's Issues

Cannot connect source to Heroku Postgres instances

Amazing tool! However Heroku requires sslmode=require to connect to their databases from outside. However it looks like sslmode is being explicitly disabled: https://github.com/pagarme/teleport/blob/master/database/database.go#L42

I tried forking, modifying the sslmode and still I can't connect, so forgive me as I'm not 100% sure this is the problem.

Error log:

source_1 | 2016/05/25 21:27:29 Erro starting database: %!(EXTRA *pq.Error=pq: no pg_hba.conf entry for host "<IP REDACTED>", user "<USER REDACTED>", database "<DATABASE REDACTED>", SSL off)

Question: use cases

This is very cool.

I am curious why this was written specifically. Maybe others too ?

When you have a cluster you can apply changes to one dB instance, and the changes propagate too all others.
But there must be a multitude of use cases this solves and probably a few you only discover after running a big postresql system.
Hence,the use cases are valuable to know

Implement vacuum

A vacuum is needed to delete replicated/ignored events and applied batches.

Cannot install triggers on tables which do not have a primary key

If the source schema contains tables without a primary key, installing triggers logs this error and panics:

source_1 | 2016/05/26 18:26:24 Error installing triggers on table table_name: table table_name does not have primary key!
source_1 | panic: runtime error: invalid memory address or nil pointer dereference
source_1 | [signal 0xb code=0x1 addr=0x0 pc=0x4963ff]
source_1 |
source_1 | goroutine 1 [running]:
source_1 | github.com/pagarme/teleport/database.(*Table).Diff(0xc8200f0540, 0x0, 0x0, 0xc820015cd0, 0x6, 0x0, 0x0, 0x0)
source_1 |  /go/src/github.com/pagarme/teleport/database/table.go:95 +0x5bf
source_1 | github.com/pagarme/teleport/batcher/ddldiff.Diff(0xaf7680, 0x0, 0x0, 0xc8204a8000, 0x2aa, 0x400, 0xc820015cd0, 0x6, 0x0, 0x0, ...)
source_1 |  /go/src/github.com/pagarme/teleport/batcher/ddldiff/ddldiff.go:22 +0x240
source_1 | github.com/pagarme/teleport/batcher/ddldiff.Diff(0xaf7680, 0x0, 0x0, 0xc82049d620, 0x1, 0x1, 0xc820015cd0, 0x6, 0x0, 0x0, ...)
source_1 |  /go/src/github.com/pagarme/teleport/batcher/ddldiff/ddldiff.go:27 +0xaeb
source_1 | github.com/pagarme/teleport/database.(*Ddl).Diff(0xc8201139f0, 0x0, 0x0, 0x0)
source_1 |  /go/src/github.com/pagarme/teleport/database/ddl.go:63 +0x1f6
source_1 | github.com/pagarme/teleport/loader.(*Loader).createDDLBatch(0xc820113d30, 0x0, 0x0, 0x0, 0x0, 0x0)
source_1 |  /go/src/github.com/pagarme/teleport/loader/ddl.go:27 +0x297
source_1 | github.com/pagarme/teleport/loader.(*Loader).Load(0xc820113d30, 0x0, 0x0)
source_1 |  /go/src/github.com/pagarme/teleport/loader/loader.go:46 +0x10a
source_1 | main.main()
source_1 |  /go/src/github.com/pagarme/teleport/main.go:112 +0x1331

Multi-Master Replication

Hi,

though there is the mention of multiple targets and master-master replication in the README and issues, I'd like to ask whether multi-master is supported. If so any deployment experiences (first-third hand knowledge)?

We have an upcoming deployment for a 1 central and roughly 10 satellites (over WAN) deployment with little actual load (satellites are POS's - ~1 write/minute) and I'm wondering whether to go for Teleport over Bucardo if possible. DDL propagation is really nice to have in this case. Also it gives kind of a rough time to trust 10k lines Perl for this, even though its one of the nicest Perl scripts I've seen so far.

Thanks,
~lwk

Test actions

Actions need to be tested to check if they affect the database the expected way. Testing for case-sensitiveness in columns might also be a good idea.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.