dbus

yet another databus that transfer/transform pipeline data between plugins

dbus

      $$\       $$\                                       
      $$ |      $$ |                                      
 $$$$$$$ |      $$$$$$$\        $$\   $$\        $$$$$$$\ 
$$  __$$ |      $$  __$$\       $$ |  $$ |      $$  _____|
$$ /  $$ |      $$ |  $$ |      $$ |  $$ |      \$$$$$$\  
$$ |  $$ |      $$ |  $$ |      $$ |  $$ |       \____$$\ 
\$$$$$$$ |      $$$$$$$  |      \$$$$$$  |      $$$$$$$  |
 \_______|      \_______/        \______/       \_______/ 

yet another databus that transfer/transform pipeline data between plugins.

dbus is not yet a 1.0. We’re writing more tests, fixing bugs, working on TODOs.

Features

dbus supports powerful and scalable directed graphs of data routing, transformation and system mediation logic.

  • Designed for extension
    • plugin architecture
    • build your own plugins and more
    • enables rapid development and effective testing
  • Data Provenance
    • track dataflow from beginning to end
    • visualized dataflow
    • rich metrics feed into tsdb
    • online manual mediation of the dataflow
    • RESTful API
    • monitoring with alert
  • Distributed Deployment
    • shard/balance/auto rebalance
    • linear scale
  • Delivery Guarantee
    • loss tolerant
    • high throuput vs low latency
    • back pressure
  • Robustness
    • race condition detected
    • edge cases fully covered
    • network jitter tested
    • dependent components failure tested
  • Systemic Quality
    • hot reload
    • dryrun throughput 1.9M packets/s

Getting Started

1. Installing

To start using dbus, install Go and run go get:

$ go get -u github.com/funkygao/dbus

2. Create config file

Please find sample config files in etc/ directory.

3. Run the server

$ $GOPATH/dbusd -conf $myfile

Dependencies

dbus itself has no external dependencies.

But the plugins might have. For example, MysqlbinlogInput uses zookeeper for sharding/balance/election.

Plugins

Input

  • MysqlbinlogInput
  • RedisbinlogInput
  • KafkaInput
  • MockInput

Filter

  • MysqlbinlogFilter
  • MockFilter

Output

  • KafkaOutput
  • ESOutput
  • MockOutput

Configuration

  • KafkaOutput async mode with batch=1024/500ms, ack=WaitForAll
  • KafkaOutput retry=3, retry.backoff=350ms
  • Mysql binlog positioner commit every 1s, channal buffer 100

FAQ

mysql binlog

  • is it totally data loss tolerant?

if the binlog exceeds 1MB, it will be discarded(lost)

  • ERROR 1236 (HY000): Could not find first log file name in binary log index file

the checkpointed binlog position is gone on master, reset the zk znode and replication will automatically resume

why not canal?

  • no Delivery Guarantee
  • no Data Provenance
  • no integration with kafka
  • only hot standby deployment mode, we need sharding load
  • dbus is a dataflow engine, while canal only support mysql binlog pipeline

TODO

  • [ ] batcher only retries after full batch ack’ed, add timer?
  • [ ] sharding binlog across the dbusd cluster
    • [ ] integration with helix
  • [ ] pack.Payload reuse memory, json.NewEncoder(os.Stdout)
  • [ ] router finding matcher is slow
  • [X] hot reload on config file changed
  • [X] each Input have its own recycle chan, one block will not block others
  • X [zabbix] invalid table id 2968, no correspond table map event
  • [X] make canal, high cpu usage
    • because CAS backoff 1us, cpu busy
  • [X] ugly design of Input/Output ack mechanism
    • we might learn from storm bolt ack
  • [X] some goroutine leakage
  • [X] pipeline
    • 1 input, multiple output
    • filter to dispatch dbs of a single binlog to different output
  • [X] kill Packet.input field
  • [X] visualized flow throughput like nifi
    • dump uses dag pkg
    • pipeline
  • [X] router metrics
  • [X] dbusd api server
  • [X] logging
  • [X] share zkzone instance
  • [X] presence and standby mode
  • [X] graceful shutdown
  • [X] master must drain before leave cluster
  • [X] KafkaOutput metrics
    • binlog tps
    • kafka tps
    • lag
  • [X] hub is shared, what if a plugin blocks others
    • currently, I have no idea how to solve this issue
  • [X] Batcher padding
  • [X] shutdown kafka
  • [X] zk checkpoint vs kafka checkpoint
  • [X] kafka follower stops replication
  • [X] can a mysql instance with miltiple databases have multiple Log/Position?
  • [X] kafka sync produce in batch
  • [X] DDL binlog
    • drop table y;
  • [X] trace async producer Successes channel and mark as processed
  • [X] metrics
  • [X] telemetry and alert
  • [X] what if replication conn broken
  • [X] position will be stored in zk
  • [X] play with binlog_row_image
  • [ ] project feature for multi-tenant
  • [ ] bug fix
    • [ ] kill dbusd, dbusd-slave did not leave cluster
    • [ ] next log position leads to failure after resume
    • [ ] KafkaOutput only support 1 partition topic for MysqlbinlogInput
    • [X] table id issue
    • [X] what if invalid position
    • [X] router stat wrong Total:142,535,625 0.00B speed:22,671/s 0.00B/s max: 0.00B/0.00B
    • [X] ffjson marshalled bytes has NL before the ending bracket
  • [ ] test cases
    • [X] restart mysql master
    • [X] mysql kill process
    • [X] race detection
    • [ ] tc drop network packets and high latency
    • [ ] mysql binlog zk session expire
    • [X] reset binlog pos, and check kafka didn’t recv dup events
    • [X] MysqlbinlogInput max_event_length
    • [X] min.insync.replicas=2, shutdown 1 kafka broker then start
  • [ ] GTID
    • place config to central zk znode and watch changes
  • [ ] Known issues
  • [ ] Roadmap
    • pubsub audit reporter
    • universal kafka listener and outputer

Memo

  • mysqlbinlog input peak with mock output

    • 140k event per second
    • 30k row event per second
    • 260Mb network bandwidth
    • KafkaOutput 35K msg per second
    • it takes 2h25m to zero lag for platform of 2d lag
  • dryrun MockInput -> MockOutput

    • 1.8M packet/s

Related Repositories

php-dbus

php-dbus

PHP DBus extension ...

debian-golang-dbus

debian-golang-dbus

https://anonscm.debian.org/cgit/pkg-go/packages/golang-dbus.git ...

spotify-dbus

spotify-dbus

Simple gem and utility bin for controlling the Spotify player via the DBus ...

dbus-factory

dbus-factory

mirrored from https://cr.deepin.io/#/admin/projects/dbus-factory ...

ruby-dbus-gnome-keyring-playground

ruby-dbus-gnome-keyring-playground

Some playground for ruby/dbus/gnome-keyring ...