streamt¶

dbt for streaming — Build declarative streaming pipelines with Kafka, Flink, and Connect

Preview

What is streamt?¶

streamt brings the beloved dbt workflow to real-time streaming. Define your streaming pipelines declaratively using YAML and SQL, then let streamt handle compilation, validation, and deployment to Kafka, Flink, and Kafka Connect.

stream_project.yml

project:
  name: payments-pipeline
  version: "1.0.0"

sources:
  - name: payments_raw
    topic: payments.raw.v1
    description: Raw payment events from checkout service

models:
  - name: payments_validated
    description: Validated payments with fraud scores
    sql: |
      SELECT
        payment_id,
        customer_id,
        amount,
        CASE WHEN amount > 10000 THEN 'HIGH_RISK' ELSE 'NORMAL' END as risk_level
      FROM {{ source("payments_raw") }}
      WHERE status IS NOT NULL

Why streamt?¶

Declarative First¶

Define what you want, not how to build it. Write YAML and SQL, let streamt generate Kafka topics, Flink jobs, and Connect configurations.

Built-in Lineage¶

Automatic dependency tracking from your SQL. See exactly how data flows from sources through transformations to downstream consumers.

Governance & Quality¶

Enforce naming conventions, partition requirements, and data classification. Run schema, sample, and continuous tests on your streams.

Plan Before Apply¶

Review changes before deployment. See what topics will be created, which Flink jobs will be updated, and what connectors will be modified.

How It Works¶

graph LR
    A[YAML + SQL] --> B[Parse & Validate]
    B --> C[Compile Artifacts]
    C --> D[Plan Changes]
    D --> E[Apply to Infrastructure]

    E --> F[Kafka Topics]
    E --> G[Flink Jobs]
    E --> H[Connect Connectors]

    style A fill:#e0e7ff,stroke:#5046e5
    style E fill:#d1fae5,stroke:#059669

Define your sources, models, tests, and exposures in YAML
Validate syntax, references, and governance rules
Compile to infrastructure-specific artifacts
Plan to see what will change
Apply to deploy to Kafka, Flink, and Connect

Quick Example¶

Define SourcesCreate ModelsAdd TestsRun CLI

sources/payments.yml

sources:
  - name: payments_raw
    topic: payments.raw.v1
    description: Raw payment events from checkout
    owner: payments-team
    schema:
      registry: confluent
      subject: payments-raw-value
    columns:
      - name: payment_id
        description: Unique payment identifier
      - name: amount
        description: Payment amount in cents
        classification: internal

models/payments_clean.yml

models:
  - name: payments_clean
    description: Cleaned and validated payments
    sql: |
      SELECT
        payment_id,
        customer_id,
        amount,
        currency,
        created_at
      FROM {{ source("payments_raw") }}
      WHERE payment_id IS NOT NULL
        AND amount > 0
    advanced:
      topic:
        name: payments.clean.v1
        partitions: 12
        config:
          retention.ms: 604800000

tests/payments_tests.yml

tests:
  - name: payments_schema_test
    model: payments_clean
    type: schema
    assertions:
      - not_null:
          columns: [payment_id, customer_id, amount]
      - accepted_values:
          column: currency
          values: [USD, EUR, GBP]

  - name: payments_sample_test
    model: payments_clean
    type: sample
    sample_size: 1000
    assertions:
      - range:
          column: amount
          min: 0
          max: 1000000

# Validate your project
streamt validate

# See what will be deployed
streamt plan

# Deploy to infrastructure
streamt apply

# Run tests
streamt test

# View lineage
streamt lineage

Core Concepts¶

Concept	Description
Source	External data entry point (Kafka topic produced by another system)
Model	Transformation that creates new data streams
Test	Quality assertion (schema, sample, or continuous)
Exposure	Documentation of downstream consumers
Materialization	How a model is deployed (`topic`, `virtual_topic`, `flink`, `sink`)

Learn more about concepts →

Materializations¶

Materializations are automatically inferred from your SQL:

Type	Auto-detected When	Infrastructure
`virtual_topic`	Stateless SQL + Gateway configured	Conduktor Gateway
`flink`	Stateful SQL (GROUP BY, JOIN, windows)	Flink SQL job
`flink`	ML_PREDICT/ML_EVALUATE	Confluent Flink
`sink`	`from:` without SQL	Kafka Connect

Learn more about materializations →

Installation¶

pip install streamt

Or with all optional dependencies:

pip install "streamt[all]"

Full installation guide →

Community¶

Slack — Join the conversation, get help, share ideas
GitHub Discussions — Ask questions, share ideas
GitHub Issues — Report bugs, request features
X/Twitter — Follow for updates

**Ready to build streaming pipelines the declarative way?** [Get Started](getting-started/quickstart.md){ .md-button .md-button--primary }