Skip to content

Kafka Connect WebSocket Source Connector

Stream real-time data from any WebSocket endpoint directly into Apache Kafka with automatic reconnection, authentication support, and comprehensive monitoring.

Get Started View on GitHub

Features

Real-Time Streaming

Connect to any WebSocket endpoint (ws:// or wss://) and stream data directly into Kafka topics with minimal latency.

Automatic Reconnection

Built-in reconnection logic with configurable intervals ensures resilient connections even when endpoints are unstable.

Authentication Support

Bearer token authentication and custom headers for connecting to secured WebSocket APIs.

Subscription Messages

Send subscription messages after connection to exchanges like Binance, Coinbase, and custom WebSocket servers.

Built-in Monitoring

Comprehensive metrics via JMX, detailed logging, and integration with Prometheus/Grafana for production observability.

:material-buffer: Message Buffering

Configurable in-memory queue to handle traffic bursts and optimize throughput to Kafka.

Quick Example

Stream Bitcoin trades from Binance into Kafka:

{
  "name": "binance-btcusdt-connector",
  "config": {
    "connector.class": "io.conduktor.connect.websocket.WebSocketSourceConnector",
    "tasks.max": "1",
    "websocket.url": "wss://stream.binance.com:9443/ws",
    "kafka.topic": "binance-btcusdt-trades",
    "websocket.subscription.message": "{\"method\":\"SUBSCRIBE\",\"params\":[\"btcusdt@trade\"],\"id\":1}",
    "websocket.reconnect.enabled": "true"
  }
}
curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d @binance-connector.json
kafka-console-consumer.sh \
  --bootstrap-server localhost:9092 \
  --topic binance-btcusdt-trades \
  --from-beginning

Use Cases

The connector is ideal for streaming real-time data from:

  • Cryptocurrency Exchanges - Live trades, order books, and ticker data (Binance, Coinbase, Kraken)
  • IoT Devices - Sensor data and telemetry streams
  • Financial Markets - Stock tickers, forex rates, market data
  • Collaboration Tools - Chat messages, presence updates, notifications
  • Gaming Platforms - Live scores, player updates, game events
  • Custom WebSocket APIs - Any service exposing a WebSocket endpoint

Architecture

graph LR
    A[WebSocket Endpoint] -->|OkHttp Client| B[Message Queue]
    B -->|Poll| C[Kafka Connect Task]
    C -->|Produce| D[Kafka Topic]

    style A fill:#3b82f6
    style B fill:#f59e0b
    style C fill:#10b981
    style D fill:#ef4444

The connector uses OkHttp's WebSocket client for reliable connections, maintains an in-memory queue for buffering, and integrates seamlessly with Kafka Connect's task framework.

Why This Connector?

Feature Kafka Connect WebSocket Custom Consumer REST Polling
Real-time data ✅ True push model ✅ True push ❌ Polling delays
Kafka integration ✅ Native ⚠️ Manual ⚠️ Manual
Reconnection logic ✅ Built-in ⚠️ Custom code ⚠️ Custom code
Monitoring ✅ JMX/Prometheus ⚠️ Custom ⚠️ Custom
Deployment ✅ Kafka Connect ❌ Separate service ❌ Separate service
Maintenance ✅ Integrated ⚠️ DIY ⚠️ DIY

Performance

  • Throughput: Handles 10,000+ messages/second on standard hardware
  • Latency: < 10ms from WebSocket receipt to Kafka produce
  • Reliability: Automatic reconnection with exponential backoff
  • Scalability: Configurable queue size for traffic bursts

Operational Features

Includes comprehensive operational tooling:

  • Detailed error handling and structured logging
  • JMX metrics for Prometheus/Grafana integration
  • Automatic reconnection with exponential backoff
  • Configurable message buffering
  • Extensive troubleshooting documentation

⚠️ Important: Data Delivery Semantics

At-Most-Once Delivery

This connector provides at-most-once delivery semantics due to architectural limitations:

  • In-memory buffering: Messages in the queue are lost on shutdown/crashes
  • No replay capability: WebSocket protocol doesn't support message replay
  • Single task limitation: WebSocket connections don't shard like Kafka partitions

Use cases: Best suited for telemetry, monitoring, and scenarios where occasional data loss is acceptable. For critical data, consider additional validation or complementary ingestion methods. See the README for detailed mitigation strategies.

Community & Support

License

Apache License 2.0 - see LICENSE for details.