May 12, 2025
How to stream Kafka data into Elasticsearch with millisecond latency
We wrote a guide that details the construction of a low-latency data pipeline for a fintech client requiring near-real-time search capabilities. The team opted for Kafka Connect with the Elasticsearch Sink Connector, favoring its simplicity and configurability over custom consumers. They fine-tuned parameters like flush.timeout.ms, linger.ms, and batch.size to optimize throughput.
To ensure consistent serialization and minimize parsing overhead, the pipeline utilized JSON or Avro formats with schemas managed by Confluent Schema Registry. Data enrichment was handled through Elasticsearch ingest pipelines, adding geo-tags, parsing user agents, and flattening nested data structures.
Monitoring was integral, focusing on Kafka consumer lag, event-to-index latency, and Elasticsearch ingest and refresh rates to maintain performance and meet SLAs. Indexing efficiency was further enhanced by setting the refresh_interval to 5 seconds, employing index templates with fast analyzers, and maintaining shard sizes under 30GB.
This comprehensive approach enabled the client to achieve millisecond-level latency across millions of daily events. For more details, read the full article here: https://dattell.com/data-architecture-blog/how-we-stream-kafka-data-into-elasticsearch-with-millisecond-latency/