StarTree Cloud brings real-time analytics to Apache Iceberg
StarTree has expanded its StarTree Cloud platform to support Apache Iceberg, enabling organisations to conduct real-time analytics directly on data stored in their data lakehouse, without the need for data duplication or complex data pipelines.
This development enables StarTree Cloud to act as both the analytic and serving layer over Iceberg, allowing businesses to generate interactive insights for internal and external applications using data stored in open formats. The approach is designed to meet the growing demand for fast, reliable access to large volumes of data, particularly in scenarios that serve external users and AI-based solutions requiring consistent, low-latency responses.
Industry trends
Kishore Gopalakrishna, Cofounder and Chief Executive Officer at StarTree, commented on the increasing requirements for high-speed analytics in customer and agent-facing environments, alongside the adoption of Iceberg for data management. He said:
"We're seeing explosive growth in customer-facing, and increasingly agent-facing, data products that demand sub-second responsiveness and fresh insights. At the same time, Iceberg is emerging as the industry standard for managing historical data at scale. As these two trends converge, StarTree is delivering unique value by acting as a real-time serving layer for Iceberg empowering companies to serve millions of external users and AI agents securely, scalably, and without moving data."
The open table formats utilised by Apache Iceberg and Parquet have gained popularity for managing large-scale data in the lakehouse environment. However, unlike query engines, these formats do not provide fast, interactive queries on their own. Existing query engines that operate with these formats often struggle to meet the performance requirements necessary for analytical applications that interact with end users or require high concurrency. Due to these constraints, many companies have traditionally employed reverse ETL pipelines or converted data into proprietary formats to serve applications, resulting in data latency, operational complexity, and increased costs.
Technical approach
StarTree's recent platform enhancements aim to remove the need for such workarounds by delivering real-time query acceleration on native Iceberg tables. The company explains that it achieves performance gains by combining open table formats, such as Parquet and Iceberg, with indexing techniques from Pinot, which supports high-performance queries and interactive analytics directly on the original data sources, without requiring data movement, duplication, or format transformation.
Key features announced for StarTree Cloud include native support for both Apache Iceberg and Parquet, real-time indexing and aggregations (with numerical, text, JSON, and geo index support), intelligent materialised views through the StarTree Index, as well as local caching and query pruning for improved concurrency and query speed. The solution also offers intelligent prefetching to minimise unnecessary data scans, and enables serving data directly from Iceberg without intermediate storage layers.
Unlike alternatives such as Presto, Trino, or ClickHouse, which often rely on batch processing and full table scans, StarTree's approach is designed for low-latency and high-concurrency environments. This is particularly relevant for interactive dashboards, real-time data products, and operational analytical workloads with stringent service-level agreements.
Industry analysis
Paul Nashawaty, Principal Analyst at theCUBE Research, commented on the trend towards broader adoption of Iceberg within data lakehouses and the performance gaps present in current solutions. He noted:
"Apache Iceberg is rapidly becoming the de facto standard for managing large-scale analytical data in the open data lakehouse - adoption has surged by over 60% year-over-year, according to theCUBE Research. But as more organizations look to power real-time, customer-facing applications with this data, a clear gap has emerged in the market. StarTree's ability to serve Iceberg data with sub-second latency and without data duplication is a unique and timely advancement. It addresses a critical performance need for accessing historical data in modern data products."
The availability of real-time data analytics directly on Iceberg is expected to enable companies to maximise their investment in data lakehouse architectures and offer enhanced, intelligent experiences to end-users, without incurring the technical debt associated with traditional data duplication or maintaining additional, complex pipelines.
StarTree Cloud support for Apache Iceberg is presently accessible in private preview.