0

Organizations face significant challenges when moving operational data into data warehouses and data lakes.

  • Top three challenges identified in new study include having infrastructure that performs reliably, protecting sensitive data, and synchronizing multiple sources of data into lakes and warehouses.
  • Organizations use a wide range of data lakes, notably Amazon S3 and/or Lake Formation, Databricks Delta Lake and Google Cloud Platform.
  • The three biggest pain points include time efficiency, schema changes and data complexity, and parallel architectures.

Enterprises Increasingly Depend on Data Streaming for AI and Operations

As organisations scale artificial intelligence (AI) and real-time analytics, data streaming has become central to both operational and AI-driven systems. However, a growing reliance on diverse tools for ingesting streaming data into data lakes and data warehouses is creating major operational and governance challenges.

RELATED: Data streaming becomes critical for AI, but quality, privacy and scaling challenges slow adoption

This is according to a new research report by Conduktor, an intelligent data hub for streaming data and AI.

Survey of Senior IT Leaders Highlights Key Ingestion Challenges

The study surveyed 200 senior IT and data executives from large enterprises with annual revenues of $50 million or more. When asked about moving operational data into data lakes and warehouses, respondents identified several persistent challenges:

  • Infrastructure: Scaling and managing reliable data pipelines
  • Security: Protecting sensitive data as it moves into storage systems
  • Integration: Connecting and synchronising multiple data sources
  • Governance: Controlling, validating and tracking data ingestion
  • Skills Gap: Limited in-house expertise to build and maintain ingestion pipelines

These issues are becoming more pronounced as organisations attempt to operationalise AI models that rely on continuous, real-time data.

ADVERTISEMENT

Wide Adoption of Multiple Data Lakes and Warehouses

The research found that enterprises rely on a broad mix of data lake and warehouse technologies, contributing to architectural complexity.

Popular data lake platforms include:

  • Amazon S3 and Lake Formation
  • Databricks Delta Lake
  • Google Cloud Platform (GCP)

On the data warehouse side, respondents reported using:

  • Google BigQuery
  • Amazon Redshift
  • Azure Synapse Analytics
  • IBM Db2 Warehouse

Fragmented Tooling Dominates Streaming Data Ingestion

To move data from streaming platforms into lakes or warehouses, organisations reported using multiple ingestion approaches:

  • 73% build custom streaming pipelines using Spark or Flink
  • 69% use Kafka Connect or similar connectors
  • 50% rely on fully managed services such as Amazon Firehose or Snowpipe
  • 49% use micro-batching before loading data
  • 28% deploy ELT/ETL tools like Fivetran or Airbyte

While flexible, this fragmented tooling landscape significantly slows data delivery and increases operational overhead.

ADVERTISEMENT

Fragmentation Slows Data Access and Decision-Making

According to the research, the proliferation of data lakes, warehouses and ingestion tools is creating severe bottlenecks. The top three pain points identified were:

  • Time inefficiency: Difficulty collecting, connecting and analysing data in a centralised way
  • Schema complexity: Inconsistent data structures increasing operational complexity
  • Parallel architectures: Multiple systems requiring additional resources to manage

These challenges reduce the speed at which teams can deliver reliable data to analysts, engineers and AI systems.

Governance Emerges as a Critical Priority for AI Readiness

Nicolas Orban, CEO of Conduktor, said the findings highlight the urgent need for stronger governance as streaming data adoption accelerates—particularly for AI use cases.

“As data streaming adoption grows, especially for AI, organisations need to address the importance of governance,” Orban said.

He noted that managing multiple data lakes and ingestion tools—each with different governance models, schemas and latency profiles—creates operational risk.

Unified Platforms Key to Reducing Data Chaos

Orban warned that fragmented data architectures often result in missed insights, duplicated efforts and poor decision-making.

“Fragmented data creates chaos. With Conduktor, organisations can unify operational data into a single platform, providing full visibility and control while significantly improving IT team productivity,” he added.

Global Streaming Data Market Set for Strong Growth

The report aligns with broader market trends. According to Dataintelo, the global streaming data processing software market was valued at approximately $9.5 billion in 2023 and is projected to reach $23.8 billion by 2032, representing a compound annual growth rate (CAGR) of 10.8%.

Dataintelo attributes this growth to the rising demand for real-time data processing, driven by the explosion of data from IoT devices, social media platforms and enterprise systems.

Enterprises Must Simplify to Scale AI Successfully

As organisations expand AI deployments and real-time analytics, the research suggests that simplifying data ingestion architectures and improving governance will be critical to unlocking the full value of streaming data.

Learn more about Conduktor’s streaming data hub here: https://conduktor.io/

More in News

You may also like