DataPelago Unveils World’s First Universal Data Processing Engine - Read More

Accelerate any Engine,
on any Hardware,
on any Data.

[ ANY ENGINE ]

Spark
Trino
Flink
Presto

[ ANY HARDWARE ]

GPU
CPU
FPGA
Envidia
AMD

[ ANY DATA ]

Iceberg
Delta Lake
Hudi
Video
Image
Text
Audio
What we do

Deliver 10X+ performance/$ advantage for
Lakehouse Analytics

Process all your data
no matter the size or type

Structured, Semi-structured, or Unstructured – we accelerate data processing through a common platform. Whether you are training a foundational model, fine-tuning one, adding RAG support, or analyzing data for insights, DataPelago can power your workloads.

Discover new value
that was previously not viable

90% of data is never tapped for its value because of processing cost and time - Unlock insights from massive datasets in business time. Extract content from unstructured data for better quality RAG and fine tuning pipelines.

Why we do it

We make it viable to extract value from all data in the world - so humanity can capture every insight, cure, invention and opportunity.

/ pharmaceutical
/ transportation
/ sports
/ biotech & gene research
/ communication services
/ finance
/ energy
/ industrial
/ agriculture
/ semiconductor
how we do it

Introducing DataPelago’s Universal Data Processing Engine

Who we serve

For Data and AI practitioners

GenAI Data Pipelines

Apply GenAI faster

Process multi-modal data for GenAI with DataPelago. Whether you are extracting text or images, filtering & cleaning, chunking or tokenizing, or embedding - DataPelago accelerates every step of your GenAI pipeline. From foundational model training to fine tuning to RAG, deploy GenAI applications faster and always keep them fresh with the latest data.

Learn more

Lakehouse Analytics
McAfee
Samsung
Akad
HiddenLayer
Twingo
Testimonials

New possibilities created with DataPelago

The exponential growth of semi-structured and unstructured data along with rapid Gen AI/AI adoption is driving innovation, not only in AI, but in data management and data processing. McAfee has been proud to partner with DataPelago on the design of their technology that shows promising results, including significant performance and cost improvements on certain workloads. Congratulations on your product launch!
Steve Grobman Executive VP and CTO, McAfee
Samsung SDS America has been working with DataPelago to evaluate their data processing platform in our AWS VPC, leveraging Accelerated Computing Infrastructure (GPUs). In testing with sample data, we’ve seen promising results in terms of performance and cost efficiency compared to traditional compute engines. DataPelago's platform shows potential in modernizing architecture and unifying data processing pipelines for GenAI and analytics, handling structured, semi-structured, and unstructured data types. This collaboration aligns with our interest in exploring innovative solutions that separate compute and storage, enhancing flexibility and reducing vendor lock-in.
Prashant Vithlani Head of Division | Cloud Business, Samsung SDS America
Twingo is proud to partner with and serve as an official reseller for DataPelago, delivering cutting-edge Big Data solutions to the Israeli market. As an early design partner, we are excited to offer DataPelago’s unified data processing platform, accelerating engines like Spark and Trino using advanced CPU and GPU infrastructure across any data lakehouse format, including Iceberg, Hudi, and Delta Lake. The benchmarks from our collaboration are groundbreaking, reducing Total Cost of Ownership and delivering exceptional value. This partnership reinforces our commitment to innovation and next-gen solutions for data-driven organizations.
Golan Nahum Founder & CEO, Twingo
The growth in the volume of data processed by security systems is exponential as the adoption of AI and GenAI in cybersecurity continues to grow. Datapelago enables cost-effective expansion of AI/GenAI and cybersecurity systems by transforming the economics of data processing with its heterogeneous accelerated computing engine. As a security practitioner, I am excited with its modular architecture which allows for seamless plug-and-play integration with open-source components like Spark and Apache Gluten, ensuring frictionless deployment without any vendor lock-in.
Malcolm Harkins Chief Security and Trust Officer, HiddenLayer & ex-CISO, Intel Corp
As Director of Engineering at Uber and Presto Foundation GB Chair, I have extensive experience developing and running open-source analytics software at an enterprise scale. Our workloads typically included heavy scan/filter/join operations, which are ideal for hardware acceleration. It's exciting to see how DataPelago disrupts the industry by accelerating open-source frameworks like Presto and Spark with custom hardware infrastructure. I'm particularly impressed with their dynamic mapping to heterogeneous computing elements and reconfigurable run-time techniques. By accelerating open-source frameworks, I think DataPelago will significantly transform today's performance/$ paradigm and reshape the economics of data processing.
Girish Baliga Ex-Director of Engineering, Uber & Chair of the Presto Foundation
At Akad Seguros, innovation is woven into our DNA, fueling our unwavering commitment to exceptional customer service. Our partnership with DataPelago exemplifies this dedication, as we modernize our data architecture and unify processing pipelines for GenAI and data analysis. Leveraging DataPelago's advanced platform, we can seamlessly process structured, semi-structured, and unstructured data, reducing our costs by more than 50% and enhancing operational performance. By fully utilizing AWS’s Accelerated Computing (GPU) infrastructure, this collaboration is transforming our capacity to deliver superior results and elevating the quality of service for our customers.
Andre Fichel CTO, Akad Seguros
I’ve been privileged to be around some of the brightest minds in technology over the last several decades and it's clear to me that Rajan Goyal, co-founder and CEO, possesses the vision, intellect, experience and passion to build a truly great and innovative company. I'm excited to participate in one of the next Silicon Valley success stories!
Paula Hurd Advisor and Investor
Congratulations to DataPelago on their launch and announcement that their engine will extend Gluten, Substrait and Velox to deliver the benefits of accelerated computing for Spark to address the performance and cost challenges in the Apache Spark community. Apache Gluten is designed to reuse Apache Spark's whole control flow, while offloading the compute-intensive data processing part to high performance native libraries in the backend. DataPelago is taking this quantum leap forward by extending Gluten with native accelerated computing enhancements, yielding orders of magnitude performance and cost improvements for Spark workloads!
Binwei Yang Apache Gluten Initiator
Try it now

Ready to experience the new economics of data at scale?

Get in touch

Fill out the form and a DataPelago team member will reach out.