Case studies

Unlocking OSS Spark: How ShareChat Used DataPelago to Accelerate Analytical Data Pipelines While Reducing Costs by 50%

2m read

With DataPelago Accelerator for Spark, we were finally able to complete our heaviest OLAP cube jobs on OSS Spark—something that had been impossible due to data skew and performance bottlenecks. This opens the door for a full migration from managed platforms without compromising speed or reliability while reducing our costs by 50%.

Arya KetanDistinguished Engineer & VP of Data, ShareChat

Customer Profile

Mohalla Tech Private Limited powers ShareChat and Moj, India’s premier regional-language social media platforms, serving over 350 million monthly active users across 15 languages. Valued at $5 billion, the company is known for its rapid growth, deep vernacular focus, and significant acquisitions, including MX TakaTak for $700 million.

Opportunity

ShareChat processes terabytes daily through their Apache Spark-based Lakehouse, Their "Cube Service" runs ~300 queries daily to build OLAP cubes powering experimentation and analytics, using a bronze-to-gold ETL pipeline. While the self-serve platform democratizes data access for data scientists, this autonomy created unpredictable variability that strained their infrastructure.

Growing data volumes drove up processing costs and caused compute-intensive jobs to fail due to data skew and resource hotspots. The self-service platform also amplified these issues by creating unpredictable workload spikes because of scheduling autonomy, leading to cascading failures and cost overruns across their 350M daily user operation.

Solution

DataPelago Accelerator for Spark (DPA) became the transformative solution for ShareChat, addressing three critical requirements—ensuring compute-intensive job completion, accelerating runtimes for improved SLAs, and reducing costs by at least 33%—through rapid deployment implemented in just hours. DPA's plug-and-play architecture eliminated the need for code rewrites or data migration while maintaining full Spark compatibility. Its built-in intelligence specifically targets demanding workloads by optimizing joins and aggregations over skewed data. The results were exceptional: previously failing jobs now run successfully, job speeds increased by 2X, and costs were slashed by 50%, significantly surpassing the 33% target. DPA also effectively addressed partition hotspots and stabilized the self-service cube platform under heavy loads. This comprehensive transformation preserved existing pipelines while unlocking the full potential of ShareChat's data infrastructure.

Blog

Introducing DataPelago Accelerator for Spark — the next frontier in Spark performance and efficiency

5m read

Case studies

RevSure.ai Accelerates Data and AI Workloads with DataPelago Accelerator for Spark

2m read

Blog

Why I Joined DataPelago: John “JG” Chirapurath, President