RISING · DATA ENGINEERING
Pipelines, ETL, streaming and analytics at scale. The Data Engineering repositories climbing fastest right now, ranked by momentum, the stars per day humans are adding, not just the total they have already earned.
catalog of 2026-06-15 · velocity from the daily distribution · the one history nobody else keeps
▸ FOR AGENTS & INVESTORS · GET THIS AS JSON- 01GokuMohandas/Made-With-MLJupyter NotebookLearn how to develop, deploy and iterate on production-grade ML applications.48K ★▲ 71.8/day
- 02DataTalksClub/data-engineering-zoomcampJupyter NotebookData Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼42K ★▲ 32.6/day
- 03binhnguyennus/awesome-scalabilityThe Patterns of Scalable, Reliable, and Performant Large-Scale Systems72K ★▲ 20.7/day
- 04umami-software/umamiTypeScriptUmami is a modern, privacy-focused analytics platform. An open-source alternative to Google Analytics, Mixpanel and Amplitude.37K ★▲ 16.3/day
- 05PostHog/posthogPython🦔 PostHog is an all-in-one developer platform for building successful products. We offer product analytics, web analytics, session replay, error tracking, feat35K ★▲ 15.2/day
- 06grafana/grafanaTypeScriptThe open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elastics74K ★▲ 13.1/day
- 07apache/supersetTypeScriptApache Superset is a Data Visualization and Data Exploration Platform73K ★▲ 13.1/day
- 08metabase/metabaseClojureThe easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:48K ★▲ 13.1/day
- 09duckdb/duckdbC++DuckDB is an analytical in-process SQL database management system39K ★▲ 13.1/day
- 10langfuse/langfuseTypeScript🪢 Open source AI engineering platform: LLM evals, observability, metrics, prompt management, playground, datasets. Integrates with OpenTelemetry, LangChain, Op29K ★▲ 12/day
- 11ClickHouse/ClickHouseC++ClickHouse® is a real-time analytics database management system48K ★▲ 10.9/day
- 12plausible/analyticsElixirOpen source, privacy-first web analytics. Lightweight, cookie-free Google Analytics alternative. Self-hosted or cloud.27K ★▲ 6.9/day
- 13openobserve/openobserveTypeScriptOpen source observability platform for logs, metrics, traces, frontend monitoring, pipelines and LLM observability. A sophisticated, simple and highly performan19K ★▲ 6.9/day
- 14apache/airflowPythonApache Airflow - A platform to programmatically author, schedule, and monitor workflows46K ★▲ 6.5/day
- 15apache/sparkScalaApache Spark - A unified analytics engine for large-scale data processing43K ★▲ 6.5/day
- 16eugeneyan/applied-ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.30K ★▲ 5.2/day
- 17apache/flinkJavaApache Flink26K ★▲ 5.2/day
- 18mindsdb/mindsDockerfileGeneral-purpose AI designed for knowledge workers — creators, strategists, and operators — and individuals seeking AI systems they can truly control to help the39K ★▲ 4.4/day
- 19timescale/timescaledbCA time-series database for high-performance real-time analytics packaged as a Postgres extension23K ★▲ 3.4/day
- 20vectordotdev/vectorRustA high-performance observability data pipeline.22K ★▲ 3.4/day
- 21airbytehq/airbytePythonOpen-source data movement for ELT pipelines and AI agents — from APIs, databases & files to warehouses, lakes, and AI applications. Both self-hosted and Cloud.21K ★▲ 3.4/day
- 22apache/kafkaJavaApache Kafka - A distributed event streaming platform33K ★▲ 2.2/day
- 23webtorrent/webtorrentJavaScript⚡️ Streaming torrent client for the web31K ★▲ 1.7/day
- 24donnemartin/data-science-ipython-notebooksPythonData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pan29K ★▲ 1.7/day
- 25getredash/redashPythonMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.29K ★▲ 1.7/day
- 26kestra-io/kestraJavaEvent Driven Orchestration & Scheduling Platform for Mission Critical Applications27K ★▲ 1.7/day
- 27PrefectHQ/prefectPythonPrefect is a workflow orchestration framework for building resilient data pipelines in Python.23K ★▲ 1.7/day
- 28matomo-org/matomoPHPEmpowering People Ethically 🚀 — Matomo is hiring! Join us → https://matomo.org/jobs Matomo is the leading open-source alternative to Google Analytics, giving y22K ★▲ 1.7/day
- 29qax-os/excelizeGoGo language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets21K ★▲ 1.7/day
- 30amark/gunJavaScriptAn open source cybersecurity protocol for syncing decentralized graph data.19K ★▲ 1.7/day
- 31asciinema/asciinemaRustTerminal session recorder, streamer and player 📹17K ★▲ 1.7/day
- 32koel/koelPHPMusic streaming solution that works.17K ★▲ 1.7/day