← ALL CATEGORIES

RISING · DATA ENGINEERING

Pipelines, ETL, streaming and analytics at scale. The Data Engineering repositories climbing fastest right now, ranked by momentum, the stars per day humans are adding, not just the total they have already earned.

catalog of 2026-06-15 · velocity from the daily distribution · the one history nobody else keeps

▸ FOR AGENTS & INVESTORS · GET THIS AS JSON
  1. 01GokuMohandas/Made-With-MLJupyter NotebookLearn how to develop, deploy and iterate on production-grade ML applications.48K71.8/day
  2. 02DataTalksClub/data-engineering-zoomcampJupyter NotebookData Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼42K32.6/day
  3. 03binhnguyennus/awesome-scalabilityThe Patterns of Scalable, Reliable, and Performant Large-Scale Systems72K20.7/day
  4. 04umami-software/umamiTypeScriptUmami is a modern, privacy-focused analytics platform. An open-source alternative to Google Analytics, Mixpanel and Amplitude.37K16.3/day
  5. 05PostHog/posthogPython🦔 PostHog is an all-in-one developer platform for building successful products. We offer product analytics, web analytics, session replay, error tracking, feat35K15.2/day
  6. 06grafana/grafanaTypeScriptThe open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elastics74K13.1/day
  7. 07apache/supersetTypeScriptApache Superset is a Data Visualization and Data Exploration Platform73K13.1/day
  8. 08metabase/metabaseClojureThe easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:48K13.1/day
  9. 09duckdb/duckdbC++DuckDB is an analytical in-process SQL database management system39K13.1/day
  10. 10langfuse/langfuseTypeScript🪢 Open source AI engineering platform: LLM evals, observability, metrics, prompt management, playground, datasets. Integrates with OpenTelemetry, LangChain, Op29K12/day
  11. 11ClickHouse/ClickHouseC++ClickHouse® is a real-time analytics database management system48K10.9/day
  12. 12plausible/analyticsElixirOpen source, privacy-first web analytics. Lightweight, cookie-free Google Analytics alternative. Self-hosted or cloud.27K6.9/day
  13. 13openobserve/openobserveTypeScriptOpen source observability platform for logs, metrics, traces, frontend monitoring, pipelines and LLM observability. A sophisticated, simple and highly performan19K6.9/day
  14. 14apache/airflowPythonApache Airflow - A platform to programmatically author, schedule, and monitor workflows46K6.5/day
  15. 15apache/sparkScalaApache Spark - A unified analytics engine for large-scale data processing43K6.5/day
  16. 16eugeneyan/applied-ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.30K5.2/day
  17. 17apache/flinkJavaApache Flink26K5.2/day
  18. 18mindsdb/mindsDockerfileGeneral-purpose AI designed for knowledge workers — creators, strategists, and operators — and individuals seeking AI systems they can truly control to help the39K4.4/day
  19. 19timescale/timescaledbCA time-series database for high-performance real-time analytics packaged as a Postgres extension23K3.4/day
  20. 20vectordotdev/vectorRustA high-performance observability data pipeline.22K3.4/day
  21. 21airbytehq/airbytePythonOpen-source data movement for ELT pipelines and AI agents — from APIs, databases & files to warehouses, lakes, and AI applications. Both self-hosted and Cloud.21K3.4/day
  22. 22apache/kafkaJavaApache Kafka - A distributed event streaming platform33K2.2/day
  23. 23webtorrent/webtorrentJavaScript⚡️ Streaming torrent client for the web31K1.7/day
  24. 24donnemartin/data-science-ipython-notebooksPythonData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pan29K1.7/day
  25. 25getredash/redashPythonMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.29K1.7/day
  26. 26kestra-io/kestraJavaEvent Driven Orchestration & Scheduling Platform for Mission Critical Applications27K1.7/day
  27. 27PrefectHQ/prefectPythonPrefect is a workflow orchestration framework for building resilient data pipelines in Python.23K1.7/day
  28. 28matomo-org/matomoPHPEmpowering People Ethically 🚀 — Matomo is hiring! Join us → https://matomo.org/jobs Matomo is the leading open-source alternative to Google Analytics, giving y22K1.7/day
  29. 29qax-os/excelizeGoGo language library for reading and writing Microsoft Excel™ (XLAM / XLSM / XLSX / XLTM / XLTX) spreadsheets21K1.7/day
  30. 30amark/gunJavaScriptAn open source cybersecurity protocol for syncing decentralized graph data.19K1.7/day
  31. 31asciinema/asciinemaRustTerminal session recorder, streamer and player 📹17K1.7/day
  32. 32koel/koelPHPMusic streaming solution that works.17K1.7/day