The biggest Python topics of 2023 ›
Data Engineering Pipelines
This topic revolves around open-source tools and frameworks in the fields of data engineering and software engineering, focusing on dataflows, pipeline development, and machine learning operations. It covers a range of technologies such as XGBoost, Fugue, Hamilton, Feast, Meltano, Prefect, and more, offering solutions for tasks like enhancing classification models, orchestrating data workflows, and managing machine learning pipelines efficiently. The intersection of data engineering, software engineering, and machine learning practices is explored through a variety of tools and resources designed to streamline data processing and model deployment.
prefect: Workflow Orchestration Tool Project
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
https://github.com/PrefectHQ/prefect
feast: Feature Store for Machine Learning Project
Feature Store for Machine Learning
https://github.com/feast-dev/feast
modelscope: Model-as-a-Service Platform for ML Learning Project
ModelScope: bring the notion of Model-as-a-Service to life.
https://github.com/modelscope/modelscope
Improving Classification Models With XGBoost Article
How can you improve a classification model while avoiding overfitting? Once you have a model, what tools can you use to explain it to others? This week on the show, we talk with author and Python trainer Matt Harrison about his new book Effective XGBoost: Tuning, Understanding, and Deploying Classification Models.
https://realpython.com/podcasts/rpp/169/
ML System Design: 200 Case Studies Article
A collection of links to 200 different blog posts / case studies from leaders in the ML space. Learn how companies such as Netflix and Airbnb implement and use ML in their organizations.
https://www.evidentlyai.com/ml-system-design
The Dangers Behind Image Resizing Article
When training an ML model on image data you likely want smaller, consistently sized images. That means image processing in your pipeline, but the expectation that image resizing is the same across libraries can cause unforeseen problems.
https://zuru.tech/blog/the-dangers-behind-image-resizing
Failed ML Project About Real Estate Article
“There aren’t enough failed data science projects out there. Usually, projects only show up in public if they work. I think that’s a shame. If we learn more from our successes than our failures, it makes sense to share more failures to help those around us.”
https://www.datafantic.com/failed-project-how-bad-is-the-real-estate-market-getting/
fugue: Unified Interface for Distributed Computing Project
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://github.com/fugue-project/fugue
meltano: CLI for ELT+ Project
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
https://github.com/meltano/meltano
hamilton: Micro-Framework for Defining Dataflows Project Started in 2023
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
https://github.com/dagworks-inc/hamilton
Xorbits: Compatible, Scalable Data Science Project
Scalable Python DS & ML, in an API compatible & lightning fast way.
https://github.com/xorbitsai/xorbits
Python Stateful Stream Processing OSS Framework Project
Python Stream Processing
https://github.com/bytewax/bytewax
sematic: An Open-Source ML Pipeline Development Toolkit Project
An open-source ML pipeline development platform
https://github.com/sematic-ai/sematic
cleanvision: Find Issues in Image Datasets Project
Automatically find issues in image datasets and practice data-centric computer vision.
https://github.com/cleanlab/cleanvision
ML-Recipes: Collection of Machine Learning Recipes Project
A collection of stand-alone Python machine learning recipes
https://github.com/rougier/ML-Recipes
pipeless-ai: Open-Source Computer Vision Framework Project Started in 2023
An open-source computer vision framework to build and deploy apps in minutes
https://github.com/pipeless-ai/pipeless