Here are some curated examples from GitHub of Pachyderm in action.
This Notebook provides an introduction to Pachyderm, using the pachctl command line utility to illustrate the basics of data repositories and pipelines
A machine learning pipeline to train a regression model on the Boston Housing Dataset to predict the value of homes.
Extends the original Boston Housing Prices example to show a multi-pipeline DAG and data rollbacks.
A spout is a type of pipeline that ingests streaming data (message queue, database transactions logs, event notifications... ), acting as a bridge between an external stream of data and Pachyderm's repo.
Train and deploy a fully automated financial market sentiment BERT model. As data is manually labeled, the model will automatically retrain and deploy.
Train an object detector on the COCO128 dataset with Lightning Flash, modify predictions with Label Studio, and version everything in Pachyderm.
A notebook showing how to use the JupyterLab Pachyderm Mount Extension to mount Pachyderm data repositories into your Notebook environment.
A notebook introducing and showing how use Jsonnet Pipeline Specs to templatize common pipelines.
Incorporate data versioning into any labeling project with Label Studio and Pachyderm.
This example shows how you can create a Pachyderm pipeline to automatically version and save data you've labeled in Superb.ai to use in downstream machine learning workflows.
Uses Pachyderm to create crowdsourced annotation jobs for news headlines in Toloka, aggregate the labeled data, and train a model.
Create a churn analysis model for a music streaming service with Pachyderm and Snowflake using the Data Warehouse integration.
End-to-end example demonstrating the full ML training process of a fraud detection model with Spark, MLlib, MLflow, and Pachyderm.
This example demonstrates how you can evaluate a model or function in a distributed manner on multiple sets of parameters.