Introducing Pachyderm 2.11

August 28, 2024

We’re thrilled to announce the release of Pachyderm 2.11!

Your data is powerful, and Pachyderm 2.11 is ready to help you make the most of it in a straightforward and easy-to-use way. Check out some of our new features, enhancements, and security options below to find out more.

Pipeline Templates in Console

Getting started via our Console UI is easier than ever with expanded support for Jsonnet pipeline templates. These pipeline templates offer a simple, repeatable way to create pipelines for your most familiar use cases, saving you time and energy. When creating a pipeline, Console now allows you to choose from one of several built-in templates, including Snowflake and Hugging Face – with more on the way! After selecting a template, the template parameters are clearly displayed with helpful descriptions and default values where provided. You can also create your own templates in Jsonnet, provide some simple YAML documentation of the parameters, and then use these new templates in Console via URL.

Console Metadata Experience

We are happy to announce with Pachyderm 2.11, Console now offers the ability to create, add, edit, and delete metadata on the following Pachyderm objects: cluster, project, repository, branch and commit. With this enhancement you can now work with metadata in the way that best suits you: via API, command line, or Console.

You will also find new derived metadata available as well, such as creation and update metadata (createdBy, createdAt, updatedAt) . We believe information about your data should be easily discoverable so you can spend your time doing what matters.

Improved Data Integration

If you are a user of our other HPE AI offerings like Determined (MLDE) you know how important the ability to efficiently transfer data between the products can be. For our customers really looking to maximize their performance and efficiency across Pachyderm and Determined we are happy to now provide a new, more performant data access API, accessible via our Pachyderm SDK.

This new API, called Common Data Refs (CDRs), improves the performance of workers running outside Pachyderm, such as Determined training nodes. It allows them to download version-controlled data directly from Pachyderm’s underlying Object Storage bucket, cache that data locally, assemble datums locally, and incrementally update the cached data from one commit to the next. In one testing scenario, downloading 20GB of data onto a remote worker previously took five minutes per job, and after switching to the new CDR API, incrementally updating each worker’s local cache instead of downloading all input data from scratch effectively reduced that time to one minute.

Iron Bank Repository

Iron Bank is a hardened container image repository owned and maintained by the U.S. Department of Defense (DoD) that supports the end-to-end lifecycle for modern software development. It’s part of the Platform One project, which is an initiative by the U.S. Air Force to deliver the benefits of DevSecOps. We understand the importance of heightened security options for our customers and are happy to provide Pachyderm containers on Iron Bank as an alternative to DockerHub. Every release of Pachyderm will be concurrently released on Iron Bank.

Summary

This release focuses on providing you powerful capabilities to work with your data and create pipelines quickly and efficiently. Unlock impressive time to value with our new Pipeline Templates in Console or use our holistic metadata abilities to super charge your data. Dive into the release notes for a comprehensive list of features or schedule a personalized demo to see how Pachyderm can benefit your environment.