Version control tracks and manages changes in programs, websites, and files in general. In software development, the process involves recording every modification done to the source code, typically with timestamps.
Developers use version control systems to keep track of any changes made. The systems allow the teams to return to previous versions and compare them to newer ones, improving product uptime with quick reversion and resulting in more stable operations.
The type of version control system chosen determines how it operates. There are three main types:
Implementing version control systems offer several advantages such as:
In machine learning, there are two types of information being changed and updated simultaneously: The code that makes up a machine learning model, and the datasets that are being processed by the machine learning model. Version control systems exist for models and for data, and they take a variety of approaches to how version history is stored and managed in the system.
These include tools like metadata stores, model registry, and feature store.
Pachyderm handles version control by breaking down the jobs for your machine learning model at the datum level. The datum refers to the smallest unit of data and code needed to run a complete processing job within your dataset. If you are running computer vision, for example, this would be an individual image file and all of the code needed to process that file through your model.
This iterative approach to data version control means that Pachyderm can look at all of your data, and see the datum-level changes in your files, only processing what is new or changed – saving you processing time and reducing your cloud computing bills by only processing what’s needed.
Pachyderm’s data version control system allows for automated file tracking and complete audits, allowing you to trace back all changes to your data. See our version control in action when you book a demo and tech consultation call for your team.
« Back to Glossary Index