logo

Dev-kit

article cover image

What is Managed MLFlow

September 21, 2023

MLflow is an open-source platform designed to manage the complete machine learning lifecycle. It encompasses everything from tracking experiments to sharing and deploying models, thereby creating a seamless workflow for data scientists and engineers

What is Managed MLflow?

MLflow is an open-source platform designed to manage the complete machine learning lifecycle. It encompasses everything from tracking experiments to sharing and deploying models, thereby creating a seamless workflow for data scientists and engineers. However, as you scale your operations, managing your own MLflow instance might become cumbersome. That's where Managed MLflow comes in. Managed MLflow is essentially MLflow-as-a-Service, usually hosted on Databricks, providing a centralized server that multiple teams can access, removing the operational complexities of MLflow.

How Does Managed MLflow Work?

Managed MLflow integrates seamlessly with your existing Databricks workspace. The service takes care of the backend, allowing you to focus purely on your machine learning projects. It stores experiment metadata, organizes your projects, and even supports multiple backends for storing your model artifacts.

![image](Diagram depicting how Managed MLflow works within a Databricks workspace)

Experiments in MLflow

Core Unit of Work: Experiments Experiments serve as the core unit of work in MLflow. They allow for the visualization, search, and comparison of different model runs, and facilitate the download of artifacts or metadata for analysis using other tools. Experiments are maintained in a Databricks-hosted MLflow tracking server.

Types of Experiments

Workspace Experiment

A Workspace Experiment can be created either through the Databricks Machine Learning UI or via the MLflow API. Importantly, these experiments are not linked to any particular notebook.

Notebook Experiment

Notebook Experiments are tied to a specific notebook. If no active experiment exists, Databricks will automatically create a new notebook experiment when you use mlflow.start_run() to start a new run.

MLflow Experiment Permissions

In terms of permissions, Managed MLflow provides fine-grained control, allowing you to set access at the experiment or individual run level.

MLflow Experiments Tracking

Enabling Tracking

MLflow Tracking is a system for logging and querying experiments. It enables extensive tracking of experiments and allows you to compare various parameters and results easily.

Comparing Parameters and Results

With MLflow Tracking, you can record parameters, metrics, and even snapshots of data to reproduce your models. The UI offers a straightforward way to compare these parameters across multiple runs.

![image](Screenshot showing the tracking UI where experiments are compared)

MLflow Projects

What is an MLflow Project?

An MLflow Project is a packaging format for data science code. It enables code reusability and provides a reproducible way to run your projects.

Reusability and Reproducibility

MLflow projects include APIs and command-line tools for running projects, making it possible to chain together projects into complex workflows.

Supported Environments

MLflow currently supports Conda, Docker, and system environments, allowing you the flexibility to choose your preferred setup.

MLflow Models

Using Trained Models

Once models are trained, they can be leveraged by a variety of downstream tools through REST APIs or batch interfaces on Apache Spark.

Model Customization

MLflow's custom Python models (mlflow.pyfunc) offer utilities for customization and the ability to save and log models.

Built-in Model Flavors

The platform also supports several built-in model flavors such as Python and R functions, H20.ai, Keras, MLeap, PyTorch, Scikit-learn, Spark MLlib, TensorFlow, and ONNX.

#@ MLOps with MLflow

What is MLOps?

MLOps is the discipline of integrating machine learning with your operational processes. It aims to automate, version, and monitor your machine learning workflows.

Key Features

Automation MLflow allows you to automate training, parameter tuning, and deployment tasks.

Versioning of Models

Keep track of multiple versions of your models, enabling rollback and easy comparison.

Reproducibility

MLflow ensures that your experiments are easily reproducible, providing all the required information to run them again.

Experiment Tracking

Similar to MLflow Experiments Tracking, MLOps also encompasses the tracking of larger experiments, ensuring that your team is always on the same page.

Continuous Integration and Deployment

MLflow integrates with existing CI/CD solutions, helping you get your models from the lab to production smoothly.

Testing

MLflow supports a range of testing frameworks, allowing you to rigorously test your models before deployment.

Monitoring

Monitoring metrics, logging, and alerting features help you keep track of your deployed models' performance.

![image](Diagram showing MLOps lifecycle including the features like Automation, Versioning, Reproducibility, etc.)

Conclusion

Managed MLflow offers a plethora of features aimed at simplifying the machine learning lifecycle. From experiment tracking to model deployment, it provides a centralized platform to manage all your machine learning needs. As machine learning projects grow in complexity and scale, leveraging a tool like Managed MLflow can significantly aid in maintaining efficiency and productivity.

Ready to dive in?
Get started today