What is Managed MLFlow
• September 21, 2023
MLflow is an open-source platform designed to manage the complete machine learning lifecycle. It encompasses everything from tracking experiments to sharing and deploying models, thereby creating a seamless workflow for data scientists and engineers
What is Managed MLflow?
MLflow is an open-source platform designed to manage the complete machine learning lifecycle. It encompasses everything from tracking experiments to sharing and deploying models, thereby creating a seamless workflow for data scientists and engineers. However, as you scale your operations, managing your own MLflow instance might become cumbersome. That's where Managed MLflow comes in. Managed MLflow is essentially MLflow-as-a-Service, usually hosted on Databricks, providing a centralized server that multiple teams can access, removing the operational complexities of MLflow.
How Does Managed MLflow Work?
Managed MLflow integrates seamlessly with your existing Databricks workspace. The service takes care of the backend, allowing you to focus purely on your machine learning projects. It stores experiment metadata, organizes your projects, and even supports multiple backends for storing your model artifacts.
![image](Diagram depicting how Managed MLflow works within a Databricks workspace)
Experiments in MLflow
Core Unit of Work: Experiments Experiments serve as the core unit of work in MLflow. They allow for the visualization, search, and comparison of different model runs, and facilitate the download of artifacts or metadata for analysis using other tools. Experiments are maintained in a Databricks-hosted MLflow tracking server.
Types of Experiments
Workspace Experiment
A Workspace Experiment can be created either through the Databricks Machine Learning UI or via the MLflow API. Importantly, these experiments are not linked to any particular notebook.
Notebook Experiment
Notebook Experiments are tied to a specific notebook. If no active experiment exists, Databricks will automatically create a new notebook experiment when you use mlflow.start_run() to start a new run.
MLflow Experiment Permissions
In terms of permissions, Managed MLflow provides fine-grained control, allowing you to set access at the experiment or individual run level.
MLflow Experiments Tracking
Enabling Tracking
MLflow Tracking is a system for logging and querying experiments. It enables extensive tracking of experiments and allows you to compare various parameters and results easily.
Comparing Parameters and Results
With MLflow Tracking, you can record parameters, metrics, and even snapshots of data to reproduce your models. The UI offers a straightforward way to compare these parameters across multiple runs.
![image](Screenshot showing the tracking UI where experiments are compared)
MLflow Projects
What is an MLflow Project?
An MLflow Project is a packaging format for data science code. It enables code reusability and provides a reproducible way to run your projects.
Reusability and Reproducibility
MLflow projects include APIs and command-line tools for running projects, making it possible to chain together projects into complex workflows.
Supported Environments
MLflow currently supports Conda, Docker, and system environments, allowing you the flexibility to choose your preferred setup.
MLflow Models
Using Trained Models
Once models are trained, they can be leveraged by a variety of downstream tools through REST APIs or batch interfaces on Apache Spark.
Model Customization
MLflow's custom Python models (mlflow.pyfunc) offer utilities for customization and the ability to save and log models.
Built-in Model Flavors
The platform also supports several built-in model flavors such as Python and R functions, H20.ai, Keras, MLeap, PyTorch, Scikit-learn, Spark MLlib, TensorFlow, and ONNX.
#@ MLOps with MLflow
What is MLOps?
MLOps is the discipline of integrating machine learning with your operational processes. It aims to automate, version, and monitor your machine learning workflows.
Key Features
Automation MLflow allows you to automate training, parameter tuning, and deployment tasks.
Versioning of Models
Keep track of multiple versions of your models, enabling rollback and easy comparison.
Reproducibility
MLflow ensures that your experiments are easily reproducible, providing all the required information to run them again.
Experiment Tracking
Similar to MLflow Experiments Tracking, MLOps also encompasses the tracking of larger experiments, ensuring that your team is always on the same page.
Continuous Integration and Deployment
MLflow integrates with existing CI/CD solutions, helping you get your models from the lab to production smoothly.
Testing
MLflow supports a range of testing frameworks, allowing you to rigorously test your models before deployment.
Monitoring
Monitoring metrics, logging, and alerting features help you keep track of your deployed models' performance.
![image](Diagram showing MLOps lifecycle including the features like Automation, Versioning, Reproducibility, etc.)
Conclusion
Managed MLflow offers a plethora of features aimed at simplifying the machine learning lifecycle. From experiment tracking to model deployment, it provides a centralized platform to manage all your machine learning needs. As machine learning projects grow in complexity and scale, leveraging a tool like Managed MLflow can significantly aid in maintaining efficiency and productivity.