ModelDB

A system to manage ML models

This project is maintained by mitdbg

ModelDB: A system to manage machine learning models

Companies often build hundreds of models a day (e.g., churn, recommendation, credit default). However, there is no practical way to manage all the models that are built over time. This lack of tooling leads to insights being lost, resources wasted on re-generating old results, and difficulty collaborating. ModelDB is an end-to-end system that tracks models as they are built, extracts and stores relevant metadata (e.g., hyperparameters, data sources) for models, and makes this data available for easy querying and visualization.

Use Cases

How it works

Use a set of ModelDB native clients (currently spark.ml and scikit-learn) to log modeling data to ModelDB. Using the ModelDB client API requires minimal changes to a modeling workflow.

For example, in spark.ml, it requires the following changes:

  estimator.fit(data) --> estimator.fitSync(data)
  transformer.transform(data) --> transformer.transformSync(data)
  model.predict(data) --> model.predictSync(data)
 

And similarly in scikit-learn:

  model.fit(data) --> model.fit_sync(data)
  preprocessor.transform(data) --> preprocessor.transform_sync(data)
  model.predict(data) --> model.predict_sync(data)
 

Once you run a workflow that has been instrumented with ModelDB, all the relevant modeling data is logged to the server. Now you can use the frontend to query and visualize this data.

Overview Page

Table View of Models

Charting capabilities

Architecture

ModelDB adopts a modular client-server architecture (below). Native clients for different languages (and ML packages) log data to the ModelDB server. All communication takes place through the ModelDB Thrift API. As a result, adding a native client for another language is straightforward. The web frontend surfaces data in the backend for query, visualization and updates.

Getting Started

The ModelDB Getting Started Guides for spark.ml and scikit-learn are a good place to start. Please use the ModelDB mailing list or Google Group for quesions.

Papers

Contributors

Contact us

Send questions to modeldb _at_ csail.mit.edu