Companies often build hundreds of models a day (e.g., churn, recommendation, credit default). However, there is no practical way to manage all the models that are built over time. This lack of tooling leads to insights being lost, resources wasted on re-generating old results, and difficulty collaborating. ModelDB is an end-to-end system that tracks models as they are built, extracts and stores relevant metadata (e.g., hyperparameters, data sources) for models, and makes this data available for easy querying and visualization.
Use a set of ModelDB native clients (currently
scikit-learn) to log modeling data to ModelDB. Using the ModelDB client API requires minimal changes to a modeling workflow.
For example, in
spark.ml, it requires the following changes:
And similarly in
Once you run a workflow that has been instrumented with ModelDB, all the relevant modeling data is logged to the server. Now you can use the frontend to query and visualize this data.
Table View of Models
ModelDB adopts a modular client-server architecture (below). Native clients for different languages (and ML packages) log data to the ModelDB server. All communication takes place through the ModelDB Thrift API. As a result, adding a native client for another language is straightforward. The web frontend surfaces data in the backend for query, visualization and updates.
Send questions to modeldb _at_ csail.mit.edu