ML.NET, an Open Source Machine Learning Framework for the .NET Ecosystem: Pranav Rastogi Q&A

Earlier this month Microsoft released the first major version of ML.NET, an open source machine learning (ML) framework for the .NET ecosystem.

Originally developed as part of the Microsoft Research initiative, ML.NET allows the development of custom ML models using either C# or F#. These models can be used in scenarios involving sentiment analysis, fraud and spam detection, product and movie recommendation, image classification, and more.

ML.NET was already being used by Microsoft customers in its previous versions. The new version of the framework, however, features a model builder for Visual Studio and a tool called Automated Machine Learning (AutoML). AutoML is a featured aimed at developers new to ML, automatically deciding the data scenarios (i.e., classification and regression) and ML algorithm to be used in the ML model according to the input data. Complementary to this feature, the model builder provides a UI tool (Windows only) for creating ML models inside the Visual Studio environment. Both tools are currently available in preview.

InfoQ spoke with Pranav Rastogi, part of the program management team for ML.NET.

InfoQ: Why is the latest release of ML.NET so important to the .NET machine learning community?

Pranav Rastogi: The latest release of ML.NET makes ML more approachable to the developers. As a .NET developer, you can use ML.NET APIs to build scenarios such as sentiment analysis, product recommendation, customer segmentation, and a lot more. However, one of the key challenges the developers face is that getting started with ML is still hard. They need to understand which ML trainer to use, and how to customize or how to optimize it. What we introduce as part of this release is a way of making custom ML models easier with AutoML.

We are also introducing different tools for developers to get started with ML. This release includes a CLI-based experience, so a developer can go to the command line and just build the ML models.

We're also enabling a GUI-based experience for Visual Studio users so they can add ML models to their projects. This is possible by using a tool called Model Builder, which allows a developer to directly connect their files and build their custom models. You can right-click a project, say “add machine learning” and choose a scenario. Your data source could be a file or SQL server, and AutoML will select the best model for you based on your scenario. It's going to try out various models and settings, and it's going to come back to you with the result of a summary of the top five models, and the recommendation of the best model.

And the last step of the journey is adding code to your solution. Oftentimes developers would sort of look at these tools and not trust them because they do a lot of magic. But with Model Builder, at the end of the process, you get the exact code that was used to train the models. You can customize that code if you want to. You can start by running it locally and then you can use the cloud to train your model for a longer time. You can easily integrate it with your DevOps tools. You can operationalize your model and build the custom machine learning models for any dormant application, so you can build it for a Web application, a mobile application, or a desktop application.

InfoQ: How does ML.NET compare with other ML frameworks, such as TensorFlow?

Rastogi: One of the key value propositions of ML.NET is leveraging the ecosystem of ML libraries and frameworks currently available. ML.NET offers deep integration with popular frameworks like TensorFlow or ONNX for .NET. You can easily include models that were built using these frameworks in scenarios like image classification or object detection, and you can use any of the models such as inception or resonate in your .NET application.

InfoQ: How is ML.NET positioned in comparison with other Microsoft products, such as Cognitive Services and Azure Machine Learning?

Rastogi: It depends on what your skill sets are like: if you're a data scientist then you'll probably use a notebook-based environment, and you would use the frameworks of your choice. Azure Machine Learning provides ML a service that developers or data scientists can use in notebooks, and then train in the cloud effectively. If a user is getting started with ML, Cognitive Services provides an easy-to-use API, with out-of-the-box scenarios. ML.NET is a framework for building custom ML models for .NET developers.

InfoQ: Can you share a few of the most challenging tasks the team encountered while developing ML.NET?

Rastogi: I think that our biggest challenge in the project was how to make ML more approachable and more accessible to developers. Given that ML is new and a lot of users are not familiar with its concepts, we spent a lot of our time figuring out what should be the name of the APIs so that they would feel familiar to the .NET developers, helping to build trust in the ecosystem. We have done a lot of customer research, and we learned that if you come to developers and say “can you build me a binary classification algorithm”, they have no idea what it means. But if you approach developers from a scenario-first perspective, asking them to build sentiment analysis, which analyzes customer reviews, they can do it because it is a known problem in ML that can be solved by classifying data into two categories: A and B. So we have spent a lot of our time making sure that the framework was approachable. The defaults work out of the box, so the users don't have to customize it.

We also spent a lot of time making sure that our framework could handle large datasets (terabytes) and that your entire pipeline could be deployed as it is in your production application. That sort of increases the productivity of developers quite a lot.

InfoQ: The AutoML feature is originally a project from Microsoft Research. How was the process of integrating the existing research with ML.NET and shipping it as a product?

Rastogi: This was a big collaborative effort between Microsoft Research and different product teams at Microsoft, all coming together to shape AutoML. AutoML is a key piece of our technology, which offers an easy experience for developers to build custom ML models. They don't have to worry about figuring out which learner to use and what should be the hybrid parameter setting and how many learners they should try, for how long they should try it, or what the featurization process should be. So a lot of teams at Microsoft came together working on this research project to bring it to market.

InfoQ: Which new features can the .NET development community expect in the future?

Rastogi: What we're working on right now is to bring support for our preview feature in Visual Studio. Some of the features around deep learning and support to TensorFlow are in preview right now. You'll see them come together. You'll see improvements in our model builder story, with support to recommendation based on time series. A great way to keep up to date with our roadmap is by checking our GitHub repository.

ML.NET is available for Windows, Linux, and macOS. The Windows release requires Visual Studio 2017 15.6 or later, since the model builder is available as a Visual Studio extension. There are no prerequisites for macOS and Linux, but in these platforms the ML.NET models are built using a CLI. You can learn more about ML.NET here.