Why we built a tool for managing ML workflows — Deploifai

Utkarsh Goel
4 min readJun 13, 2022

It was late 2020. I was working on multiple machine learning projects on a daily basis. I was building object detection models for very specific use-cases with very custom datasets. And then I realised that I was being super inefficient. I was tired of experimenting with different workflows to get the most out of my time and resources at hand. One day I decided, I had had enough. I wanted something that works better for me, and I wanted it then. So I started building something that I now call Deploifai.

What was I so picky about?

Looking at it broadly, there were 4 main issues:

  1. I work with multiple cloud providers — AWS for some workloads, Azure for some other, and even GCP sometimes. At the end of the day, at a startup we had credits for all of them, and we needed them! I was always hopping back and forth with infrastructure for deployments and whatnot.
  2. There was a lot of data — And I was storing all of this data in storage buckets, sometimes on AWS, sometimes on Azure and I would have to go looking around for it.
  3. I was creating and deleting complete environments a lot and managing dependencies all the time— Once a training job was done, I delete the VM, and every time, I would have to repeat the steps to set up again and again! Sometimes, things broke too. And it’s impossible to explain the pain with CUDA and cuDNN.
  4. I was integrating tools with a lot of overhead effort— When I needed to use something new, say MLFlow, a whole set up process gets added! These were industry standards, yet so hard to introduce in the workflow.

Starting off, I had these issues in mind. Most tools out there solve some of these problems individually. However, this is all part of one workflow for a developer. While one tool can definitely not have all the features done by itself, it would suffer from feature shock, one platform could seamless bring these moving parts together into a single machine, just like a car. Individual parts play their different roles, but make the whole car drive seamlessly. That was the intention when I started to “theorise” the idea for Deploifai.

I was looking at DVC for data version control, MLFlow for experiment tracking, Jupyter Notebooks for what they do best, Apache Spark for data engineering, and more. This gives me a chance to give a huge shoutout to all the community members for each of these projects! I did not need to re-invent these tools. Instead, what I needed was a platform that just made all of this available to developers without the overhead of setting them up, and doing each of these integrations manually.

Hence Deploifai is built around a philosophy

It’s been the major driving force behind Deploifai — make ML workflows super simple and reproducible for all developers. Internally, we stress on one singular user-experience: things should just work out-of-the-box. An ML developer need not concern themselves with the DevOps.

On top of that, the developer must feel like they have all the options open. Want to log metrics on MLFlow? Well here is a 2-line code snippet that will integrate it directly with Deploifai’s experiments. It’s not only an option, we want to encourage developers that they use the best tools in the industry.

We care about the experience that developers have when they are trying to bootstrap their ML projects and deal with the chaotic stuff, work with teammates, set things up on the cloud, set up pipelines and tools. We find ways to eliminate steps that developer would have to do manually, and provide an automated solution to make infrastructure come together to work just as the developer would want. With some of the opinionated design, Deploifai can generally work for the common developer, even if it does not work for every developer.

Is there really nothing else that does this?

To be completely transparent, I do not know. Is there something? Find me on Twitter (https://twitter.com/javachipd) and let me know. When I started building it, I didn’t really think to myself whether the tool that I was building was a clone of something else, even if it would be coincidental. I knew that as a developer myself, I wanted a product that I knew I had the capability to build. And if a few determined, bright minds got together to design and build a product, we would have something that developers like us would find that useful.

What’s the status on the development?

We are building things everyday! Deploifai can help developers build better ML workflows even today. It is as simple as going to https://deploif.ai and making a project in your dashboard.

We also just completed the integration with MLFlow. And it works beautifully. A simple custom plugin, and custom credentials, and we have seamless integration such that MLFlow directly logs experiment metrics to Deploifai dashboard. Check out the blog post about it!

We have a long way to go still. While the platform is free to signup and get started, we work more closely with a limited number of users to understand the experience and improve on it constantly. We are still building the documentation, the support resources, and as a developer-focused product, we know we have to get that experience right! If you, or your team are serious about onboarding, we can get on call and guide you through it as well! And we will be very excited if you will join our group that provides us the valuable feedback we need.

Please check it out

The website is: https://deploif.ai. We have a GitHub where you can check out the stuff we open-source. At the moment its mostly our example code, and some older projects, but soon enough we will get things together to open-source most of the tech we build. We also have a Discord: Join the community on Discord.

Thanks for reading!

--

--