Oct 27, 2022
We regularly invite ML practitioners and industry leaders to share their experiences with our Community. Want to ask questions to our next guest? Join BentoML Community Slack

Â
We recently invited Josh Bottum. He is Kubeflow community product manager. Kubeflow is a popular open-source project that delivers a composable software foundation for those who need to build an maintain a scalable ML platform with best-in-class KPIs. As a Kubeflow Community Product Manager, Josh focuses on machine learning initiatives and engages with research, engineering, and operational teams, who need to simplify complex workflows and Kubernetes data management.
Kubeflow is a composable, Kubernetes-native Machine Learning Platform.
It is designed to provide efficient MLOps for workflows that need an IDE (notebooks), distributed model training, model tuning, reproducible pipelines, and model serving.
Â
I was working at Canonical and was interested in open source. I was lucky enough to start joining the Community calls and working with some customers/prospects. Then I attended the Kubeflow Summit that was at Google and met a lot of the folks in the Community.
Â
Kubeflow was fortunate to have foundations in two large open-source projects - Kubernetes and Tensorflow. I think that base, and Google's best practices helped to set a good foundation.Â
We have done some things pretty well i.e. we have continued to deliver releases on a pretty regular basis (14 releases over 4 years). This has required us to improve our process and document our release process. We have learned many lessons along the way. The most important is to work on the basics - documentation, installation, tutorials, training, security
Â
Kubeflow is designed to be an open community, with the power resting at the working group level, especially with the contributors. I have made an effort to improve our process to collaborate with other communities, especially as we move forward to Kubeflow v1.7. We need to strive for quality, which I think is based on testing and documentation. We also need tutorials that are relevant and perhaps easy to set up and follow.
So IMO, the answer is more than BentoML/ZenML for Kubeflow. We need to be open to collaborating with many projects and we need to ground rules for that. On the other side, the collaborators need to think about what they can commit to.
Â
Good question, you might be able to answer better than me. TFX provides great, well-tested libraries for ML common requirements. It overlaps with Kubeflow in places. KServe is a serving solution that has incorporated technologies from many areas. IBM and Bloomberg have made many contributions, especially lately.
Â
Kubeflow has not collaborated particularly well with third-party communities. I am trying to change that, but I have concerns, especially about quality. My other concern is that we deliver solutions that are sustainable. To be sustainable, I believe we need users to provide feedback.
Communities like BentoML, which has a reputation for good docs and quality testing, have an advantage. I think that users and collaborators appreciate that
Â
Yes, Kubeflow does not have a marketing team. We are all volunteers. Sometimes, I am just happy to get through the Community call action items related to getting the release out .
Still, user adoption is key. Kubeflow needs to be more welcoming. It processes need to be easier. More tutorials that are easier to spin up.  Arrikto has a nice Kubeflow-as-a-service option, which is free for 2 weeks.  It's a great way to get hands-on and not get buried in the lower level stack issues that are inevitable when you are deploying Kubeflow's 40 pods on your in-home Ubuntu server. Once you run one of the end-to-end tutorials, you can see if Kubeflow fits your workflows.  After that, you can decide if you want to work on a customized installation that will address your security, networking, storage, GPU, etc. needs.
Â
We are now using Google for Kubeflow Pipelines but most working groups are using GitHub Actions and AWS for testing infra. Setting up the test infra, especially with different dependencies (K8s, istio, cert mgr, kustomize, argo, tecton, knative) takes coordination. We now test in the Working Groups, then at the Manifest level and then by the Distributions. I think these multiple layers of testing help us to find issues...but changes (updates) have ripple effects. So developing the code and keeping the test infra going is a job, then contributors have to work on test plans and docs. The components are designed to work together so you need test plans just for the WG features, but then also for the end-to-end workflows.
On the Roadmap, we have started a quality improvement program this includes adding more processes and people.
Â
Instead of one person for all docs, we are trying to add a person into each of the Working Groups to be a Docs Lead. We need Docs Leads, we need Testing Leads and coders for new features. We are glad to help people take material and conduct meet-ups on their own. That will get your heart beating, especially when you are doing a live demo!
Â
We have a release handbook. As for what gets into the release, we usually let the Working Groups produce a roadmap.  then we let users review the issues/PRs and provide comments. In the upcoming release, I have tried to set out some strategic objectives too. We are working on our build process. We also are starting to improve our security items. There is a question on my part about how much the Community does and when a Distribution becomes responsible for (commercial) support, CVEs fixes, SLAs, etc.
Â
Kubeflow provides a different value equation compared to cloud ML platforms. It also enables users to have more control over modifications for their workflows for the teams that use Kubernetes. Kubeflow is designed for efficient MLOps. Efficient for infrastructure, efficient for operations, but most importantly, efficient for model development and deployment. That said I like Kubeflow because of the people and problems that we get to solve. Some are very transformative and really change things for the better.
* The discussion was lightly edited for better readability.