AMA With Josh Bottum, The Future of Kubeflow

Oct 27, 2022

We regularly invite ML practitioners and industry leaders to share their experiences with our Community. Want to ask questions to our next guest? Join BentoML Community Slack

We recently invited Josh Bottum. He is Kubeflow community product manager. Kubeflow is a popular open-source project that delivers a composable software foundation for those who need to build an maintain a scalable ML platform with best-in-class KPIs. As a Kubeflow Community Product Manager, Josh focuses on machine learning initiatives and engages with research, engineering, and operational teams, who need to simplify complex workflows and Kubernetes data management.

Q: For those who don't know about Kubeflow. Can you explain what is it?#

Kubeflow is a composable, Kubernetes-native Machine Learning Platform.

It is designed to provide efficient MLOps for workflows that need an IDE (notebooks), distributed model training, model tuning, reproducible pipelines, and model serving.

Q: How did you first hear about Kubeflow? And what keep you involved with the community?#

I was working at Canonical and was interested in open source. I was lucky enough to start joining the Community calls and working with some customers/prospects. Then I attended the Kubeflow Summit that was at Google and met a lot of the folks in the Community.

Q: How did Kubeflow build one of the largest communities in space? What did Kubeflow do right and what were the lessons learned?#

Kubeflow was fortunate to have foundations in two large open-source projects - Kubernetes and Tensorflow. I think that base, and Google's best practices helped to set a good foundation.

We have done some things pretty well i.e. we have continued to deliver releases on a pretty regular basis (14 releases over 4 years). This has required us to improve our process and document our release process. We have learned many lessons along the way. The most important is to work on the basics - documentation, installation, tutorials, training, security

Q: Are there plans to involve folks from BentoML/ZenML etc into Kubeflow working groups?#

Kubeflow is designed to be an open community, with the power resting at the working group level, especially with the contributors. I have made an effort to improve our process to collaborate with other communities, especially as we move forward to Kubeflow v1.7. We need to strive for quality, which I think is based on testing and documentation. We also need tutorials that are relevant and perhaps easy to set up and follow.

So IMO, the answer is more than BentoML/ZenML for Kubeflow. We need to be open to collaborating with many projects and we need to ground rules for that. On the other side, the collaborators need to think about what they can commit to.

Q: I have heard about TFX, Kubeflow, and KServe mentioned together. Can you clarify what is the difference between these projects?#

Good question, you might be able to answer better than me. TFX provides great, well-tested libraries for ML common requirements. It overlaps with Kubeflow in places. KServe is a serving solution that has incorporated technologies from many areas. IBM and Bloomberg have made many contributions, especially lately.

Q: Who are the collaborators right now?#

Kubeflow has not collaborated particularly well with third-party communities. I am trying to change that, but I have concerns, especially about quality. My other concern is that we deliver solutions that are sustainable. To be sustainable, I believe we need users to provide feedback.

Communities like BentoML, which has a reputation for good docs and quality testing, have an advantage. I think that users and collaborators appreciate that

Q: Kubeflow has been pretty poor in terms of marketing and evangelization. Are there plans to improve that in the future and how can folks help?#

Yes, Kubeflow does not have a marketing team. We are all volunteers. Sometimes, I am just happy to get through the Community call action items related to getting the release out .

Still, user adoption is key. Kubeflow needs to be more welcoming. It processes need to be easier. More tutorials that are easier to spin up. Arrikto has a nice Kubeflow-as-a-service option, which is free for 2 weeks. It's a great way to get hands-on and not get buried in the lower level stack issues that are inevitable when you are deploying Kubeflow's 40 pods on your in-home Ubuntu server. Once you run one of the end-to-end tutorials, you can see if Kubeflow fits your workflows. After that, you can decide if you want to work on a customized installation that will address your security, networking, storage, GPU, etc. needs.

Q: You mentioned a few times user adoption via tutorials, docs, and tests. Is that what Kubeflow "team" are currently working on? What other initials or items are on the roadmap? Beside tutorials, docs, and tests, what other initiatives and items are on the roadmap?#

We are now using Google for Kubeflow Pipelines but most working groups are using GitHub Actions and AWS for testing infra. Setting up the test infra, especially with different dependencies (K8s, istio, cert mgr, kustomize, argo, tecton, knative) takes coordination. We now test in the Working Groups, then at the Manifest level and then by the Distributions. I think these multiple layers of testing help us to find issues...but changes (updates) have ripple effects. So developing the code and keeping the test infra going is a job, then contributors have to work on test plans and docs. The components are designed to work together so you need test plans just for the WG features, but then also for the end-to-end workflows.

On the Roadmap, we have started a quality improvement program this includes adding more processes and people.

Q: How can the community help to contribute?#

Instead of one person for all docs, we are trying to add a person into each of the Working Groups to be a Docs Lead. We need Docs Leads, we need Testing Leads and coders for new features. We are glad to help people take material and conduct meet-ups on their own. That will get your heart beating, especially when you are doing a live demo!

Q: How do you decide on when to do a release, and/or what gets worked on? Also, how are your release images built? Are there any plans to do something along the lines of following SLSA for supply chain security?#

We have a release handbook. As for what gets into the release, we usually let the Working Groups produce a roadmap. then we let users review the issues/PRs and provide comments. In the upcoming release, I have tried to set out some strategic objectives too. We are working on our build process. We also are starting to improve our security items. There is a question on my part about how much the Community does and when a Distribution becomes responsible for (commercial) support, CVEs fixes, SLAs, etc.

Q: What’s your vision for the Kubeflow project in terms of product positioning and community eco-system? How do you compare Kubeflow to cloud providers’ end-to-end ML platforms?#

Kubeflow provides a different value equation compared to cloud ML platforms. It also enables users to have more control over modifications for their workflows for the teams that use Kubernetes. Kubeflow is designed for efficient MLOps. Efficient for infrastructure, efficient for operations, but most importantly, efficient for model development and deployment. That said I like Kubeflow because of the people and problems that we get to solve. Some are very transformative and really change things for the better.

* The discussion was lightly edited for better readability.

AMA With Josh Bottum, The Future of Kubeflow

Last Updated

Share

AMA With Josh Bottum, The Future of Kubeflow

Q: For those who don't know about Kubeflow. Can you explain what is it?#

Q: How did you first hear about Kubeflow? And what keep you involved with the community?#

Q: How did Kubeflow build one of the largest communities in space? What did Kubeflow do right and what were the lessons learned?#

Q: Are there plans to involve folks from BentoML/ZenML etc into Kubeflow working groups?#

Q: I have heard about TFX, Kubeflow, and KServe mentioned together. Can you clarify what is the difference between these projects?#

Q: Who are the collaborators right now?#

Q: Kubeflow has been pretty poor in terms of marketing and evangelization. Are there plans to improve that in the future and how can folks help?#

Q: You mentioned a few times user adoption via tutorials, docs, and tests. Is that what Kubeflow "team" are currently working on? What other initials or items are on the roadmap? Beside tutorials, docs, and tests, what other initiatives and items are on the roadmap?#

Q: How can the community help to contribute?#

Q: How do you decide on when to do a release, and/or what gets worked on? Also, how are your release images built? Are there any plans to do something along the lines of following SLSA for supply chain security?#

Q: What’s your vision for the Kubeflow project in terms of product positioning and community eco-system? How do you compare Kubeflow to cloud providers’ end-to-end ML platforms?#

Subscribe to our newsletter

Products

Resources

Company

Join our community