Accelerating AI Innovation at Yext with BentoML
Takeaways
- Cutting development time by 70% by standardizing development processes and team collaboration.
- Reducing compute costs by up to 80% with BentoML's multi-cloud, multi-region deployment and efficient autoscaling capability.
- Shipping 2x more models by enabling the Data Science team to self-serve and the Engineering team to work on higher leverage tasks.
“BentoML enables our Data Science and Engineering teams to work independently, without the need for constant coordination. This allows us to build and deploy AI services with incredible efficiency while giving the ML Engineering team the flexibility to refactor when needed. What used to take days, now takes just hours. In the first four months alone, we deployed over 40 models, and now run over 150 in production, thanks to BentoML's standardized platform.”
——— Michael Misiewicz, Director of Data Science at Yext
Background
Yext is the leading digital presence platform, recognized for revolutionizing how multi-location brands worldwide connect with customers. With one central platform, Yext empowers brands to deliver accurate, consistent information across all digital touchpoints. They implement cutting-edge AI solutions to simplify the management of local listings, webpages, reviews, social media, and more.
Yext uses a diverse set of AI architectures that enhance content creation, information retrieval, recommendations, and optimization for both traditional and AI-powered search engines. These services require deploying a variety of AI models, from state-of-the-art generative models to fine-tuned small language models and classical statistical ML models. This comprehensive approach to creating AI products has helped Yext meet the varied requirements of its offerings while driving enhanced engagement and visibility for its customers.
Growing pains in scaling AI workloads
Like many enterprises scaling their AI workloads, Yext encountered common challenges around infrastructure and process coordination. While the team had a strong foundation of innovative AI products, these friction points made it difficult to bring models into production quickly.
- Rigid infrastructure slowed down deployment. With a range of experimental AI products, Yext needed to bring multiple prototypes into production. However, without a flexible infrastructure and a standardized framework, the team’s iteration speed couldn’t keep up with demand. This resulted in slower progress.
- Productionizing AI models was time-consuming and prone to error. Managing various AI models required the team to repeatedly build the same instrumentation, optimizations and tooling for each service. This duplication extended development timelines and increased the risk of errors and resource constraints.
- Cross-team dependencies added complexity. Each new model release required the ML Engineering team to build a new service in the production infrastructure. This added workload slowed down iterations and limited the team’s ability to quickly adapt to changing demands or explore new use cases.
“Bringing new services to production took a lot more time than expected, especially when new infrastructure requirements were involved, which is often the case with AI workloads,” said Misiewicz.
Transforming AI operations with a flexible and cost-efficient platform
Faced with these challenges, Yext recognized the need for a standardized platform with the flexibility to support its wide array of model-serving use cases.
“Instead of a restrictive, all-encompassing platform, we wanted a solution that would integrate seamlessly with our existing infrastructure and support our preferred workflows. It also needed to be flexible in GPU deployment options,” Misiewicz added.
After evaluation, Yext selected BentoML for several key benefits:
- Reducing time spent on infrastructure operations. BentoCloud’s BYOC option combines the benefits of a fully managed platform with the control of a secure, customer-controlled cloud account. This deployment architecture minimized infrastructure demands, allowing the Yext team to focus on high-impact work while benefiting from the latest AI infrastructure innovations.
- Streamlining existing processes with integrations. BentoML easily integrates with the existing model training and experimentation workflows. This was an important requirement for Yext, as it wanted to avoid overhauling its development process to accommodate a new platform.
“After exploring BentoML, we believed it would be a great fit as it offered the integrations, flexibility and scalability that we wanted from an inference platform,” said Misiewicz.
- Shipping AI products faster with a standardized framework. BentoML standardizes how teams collaborate and ship models to production. First, BentoML’s primitives make it much easier for data scientists to create AI services with various models using minimal code. Second, the standardized framework reduces the back-and-forth communication required between data scientists and ML engineers. This frees ML engineers from repetitive deployment tasks, allowing them to focus on more critical initiatives and core business development.
- Saving money with a hybrid cloud architecture. BentoML’s cloud-agnostic approach allows Yext to easily deploy models across various regions to not only comply with local regulations but also optimize for GPU cost and availability. This enables Yext to maximize resource usage while efficiently scaling globally.
Once the decision was made, getting the initial projects up and running was fast and intuitive. The Yext Data Science team quickly adapted to the platform, gradually applying it to various mission-critical ML and GenAI projects. Throughout this transition, the BentoML team worked closely with Yext, utilizing weekly check-ins and daily, direct communication to ensure the success of the projects.
Faster and more efficient AI products at a global scale
Since adopting BentoML, Yext has seen significant operational improvements, including:
- Cutting development time by 70%. With BentoML’s unified serving framework that standardizes performance optimization and workflow management, models are getting to production in hours rather than days.
- Reducing the reliance on the infrastructure team. The Data Science team can now push, deploy and scale their models independently using BentoCloud's advanced SDKs which are accessible using Python, CLI or a convenient dashboard.
- Lowering compute costs by up to 80%. BentoML’s cloud-agnostic approach enables Yext to deploy to regions with the best GPU rates and availability. In addition, efficient auto-scaling and scale-to-zero features ensure no over-provisioning is needed.
- Shipping 2x more models. Serving a global market requires Yext to provision their products to handle a variety of languages. With BentoCloud they were able to ship 2x the number of models for customers all over the world. In the process of rolling out new language models, Yext was able to deploy over 15 different models in a week. Today, Yext runs over 150 models in production, all powered by BentoCloud’s scalable and efficient infrastructure.
“We are thrilled with the results and the changes BentoML has brought, particularly the time saved in our development and deployment processes,” Misiewicz said. “It bridges the gap between our Data Science and Engineering teams, accelerating our iteration cycle.”
“At BentoML, our goal is to help enterprises lead and succeed with AI,” said Chaoyu Yang, CEO and Founder of BentoML. “We provide fast and scalable infrastructure for model inference and advanced AI applications. It’s exciting to see how well BentoML integrates into Yext’s infrastructure and standardizes its workflow.”
Conclusion
BentoML has transformed the Yext workflow for developing and deploying AI models by providing a standardized framework that enhances team collaboration. With infrastructure burdens offloaded, the Yext Data Science team can now explore a wider range of AI use cases to expand its business. This shift lays the groundwork for ongoing growth and success in their AI-driven initiatives.
More resources
- Learn more about Yext
- Sign up for BentoCloud for free! Experience a Unified Inference Platform that simplifies how developers build and scale AI systems in production, with any model, on any cloud or on-premises environment.
- Join our Slack community to get the latest information about BentoML.
- Have questions? Schedule a call with our experts.