Neurolabs Accelerates Time to Market by 9 Months and Saves up to 70% with BentoML

Takeaways

  • Time to market sped up by 9 months with BentoML’s inference platform
  • Shipping new models on a daily basis without any additional infrastructure resources
  • 70% savings in compute costs achieved with efficient auto-scaling and scale-to-zero

BentoML’s infrastructure gave us the platform we needed to launch our initial product and scale it without hiring any infrastructure engineers. As we grew, features like scale-to-zero and BYOC have saved us a considerable amount of money.

—— Patric Fulop, CTO of Neurolabs

Background

Neurolabs works with leading consumer packaged goods (CPG) companies in the world to streamline the collection of in-store performance data. Using Synthetic Computer Vision (SCV) models, Neurolabs leverages simulated image data to accurately identify products on retail shelves, track compliance, and gather valuable retail insights. This enables brands to optimize retail execution, enhance customer experiences, and ultimately drive revenue growth.

In doing this, they have also built the most comprehensive 3D asset library for product recognition in the industry.

Deploying complex AI pipelines was slowing time to market

As Neurolabs was transitioning its advanced SCV models to production AI systems, they encountered several challenges in deployment and scaling.

  • Hiring AI engineers to build infrastructure was slow and expensive. As Neurolabs trained more AI models, it needed to ship them to production on a reliable infrastructure. Dedicating internal resources to infrastructure was not feasible, as it would detract from their core business development. This meant Neurolabs had to hire new engineers to build and maintain the infrastructure. However, this was a slow and costly process, requiring time to onboard talent and build the necessary systems from scratch.
  • Deploying multi-model pipelines was complex and highly specialized. As the customer base of Neurolabs grew, they often needed to build highly customized AI systems with different fine-tuned models. With so many live models running, it became increasingly challenging to optimize the workflows. Neurolabs needed a standard that unified their AI deployments, followed best practices and was easy to scale.
  • Various AI workloads required a flexible infrastructure in order to scale. The expanded client base also meant Neurolabs had to manage varied traffic patterns. Different clients had unique usage demands, which required dynamic scaling solutions - scaling up during peak hours and down to zero when workloads were low to save costs.

Recognizing these challenges, Neurolabs began searching for an established infrastructure platform that could help transition its AI models to production with speed, reliability and scalability.

An inference platform that streamlines AI deployment

“BentoML’s specialized model serving platform proved to be the ideal solution. It complements our expertise in developing advanced Computer Vision pipelines and provides the infrastructure we need to streamline AI deployment,” said Patric Fulop, CTO of Neurolabs.

Key benefits of BentoML to Neurolabs:

  • Streamlining AI infrastructure without hiring an infrastructure team. BentoML equipped Neurolabs with the infrastructure to quickly move prototypes to production, saving on hiring costs and enabling data scientists to focus on optimizing AI models. Setting up the infrastructure was a fast, seamless process where the necessary components were automatically installed in Neurolab’s cloud account to provide the necessary security and data privacy.

  • Saving development time with BentoML’s standardized framework. BentoML makes it easy for Neurolabs to bring custom models online for its diverse client base. It seamlessly integrates with the training and CI/CD workflows, allowing data scientists to frequently train and update models with minimal friction. This leads to a much faster end-to-end deployment cycle and a shorter time to market.

  • Purpose built for deploying compound AI systems. BentoML provides the essential building blocks to create and connect multiple AI services. For example, users can run separate services or models on CPU or GPU independently (e.g. isolating data pre-processing tasks from model inference) and configure communication between them as needed.

    “The way BentoML helps build compound AI systems provides new insights for how we approach our internal pipelines,” said Calin Cojocaru, AI Engineer at Neurolabs. “We haven’t fully tapped the potential yet, but it’s certainly making us think more about how we will use it moving forward.”

  • Cost savings with auto-scaling and scale to zero. BentoML automatically manages different traffic patterns with no manual intervention needed. It scales workloads to zero during low-traffic periods and achieves fast startup when traffic surges. This helps Neurolabs maintain optimal performance while minimizing infrastructure costs. Since configuring autoscaling and scaling to zero with BentoML is straightforward, it also reduces operational overhead and saves significant development time.

Gains in compute cost, speed and scale

Since partnering with BentoML, Neurolabs has a variety of improvements:

  • Accelerating the transition to productionizing AI systems by 9 months. Over the past 6 months, Neurolabs has seen a 3x increase in deployment speed, allowing it to manage 10 model iterations per week. Deployment via CLI, Python or dashboard makes it easy to develop, ship and scale new services.
  • Saving 2 infrastructure hires that would have been required to build and maintain an infrastructure platform. With BentoML providing a standardized framework and infrastructure, there’s no need to hire additional resources. In addition, Neurolabs stays on the cutting edge of AI infrastructure with advanced features like model caching and cold start optimization constantly being added.
  • Cutting compute costs by 70% with efficient auto-scaling. BentoML's advanced auto-scaling features allows Neurolabs to flexibly handle different client traffic patterns while maintaining peak performance. Fast cold start time ensures minimal latency during and after model launches.

“BentoML has helped us smoothly transition to productionizing our AI systems. It not only meets our current needs but also future-proofs us for more advanced use cases,” Fulop added.

Looking ahead

Neurolabs expects a significant increase in model usage as it continues to develop and deploy new models to meet the growing demands of clients. As it scales, it plans to explore how BentoML’s support for compound AI can unlock more advanced use cases. With BentoML’s robust infrastructure in place, Neurolabs is well-prepared to manage this growth.

More resources