Databricks launched a public preview of GPU and LLM optimization help for Databricks Mannequin Serving. This new characteristic allows the deployment of assorted AI fashions, together with LLMs and Imaginative and prescient fashions, on the Lakehouse Platform.
Databricks Mannequin Serving affords computerized optimization for LLM Serving, delivering high-performance outcomes with out the necessity for guide configuration. In line with Databricks, it’s the primary serverless GPU serving product constructed on a unified knowledge and AI platform, permitting customers to create and deploy GenAI functions seamlessly inside a single platform, masking all the pieces from knowledge ingestion to mannequin deployment and monitoring.
Databricks Mannequin Serving simplifies the deployment of AI fashions, making it simple even for customers with out deep infrastructure data. Customers can deploy a variety of fashions, together with pure language, imaginative and prescient, audio, tabular, or customized fashions, no matter how they had been educated (from scratch, open-source, or fine-tuned with proprietary knowledge).
Simply log your mannequin with MLflow, and Databricks Mannequin Serving will robotically put together a production-ready container with GPU libraries like CUDA and deploy it to serverless GPUs. This absolutely managed service handles all the pieces from managing situations, sustaining model compatibility, to patching variations. It additionally robotically adjusts occasion scaling to match site visitors patterns, saving on infrastructure prices whereas optimizing efficiency and latency.
Databricks Mannequin Serving has launched optimizations for serving giant language fashions (LLM) extra effectively, leading to as much as a 3-5x discount in latency and price. To make use of Optimized LLM Serving, you merely present the mannequin and its weights, and Databricks takes care of the remaining, making certain your mannequin performs optimally.
This streamlines the method, permitting you to focus on integrating LLM into your software fairly than coping with low-level mannequin optimization. At the moment, Databricks Mannequin Serving robotically optimizes MPT and Llama2 fashions, with plans to help extra fashions sooner or later.