Run ERNIE language models on your own servers and expose them via an OpenAI-compatible API endpoint.
Connect existing apps built for the OpenAI API to a self-hosted inference server without rewriting client code.
Deploy vision-language models that accept both text and image inputs using the ERNIE 4.5 VL architecture.
Requires PaddlePaddle installation and a GPU, primary documentation is in Chinese.
FastDeploy is a Python-based toolkit from PaddlePaddle, Baidu's open-source deep learning platform, that makes it faster and simpler to deploy large language models and vision-language models into production systems. If you have a trained AI model and want to serve it to real users at high speed, this toolkit provides the tooling to do that. The main focus is inference serving: running a model to produce outputs in response to requests, rather than training a model from scratch. FastDeploy handles the details of model loading, batching requests, and returning results quickly. This matters when a deployed service needs to handle many users simultaneously without long wait times. It supports Baidu's ERNIE model family, including ERNIE 4.5 for text tasks and ERNIE 4.5 VL, which handles both text and image inputs. One practical advantage is that FastDeploy exposes an API compatible with the OpenAI format, meaning applications or workflows already written for OpenAI's ChatGPT API can connect to a self-hosted FastDeploy server with minimal code changes. This is useful for teams who want to run models on their own infrastructure rather than depending on a third-party API provider. The repository includes tooling for LLM serving and inference optimization, with topics covering both LLM serving and multimodal model support. The codebase integrates with PaddlePaddle's ecosystem and is geared toward production deployments. Note that the main README in the repository links to a Chinese-language document rather than providing English documentation directly, so non-Chinese readers may need to explore the code and community resources for setup details. FastDeploy is aimed at machine learning engineers and backend developers who need to go from a trained PaddlePaddle model to a running API endpoint. It is not a model training toolkit and it does not cover the full model development lifecycle. If your team works with ERNIE models or wants an OpenAI-compatible serving layer for self-hosted LLMs, this toolkit provides a focused starting point for doing that.
← paddlepaddle on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.