Scalable and Secure AI Inference in Healthcare: A Comparative Benchmarking of FastAPI and Triton Inference Server on Kubernetes
By: Ratul Ali
Published: 2026-02-01
View on arXiv →#cs.AI
Abstract
This paper presents a comparative benchmarking of FastAPI and Triton Inference Server on Kubernetes for scalable and secure AI inference in healthcare. The research addresses critical deployment challenges in medical AI, focusing on ensuring both high performance and robust security. By evaluating these popular tools within a Kubernetes environment, the study provides valuable insights for developing reliable and compliant AI solutions for diagnostics, personalized medicine, and other healthcare applications, emphasizing practical considerations for real-world integration.