Fetching latest headlines…
Deploy a Real‑Time Object Detection API with YOLOv8 & FastAPI
NORTH AMERICA
🇺🇸 United StatesMay 11, 2026

Deploy a Real‑Time Object Detection API with YOLOv8 & FastAPI

0 views0 likes0 comments
Originally published byDev.to

Why combine YOLOv8 and FastAPI?

Object detection is at the heart of many modern applications—think smart cameras, inventory robots, or AR experiences. YOLOv8 (You Only Look Once) gives you state‑of‑the‑art accuracy while still running fast enough for real‑time use. FastAPI, on the other hand, is a lightweight, async‑first web framework that makes it trivial to expose a model as a REST endpoint.

In this tutorial you’ll walk through:

  1. Preparing a small custom dataset and training a YOLOv8 model.
  2. Wrapping the model in a FastAPI service that accepts images and returns detections.
  3. Docker‑izing the whole stack so it can run anywhere with a single docker compose up.

By the end you’ll have a reproducible, container‑based API that can serve predictions in a few milliseconds.

Prerequisites

Tool Version Why
Python 3.9‑3.11 Compatibility with Ultralytics YOLO
Ultralytics YOLO pip install ultralytics Training and inference
FastAPI pip install fastapi[all] HTTP server
Docker & Docker‑Compose Latest Container orchestration
Git Any Version control (optional)

You’ll also need a modest GPU for training (even a laptop GPU works for a small dataset). If you only want to test inference, CPU‑only mode is fine.

1. Prepare a custom dataset

YOLOv8 expects the classic folder layout:

my_dataset/
├── images/
│   ├── train/
│   └── val/
└── labels/
    ├── train/
    └── val/

Each image in train/ or val/ has a corresponding .txt file in the same sub‑folder under labels/. The label file contains one line per object:

<class_id> <x_center> <y_center> <width> <height>

All coordinates are normalized (0‑1). If you already have COCO‑style annotations, the ultralytics package can convert them:

# convert_coco_to_yolo.py
from ultralytics import YOLO

YOLO.convert_coco(
    data="coco_annotations.json",
    save_dir="my_dataset"
)

Once you have the folder ready, create a data.yaml that points to it:

# data.yaml
train: ./my_dataset/images/train
val: ./my_dataset/images/val

nc: 3                     # number of classes
names: ['person', 'bicycle', 'dog']

2. Train the model

Training with YOLOv8 is a single line:

yolo task=detect mode=train data=data.yaml epochs=50 imgsz=640 batch=16 model=yolov8n.pt
  • yolov8n.pt is the nano version (fastest, smallest). Swap for yolov8s.pt or larger if you need higher accuracy.
  • Adjust epochs, batch, and imgsz to fit your hardware.

After training finishes you’ll find the best checkpoint in runs/detect/train/weights/best.pt. Keep that file; it’s what the API will load.

3. Build the FastAPI inference service

Create a new folder api/ and add the following files.

app/main.py

# app/main.py
import io
import numpy as np
from fastapi import FastAPI, File, UploadFile, HTTPException
from ultralytics import YOLO
from PIL import Image

app = FastAPI(title="YOLOv8 Object Detection API")

# Load the model once at startup
model = YOLO("weights/best.pt")

def pil_to_numpy(img: Image.Image) -> np.ndarray:
    """Convert a Pillow image to a NumPy array that YOLO expects."""
    return np.array(img.convert("RGB"))

@app.post("/detect")
async def detect(file: UploadFile = File(...)):
    """Accept an image file and return bounding boxes."""
    if file.content_type not in {"image/jpeg", "image/png"}:
        raise HTTPException(status_code=400, detail="Invalid image type")

    # Read bytes and open with Pillow
    contents = await file.read()
    try:
        img = Image.open(io.BytesIO(contents))
    except Exception:
        raise HTTPException(status_code=400, detail="Corrupt image")

    # Run inference (async is not needed because YOLO runs on C++)
    results = model(pil_to_numpy(img))[0]

    # Build a simple JSON response
    detections = []
    for box in results.boxes:
        detections.append({
            "class_id": int(box.cls),
            "class_name": model.names[int(box.cls)],
            "confidence": float(box.conf),
            "bbox": box.xyxy.tolist()[0]  # [x1, y1, x2, y2] in pixel coords
        })

    return {"detections": detections}

app/requirements.txt

fastapi[all]==0.110.*
uvicorn==0.27.*
ultralytics==8.2.*
pillow==10.2.*
numpy==1.26.*

Dockerfile

# Use an official lightweight Python image
FROM python:3.11-slim

# Install system dependencies (opencv needed by ultralytics)
RUN apt-get update && apt-get install -y --no-install-recommends \
    libgl1-mesa-glx \
    && rm -rf /var/lib/apt/lists/*

# Create a non‑root user
RUN useradd -m appuser
WORKDIR /app
COPY --chown=appuser:appuser app/ ./app/
COPY --chown=appuser:appuser weights/best.pt ./weights/best.pt

# Install Python deps
RUN pip install --no-cache-dir -r app/requirements.txt

# Switch to non‑root user
USER appuser

EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Tip – Keep the weights/ directory next to the Dockerfile so the model file is added to the image at build time. For larger models you may want to mount the weights as a volume instead.

4. Docker‑Compose for one‑click launch

Create a docker-compose.yml at the repository root:

version: "3.9"

services:
  yolo-api:
    build: .
    ports:
      - "8000:8000"
    restart: unless-stopped
    # If you have a GPU and the host has NVIDIA drivers, uncomment:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - capabilities: [gpu]

Now run:

docker compose up --build -d

The API will be reachable at http://localhost:8000/detect. FastAPI automatically generates interactive docs at http://localhost:8000/docs, where you can upload an image and see the JSON response instantly.

5. Test the endpoint

A quick curl test:

curl -X POST "http://localhost:8000/detect" \
  -F "[email protected]" \
  -H "Accept: application/json"

You should receive something like:

{
  "detections": [
    {
      "class_id": 0,
      "class_name": "person",
      "confidence": 0.92,
      "bbox": [112, 45, 398, 720]
    },
    {
      "class_id": 2,
      "class_name": "dog",
      "confidence": 0.78,
      "bbox": [410, 300, 620, 540]
    }
  ]
}

If you prefer a visual output, you can extend the API to return an image with drawn boxes using OpenCV or Pillow. The core logic stays the same; just add a cv2.rectangle loop and return StreamingResponse.

6. Scaling considerations

  • GPU acceleration: The Dockerfile above runs on CPU. To enable GPU, use the nvidia/cuda base image and add --gpus all to docker compose up. The Ultralytics package automatically detects CUDA.
  • Batch inference: For higher throughput, modify the endpoint to accept a list of images and call model(images) once. This reduces the overhead of model loading.
  • Model versioning: Store each trained checkpoint in a separate folder and mount the desired version at runtime (-v ./weights/v2.pt:/app/weights/best.pt). This makes A/B testing painless.

7. Clean up

When you’re done experimenting, stop and remove containers:

docker compose down
docker rmi $(docker images -q your_image_name)  # optional

You can also push the image to a registry (Docker Hub, GitHub Packages, etc.) and deploy it to any cloud provider that supports containers—AWS ECS, GCP Cloud Run, or Azure Container Apps—all with the same docker run command.

Conclusion

You’ve just built a full‑stack, containerized object detection service:

  1. Data → YOLOv8 training → best.pt.
  2. FastAPI wraps the model in a clean HTTP endpoint.
  3. Docker guarantees reproducibility and portability.
  4. Docker‑Compose makes local development and testing a single command.

From here you can experiment with larger YOLO variants, add authentication to the API, or integrate the service into a larger micro‑service architecture. The sky’s the limit!

Key takeaways

  • YOLOv8’s CLI makes custom training fast; a single best.pt file is all you need for inference.
  • FastAPI provides async‑ready, auto‑documented endpoints that pair nicely with YOLO’s Python API.
  • Docker isolates dependencies (Python, OpenCV, CUDA) and ensures the same environment runs everywhere.
  • Using Docker‑Compose you can spin up the API locally, test it with curl or the Swagger UI, and later push the image to any container platform.

Comments (0)

Sign in to join the discussion

Be the first to comment!