Building a Zero-Cost LLM API Rotation Stack

The idea

Side projects deserve reliable LLM access too. Combine three generous free tiers—Aiven-hosted MySQL, a Docker-powered Hugging Face Space, and the open-source gpt-load gateway—to assemble a lightweight backend that rotates API keys and balances rate limits without extra spend.

High-level flow:

Client request
   │
   ├─> Hugging Face Space (gpt-load container)
   │       └─ Reads the key pool from Aiven MySQL
   └─< LLM response after rotation

Step 1: Provision Aiven MySQL

https://aiven.io/

Aiven offers a free 1 vCPU / 1 GB MySQL instance—enough for metadata and key pools.

Sign up and create a MySQL service on the free tier.
From the Service page, note the host, port, username, and SSL settings.

Create Aiven MySQL service Keep track of connection details

Keep these values in the .env file under DATABASE_DSN so the container can read them on startup.

Step 2: Deploy gpt-load on Hugging Face

https://huggingface.co/

Project repository: https://github.com/tbphp/gpt-load

Create a Docker Space and point it to ghcr.io/tbphp/gpt-load:latest.
Upload a custom Dockerfile and launch script.

Create the Docker Space Edit Dockerfile online

Dockerfile:

FROM ghcr.io/tbphp/gpt-load:latest
WORKDIR /app
COPY .env .
COPY start.sh .
RUN chmod +x ./start.sh
RUN mkdir -p data/logs && chmod -R 777 data
EXPOSE 7860
ENTRYPOINT ["./start.sh"]

start.sh keeps things simple:

#!/bin/sh
set -e

if [ -f .env ]; then
  export $(grep -v '^#' .env | xargs)
  echo "env loaded"
fi

echo "starting gpt-load"
exec /app/gpt-load

Step 3: Configure environment values

Shared settings in .env:

PORT=7860
HOST=0.0.0.0
SERVER_READ_TIMEOUT=60
SERVER_WRITE_TIMEOUT=600
SERVER_IDLE_TIMEOUT=120
SERVER_GRACEFUL_SHUTDOWN_TIMEOUT=10
IS_SLAVE=false
TZ=Asia/Shanghai
ENABLE_CORS=true
ALLOWED_ORIGINS=*
ALLOWED_METHODS=GET,POST,PUT,DELETE,OPTIONS
ALLOWED_HEADERS=*
LOG_LEVEL=info
LOG_FORMAT=text
LOG_ENABLE_FILE=true
LOG_FILE_PATH=./data/logs/app.log

Sensitive data goes into Secrets on the Space dashboard instead of the repository:

AUTH_KEY=<custom login token>
ENCRYPTION_KEY=<optional local encryption key>
DATABASE_DSN=<user>:<password>@tcp(<host>:<port>)/<db>?charset=utf8mb4&parseTime=True&loc=Local&tls=skip-verify

After deployment the Space page shows the public URL and health indicators.

Space status page Logs and entry point

Step 4: Import keys and enable rotation

Visit the Space URL, sign in with AUTH_KEY.
Create a group (say gemini1) and bulk-import provider keys.

Create group and import keys Batch import supported

Each group exposes a proxy endpoint and its own proxy token, for example:

Proxy URL: https://your-space.hf.space/proxy/gemini1
Proxy key: generated on the dashboard

Replace local calls with the Space endpoint and rotation kicks in automatically:

Before: http://localhost:3001/proxy/gemini1
After:  https://your-space.hf.space/proxy/gemini1

Third-party clients such as Cherry Studio work out of the box:

Configure proxy in a third-party client Successful response after rotation

Operational tips

Periodically prune expired keys from the MySQL table and rely on gpt-load’s statistics for visibility.
Spaces may go idle; schedule a lightweight ping task to prevent the service from sleeping.
For higher availability, version control .env and start.sh (minus secrets) and automate updates with GitHub Actions.

Takeaway

This stack delivers a surprisingly robust LLM gateway without touching your wallet. As long as you keep configs tidy and manage keys responsibly, it behaves like a mini platform that can serve multiple clients reliably.

The idea#

Step 1: Provision Aiven MySQL#

Step 2: Deploy gpt-load on Hugging Face#

Step 3: Configure environment values#

Step 4: Import keys and enable rotation#

Operational tips#

Takeaway#