Home LLM + Cloudflare Tunnel Guide

Hands-on Guide

Host your own OpenAI-style API at home with llama.cpp + Cloudflare Tunnel

This article shows you, step by step, how to run a local GGUF model with llama.cpp, expose it safely through Cloudflare Tunnel, protect it with a Node.js API-key gateway, and call it from a simple HTML chat UI.

用最简单的方式，在家里把「自己的 OpenAI API」架起来。你只要照着复制贴上，一步一步跟着做就可以。

llama.cpp Cloudflare Tunnel Node.js Gateway CORS & API Keys Chat UI

0. Quick overview（懒人总览）

如果你只想知道整体在做什么，先看这段。

Step 1：用 llama-server 在本机跑 GGUF 模型（OpenAI 风格 API）。
Step 2：用 Node.js 做一个 API Gateway：检查 Authorization: Bearer sk-xxx，并处理 CORS。
Step 3：用 Cloudflare Tunnel 把 https://api.your-domain.com 安全地指到你家里的 Gateway。
Step 4：用一个简单的 HTML index.html 做 Chat UI，在浏览器直接调用你的 API。

[Browser / Chat UI] -- HTTPS + Authorization: Bearer sk-xxx
        |
Cloudflare Edge (DNS / TLS / WAF)
        |
cloudflared tunnel
        |
Node.js Gateway @ localhost:8787  (API keys + CORS)
        |
llama-server (llama.cpp) @ localhost:5857  (GGUF model)

What you get

An OpenAI-compatible /v1/chat/completions API running on your own hardware, protected by your own API keys and reachable from the internet.

Who this is for

Junior developers who want a copy-paste friendly way to self-host an LLM with sensible defaults, without reverse-engineering docs all day.

Time to complete

About 30–45 minutes on macOS with decent bandwidth and a 16 GB+ machine.

1. Prerequisites（前置准备）

Hardware：16 GB RAM 推荐给 20B 模型（例如 gpt-oss-20B）。7B / 8B 模型可以在更小机器上跑。
OS：macOS / Linux / Windows 都可以。下面示例使用 macOS。
Tools：
- Homebrew（macOS 套件管理）
- Node.js 18+（内建 fetch）
- Cloudflare 帐号 + 一个你的网域，DNS 已经托管到 Cloudflare（nameserver 指过去即可）。

2. Run llama.cpp locally（在本机跑 LLM 服务器）

我们用 llama-server 跑一个 OpenAI 兼容的 HTTP 服务器，让你可以用 /v1/chat/completions 调模型。

# 2.1 安装 llama.cpp（macOS）
brew install llama.cpp

# 2.2 启动 llama-server
llama-server \
  -hf unsloth/gpt-oss-20b-GGUF:gpt-oss-20b-Q4_K_M.gguf \
  --port 5857 \
  --ctx-size 16384 \
  --threads -1 \
  --jinja \
  --reasoning-format none

第一次跑会从 Hugging Face 下载 GGUF 模型（需要一点时间）。之后就直接用本地快取。
参数简单理解一下：--port 是 HTTP 端口、--ctx-size 是最大上下文 token、 --threads 使用 CPU 线程数。

用 curl 做一个本机测试（确认模型正常工作）：

curl http://localhost:5857/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [
      { "role": "user", "content": "Hello from localhost 5857" }
    ]
  }'

如果你看到类似 OpenAI 风格的 JSON 回应，那这一层就 OK 了。
到目前为止，一切都还只在你家电脑里面，没有对外。

3. Build a Node.js API-key gateway（做一个自己的 API Gateway）

这一步是整个系统的「大脑入口」：
所有外部请求都先经过 Gateway → 检查 API Key → 通过才转发给 llama-server。
同时在这里统一处理 CORS，让浏览器不会被 CORS 卡住。

3.1 建立项目目录 & 安装依赖

mkdir llm-gateway
cd llm-gateway

npm init -y
npm install express dotenv

在 package.json 中加上 "type": "module" 以及启动指令：

{
  "name": "llm-gateway",
  "version": "1.0.0",
  "main": "gateway.js",
  "type": "module",
  "scripts": {
    "start": "node gateway.js"
  },
  "dependencies": {
    "dotenv": "^16.4.0",
    "express": "^4.21.0"
  }
}

3.2 .env：集中配置 API Keys / 上游地址 / CORS

# .env
LLM_API_KEYS=sk-home-2025-1,sk-home-2025-2
LLM_UPSTREAM=http://127.0.0.1:5857
GATEWAY_PORT=8787
CORS_ORIGIN=http://localhost:7788

说明：

LLM_API_KEYS：允许的 API Key（可多个，用逗号隔开）。
LLM_UPSTREAM：你的 llama-server 地址。
GATEWAY_PORT：Gateway 对外监听的本机端口。
CORS_ORIGIN：前端网页所在的 Origin（开发时是 http://localhost:7788）。

3.3 gateway.js：API Key 验证 + CORS + 转发

这是核心逻辑，逻辑尽量写得简单直接，方便初学者阅读。

// gateway.js
import express from "express";
import dotenv from "dotenv";

dotenv.config();

const app = express();

const PORT = parseInt(process.env.GATEWAY_PORT || "8787", 10);
const UPSTREAM_BASE = process.env.LLM_UPSTREAM || "http://127.0.0.1:5857";
const CORS_ORIGIN = process.env.CORS_ORIGIN || "http://localhost:7788";

// 解析 API Keys 到 Set
const rawKeys = (process.env.LLM_API_KEYS || "").split(",");
const VALID_KEYS = new Set(
  rawKeys.map(k => k.trim()).filter(k => k.length > 0)
);

console.log("✅ LLM Gateway config:");
console.log("  - Port:", PORT);
console.log("  - Upstream:", UPSTREAM_BASE);
console.log("  - Valid API keys:", VALID_KEYS.size);

// --- 全局 CORS 中间件 ---
app.use((req, res, next) => {
  res.header("Access-Control-Allow-Origin", CORS_ORIGIN);
  res.header(
    "Access-Control-Allow-Headers",
    "Origin, X-Requested-With, Content-Type, Accept, Authorization"
  );
  res.header("Access-Control-Allow-Methods", "GET, POST, OPTIONS");

  if (req.method === "OPTIONS") {
    return res.sendStatus(204); // CORS 预检，直接放行
  }
  next();
});

// 解析 JSON body
app.use(express.json({ limit: "10mb" }));

// --- API Key 验证中间件（OpenAI 风格）---
function authMiddleware(req, res, next) {
  const authHeader = req.headers["authorization"];
  if (!authHeader) {
    return res.status(401).json({
      error: {
        message: "Missing Authorization header. Use: Authorization: Bearer sk-xxx",
        type: "invalid_api_key",
      },
    });
  }

  const parts = authHeader.split(" ");
  if (parts.length !== 2 || parts[0].toLowerCase() !== "bearer") {
    return res.status(401).json({
      error: {
        message: "Invalid Authorization header format. Expected: Bearer sk-xxx",
        type: "invalid_api_key",
      },
    });
  }

  const token = parts[1].trim();
  if (!VALID_KEYS.has(token)) {
    console.warn("❌ Invalid API key:", token);
    return res.status(401).json({
      error: {
        message: "Incorrect API key provided.",
        type: "invalid_api_key",
      },
    });
  }

  req.apiKey = token; // 可用于 logging
  next();
}

// Health check（可选）
app.get("/health", (req, res) => {
  res.json({
    ok: true,
    upstream: UPSTREAM_BASE,
    keysConfigured: VALID_KEYS.size,
  });
});

// Chat Completions 代理
app.post("/v1/chat/completions", authMiddleware, async (req, res) => {
  try {
    const upstreamUrl = `${UPSTREAM_BASE}/v1/chat/completions`;
    console.log("➡️  /v1/chat/completions via key:", req.apiKey);

    const upstreamRes = await fetch(upstreamUrl, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
      },
      body: JSON.stringify(req.body),
    });

    res.status(upstreamRes.status);

    // 复制上游 headers，但不要覆盖我们的 CORS
    for (const [key, value] of upstreamRes.headers.entries()) {
      const lowerKey = key.toLowerCase();
      if (lowerKey === "transfer-encoding") continue;
      if (lowerKey.startsWith("access-control-")) continue;
      res.setHeader(key, value);
    }

    // 再补一次 CORS（保险）
    res.setHeader("Access-Control-Allow-Origin", CORS_ORIGIN);
    res.setHeader(
      "Access-Control-Allow-Headers",
      "Origin, X-Requested-With, Content-Type, Accept, Authorization"
    );
    res.setHeader("Access-Control-Allow-Methods", "GET, POST, OPTIONS");

    if (upstreamRes.body) {
      upstreamRes.body.pipe(res); // 支持 streaming
    } else {
      res.end();
    }
  } catch (err) {
    console.error("Gateway error:", err);
    res.status(500).json({
      error: {
        message: "Gateway failed to reach llama-server.",
        type: "gateway_error",
      },
    });
  }
});

app.listen(PORT, () => {
  console.log(`🚀 LLM API Gateway listening on http://localhost:${PORT}`);
});

测试（本机）：
1）先确认 llama-server 在 5857 跑着；
2）在 llm-gateway 目录执行 npm start；
3）试着打：

curl http://localhost:8787/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-home-2025-1" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [
      { "role": "user", "content": "Hello via Node.js gateway" }
    ]
  }'

4. Expose it with Cloudflare Tunnel（用 Cloudflare 安全对外开放）

现在我们让外面的世界可以通过 https://api.your-domain.com 访问你的 Gateway，但不需要开路由器端口。

4.1 安装与登录

brew install cloudflared

cloudflared tunnel login
# 浏览器会打开 Cloudflare 页面，登录并授权

4.2 创建 Tunnel 和 DNS 记录

# 创建一个 Tunnel（名字可以自定义）
cloudflared tunnel create my-llm

# 把子域名映射到这个 Tunnel（例如 api.your-domain.com）
cloudflared tunnel route dns my-llm api.your-domain.com

4.3 配置 ~/.cloudflared/config.yml

tunnel: <YOUR-TUNNEL-UUID>
credentials-file: /Users/<you>/.cloudflared/<YOUR-TUNNEL-UUID>.json

ingress:
  - hostname: api.your-domain.com
    service: http://localhost:8787   # 指向 Node.js Gateway
  - service: http_status:404         # 兜底规则

4.4 启动 Tunnel

cloudflared tunnel run my-llm

现在你可以从任何地方测试（只要 DNS 刷新完成）：

curl https://api.your-domain.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-home-2025-1" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [
      { "role": "user", "content": "Hello from the internet" }
    ]
  }'

5. Simple chat UI（前端聊天页面）

最后，我们做一个简单的 index.html，在浏览器直接跟你的「家庭版 OpenAI API」聊天。这里只给关键部分：

5.1 用静态服务器跑前端

# 在前端目录（包含 index.html 的地方）
python3 -m http.server 7788

# 浏览器打开
http://localhost:7788

5.2 Chat UI 内部的关键 JS 片段

<script>
const API_URL = "https://api.your-domain.com/v1/chat/completions";
const API_KEY = "sk-home-2025-1"; // 不要给别人看，生产环境请改用后端注入

async function sendMessage(userText) {
  const body = {
    model: "gpt-oss-20b",
    messages: [
      { role: "user", content: userText }
    ],
    stream: false
  };

  const res = await fetch(API_URL, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": "Bearer " + API_KEY
    },
    body: JSON.stringify(body)
  });

  if (!res.ok) {
    throw new Error("HTTP " + res.status);
  }
  const json = await res.json();
  const reply = json.choices?.[0]?.message?.content ?? "(no content)";
  return reply;
}
</script>

在真正给别人用的时候，不建议把 API Key 直接写死在前端。
这篇文章主要是教学：让你先跑得起来，之后可以再升级「账号系统」和「真正的 Key 管理」。

6. CORS & common errors（常见问题排错）

6.1 favicon.ico 404

GET http://localhost:7788/favicon.ico 404 (File not found)

这只是浏览器自动请求网站图示，找不到就 404，对功能没有任何影响。你可以：

直接忽略；
或在 <head> 里加：<link rel="icon" href="data:,">。

6.2 CORS 被挡（核心问题）

Access to fetch at 'https://api.your-domain.com/v1/chat/completions'
from origin 'http://localhost:7788' has been blocked by CORS policy:
The 'Access-Control-Allow-Origin' header contains the invalid value ''.

出现这类错误，基本上都是： Gateway 回给浏览器的 CORS 头没设好，或被上游覆盖掉。
记得两件事：

在 Gateway 里统一设置 Access-Control-Allow-Origin（用 CORS_ORIGIN）。
从 llama-server 转发 header 时，不要转上游的 access-control-*，避免覆盖。

如果你照着本文的 gateway.js 实作，基本上就不会再被 CORS 卡住。

7. Security & next steps（安全与下一步）

把 llama-server 只绑定在 localhost，不要对外开放端口。
所有来自公网的流量都必须经由： Cloudflare → Tunnel → Node.js Gateway（API Key 验证）。
避免在公开前端硬编码真实 API Key；本篇示例偏向教学 / 自用。
可以在 Gateway 加入简单的 rate limit，防止 Key 被滥用刷爆你的机器。
有需要的话，再往「多模型路由」和「RAG（检索增强）」方向扩展。

到这里，你就已经拥有一个「家庭版 OpenAI API」：
模型跑在你自己的机器上，API 风格和 OpenAI 非常接近，而且你完全控制 Key 和访问权限。

YAPWEIJUN1996