realesrgan-service/README.md

222 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Wan22 Video Generation Service
Wan2.2-TI2V-5B 视频生成服务,基于 ahserver + longtasks 提供 OpenAI 兼容的异步视频生成 API。
## Architecture
```
HTTP Request → ahserver (port 8079) → submit.dspy → longtasks.submit_task()
↓ (Redis Queue)
Wan22Tasks.process_task()
Wan22.generate() [GPU]
save to /data/ymq/wan22-outputs/
task.dspy ← longtasks.get_status()
```
- **串行推理**: GPU 全局锁 `_GLOBAL_INFER_LOCK`,一次只跑一个任务
- **模型常驻**: 首次任务加载 Wan2.2 模型,后续任务复用,无需重复加载
- **异步队列**: longtasks 通过 Redis 管理任务队列,支持失败重试
## 模型下载(离线部署)
Real-ESRGAN x2 超分辨率模型64MB。
### 方法1: wget推荐
```bash
# 创建模型目录
mkdir -p /data/ymq/models
# 下载模型
wget -O /data/ymq/models/RealESRGAN_x2plus.pth \
https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth
```
**下载大小**: 64MB
**下载时间**: 约10-30秒
### 方法2: curl
```bash
curl -L -o /data/ymq/models/RealESRGAN_x2plus.pth \
https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth
```
### 方法3: 自动下载
服务首次运行时会自动下载模型到 `/data/ymq/models/RealESRGAN_x2plus.pth`,但需要网络能访问 GitHub。
### 验证下载
```bash
ls -lh /data/ymq/models/RealESRGAN_x2plus.pth
# 应该看到 64M 大小
```
### 模型来源
- **GitHub**: https://github.com/xinntao/Real-ESRGAN
- **Release**: v0.2.1
- **License**: BSD 3-Clause
- **Paper**: Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
## API 接口
### 1. 提交视频生成任务
```
POST /api/submit
Content-Type: application/json
{
"prompt": "A cinematic scene of...", // 必填,视频描述
"size": "1280*720", // 可选,默认 1280*720
"frame_num": 81, // 可选,帧数 (4n+1, 17~129)
"sample_steps": 50, // 可选,采样步数
"sample_guide_scale": 5.0, // 可选,引导比例
"base_seed": 42, // 可选,随机种子
"task_id": "my_custom_id" // 可选自定义任务ID
}
```
**响应**:
```json
{
"task_id": "a1b2c3d4e5f6", // 用于查询状态
"status": "queued",
"prompt": "A cinematic scene...",
"size": "1280*720",
"frame_num": 81,
"message": "task submitted",
"check_url": "/api/task?task_id=a1b2c3d4e5f6"
}
```
### 2. 查询任务状态
```
GET /api/task?task_id=a1b2c3d4e5f6
```
**响应** (PENDING):
```json
{
"status": "PENDING",
"created_at": 1712345678.0,
"started_at": null,
"finished_at": null
}
```
**响应** (SUCCEEDED):
```json
{
"status": "SUCCEEDED",
"task_id": "a1b2c3d4e5f6",
"video_url": "/idfile?path=a1b2c3d4e5f6.mp4",
"video_path": "/data/ymq/wan22-outputs/a1b2c3d4e5f6.mp4",
"size": "1280*720",
"frame_num": 81,
"file_size": 12345678,
"prompt": "A cinematic scene...",
"seed": 42,
"created_at": 1712345678.0,
"started_at": 1712345680.0,
"finished_at": 1712345900.0
}
```
**响应** (FAILED):
```json
{
"status": "FAILED",
"task_id": "a1b2c3d4e5f6",
"error": "CUDA out of memory",
"created_at": 1712345678.0
}
```
### 3. 服务状态
```
GET /api/status
```
```json
{
"service": "wan22-video-generation",
"model": "Wan2.2-TI2V-5B",
"gpu_id": 2,
"gpus": [
{"id": 0, "util": 23, "mem_used": 5120, "mem_total": 24564},
{"id": 1, "util": 0, "mem_used": 4, "mem_total": 24564},
{"id": 2, "util": 45, "mem_used": 8192, "mem_total": 24564}
]
}
```
## 视频下载
生成完成后,通过 `video_url` 下载视频:
```
GET /idfile?path=a1b2c3d4e5f6.mp4
```
或在浏览器中拼接 URL
```
http://<server>:8079/idfile?path=a1b2c3d4e5f6.mp4
```
## 部署
```bash
# 启动
cd ~/wan22-service
WAN22_GPU_ID=2 ./start.sh
# 停止
./stop.sh
# 查看日志
tail -f wan22-service.log
```
环境变量:
- `WAN22_GPU_ID`: GPU 设备号 (默认 2)
## 文件结构
```
wan22-service/
├── ah.py # 主入口: ahserver + longtasks 初始化
├── app/
│ └── api/
│ ├── submit/index.dspy # POST /api/submit - 提交任务
│ ├── task/index.dspy # GET /api/task - 查询状态
│ └── status/index.dspy # GET /api/status - 服务状态
├── conf/
│ └── config.json # ahserver 配置 (端口 8079)
├── workers/
│ ├── generate.py # 任务执行逻辑 (惰性加载 Wan22)
│ └── wan22_wrapper.py # Wan22 类 (OpenAI 风格封装)
├── repo/ # Wan2.2 推理代码
├── py3/ # Python venv
├── start.sh / stop.sh
├── skill/ # Hermes skill 文档
├── README.md
└── wan22-service.log
```
## Dependencies
- ahserver (Web framework)
- longtasks (Async task queue via Redis)
- sqlor (Optional, for database operations)
- torch + torchvision (GPU inference)
- wan (Wan2.2 repo, local at `repo/wan/`)