demucs-service/README.md

193 lines
4.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# demucs-service
Vocal/accompaniment separation web service using [Demucs](https://github.com/adefossez/demucs) (htdemucs model).
## Overview
This service provides an async API for separating audio files into vocals and accompaniment tracks using Meta's Demucs neural network model. It follows the ahserver + longtasks + Redis pattern.
## Architecture
- **ahserver**: Async HTTP server framework
- **longtasks**: Background task processing via Redis queues
- **Redis**: Task queue for separation jobs
- **Demucs 4.0.1**: AI-powered source separation model (htdemucs)
## 模型下载(离线部署)
Demucs 使用 htdemucs 模型PyTorch hub 格式),首次运行时自动下载,也可手动预下载。
### 方法1: PyTorch Hub 自动下载(默认)
服务首次运行时会通过 `torch.hub.load("facebookresearch/demucs", "htdemucs")` 自动下载模型。
**下载位置**: `~/.cache/torch/hub/checkpoints/`
**下载大小**: ~80MB
### 方法2: 手动预下载
如果部署环境无法访问外网,可以先在有网络的机器上下载,再拷贝:
```bash
# 在有网络的机器上运行 Python
python3 << "PYTHON"
import torch
model = torch.hub.load("facebookresearch/demucs", "htdemucs", pretrained=True)
print("Model downloaded to:", torch.hub.get_dir())
PYTHON
# 找到模型文件
ls ~/.cache/torch/hub/checkpoints/
# 应该看到 htdemucs-*.pt 或类似文件
# 拷贝到部署服务器
scp ~/.cache/torch/hub/checkpoints/htdemucs*.pt ymq@opencomputing.net:~/.cache/torch/hub/checkpoints/
```
### 方法3: 直接下载模型文件
```bash
# 创建缓存目录
mkdir -p ~/.cache/torch/hub/checkpoints/
# 下载 htdemucs 模型
wget -O ~/.cache/torch/hub/checkpoints/htdemucs_v4.pt \
https://dl.fbaipublicfiles.com/demucs/v4.0/htdemucs.pth
```
**下载大小**: ~80MB
**下载时间**: 约5-15秒
### 验证下载
```bash
# 启动服务后检查日志
tail -f /data/ymq/demucs-service/nohup.out | grep -i "model"
# 应该看到 "Model loaded" 而不是 "Downloading"
```
### 模型来源
- **GitHub**: https://github.com/facebookresearch/demucs
- **PyTorch Hub**: facebookresearch/demucs
- **License**: MIT
- **Paper**: Hybrid Spectrogram and Waveform Source Separation
## API
### Submit Separation Task
Send a JSON payload to the longtask endpoint:
```json
{
"task_type": "separate",
"audio_path": "/path/to/audio.wav",
"output_dir": "/tmp/demucs_custom_output" // optional
}
```
**Parameters:**
- `audio_path` (required): Absolute path to the input audio file
- `output_dir` (optional): Output directory. Default: `/tmp/demucs_{task_id}`
**Response:**
```json
{
"vocals_path": "/tmp/demucs_123/htdemucs/audio/vocals.wav",
"no_vocals_path": "/tmp/demucs_123/htdemucs/audio/no_vocals.wav",
"duration": 12.34,
"output_dir": "/tmp/demucs_123",
"model": "htdemucs"
}
```
### Health Check
```
GET /app/health.dspy
```
Returns:
```json
{"status":"ok","service":"demucs-service","model":"htdemucs"}
```
## Configuration
Config file: `conf/config.json`
```json
{
"port": 9083,
"queue": "demucs",
"filesroot": "/tmp/demucs-outputs",
"host": "0.0.0.0",
"debug": false
}
```
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `DEMUCS_GPU_ID` | `5` | GPU device ID for CUDA |
| `CUDA_VISIBLE_DEVICES` | `5` | CUDA device visibility |
| `PYTHONPATH` | `/data/ymq/demucs-service` | Python module path |
## Deployment
### Prerequisites
- Python venv at `/data/ymq/demucs_venv` with demucs 4.0.1 and torchcodec
- Redis server running on `127.0.0.1:6379`
- GPU with CUDA support
### Start
```bash
bash start.sh
```
### Stop
```bash
bash stop.sh
```
### Logs
```bash
tail -f nohup.out
```
## Directory Structure
```
demucs-service/
├── ah.py # Main entry point
├── workers/
│ ├── __init__.py
│ └── separate.py # Separation worker
├── conf/
│ └── config.json # Service configuration
├── app/
│ └── health.dspy # Health check endpoint
├── start.sh # Start script
├── stop.sh # Stop script
└── README.md # This file
```
## Output Format
Demucs outputs to: `{output_dir}/htdemucs/{basename}/`
- `vocals.wav` - Isolated vocal track
- `no_vocals.wav` - Accompaniment (everything except vocals)
## Troubleshooting
- **GPU OOM**: The htdemucs model requires significant VRAM. Ensure the assigned GPU has enough memory.
- **Process timeout**: Long audio files may exceed the stuck_seconds timeout (default: 600s). Increase if needed.
- **Missing output files**: Check nohup.out for demucs stderr output to diagnose issues.