193 lines
4.6 KiB
Markdown
193 lines
4.6 KiB
Markdown
# demucs-service
|
||
|
||
Vocal/accompaniment separation web service using [Demucs](https://github.com/adefossez/demucs) (htdemucs model).
|
||
|
||
## Overview
|
||
|
||
This service provides an async API for separating audio files into vocals and accompaniment tracks using Meta's Demucs neural network model. It follows the ahserver + longtasks + Redis pattern.
|
||
|
||
## Architecture
|
||
|
||
- **ahserver**: Async HTTP server framework
|
||
- **longtasks**: Background task processing via Redis queues
|
||
- **Redis**: Task queue for separation jobs
|
||
- **Demucs 4.0.1**: AI-powered source separation model (htdemucs)
|
||
|
||
## 模型下载(离线部署)
|
||
|
||
Demucs 使用 htdemucs 模型(PyTorch hub 格式),首次运行时自动下载,也可手动预下载。
|
||
|
||
### 方法1: PyTorch Hub 自动下载(默认)
|
||
|
||
服务首次运行时会通过 `torch.hub.load("facebookresearch/demucs", "htdemucs")` 自动下载模型。
|
||
|
||
**下载位置**: `~/.cache/torch/hub/checkpoints/`
|
||
**下载大小**: ~80MB
|
||
|
||
### 方法2: 手动预下载
|
||
|
||
如果部署环境无法访问外网,可以先在有网络的机器上下载,再拷贝:
|
||
|
||
```bash
|
||
# 在有网络的机器上运行 Python
|
||
python3 << "PYTHON"
|
||
import torch
|
||
model = torch.hub.load("facebookresearch/demucs", "htdemucs", pretrained=True)
|
||
print("Model downloaded to:", torch.hub.get_dir())
|
||
PYTHON
|
||
|
||
# 找到模型文件
|
||
ls ~/.cache/torch/hub/checkpoints/
|
||
# 应该看到 htdemucs-*.pt 或类似文件
|
||
|
||
# 拷贝到部署服务器
|
||
scp ~/.cache/torch/hub/checkpoints/htdemucs*.pt ymq@opencomputing.net:~/.cache/torch/hub/checkpoints/
|
||
```
|
||
|
||
### 方法3: 直接下载模型文件
|
||
|
||
```bash
|
||
# 创建缓存目录
|
||
mkdir -p ~/.cache/torch/hub/checkpoints/
|
||
|
||
# 下载 htdemucs 模型
|
||
wget -O ~/.cache/torch/hub/checkpoints/htdemucs_v4.pt \
|
||
https://dl.fbaipublicfiles.com/demucs/v4.0/htdemucs.pth
|
||
```
|
||
|
||
**下载大小**: ~80MB
|
||
**下载时间**: 约5-15秒
|
||
|
||
### 验证下载
|
||
|
||
```bash
|
||
# 启动服务后检查日志
|
||
tail -f /data/ymq/demucs-service/nohup.out | grep -i "model"
|
||
# 应该看到 "Model loaded" 而不是 "Downloading"
|
||
```
|
||
|
||
### 模型来源
|
||
|
||
- **GitHub**: https://github.com/facebookresearch/demucs
|
||
- **PyTorch Hub**: facebookresearch/demucs
|
||
- **License**: MIT
|
||
- **Paper**: Hybrid Spectrogram and Waveform Source Separation
|
||
|
||
|
||
## API
|
||
|
||
### Submit Separation Task
|
||
|
||
Send a JSON payload to the longtask endpoint:
|
||
|
||
```json
|
||
{
|
||
"task_type": "separate",
|
||
"audio_path": "/path/to/audio.wav",
|
||
"output_dir": "/tmp/demucs_custom_output" // optional
|
||
}
|
||
```
|
||
|
||
**Parameters:**
|
||
- `audio_path` (required): Absolute path to the input audio file
|
||
- `output_dir` (optional): Output directory. Default: `/tmp/demucs_{task_id}`
|
||
|
||
**Response:**
|
||
```json
|
||
{
|
||
"vocals_path": "/tmp/demucs_123/htdemucs/audio/vocals.wav",
|
||
"no_vocals_path": "/tmp/demucs_123/htdemucs/audio/no_vocals.wav",
|
||
"duration": 12.34,
|
||
"output_dir": "/tmp/demucs_123",
|
||
"model": "htdemucs"
|
||
}
|
||
```
|
||
|
||
### Health Check
|
||
|
||
```
|
||
GET /app/health.dspy
|
||
```
|
||
|
||
Returns:
|
||
```json
|
||
{"status":"ok","service":"demucs-service","model":"htdemucs"}
|
||
```
|
||
|
||
## Configuration
|
||
|
||
Config file: `conf/config.json`
|
||
|
||
```json
|
||
{
|
||
"port": 9083,
|
||
"queue": "demucs",
|
||
"filesroot": "/tmp/demucs-outputs",
|
||
"host": "0.0.0.0",
|
||
"debug": false
|
||
}
|
||
```
|
||
|
||
## Environment Variables
|
||
|
||
| Variable | Default | Description |
|
||
|----------|---------|-------------|
|
||
| `DEMUCS_GPU_ID` | `5` | GPU device ID for CUDA |
|
||
| `CUDA_VISIBLE_DEVICES` | `5` | CUDA device visibility |
|
||
| `PYTHONPATH` | `/data/ymq/demucs-service` | Python module path |
|
||
|
||
## Deployment
|
||
|
||
### Prerequisites
|
||
|
||
- Python venv at `/data/ymq/demucs_venv` with demucs 4.0.1 and torchcodec
|
||
- Redis server running on `127.0.0.1:6379`
|
||
- GPU with CUDA support
|
||
|
||
### Start
|
||
|
||
```bash
|
||
bash start.sh
|
||
```
|
||
|
||
### Stop
|
||
|
||
```bash
|
||
bash stop.sh
|
||
```
|
||
|
||
### Logs
|
||
|
||
```bash
|
||
tail -f nohup.out
|
||
```
|
||
|
||
## Directory Structure
|
||
|
||
```
|
||
demucs-service/
|
||
├── ah.py # Main entry point
|
||
├── workers/
|
||
│ ├── __init__.py
|
||
│ └── separate.py # Separation worker
|
||
├── conf/
|
||
│ └── config.json # Service configuration
|
||
├── app/
|
||
│ └── health.dspy # Health check endpoint
|
||
├── start.sh # Start script
|
||
├── stop.sh # Stop script
|
||
└── README.md # This file
|
||
```
|
||
|
||
## Output Format
|
||
|
||
Demucs outputs to: `{output_dir}/htdemucs/{basename}/`
|
||
- `vocals.wav` - Isolated vocal track
|
||
- `no_vocals.wav` - Accompaniment (everything except vocals)
|
||
|
||
## Troubleshooting
|
||
|
||
- **GPU OOM**: The htdemucs model requires significant VRAM. Ensure the assigned GPU has enough memory.
|
||
- **Process timeout**: Long audio files may exceed the stuck_seconds timeout (default: 600s). Increase if needed.
|
||
- **Missing output files**: Check nohup.out for demucs stderr output to diagnose issues.
|