demucs-service/README.md

# demucs-service

Vocal/accompaniment separation web service using [Demucs](https://github.com/adefossez/demucs) (htdemucs model).

## Overview

This service provides an async API for separating audio files into vocals and accompaniment tracks using Meta's Demucs neural network model. It follows the ahserver + longtasks + Redis pattern.

## Architecture

- **ahserver**: Async HTTP server framework
- **longtasks**: Background task processing via Redis queues
- **Redis**: Task queue for separation jobs
- **Demucs 4.0.1**: AI-powered source separation model (htdemucs)

## 模型下载（离线部署）

Demucs 使用 htdemucs 模型（PyTorch hub 格式），首次运行时自动下载，也可手动预下载。

### 方法1: PyTorch Hub 自动下载（默认）

服务首次运行时会通过 `torch.hub.load("facebookresearch/demucs", "htdemucs")` 自动下载模型。

**下载位置**: `~/.cache/torch/hub/checkpoints/`
**下载大小**: ~80MB

### 方法2: 手动预下载

如果部署环境无法访问外网，可以先在有网络的机器上下载，再拷贝：

```bash
# 在有网络的机器上运行 Python
python3 << "PYTHON"
import torch
model = torch.hub.load("facebookresearch/demucs", "htdemucs", pretrained=True)
print("Model downloaded to:", torch.hub.get_dir())
PYTHON

# 找到模型文件
ls ~/.cache/torch/hub/checkpoints/
# 应该看到 htdemucs-*.pt 或类似文件

# 拷贝到部署服务器
scp ~/.cache/torch/hub/checkpoints/htdemucs*.pt ymq@opencomputing.net:~/.cache/torch/hub/checkpoints/
```

### 方法3: 直接下载模型文件

```bash
# 创建缓存目录
mkdir -p ~/.cache/torch/hub/checkpoints/

# 下载 htdemucs 模型
wget -O ~/.cache/torch/hub/checkpoints/htdemucs_v4.pt \
  https://dl.fbaipublicfiles.com/demucs/v4.0/htdemucs.pth
```

**下载大小**: ~80MB
**下载时间**: 约5-15秒

### 验证下载

```bash
# 启动服务后检查日志
tail -f /data/ymq/demucs-service/nohup.out | grep -i "model"
# 应该看到 "Model loaded" 而不是 "Downloading"
```

### 模型来源

- **GitHub**: https://github.com/facebookresearch/demucs
- **PyTorch Hub**: facebookresearch/demucs
- **License**: MIT
- **Paper**: Hybrid Spectrogram and Waveform Source Separation


## API

### Submit Separation Task

Send a JSON payload to the longtask endpoint:

```json
{
    "task_type": "separate",
    "audio_path": "/path/to/audio.wav",
    "output_dir": "/tmp/demucs_custom_output"  // optional
}
```

**Parameters:**
- `audio_path` (required): Absolute path to the input audio file
- `output_dir` (optional): Output directory. Default: `/tmp/demucs_{task_id}`

**Response:**
```json
{
    "vocals_path": "/tmp/demucs_123/htdemucs/audio/vocals.wav",
    "no_vocals_path": "/tmp/demucs_123/htdemucs/audio/no_vocals.wav",
    "duration": 12.34,
    "output_dir": "/tmp/demucs_123",
    "model": "htdemucs"
}
```

### Health Check

```
GET /app/health.dspy
```

Returns:
```json
{"status":"ok","service":"demucs-service","model":"htdemucs"}
```

## Configuration

Config file: `conf/config.json`

```json
{
    "port": 9083,
    "queue": "demucs",
    "filesroot": "/tmp/demucs-outputs",
    "host": "0.0.0.0",
    "debug": false
}
```

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `DEMUCS_GPU_ID` | `5` | GPU device ID for CUDA |
| `CUDA_VISIBLE_DEVICES` | `5` | CUDA device visibility |
| `PYTHONPATH` | `/data/ymq/demucs-service` | Python module path |

## Deployment

### Prerequisites

- Python venv at `/data/ymq/demucs_venv` with demucs 4.0.1 and torchcodec
- Redis server running on `127.0.0.1:6379`
- GPU with CUDA support

### Start

```bash
bash start.sh
```

### Stop

```bash
bash stop.sh
```

### Logs

```bash
tail -f nohup.out
```

## Directory Structure

```
demucs-service/
├── ah.py                  # Main entry point
├── workers/
│   ├── __init__.py
│   └── separate.py        # Separation worker
├── conf/
│   └── config.json        # Service configuration
├── app/
│   └── health.dspy        # Health check endpoint
├── start.sh               # Start script
├── stop.sh                # Stop script
└── README.md              # This file
```

## Output Format

Demucs outputs to: `{output_dir}/htdemucs/{basename}/`
- `vocals.wav` - Isolated vocal track
- `no_vocals.wav` - Accompaniment (everything except vocals)

## Troubleshooting

- **GPU OOM**: The htdemucs model requires significant VRAM. Ensure the assigned GPU has enough memory.
- **Process timeout**: Long audio files may exceed the stuck_seconds timeout (default: 600s). Increase if needed.
- **Missing output files**: Check nohup.out for demucs stderr output to diagnose issues.