demucs-service/README.md

132 lines
3.0 KiB
Markdown

# demucs-service
Vocal/accompaniment separation web service using [Demucs](https://github.com/adefossez/demucs) (htdemucs model).
## Overview
This service provides an async API for separating audio files into vocals and accompaniment tracks using Meta's Demucs neural network model. It follows the ahserver + longtasks + Redis pattern.
## Architecture
- **ahserver**: Async HTTP server framework
- **longtasks**: Background task processing via Redis queues
- **Redis**: Task queue for separation jobs
- **Demucs 4.0.1**: AI-powered source separation model (htdemucs)
## API
### Submit Separation Task
Send a JSON payload to the longtask endpoint:
```json
{
"task_type": "separate",
"audio_path": "/path/to/audio.wav",
"output_dir": "/tmp/demucs_custom_output" // optional
}
```
**Parameters:**
- `audio_path` (required): Absolute path to the input audio file
- `output_dir` (optional): Output directory. Default: `/tmp/demucs_{task_id}`
**Response:**
```json
{
"vocals_path": "/tmp/demucs_123/htdemucs/audio/vocals.wav",
"no_vocals_path": "/tmp/demucs_123/htdemucs/audio/no_vocals.wav",
"duration": 12.34,
"output_dir": "/tmp/demucs_123",
"model": "htdemucs"
}
```
### Health Check
```
GET /app/health.dspy
```
Returns:
```json
{"status":"ok","service":"demucs-service","model":"htdemucs"}
```
## Configuration
Config file: `conf/config.json`
```json
{
"port": 9083,
"queue": "demucs",
"filesroot": "/tmp/demucs-outputs",
"host": "0.0.0.0",
"debug": false
}
```
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `DEMUCS_GPU_ID` | `5` | GPU device ID for CUDA |
| `CUDA_VISIBLE_DEVICES` | `5` | CUDA device visibility |
| `PYTHONPATH` | `/data/ymq/demucs-service` | Python module path |
## Deployment
### Prerequisites
- Python venv at `/data/ymq/demucs_venv` with demucs 4.0.1 and torchcodec
- Redis server running on `127.0.0.1:6379`
- GPU with CUDA support
### Start
```bash
bash start.sh
```
### Stop
```bash
bash stop.sh
```
### Logs
```bash
tail -f nohup.out
```
## Directory Structure
```
demucs-service/
├── ah.py # Main entry point
├── workers/
│ ├── __init__.py
│ └── separate.py # Separation worker
├── conf/
│ └── config.json # Service configuration
├── app/
│ └── health.dspy # Health check endpoint
├── start.sh # Start script
├── stop.sh # Stop script
└── README.md # This file
```
## Output Format
Demucs outputs to: `{output_dir}/htdemucs/{basename}/`
- `vocals.wav` - Isolated vocal track
- `no_vocals.wav` - Accompaniment (everything except vocals)
## Troubleshooting
- **GPU OOM**: The htdemucs model requires significant VRAM. Ensure the assigned GPU has enough memory.
- **Process timeout**: Long audio files may exceed the stuck_seconds timeout (default: 600s). Increase if needed.
- **Missing output files**: Check nohup.out for demucs stderr output to diagnose issues.