asr-service/README.md

# ASR Service

Speech-to-text service powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2 backend). Uses the `large-v3-turbo` model for fast, high-quality transcription with word-level timestamps.

## Architecture

```
Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker)
                                      |
                                      v
                                 faster-whisper (GPU)
                                      |
                                      v
                                 Result (JSON)
```

- **ahserver**: Web framework serving HTTP on port 9925
- **longtasks**: Redis-backed async task queue with worker management
- **Redis**: Task queue broker (queue name: `asr`)
- **faster-whisper**: ASR engine running on GPU (CUDA, float16)

The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service.

## Model

- **Model**: faster-whisper-large-v3-turbo-ct2
- **Path**: `/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2`
- **Device**: CUDA (float16)
- **GPU**: Isolated via `CUDA_VISIBLE_DEVICES` (default GPU 5)

The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests.

## 模型下载（离线部署）

faster-whisper-large-v3-turbo-ct2 是 HuggingFace 模型，需要先下载再部署。

### 方法1: huggingface-cli（推荐）

```bash
# 安装 huggingface-cli
pip install huggingface_hub

# 下载模型到指定目录
huggingface-cli download deepdml/faster-whisper-large-v3-turbo-ct2 \
  --local-dir /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2 \
  --local-dir-use-symlinks False
```

**下载大小**: ~1.6GB
**下载时间**: 取决于网络速度（约3-10分钟）

### 方法2: git-lfs

```bash
# 安装 git-lfs
git lfs install

# 克隆模型仓库
cd /data/ymq/models
mkdir -p deepdml
cd deepdml
git clone https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
```

### 方法3: wget/curl（单文件）

如果只需要核心文件，可以直接下载：

```bash
cd /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2

# 下载模型文件
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/model.bin
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/tokenizer.json
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/vocabulary.json
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/config.json
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/preprocessor_config.json
```

### 验证下载

```bash
ls -lh /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2/
# 应该看到 model.bin (约1.6GB) + tokenizer.json + vocabulary.json + config.json
```

### 模型来源

- **HuggingFace**: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
- **Base Model**: openai/whisper-large-v3-turbo (CTranslate2 优化版)
- **License**: MIT
- **优化**: CTranslate2 格式，比原版 Whisper 快 4 倍，内存占用更少


## Deployment

### Prerequisites

- Python venv with faster-whisper 1.2.1: `/data/ymq/demucs_venv`
- Redis server running on 127.0.0.1:6379
- CUDA-capable GPU

### Start

```bash
cd /data/ymq/asr-service
bash start.sh
```

### Stop

```bash
cd /data/ymq/asr-service
bash stop.sh
```

### Health Check

```bash
curl http://localhost:9925/health
```

Returns:
```json
{
    "status": "ok",
    "service": "asr-service",
    "model": "faster-whisper-large-v3-turbo-ct2"
}
```

## API Usage

Tasks are submitted via Redis, same pattern as wan22-service.

### Submit a Transcription Task

```python
import redis
import json
import uuid

r = redis.Redis(host='127.0.0.1', port=6379)

task_id = str(uuid.uuid4())
payload = {
    "task_id": task_id,
    "task_type": "transcribe",
    "audio_path": "/path/to/audio.wav",
    "language": "zh",
    "word_timestamps": True,
    "vad_filter": True,
    "output_path": "/tmp/asr-outputs/result.json"
}

# Push to the Redis queue
r.lpush('asr:queue', json.dumps(payload))
print(f"Task submitted: {task_id}")
```

### Check Task Status

```python
# Task status is stored in Redis by longtasks
status = r.get(f'asr:status:{task_id}')
result = r.get(f'asr:result:{task_id}')
```

## Task Payload Format

| Field            | Type   | Required | Default | Description                          |
|------------------|--------|----------|---------|--------------------------------------|
| task_type        | string | Yes      | -       | Must be `"transcribe"`               |
| audio_path       | string | Yes      | -       | Path to input audio file             |
| language         | string | No       | `"zh"`  | Language code (zh, en, ja, etc.)     |
| word_timestamps  | bool   | No       | `True`  | Enable word-level timestamps         |
| vad_filter       | bool   | No       | `True`  | Enable voice activity detection      |
| output_path      | string | No       | -       | If set, save result JSON to this path|

## Output Format

```json
{
    "status": "ok",
    "text": "Full transcription text...",
    "language": "zh",
    "language_probability": 0.9876,
    "duration": 125.340,
    "segments": [
        {
            "text": "Segment text",
            "start": 0.000,
            "end": 5.120,
            "words": [
                {
                    "word": "你好",
                    "start": 0.000,
                    "end": 0.800,
                    "probability": 0.9523
                }
            ]
        }
    ],
    "processing_time": 3.45,
    "audio_path": "/path/to/audio.wav"
}
```

## Configuration

Config file: `conf/config.json`

| Setting               | Value                        | Description                    |
|-----------------------|------------------------------|--------------------------------|
| website.port          | 9925                         | HTTP listen port               |
| website.host          | 0.0.0.0                      | Bind address                   |
| session_redis         | 127.0.0.1:6379 db=1          | Session storage                |
| password_key          | ASRService2026Key            | Auth key                       |
| filesroot             | /tmp/asr-outputs             | Output files directory         |

### Environment Variables

| Variable             | Default | Description                           |
|----------------------|---------|---------------------------------------|
| ASR_GPU_ID           | 5       | GPU device ID (for logging)           |
| CUDA_VISIBLE_DEVICES | 5       | CUDA device isolation                 |
| PYTHONPATH           | .       | Python module search path             |

## File Structure

```
asr-service/
├── ah.py                  # Main entry point
├── start.sh               # Start script
├── stop.sh                # Stop script
├── conf/
│   └── config.json        # Service configuration
├── app/
│   └── health.dspy        # Health check endpoint
├── workers/
│   ├── __init__.py
│   └── transcribe.py      # Transcription worker
└── README.md
```