245 lines
7.1 KiB
Markdown
245 lines
7.1 KiB
Markdown
# ASR Service
|
||
|
||
Speech-to-text service powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2 backend). Uses the `large-v3-turbo` model for fast, high-quality transcription with word-level timestamps.
|
||
|
||
## Architecture
|
||
|
||
```
|
||
Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker)
|
||
|
|
||
v
|
||
faster-whisper (GPU)
|
||
|
|
||
v
|
||
Result (JSON)
|
||
```
|
||
|
||
- **ahserver**: Web framework serving HTTP on port 9925
|
||
- **longtasks**: Redis-backed async task queue with worker management
|
||
- **Redis**: Task queue broker (queue name: `asr`)
|
||
- **faster-whisper**: ASR engine running on GPU (CUDA, float16)
|
||
|
||
The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service.
|
||
|
||
## Model
|
||
|
||
- **Model**: faster-whisper-large-v3-turbo-ct2
|
||
- **Path**: `/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2`
|
||
- **Device**: CUDA (float16)
|
||
- **GPU**: Isolated via `CUDA_VISIBLE_DEVICES` (default GPU 5)
|
||
|
||
The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests.
|
||
|
||
## 模型下载(离线部署)
|
||
|
||
faster-whisper-large-v3-turbo-ct2 是 HuggingFace 模型,需要先下载再部署。
|
||
|
||
### 方法1: huggingface-cli(推荐)
|
||
|
||
```bash
|
||
# 安装 huggingface-cli
|
||
pip install huggingface_hub
|
||
|
||
# 下载模型到指定目录
|
||
huggingface-cli download deepdml/faster-whisper-large-v3-turbo-ct2 \
|
||
--local-dir /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2 \
|
||
--local-dir-use-symlinks False
|
||
```
|
||
|
||
**下载大小**: ~1.6GB
|
||
**下载时间**: 取决于网络速度(约3-10分钟)
|
||
|
||
### 方法2: git-lfs
|
||
|
||
```bash
|
||
# 安装 git-lfs
|
||
git lfs install
|
||
|
||
# 克隆模型仓库
|
||
cd /data/ymq/models
|
||
mkdir -p deepdml
|
||
cd deepdml
|
||
git clone https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
|
||
```
|
||
|
||
### 方法3: wget/curl(单文件)
|
||
|
||
如果只需要核心文件,可以直接下载:
|
||
|
||
```bash
|
||
cd /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2
|
||
|
||
# 下载模型文件
|
||
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/model.bin
|
||
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/tokenizer.json
|
||
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/vocabulary.json
|
||
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/config.json
|
||
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/preprocessor_config.json
|
||
```
|
||
|
||
### 验证下载
|
||
|
||
```bash
|
||
ls -lh /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2/
|
||
# 应该看到 model.bin (约1.6GB) + tokenizer.json + vocabulary.json + config.json
|
||
```
|
||
|
||
### 模型来源
|
||
|
||
- **HuggingFace**: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
|
||
- **Base Model**: openai/whisper-large-v3-turbo (CTranslate2 优化版)
|
||
- **License**: MIT
|
||
- **优化**: CTranslate2 格式,比原版 Whisper 快 4 倍,内存占用更少
|
||
|
||
|
||
## Deployment
|
||
|
||
### Prerequisites
|
||
|
||
- Python venv with faster-whisper 1.2.1: `/data/ymq/demucs_venv`
|
||
- Redis server running on 127.0.0.1:6379
|
||
- CUDA-capable GPU
|
||
|
||
### Start
|
||
|
||
```bash
|
||
cd /data/ymq/asr-service
|
||
bash start.sh
|
||
```
|
||
|
||
### Stop
|
||
|
||
```bash
|
||
cd /data/ymq/asr-service
|
||
bash stop.sh
|
||
```
|
||
|
||
### Health Check
|
||
|
||
```bash
|
||
curl http://localhost:9925/health
|
||
```
|
||
|
||
Returns:
|
||
```json
|
||
{
|
||
"status": "ok",
|
||
"service": "asr-service",
|
||
"model": "faster-whisper-large-v3-turbo-ct2"
|
||
}
|
||
```
|
||
|
||
## API Usage
|
||
|
||
Tasks are submitted via Redis, same pattern as wan22-service.
|
||
|
||
### Submit a Transcription Task
|
||
|
||
```python
|
||
import redis
|
||
import json
|
||
import uuid
|
||
|
||
r = redis.Redis(host='127.0.0.1', port=6379)
|
||
|
||
task_id = str(uuid.uuid4())
|
||
payload = {
|
||
"task_id": task_id,
|
||
"task_type": "transcribe",
|
||
"audio_path": "/path/to/audio.wav",
|
||
"language": "zh",
|
||
"word_timestamps": True,
|
||
"vad_filter": True,
|
||
"output_path": "/tmp/asr-outputs/result.json"
|
||
}
|
||
|
||
# Push to the Redis queue
|
||
r.lpush('asr:queue', json.dumps(payload))
|
||
print(f"Task submitted: {task_id}")
|
||
```
|
||
|
||
### Check Task Status
|
||
|
||
```python
|
||
# Task status is stored in Redis by longtasks
|
||
status = r.get(f'asr:status:{task_id}')
|
||
result = r.get(f'asr:result:{task_id}')
|
||
```
|
||
|
||
## Task Payload Format
|
||
|
||
| Field | Type | Required | Default | Description |
|
||
|------------------|--------|----------|---------|--------------------------------------|
|
||
| task_type | string | Yes | - | Must be `"transcribe"` |
|
||
| audio_path | string | Yes | - | Path to input audio file |
|
||
| language | string | No | `"zh"` | Language code (zh, en, ja, etc.) |
|
||
| word_timestamps | bool | No | `True` | Enable word-level timestamps |
|
||
| vad_filter | bool | No | `True` | Enable voice activity detection |
|
||
| output_path | string | No | - | If set, save result JSON to this path|
|
||
|
||
## Output Format
|
||
|
||
```json
|
||
{
|
||
"status": "ok",
|
||
"text": "Full transcription text...",
|
||
"language": "zh",
|
||
"language_probability": 0.9876,
|
||
"duration": 125.340,
|
||
"segments": [
|
||
{
|
||
"text": "Segment text",
|
||
"start": 0.000,
|
||
"end": 5.120,
|
||
"words": [
|
||
{
|
||
"word": "你好",
|
||
"start": 0.000,
|
||
"end": 0.800,
|
||
"probability": 0.9523
|
||
}
|
||
]
|
||
}
|
||
],
|
||
"processing_time": 3.45,
|
||
"audio_path": "/path/to/audio.wav"
|
||
}
|
||
```
|
||
|
||
## Configuration
|
||
|
||
Config file: `conf/config.json`
|
||
|
||
| Setting | Value | Description |
|
||
|-----------------------|------------------------------|--------------------------------|
|
||
| website.port | 9925 | HTTP listen port |
|
||
| website.host | 0.0.0.0 | Bind address |
|
||
| session_redis | 127.0.0.1:6379 db=1 | Session storage |
|
||
| password_key | ASRService2026Key | Auth key |
|
||
| filesroot | /tmp/asr-outputs | Output files directory |
|
||
|
||
### Environment Variables
|
||
|
||
| Variable | Default | Description |
|
||
|----------------------|---------|---------------------------------------|
|
||
| ASR_GPU_ID | 5 | GPU device ID (for logging) |
|
||
| CUDA_VISIBLE_DEVICES | 5 | CUDA device isolation |
|
||
| PYTHONPATH | . | Python module search path |
|
||
|
||
## File Structure
|
||
|
||
```
|
||
asr-service/
|
||
├── ah.py # Main entry point
|
||
├── start.sh # Start script
|
||
├── stop.sh # Stop script
|
||
├── conf/
|
||
│ └── config.json # Service configuration
|
||
├── app/
|
||
│ └── health.dspy # Health check endpoint
|
||
├── workers/
|
||
│ ├── __init__.py
|
||
│ └── transcribe.py # Transcription worker
|
||
└── README.md
|
||
```
|