5.2 KiB
5.2 KiB
ASR Service
Speech-to-text service powered by faster-whisper (CTranslate2 backend). Uses the large-v3-turbo model for fast, high-quality transcription with word-level timestamps.
Architecture
Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker)
|
v
faster-whisper (GPU)
|
v
Result (JSON)
- ahserver: Web framework serving HTTP on port 9925
- longtasks: Redis-backed async task queue with worker management
- Redis: Task queue broker (queue name:
asr) - faster-whisper: ASR engine running on GPU (CUDA, float16)
The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service.
Model
- Model: faster-whisper-large-v3-turbo-ct2
- Path:
/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2 - Device: CUDA (float16)
- GPU: Isolated via
CUDA_VISIBLE_DEVICES(default GPU 5)
The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests.
Deployment
Prerequisites
- Python venv with faster-whisper 1.2.1:
/data/ymq/demucs_venv - Redis server running on 127.0.0.1:6379
- CUDA-capable GPU
Start
cd /data/ymq/asr-service
bash start.sh
Stop
cd /data/ymq/asr-service
bash stop.sh
Health Check
curl http://localhost:9925/health
Returns:
{
"status": "ok",
"service": "asr-service",
"model": "faster-whisper-large-v3-turbo-ct2"
}
API Usage
Tasks are submitted via Redis, same pattern as wan22-service.
Submit a Transcription Task
import redis
import json
import uuid
r = redis.Redis(host='127.0.0.1', port=6379)
task_id = str(uuid.uuid4())
payload = {
"task_id": task_id,
"task_type": "transcribe",
"audio_path": "/path/to/audio.wav",
"language": "zh",
"word_timestamps": True,
"vad_filter": True,
"output_path": "/tmp/asr-outputs/result.json"
}
# Push to the Redis queue
r.lpush('asr:queue', json.dumps(payload))
print(f"Task submitted: {task_id}")
Check Task Status
# Task status is stored in Redis by longtasks
status = r.get(f'asr:status:{task_id}')
result = r.get(f'asr:result:{task_id}')
Task Payload Format
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| task_type | string | Yes | - | Must be "transcribe" |
| audio_path | string | Yes | - | Path to input audio file |
| language | string | No | "zh" |
Language code (zh, en, ja, etc.) |
| word_timestamps | bool | No | True |
Enable word-level timestamps |
| vad_filter | bool | No | True |
Enable voice activity detection |
| output_path | string | No | - | If set, save result JSON to this path |
Output Format
{
"status": "ok",
"text": "Full transcription text...",
"language": "zh",
"language_probability": 0.9876,
"duration": 125.340,
"segments": [
{
"text": "Segment text",
"start": 0.000,
"end": 5.120,
"words": [
{
"word": "你好",
"start": 0.000,
"end": 0.800,
"probability": 0.9523
}
]
}
],
"processing_time": 3.45,
"audio_path": "/path/to/audio.wav"
}
Configuration
Config file: conf/config.json
| Setting | Value | Description |
|---|---|---|
| website.port | 9925 | HTTP listen port |
| website.host | 0.0.0.0 | Bind address |
| session_redis | 127.0.0.1:6379 db=1 | Session storage |
| password_key | ASRService2026Key | Auth key |
| filesroot | /tmp/asr-outputs | Output files directory |
Environment Variables
| Variable | Default | Description |
|---|---|---|
| ASR_GPU_ID | 5 | GPU device ID (for logging) |
| CUDA_VISIBLE_DEVICES | 5 | CUDA device isolation |
| PYTHONPATH | . | Python module search path |
File Structure
asr-service/
├── ah.py # Main entry point
├── start.sh # Start script
├── stop.sh # Stop script
├── conf/
│ └── config.json # Service configuration
├── app/
│ └── health.dspy # Health check endpoint
├── workers/
│ ├── __init__.py
│ └── transcribe.py # Transcription worker
└── README.md