ASR Service

Speech-to-text service powered by faster-whisper (CTranslate2 backend). Uses the large-v3-turbo model for fast, high-quality transcription with word-level timestamps.

Architecture

Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker)
                                      |
                                      v
                                 faster-whisper (GPU)
                                      |
                                      v
                                 Result (JSON)
  • ahserver: Web framework serving HTTP on port 9925
  • longtasks: Redis-backed async task queue with worker management
  • Redis: Task queue broker (queue name: asr)
  • faster-whisper: ASR engine running on GPU (CUDA, float16)

The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service.

Model

  • Model: faster-whisper-large-v3-turbo-ct2
  • Path: /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2
  • Device: CUDA (float16)
  • GPU: Isolated via CUDA_VISIBLE_DEVICES (default GPU 5)

The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests.

Deployment

Prerequisites

  • Python venv with faster-whisper 1.2.1: /data/ymq/demucs_venv
  • Redis server running on 127.0.0.1:6379
  • CUDA-capable GPU

Start

cd /data/ymq/asr-service
bash start.sh

Stop

cd /data/ymq/asr-service
bash stop.sh

Health Check

curl http://localhost:9925/health

Returns:

{
    "status": "ok",
    "service": "asr-service",
    "model": "faster-whisper-large-v3-turbo-ct2"
}

API Usage

Tasks are submitted via Redis, same pattern as wan22-service.

Submit a Transcription Task

import redis
import json
import uuid

r = redis.Redis(host='127.0.0.1', port=6379)

task_id = str(uuid.uuid4())
payload = {
    "task_id": task_id,
    "task_type": "transcribe",
    "audio_path": "/path/to/audio.wav",
    "language": "zh",
    "word_timestamps": True,
    "vad_filter": True,
    "output_path": "/tmp/asr-outputs/result.json"
}

# Push to the Redis queue
r.lpush('asr:queue', json.dumps(payload))
print(f"Task submitted: {task_id}")

Check Task Status

# Task status is stored in Redis by longtasks
status = r.get(f'asr:status:{task_id}')
result = r.get(f'asr:result:{task_id}')

Task Payload Format

Field Type Required Default Description
task_type string Yes - Must be "transcribe"
audio_path string Yes - Path to input audio file
language string No "zh" Language code (zh, en, ja, etc.)
word_timestamps bool No True Enable word-level timestamps
vad_filter bool No True Enable voice activity detection
output_path string No - If set, save result JSON to this path

Output Format

{
    "status": "ok",
    "text": "Full transcription text...",
    "language": "zh",
    "language_probability": 0.9876,
    "duration": 125.340,
    "segments": [
        {
            "text": "Segment text",
            "start": 0.000,
            "end": 5.120,
            "words": [
                {
                    "word": "你好",
                    "start": 0.000,
                    "end": 0.800,
                    "probability": 0.9523
                }
            ]
        }
    ],
    "processing_time": 3.45,
    "audio_path": "/path/to/audio.wav"
}

Configuration

Config file: conf/config.json

Setting Value Description
website.port 9925 HTTP listen port
website.host 0.0.0.0 Bind address
session_redis 127.0.0.1:6379 db=1 Session storage
password_key ASRService2026Key Auth key
filesroot /tmp/asr-outputs Output files directory

Environment Variables

Variable Default Description
ASR_GPU_ID 5 GPU device ID (for logging)
CUDA_VISIBLE_DEVICES 5 CUDA device isolation
PYTHONPATH . Python module search path

File Structure

asr-service/
├── ah.py                  # Main entry point
├── start.sh               # Start script
├── stop.sh                # Stop script
├── conf/
│   └── config.json        # Service configuration
├── app/
│   └── health.dspy        # Health check endpoint
├── workers/
│   ├── __init__.py
│   └── transcribe.py      # Transcription worker
└── README.md
Description
No description provided
Readme 40 KiB