# ASR Service Speech-to-text service powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2 backend). Uses the `large-v3-turbo` model for fast, high-quality transcription with word-level timestamps. ## Architecture ``` Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker) | v faster-whisper (GPU) | v Result (JSON) ``` - **ahserver**: Web framework serving HTTP on port 9925 - **longtasks**: Redis-backed async task queue with worker management - **Redis**: Task queue broker (queue name: `asr`) - **faster-whisper**: ASR engine running on GPU (CUDA, float16) The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service. ## Model - **Model**: faster-whisper-large-v3-turbo-ct2 - **Path**: `/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2` - **Device**: CUDA (float16) - **GPU**: Isolated via `CUDA_VISIBLE_DEVICES` (default GPU 5) The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests. ## 模型下载(离线部署) faster-whisper-large-v3-turbo-ct2 是 HuggingFace 模型,需要先下载再部署。 ### 方法1: huggingface-cli(推荐) ```bash # 安装 huggingface-cli pip install huggingface_hub # 下载模型到指定目录 huggingface-cli download deepdml/faster-whisper-large-v3-turbo-ct2 \ --local-dir /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2 \ --local-dir-use-symlinks False ``` **下载大小**: ~1.6GB **下载时间**: 取决于网络速度(约3-10分钟) ### 方法2: git-lfs ```bash # 安装 git-lfs git lfs install # 克隆模型仓库 cd /data/ymq/models mkdir -p deepdml cd deepdml git clone https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 ``` ### 方法3: wget/curl(单文件) 如果只需要核心文件,可以直接下载: ```bash cd /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2 # 下载模型文件 wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/model.bin wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/tokenizer.json wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/vocabulary.json wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/config.json wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/preprocessor_config.json ``` ### 验证下载 ```bash ls -lh /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2/ # 应该看到 model.bin (约1.6GB) + tokenizer.json + vocabulary.json + config.json ``` ### 模型来源 - **HuggingFace**: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 - **Base Model**: openai/whisper-large-v3-turbo (CTranslate2 优化版) - **License**: MIT - **优化**: CTranslate2 格式,比原版 Whisper 快 4 倍,内存占用更少 ## Deployment ### Prerequisites - Python venv with faster-whisper 1.2.1: `/data/ymq/demucs_venv` - Redis server running on 127.0.0.1:6379 - CUDA-capable GPU ### Start ```bash cd /data/ymq/asr-service bash start.sh ``` ### Stop ```bash cd /data/ymq/asr-service bash stop.sh ``` ### Health Check ```bash curl http://localhost:9925/health ``` Returns: ```json { "status": "ok", "service": "asr-service", "model": "faster-whisper-large-v3-turbo-ct2" } ``` ## API Usage Tasks are submitted via Redis, same pattern as wan22-service. ### Submit a Transcription Task ```python import redis import json import uuid r = redis.Redis(host='127.0.0.1', port=6379) task_id = str(uuid.uuid4()) payload = { "task_id": task_id, "task_type": "transcribe", "audio_path": "/path/to/audio.wav", "language": "zh", "word_timestamps": True, "vad_filter": True, "output_path": "/tmp/asr-outputs/result.json" } # Push to the Redis queue r.lpush('asr:queue', json.dumps(payload)) print(f"Task submitted: {task_id}") ``` ### Check Task Status ```python # Task status is stored in Redis by longtasks status = r.get(f'asr:status:{task_id}') result = r.get(f'asr:result:{task_id}') ``` ## Task Payload Format | Field | Type | Required | Default | Description | |------------------|--------|----------|---------|--------------------------------------| | task_type | string | Yes | - | Must be `"transcribe"` | | audio_path | string | Yes | - | Path to input audio file | | language | string | No | `"zh"` | Language code (zh, en, ja, etc.) | | word_timestamps | bool | No | `True` | Enable word-level timestamps | | vad_filter | bool | No | `True` | Enable voice activity detection | | output_path | string | No | - | If set, save result JSON to this path| ## Output Format ```json { "status": "ok", "text": "Full transcription text...", "language": "zh", "language_probability": 0.9876, "duration": 125.340, "segments": [ { "text": "Segment text", "start": 0.000, "end": 5.120, "words": [ { "word": "你好", "start": 0.000, "end": 0.800, "probability": 0.9523 } ] } ], "processing_time": 3.45, "audio_path": "/path/to/audio.wav" } ``` ## Configuration Config file: `conf/config.json` | Setting | Value | Description | |-----------------------|------------------------------|--------------------------------| | website.port | 9925 | HTTP listen port | | website.host | 0.0.0.0 | Bind address | | session_redis | 127.0.0.1:6379 db=1 | Session storage | | password_key | ASRService2026Key | Auth key | | filesroot | /tmp/asr-outputs | Output files directory | ### Environment Variables | Variable | Default | Description | |----------------------|---------|---------------------------------------| | ASR_GPU_ID | 5 | GPU device ID (for logging) | | CUDA_VISIBLE_DEVICES | 5 | CUDA device isolation | | PYTHONPATH | . | Python module search path | ## File Structure ``` asr-service/ ├── ah.py # Main entry point ├── start.sh # Start script ├── stop.sh # Stop script ├── conf/ │ └── config.json # Service configuration ├── app/ │ └── health.dspy # Health check endpoint ├── workers/ │ ├── __init__.py │ └── transcribe.py # Transcription worker └── README.md ```