asr-service/README.md

183 lines
5.2 KiB
Markdown

# ASR Service
Speech-to-text service powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2 backend). Uses the `large-v3-turbo` model for fast, high-quality transcription with word-level timestamps.
## Architecture
```
Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker)
|
v
faster-whisper (GPU)
|
v
Result (JSON)
```
- **ahserver**: Web framework serving HTTP on port 9925
- **longtasks**: Redis-backed async task queue with worker management
- **Redis**: Task queue broker (queue name: `asr`)
- **faster-whisper**: ASR engine running on GPU (CUDA, float16)
The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service.
## Model
- **Model**: faster-whisper-large-v3-turbo-ct2
- **Path**: `/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2`
- **Device**: CUDA (float16)
- **GPU**: Isolated via `CUDA_VISIBLE_DEVICES` (default GPU 5)
The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests.
## Deployment
### Prerequisites
- Python venv with faster-whisper 1.2.1: `/data/ymq/demucs_venv`
- Redis server running on 127.0.0.1:6379
- CUDA-capable GPU
### Start
```bash
cd /data/ymq/asr-service
bash start.sh
```
### Stop
```bash
cd /data/ymq/asr-service
bash stop.sh
```
### Health Check
```bash
curl http://localhost:9925/health
```
Returns:
```json
{
"status": "ok",
"service": "asr-service",
"model": "faster-whisper-large-v3-turbo-ct2"
}
```
## API Usage
Tasks are submitted via Redis, same pattern as wan22-service.
### Submit a Transcription Task
```python
import redis
import json
import uuid
r = redis.Redis(host='127.0.0.1', port=6379)
task_id = str(uuid.uuid4())
payload = {
"task_id": task_id,
"task_type": "transcribe",
"audio_path": "/path/to/audio.wav",
"language": "zh",
"word_timestamps": True,
"vad_filter": True,
"output_path": "/tmp/asr-outputs/result.json"
}
# Push to the Redis queue
r.lpush('asr:queue', json.dumps(payload))
print(f"Task submitted: {task_id}")
```
### Check Task Status
```python
# Task status is stored in Redis by longtasks
status = r.get(f'asr:status:{task_id}')
result = r.get(f'asr:result:{task_id}')
```
## Task Payload Format
| Field | Type | Required | Default | Description |
|------------------|--------|----------|---------|--------------------------------------|
| task_type | string | Yes | - | Must be `"transcribe"` |
| audio_path | string | Yes | - | Path to input audio file |
| language | string | No | `"zh"` | Language code (zh, en, ja, etc.) |
| word_timestamps | bool | No | `True` | Enable word-level timestamps |
| vad_filter | bool | No | `True` | Enable voice activity detection |
| output_path | string | No | - | If set, save result JSON to this path|
## Output Format
```json
{
"status": "ok",
"text": "Full transcription text...",
"language": "zh",
"language_probability": 0.9876,
"duration": 125.340,
"segments": [
{
"text": "Segment text",
"start": 0.000,
"end": 5.120,
"words": [
{
"word": "你好",
"start": 0.000,
"end": 0.800,
"probability": 0.9523
}
]
}
],
"processing_time": 3.45,
"audio_path": "/path/to/audio.wav"
}
```
## Configuration
Config file: `conf/config.json`
| Setting | Value | Description |
|-----------------------|------------------------------|--------------------------------|
| website.port | 9925 | HTTP listen port |
| website.host | 0.0.0.0 | Bind address |
| session_redis | 127.0.0.1:6379 db=1 | Session storage |
| password_key | ASRService2026Key | Auth key |
| filesroot | /tmp/asr-outputs | Output files directory |
### Environment Variables
| Variable | Default | Description |
|----------------------|---------|---------------------------------------|
| ASR_GPU_ID | 5 | GPU device ID (for logging) |
| CUDA_VISIBLE_DEVICES | 5 | CUDA device isolation |
| PYTHONPATH | . | Python module search path |
## File Structure
```
asr-service/
├── ah.py # Main entry point
├── start.sh # Start script
├── stop.sh # Stop script
├── conf/
│ └── config.json # Service configuration
├── app/
│ └── health.dspy # Health check endpoint
├── workers/
│ ├── __init__.py
│ └── transcribe.py # Transcription worker
└── README.md
```