183 lines
5.2 KiB
Markdown
183 lines
5.2 KiB
Markdown
# ASR Service
|
|
|
|
Speech-to-text service powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2 backend). Uses the `large-v3-turbo` model for fast, high-quality transcription with word-level timestamps.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker)
|
|
|
|
|
v
|
|
faster-whisper (GPU)
|
|
|
|
|
v
|
|
Result (JSON)
|
|
```
|
|
|
|
- **ahserver**: Web framework serving HTTP on port 9925
|
|
- **longtasks**: Redis-backed async task queue with worker management
|
|
- **Redis**: Task queue broker (queue name: `asr`)
|
|
- **faster-whisper**: ASR engine running on GPU (CUDA, float16)
|
|
|
|
The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service.
|
|
|
|
## Model
|
|
|
|
- **Model**: faster-whisper-large-v3-turbo-ct2
|
|
- **Path**: `/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2`
|
|
- **Device**: CUDA (float16)
|
|
- **GPU**: Isolated via `CUDA_VISIBLE_DEVICES` (default GPU 5)
|
|
|
|
The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests.
|
|
|
|
## Deployment
|
|
|
|
### Prerequisites
|
|
|
|
- Python venv with faster-whisper 1.2.1: `/data/ymq/demucs_venv`
|
|
- Redis server running on 127.0.0.1:6379
|
|
- CUDA-capable GPU
|
|
|
|
### Start
|
|
|
|
```bash
|
|
cd /data/ymq/asr-service
|
|
bash start.sh
|
|
```
|
|
|
|
### Stop
|
|
|
|
```bash
|
|
cd /data/ymq/asr-service
|
|
bash stop.sh
|
|
```
|
|
|
|
### Health Check
|
|
|
|
```bash
|
|
curl http://localhost:9925/health
|
|
```
|
|
|
|
Returns:
|
|
```json
|
|
{
|
|
"status": "ok",
|
|
"service": "asr-service",
|
|
"model": "faster-whisper-large-v3-turbo-ct2"
|
|
}
|
|
```
|
|
|
|
## API Usage
|
|
|
|
Tasks are submitted via Redis, same pattern as wan22-service.
|
|
|
|
### Submit a Transcription Task
|
|
|
|
```python
|
|
import redis
|
|
import json
|
|
import uuid
|
|
|
|
r = redis.Redis(host='127.0.0.1', port=6379)
|
|
|
|
task_id = str(uuid.uuid4())
|
|
payload = {
|
|
"task_id": task_id,
|
|
"task_type": "transcribe",
|
|
"audio_path": "/path/to/audio.wav",
|
|
"language": "zh",
|
|
"word_timestamps": True,
|
|
"vad_filter": True,
|
|
"output_path": "/tmp/asr-outputs/result.json"
|
|
}
|
|
|
|
# Push to the Redis queue
|
|
r.lpush('asr:queue', json.dumps(payload))
|
|
print(f"Task submitted: {task_id}")
|
|
```
|
|
|
|
### Check Task Status
|
|
|
|
```python
|
|
# Task status is stored in Redis by longtasks
|
|
status = r.get(f'asr:status:{task_id}')
|
|
result = r.get(f'asr:result:{task_id}')
|
|
```
|
|
|
|
## Task Payload Format
|
|
|
|
| Field | Type | Required | Default | Description |
|
|
|------------------|--------|----------|---------|--------------------------------------|
|
|
| task_type | string | Yes | - | Must be `"transcribe"` |
|
|
| audio_path | string | Yes | - | Path to input audio file |
|
|
| language | string | No | `"zh"` | Language code (zh, en, ja, etc.) |
|
|
| word_timestamps | bool | No | `True` | Enable word-level timestamps |
|
|
| vad_filter | bool | No | `True` | Enable voice activity detection |
|
|
| output_path | string | No | - | If set, save result JSON to this path|
|
|
|
|
## Output Format
|
|
|
|
```json
|
|
{
|
|
"status": "ok",
|
|
"text": "Full transcription text...",
|
|
"language": "zh",
|
|
"language_probability": 0.9876,
|
|
"duration": 125.340,
|
|
"segments": [
|
|
{
|
|
"text": "Segment text",
|
|
"start": 0.000,
|
|
"end": 5.120,
|
|
"words": [
|
|
{
|
|
"word": "你好",
|
|
"start": 0.000,
|
|
"end": 0.800,
|
|
"probability": 0.9523
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"processing_time": 3.45,
|
|
"audio_path": "/path/to/audio.wav"
|
|
}
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Config file: `conf/config.json`
|
|
|
|
| Setting | Value | Description |
|
|
|-----------------------|------------------------------|--------------------------------|
|
|
| website.port | 9925 | HTTP listen port |
|
|
| website.host | 0.0.0.0 | Bind address |
|
|
| session_redis | 127.0.0.1:6379 db=1 | Session storage |
|
|
| password_key | ASRService2026Key | Auth key |
|
|
| filesroot | /tmp/asr-outputs | Output files directory |
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------------------|---------|---------------------------------------|
|
|
| ASR_GPU_ID | 5 | GPU device ID (for logging) |
|
|
| CUDA_VISIBLE_DEVICES | 5 | CUDA device isolation |
|
|
| PYTHONPATH | . | Python module search path |
|
|
|
|
## File Structure
|
|
|
|
```
|
|
asr-service/
|
|
├── ah.py # Main entry point
|
|
├── start.sh # Start script
|
|
├── stop.sh # Stop script
|
|
├── conf/
|
|
│ └── config.json # Service configuration
|
|
├── app/
|
|
│ └── health.dspy # Health check endpoint
|
|
├── workers/
|
|
│ ├── __init__.py
|
|
│ └── transcribe.py # Transcription worker
|
|
└── README.md
|
|
```
|