Initial: faster-whisper ASR HTTP service (ahserver+longtasks+Redis)

This commit is contained in:
Hermes Agent 2026-06-14 14:46:20 +08:00
commit e18aac6595
9 changed files with 412 additions and 0 deletions

6
.gitignore vendored Normal file
View File

@ -0,0 +1,6 @@
__pycache__/
*.pyc
nohup*.out
*.egg-info
.env
py3/

182
README.md Normal file
View File

@ -0,0 +1,182 @@
# ASR Service
Speech-to-text service powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2 backend). Uses the `large-v3-turbo` model for fast, high-quality transcription with word-level timestamps.
## Architecture
```
Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker)
|
v
faster-whisper (GPU)
|
v
Result (JSON)
```
- **ahserver**: Web framework serving HTTP on port 9925
- **longtasks**: Redis-backed async task queue with worker management
- **Redis**: Task queue broker (queue name: `asr`)
- **faster-whisper**: ASR engine running on GPU (CUDA, float16)
The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service.
## Model
- **Model**: faster-whisper-large-v3-turbo-ct2
- **Path**: `/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2`
- **Device**: CUDA (float16)
- **GPU**: Isolated via `CUDA_VISIBLE_DEVICES` (default GPU 5)
The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests.
## Deployment
### Prerequisites
- Python venv with faster-whisper 1.2.1: `/data/ymq/demucs_venv`
- Redis server running on 127.0.0.1:6379
- CUDA-capable GPU
### Start
```bash
cd /data/ymq/asr-service
bash start.sh
```
### Stop
```bash
cd /data/ymq/asr-service
bash stop.sh
```
### Health Check
```bash
curl http://localhost:9925/health
```
Returns:
```json
{
"status": "ok",
"service": "asr-service",
"model": "faster-whisper-large-v3-turbo-ct2"
}
```
## API Usage
Tasks are submitted via Redis, same pattern as wan22-service.
### Submit a Transcription Task
```python
import redis
import json
import uuid
r = redis.Redis(host='127.0.0.1', port=6379)
task_id = str(uuid.uuid4())
payload = {
"task_id": task_id,
"task_type": "transcribe",
"audio_path": "/path/to/audio.wav",
"language": "zh",
"word_timestamps": True,
"vad_filter": True,
"output_path": "/tmp/asr-outputs/result.json"
}
# Push to the Redis queue
r.lpush('asr:queue', json.dumps(payload))
print(f"Task submitted: {task_id}")
```
### Check Task Status
```python
# Task status is stored in Redis by longtasks
status = r.get(f'asr:status:{task_id}')
result = r.get(f'asr:result:{task_id}')
```
## Task Payload Format
| Field | Type | Required | Default | Description |
|------------------|--------|----------|---------|--------------------------------------|
| task_type | string | Yes | - | Must be `"transcribe"` |
| audio_path | string | Yes | - | Path to input audio file |
| language | string | No | `"zh"` | Language code (zh, en, ja, etc.) |
| word_timestamps | bool | No | `True` | Enable word-level timestamps |
| vad_filter | bool | No | `True` | Enable voice activity detection |
| output_path | string | No | - | If set, save result JSON to this path|
## Output Format
```json
{
"status": "ok",
"text": "Full transcription text...",
"language": "zh",
"language_probability": 0.9876,
"duration": 125.340,
"segments": [
{
"text": "Segment text",
"start": 0.000,
"end": 5.120,
"words": [
{
"word": "你好",
"start": 0.000,
"end": 0.800,
"probability": 0.9523
}
]
}
],
"processing_time": 3.45,
"audio_path": "/path/to/audio.wav"
}
```
## Configuration
Config file: `conf/config.json`
| Setting | Value | Description |
|-----------------------|------------------------------|--------------------------------|
| website.port | 9925 | HTTP listen port |
| website.host | 0.0.0.0 | Bind address |
| session_redis | 127.0.0.1:6379 db=1 | Session storage |
| password_key | ASRService2026Key | Auth key |
| filesroot | /tmp/asr-outputs | Output files directory |
### Environment Variables
| Variable | Default | Description |
|----------------------|---------|---------------------------------------|
| ASR_GPU_ID | 5 | GPU device ID (for logging) |
| CUDA_VISIBLE_DEVICES | 5 | CUDA device isolation |
| PYTHONPATH | . | Python module search path |
## File Structure
```
asr-service/
├── ah.py # Main entry point
├── start.sh # Start script
├── stop.sh # Stop script
├── conf/
│ └── config.json # Service configuration
├── app/
│ └── health.dspy # Health check endpoint
├── workers/
│ ├── __init__.py
│ └── transcribe.py # Transcription worker
└── README.md
```

43
ah.py Normal file
View File

@ -0,0 +1,43 @@
import os
from ahserver.webapp import webapp
from ahserver.serverenv import ServerEnv
from ahserver.configuredServer import add_startup
from longtasks.longtasks import LongTasks, schedule_once
from appPublic.log import debug
class ASRTasks(LongTasks):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.gpu_id = int(os.environ.get('ASR_GPU_ID', '5'))
async def process_task(self, payload, workid=None):
import json
if isinstance(payload, str):
payload = json.loads(payload)
task_type = payload.get('task_type', '')
if task_type == 'transcribe':
from workers.transcribe import run_transcribe
return await run_transcribe(self, payload)
raise ValueError(f'Unknown task_type: {task_type}')
async def on_app_built(app):
env = ServerEnv()
lt = env.longtasks
if lt:
schedule_once(0.1, lt.run)
debug(f'ASR longtasks worker started, GPU: {lt.gpu_id}')
def init():
env = ServerEnv()
env.longtasks = ASRTasks(
'redis://127.0.0.1:6379', 'asr',
worker_cnt=1, stuck_seconds=600, max_age_hours=24
)
add_startup(on_app_built)
if __name__ == '__main__':
webapp(init)

5
app/health.dspy Normal file
View File

@ -0,0 +1,5 @@
{{
"status": "ok",
"service": "asr-service",
"model": "faster-whisper-large-v3-turbo-ct2"
}}

1
conf/config.json Normal file
View File

@ -0,0 +1 @@
{"password_key":"ASRService2026Key","databases":{},"session_redis":{"host":"127.0.0.1","port":6379,"db":1},"website":{"paths":[["$[workdir]$/app",""]],"host":"0.0.0.0","port":9925,"coding":"utf-8","indexes":["index.html","index.dspy"],"processors":[[".dspy","dspy"]],"startswiths":[{"leading":"/idfile","registerfunction":"idfile"}]},"hot_reload":false,"filesroot":"/tmp/asr-outputs"}

7
start.sh Executable file
View File

@ -0,0 +1,7 @@
#!/bin/bash
cd /data/ymq/asr-service
export ASR_GPU_ID=5
export CUDA_VISIBLE_DEVICES=5
export PYTHONPATH=/data/ymq/asr-service
nohup /data/ymq/demucs_venv/bin/python ah.py > nohup.out 2>&1 &
echo "asr-service started, PID: $!, GPU: $ASR_GPU_ID"

24
stop.sh Executable file
View File

@ -0,0 +1,24 @@
#!/bin/bash
# Stop the asr-service
PID=$(pgrep -f "python ah.py" | head -1)
if [ -z "$PID" ]; then
echo "asr-service is not running"
exit 0
fi
echo "Stopping asr-service (PID: $PID)..."
kill "$PID"
# Wait up to 10 seconds for graceful shutdown
for i in $(seq 1 10); do
if ! kill -0 "$PID" 2>/dev/null; then
echo "asr-service stopped"
exit 0
fi
sleep 1
done
# Force kill if still running
echo "Force killing asr-service (PID: $PID)..."
kill -9 "$PID"
echo "asr-service killed"

0
workers/__init__.py Normal file
View File

144
workers/transcribe.py Normal file
View File

@ -0,0 +1,144 @@
"""
ASR Transcription Worker using faster-whisper.
Lazy-loads the model on first use and keeps it in GPU memory.
Processes transcription tasks from the Redis queue.
"""
import os
import json
import asyncio
import time
from appPublic.log import debug, error
# Module-level model cache (lazy-loaded, stays in memory)
_model = None
_model_lock = None
MODEL_PATH = '/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2'
def _get_lock():
"""Get or create the async lock for model loading."""
global _model_lock
if _model_lock is None:
_model_lock = asyncio.Lock()
return _model_lock
async def load_model():
"""Lazy-load the faster-whisper model. Thread-safe, loads once."""
global _model
if _model is not None:
return _model
async with _get_lock():
# Double-check after acquiring lock
if _model is not None:
return _model
debug(f'Loading faster-whisper model from {MODEL_PATH}...')
t0 = time.time()
from faster_whisper import WhisperModel
# CUDA device 0 — CUDA_VISIBLE_DEVICES already isolates the GPU
_model = WhisperModel(
MODEL_PATH,
device='cuda',
device_index=0,
compute_type='float16',
num_workers=1,
)
elapsed = time.time() - t0
debug(f'faster-whisper model loaded in {elapsed:.1f}s')
return _model
async def run_transcribe(tasks, payload):
"""
Run transcription on an audio file.
Payload fields:
audio_path (str): Path to the audio file (required)
language (str): Language code, default 'zh'
word_timestamps (bool): Enable word-level timestamps, default True
vad_filter (bool): Enable VAD filter, default True
output_path (str): Optional path to save result JSON
Returns:
dict with segments, language, duration, etc.
"""
audio_path = payload.get('audio_path')
if not audio_path:
raise ValueError('audio_path is required')
if not os.path.exists(audio_path):
raise FileNotFoundError(f'Audio file not found: {audio_path}')
language = payload.get('language', 'zh')
word_timestamps = payload.get('word_timestamps', True)
vad_filter = payload.get('vad_filter', True)
output_path = payload.get('output_path')
debug(f'Transcribing: {audio_path} (lang={language}, vad={vad_filter}, words={word_timestamps})')
t0 = time.time()
model = await load_model()
# Run the synchronous transcription in a thread to not block the event loop
loop = asyncio.get_event_loop()
segments_gen, info = await loop.run_in_executor(
None,
lambda: model.transcribe(
audio_path,
language=language,
word_timestamps=word_timestamps,
vad_filter=vad_filter,
)
)
# Collect segments
segments = []
for seg in segments_gen:
seg_data = {
'text': seg.text,
'start': round(seg.start, 3),
'end': round(seg.end, 3),
}
if word_timestamps and seg.words:
seg_data['words'] = [
{
'word': w.word,
'start': round(w.start, 3),
'end': round(w.end, 3),
'probability': round(w.probability, 4),
}
for w in seg.words
]
segments.append(seg_data)
elapsed = time.time() - t0
result = {
'status': 'ok',
'text': ' '.join(s['text'] for s in segments),
'language': info.language,
'language_probability': round(info.language_probability, 4),
'duration': round(info.duration, 3),
'segments': segments,
'processing_time': round(elapsed, 2),
'audio_path': audio_path,
}
debug(f'Transcription done in {elapsed:.1f}s: {len(segments)} segments, '
f'duration={info.duration:.1f}s, lang={info.language}')
# Save result if output_path specified
if output_path:
os.makedirs(os.path.dirname(output_path), exist_ok=True)
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(result, f, ensure_ascii=False, indent=2)
debug(f'Result saved to {output_path}')
return result