Initial: faster-whisper ASR HTTP service (ahserver+longtasks+Redis)

2026-06-14 14:46:20 +08:00 · 2026-06-14 14:46:20 +08:00 · e18aac6595
commit e18aac6595
9 changed files with 412 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,6 @@
 __pycache__/
 *.pyc
 nohup*.out
 *.egg-info
 .env
 py3/
--- a/README.md
+++ b/README.md
@ -0,0 +1,182 @@
 # ASR Service
 Speech-to-text service powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2 backend). Uses the `large-v3-turbo` model for fast, high-quality transcription with word-level timestamps.
 ## Architecture
 ```
 Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker)
                                      |
                                      v
                                 faster-whisper (GPU)
                                      |
                                      v
                                 Result (JSON)
 ```
 - **ahserver**: Web framework serving HTTP on port 9925
 - **longtasks**: Redis-backed async task queue with worker management
 - **Redis**: Task queue broker (queue name: `asr`)
 - **faster-whisper**: ASR engine running on GPU (CUDA, float16)
 The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service.
 ## Model
 - **Model**: faster-whisper-large-v3-turbo-ct2
 - **Path**: `/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2`
 - **Device**: CUDA (float16)
 - **GPU**: Isolated via `CUDA_VISIBLE_DEVICES` (default GPU 5)
 The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests.
 ## Deployment
 ### Prerequisites
 - Python venv with faster-whisper 1.2.1: `/data/ymq/demucs_venv`
 - Redis server running on 127.0.0.1:6379
 - CUDA-capable GPU
 ### Start
 ```bash
 cd /data/ymq/asr-service
 bash start.sh
 ```
 ### Stop
 ```bash
 cd /data/ymq/asr-service
 bash stop.sh
 ```
 ### Health Check
 ```bash
 curl http://localhost:9925/health
 ```
 Returns:
 ```json
 {
    "status": "ok",
    "service": "asr-service",
    "model": "faster-whisper-large-v3-turbo-ct2"
 }
 ```
 ## API Usage
 Tasks are submitted via Redis, same pattern as wan22-service.
 ### Submit a Transcription Task
 ```python
 import redis
 import json
 import uuid
 r = redis.Redis(host='127.0.0.1', port=6379)
 task_id = str(uuid.uuid4())
 payload = {
    "task_id": task_id,
    "task_type": "transcribe",
    "audio_path": "/path/to/audio.wav",
    "language": "zh",
    "word_timestamps": True,
    "vad_filter": True,
    "output_path": "/tmp/asr-outputs/result.json"
 }
 # Push to the Redis queue
 r.lpush('asr:queue', json.dumps(payload))
 print(f"Task submitted: {task_id}")
 ```
 ### Check Task Status
 ```python
 # Task status is stored in Redis by longtasks
 status = r.get(f'asr:status:{task_id}')
 result = r.get(f'asr:result:{task_id}')
 ```
 ## Task Payload Format
 | Field            | Type   | Required | Default | Description                          |
 |------------------|--------|----------|---------|--------------------------------------|
 | task_type        | string | Yes      | -       | Must be `"transcribe"`               |
 | audio_path       | string | Yes      | -       | Path to input audio file             |
 | language         | string | No       | `"zh"`  | Language code (zh, en, ja, etc.)     |
 | word_timestamps  | bool   | No       | `True`  | Enable word-level timestamps         |
 | vad_filter       | bool   | No       | `True`  | Enable voice activity detection      |
 | output_path      | string | No       | -       | If set, save result JSON to this path|
 ## Output Format
 ```json
 {
    "status": "ok",
    "text": "Full transcription text...",
    "language": "zh",
    "language_probability": 0.9876,
    "duration": 125.340,
    "segments": [
        {
            "text": "Segment text",
            "start": 0.000,
            "end": 5.120,
            "words": [
                {
                    "word": "你好",
                    "start": 0.000,
                    "end": 0.800,
                    "probability": 0.9523
                }
            ]
        }
    ],
    "processing_time": 3.45,
    "audio_path": "/path/to/audio.wav"
 }
 ```
 ## Configuration
 Config file: `conf/config.json`
 | Setting               | Value                        | Description                    |
 |-----------------------|------------------------------|--------------------------------|
 | website.port          | 9925                         | HTTP listen port               |
 | website.host          | 0.0.0.0                      | Bind address                   |
 | session_redis         | 127.0.0.1:6379 db=1          | Session storage                |
 | password_key          | ASRService2026Key            | Auth key                       |
 | filesroot             | /tmp/asr-outputs             | Output files directory         |
 ### Environment Variables
 | Variable             | Default | Description                           |
 |----------------------|---------|---------------------------------------|
 | ASR_GPU_ID           | 5       | GPU device ID (for logging)           |
 | CUDA_VISIBLE_DEVICES | 5       | CUDA device isolation                 |
 | PYTHONPATH           | .       | Python module search path             |
 ## File Structure
 ```
 asr-service/
 ├── ah.py                  # Main entry point
 ├── start.sh               # Start script
 ├── stop.sh                # Stop script
 ├── conf/
 │   └── config.json        # Service configuration
 ├── app/
 │   └── health.dspy        # Health check endpoint
 ├── workers/
 │   ├── __init__.py
 │   └── transcribe.py      # Transcription worker
 └── README.md
 ```
--- a/ah.py
+++ b/ah.py
@ -0,0 +1,43 @@
 import os
 from ahserver.webapp import webapp
 from ahserver.serverenv import ServerEnv
 from ahserver.configuredServer import add_startup
 from longtasks.longtasks import LongTasks, schedule_once
 from appPublic.log import debug
 class ASRTasks(LongTasks):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.gpu_id = int(os.environ.get('ASR_GPU_ID', '5'))
    async def process_task(self, payload, workid=None):
        import json
        if isinstance(payload, str):
            payload = json.loads(payload)
        task_type = payload.get('task_type', '')
        if task_type == 'transcribe':
            from workers.transcribe import run_transcribe
            return await run_transcribe(self, payload)
        raise ValueError(f'Unknown task_type: {task_type}')
 async def on_app_built(app):
    env = ServerEnv()
    lt = env.longtasks
    if lt:
        schedule_once(0.1, lt.run)
        debug(f'ASR longtasks worker started, GPU: {lt.gpu_id}')
 def init():
    env = ServerEnv()
    env.longtasks = ASRTasks(
        'redis://127.0.0.1:6379', 'asr',
        worker_cnt=1, stuck_seconds=600, max_age_hours=24
    )
    add_startup(on_app_built)
 if __name__ == '__main__':
    webapp(init)
--- a/app/health.dspy
+++ b/app/health.dspy
@ -0,0 +1,5 @@
 {{
    "status": "ok",
    "service": "asr-service",
    "model": "faster-whisper-large-v3-turbo-ct2"
 }}
--- a/conf/config.json
+++ b/conf/config.json
@ -0,0 +1 @@
 {"password_key":"ASRService2026Key","databases":{},"session_redis":{"host":"127.0.0.1","port":6379,"db":1},"website":{"paths":[["$[workdir]$/app",""]],"host":"0.0.0.0","port":9925,"coding":"utf-8","indexes":["index.html","index.dspy"],"processors":[[".dspy","dspy"]],"startswiths":[{"leading":"/idfile","registerfunction":"idfile"}]},"hot_reload":false,"filesroot":"/tmp/asr-outputs"}
--- a/start.sh
+++ b/start.sh
@ -0,0 +1,7 @@
 #!/bin/bash
 cd /data/ymq/asr-service
 export ASR_GPU_ID=5
 export CUDA_VISIBLE_DEVICES=5
 export PYTHONPATH=/data/ymq/asr-service
 nohup /data/ymq/demucs_venv/bin/python ah.py > nohup.out 2>&1 &
 echo "asr-service started, PID: $!, GPU: $ASR_GPU_ID"
--- a/stop.sh
+++ b/stop.sh
@ -0,0 +1,24 @@
 #!/bin/bash
 # Stop the asr-service
 PID=$(pgrep -f "python ah.py" | head -1)
 if [ -z "$PID" ]; then
    echo "asr-service is not running"
    exit 0
 fi
 echo "Stopping asr-service (PID: $PID)..."
 kill "$PID"
 # Wait up to 10 seconds for graceful shutdown
 for i in $(seq 1 10); do
    if ! kill -0 "$PID" 2>/dev/null; then
        echo "asr-service stopped"
        exit 0
    fi
    sleep 1
 done
 # Force kill if still running
 echo "Force killing asr-service (PID: $PID)..."
 kill -9 "$PID"
 echo "asr-service killed"
--- a/workers/init.py
+++ b/workers/init.py
--- a/workers/transcribe.py
+++ b/workers/transcribe.py
@ -0,0 +1,144 @@
 """
 ASR Transcription Worker using faster-whisper.
 Lazy-loads the model on first use and keeps it in GPU memory.
 Processes transcription tasks from the Redis queue.
 """
 import os
 import json
 import asyncio
 import time
 from appPublic.log import debug, error
 # Module-level model cache (lazy-loaded, stays in memory)
 _model = None
 _model_lock = None
 MODEL_PATH = '/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2'
 def _get_lock():
    """Get or create the async lock for model loading."""
    global _model_lock
    if _model_lock is None:
        _model_lock = asyncio.Lock()
    return _model_lock
 async def load_model():
    """Lazy-load the faster-whisper model. Thread-safe, loads once."""
    global _model
    if _model is not None:
        return _model
    async with _get_lock():
        # Double-check after acquiring lock
        if _model is not None:
            return _model
        debug(f'Loading faster-whisper model from {MODEL_PATH}...')
        t0 = time.time()
        from faster_whisper import WhisperModel
        # CUDA device 0 — CUDA_VISIBLE_DEVICES already isolates the GPU
        _model = WhisperModel(
            MODEL_PATH,
            device='cuda',
            device_index=0,
            compute_type='float16',
            num_workers=1,
        )
        elapsed = time.time() - t0
        debug(f'faster-whisper model loaded in {elapsed:.1f}s')
        return _model
 async def run_transcribe(tasks, payload):
    """
    Run transcription on an audio file.
    Payload fields:
        audio_path (str):      Path to the audio file (required)
        language (str):        Language code, default 'zh'
        word_timestamps (bool): Enable word-level timestamps, default True
        vad_filter (bool):     Enable VAD filter, default True
        output_path (str):     Optional path to save result JSON
    Returns:
        dict with segments, language, duration, etc.
    """
    audio_path = payload.get('audio_path')
    if not audio_path:
        raise ValueError('audio_path is required')
    if not os.path.exists(audio_path):
        raise FileNotFoundError(f'Audio file not found: {audio_path}')
    language = payload.get('language', 'zh')
    word_timestamps = payload.get('word_timestamps', True)
    vad_filter = payload.get('vad_filter', True)
    output_path = payload.get('output_path')
    debug(f'Transcribing: {audio_path} (lang={language}, vad={vad_filter}, words={word_timestamps})')
    t0 = time.time()
    model = await load_model()
    # Run the synchronous transcription in a thread to not block the event loop
    loop = asyncio.get_event_loop()
    segments_gen, info = await loop.run_in_executor(
        None,
        lambda: model.transcribe(
            audio_path,
            language=language,
            word_timestamps=word_timestamps,
            vad_filter=vad_filter,
        )
    )
    # Collect segments
    segments = []
    for seg in segments_gen:
        seg_data = {
            'text': seg.text,
            'start': round(seg.start, 3),
            'end': round(seg.end, 3),
        }
        if word_timestamps and seg.words:
            seg_data['words'] = [
                {
                    'word': w.word,
                    'start': round(w.start, 3),
                    'end': round(w.end, 3),
                    'probability': round(w.probability, 4),
                }
                for w in seg.words
            ]
        segments.append(seg_data)
    elapsed = time.time() - t0
    result = {
        'status': 'ok',
        'text': ' '.join(s['text'] for s in segments),
        'language': info.language,
        'language_probability': round(info.language_probability, 4),
        'duration': round(info.duration, 3),
        'segments': segments,
        'processing_time': round(elapsed, 2),
        'audio_path': audio_path,
    }
    debug(f'Transcription done in {elapsed:.1f}s: {len(segments)} segments, '
          f'duration={info.duration:.1f}s, lang={info.language}')
    # Save result if output_path specified
    if output_path:
        os.makedirs(os.path.dirname(output_path), exist_ok=True)
        with open(output_path, 'w', encoding='utf-8') as f:
            json.dump(result, f, ensure_ascii=False, indent=2)
        debug(f'Result saved to {output_path}')
    return result
		`@ -0,0 +1 @@`
							`{"password_key":"ASRService2026Key","databases":{},"session_redis":{"host":"127.0.0.1","port":6379,"db":1},"website":{"paths":[["$[workdir]$/app",""]],"host":"0.0.0.0","port":9925,"coding":"utf-8","indexes":["index.html","index.dspy"],"processors":[[".dspy","dspy"]],"startswiths":[{"leading":"/idfile","registerfunction":"idfile"}]},"hot_reload":false,"filesroot":"/tmp/asr-outputs"}`