13 changed files with 722 additions and 1 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,6 @@
 __pycache__/
 *.pyc
 nohup*.out
 *.egg-info
 .env
 py3/
--- a/README.md
+++ b/README.md
@ -1,2 +1,244 @@
-# asr-service
+# ASR Service
 Speech-to-text service powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2 backend). Uses the `large-v3-turbo` model for fast, high-quality transcription with word-level timestamps.
 ## Architecture
 ```
 Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker)
                                      |
                                      v
                                 faster-whisper (GPU)
                                      |
                                      v
                                 Result (JSON)
 ```
 - **ahserver**: Web framework serving HTTP on port 9925
 - **longtasks**: Redis-backed async task queue with worker management
 - **Redis**: Task queue broker (queue name: `asr`)
 - **faster-whisper**: ASR engine running on GPU (CUDA, float16)
 The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service.
 ## Model
 - **Model**: faster-whisper-large-v3-turbo-ct2
 - **Path**: `/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2`
 - **Device**: CUDA (float16)
 - **GPU**: Isolated via `CUDA_VISIBLE_DEVICES` (default GPU 5)
 The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests.
 ## 模型下载（离线部署）
 faster-whisper-large-v3-turbo-ct2 是 HuggingFace 模型，需要先下载再部署。
 ### 方法1: huggingface-cli（推荐）
 ```bash
 # 安装 huggingface-cli
 pip install huggingface_hub
 # 下载模型到指定目录
 huggingface-cli download deepdml/faster-whisper-large-v3-turbo-ct2 \
  --local-dir /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2 \
  --local-dir-use-symlinks False
 ```
 **下载大小**: ~1.6GB  
 **下载时间**: 取决于网络速度（约3-10分钟）
 ### 方法2: git-lfs
 ```bash
 # 安装 git-lfs
 git lfs install
 # 克隆模型仓库
 cd /data/ymq/models
 mkdir -p deepdml
 cd deepdml
 git clone https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
 ```
 ### 方法3: wget/curl（单文件）
 如果只需要核心文件，可以直接下载：
 ```bash
 cd /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2
 # 下载模型文件
 wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/model.bin
 wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/tokenizer.json
 wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/vocabulary.json
 wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/config.json
 wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/preprocessor_config.json
 ```
 ### 验证下载
 ```bash
 ls -lh /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2/
 # 应该看到 model.bin (约1.6GB) + tokenizer.json + vocabulary.json + config.json
 ```
 ### 模型来源
 - **HuggingFace**: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
 - **Base Model**: openai/whisper-large-v3-turbo (CTranslate2 优化版)
 - **License**: MIT
 - **优化**: CTranslate2 格式，比原版 Whisper 快 4 倍，内存占用更少
 ## Deployment
 ### Prerequisites
 - Python venv with faster-whisper 1.2.1: `/data/ymq/demucs_venv`
 - Redis server running on 127.0.0.1:6379
 - CUDA-capable GPU
 ### Start
 ```bash
 cd /data/ymq/asr-service
 bash start.sh
 ```
 ### Stop
 ```bash
 cd /data/ymq/asr-service
 bash stop.sh
 ```
 ### Health Check
 ```bash
 curl http://localhost:9925/health
 ```
 Returns:
 ```json
 {
    "status": "ok",
    "service": "asr-service",
    "model": "faster-whisper-large-v3-turbo-ct2"
 }
 ```
 ## API Usage
 Tasks are submitted via Redis, same pattern as wan22-service.
 ### Submit a Transcription Task
 ```python
 import redis
 import json
 import uuid
 r = redis.Redis(host='127.0.0.1', port=6379)
 task_id = str(uuid.uuid4())
 payload = {
    "task_id": task_id,
    "task_type": "transcribe",
    "audio_path": "/path/to/audio.wav",
    "language": "zh",
    "word_timestamps": True,
    "vad_filter": True,
    "output_path": "/tmp/asr-outputs/result.json"
 }
 # Push to the Redis queue
 r.lpush('asr:queue', json.dumps(payload))
 print(f"Task submitted: {task_id}")
 ```
 ### Check Task Status
 ```python
 # Task status is stored in Redis by longtasks
 status = r.get(f'asr:status:{task_id}')
 result = r.get(f'asr:result:{task_id}')
 ```
 ## Task Payload Format
 | Field            | Type   | Required | Default | Description                          |
 |------------------|--------|----------|---------|--------------------------------------|
 | task_type        | string | Yes      | -       | Must be `"transcribe"`               |
 | audio_path       | string | Yes      | -       | Path to input audio file             |
 | language         | string | No       | `"zh"`  | Language code (zh, en, ja, etc.)     |
 | word_timestamps  | bool   | No       | `True`  | Enable word-level timestamps         |
 | vad_filter       | bool   | No       | `True`  | Enable voice activity detection      |
 | output_path      | string | No       | -       | If set, save result JSON to this path|
 ## Output Format
 ```json
 {
    "status": "ok",
    "text": "Full transcription text...",
    "language": "zh",
    "language_probability": 0.9876,
    "duration": 125.340,
    "segments": [
        {
            "text": "Segment text",
            "start": 0.000,
            "end": 5.120,
            "words": [
                {
                    "word": "你好",
                    "start": 0.000,
                    "end": 0.800,
                    "probability": 0.9523
                }
            ]
        }
    ],
    "processing_time": 3.45,
    "audio_path": "/path/to/audio.wav"
 }
 ```
 ## Configuration
 Config file: `conf/config.json`
 | Setting               | Value                        | Description                    |
 |-----------------------|------------------------------|--------------------------------|
 | website.port          | 9925                         | HTTP listen port               |
 | website.host          | 0.0.0.0                      | Bind address                   |
 | session_redis         | 127.0.0.1:6379 db=1          | Session storage                |
 | password_key          | ASRService2026Key            | Auth key                       |
 | filesroot             | /tmp/asr-outputs             | Output files directory         |
 ### Environment Variables
 | Variable             | Default | Description                           |
 |----------------------|---------|---------------------------------------|
 | ASR_GPU_ID           | 5       | GPU device ID (for logging)           |
 | CUDA_VISIBLE_DEVICES | 5       | CUDA device isolation                 |
 | PYTHONPATH           | .       | Python module search path             |
 ## File Structure
 ```
 asr-service/
 ├── ah.py                  # Main entry point
 ├── start.sh               # Start script
 ├── stop.sh                # Stop script
 ├── conf/
 │   └── config.json        # Service configuration
 ├── app/
 │   └── health.dspy        # Health check endpoint
 ├── workers/
 │   ├── __init__.py
 │   └── transcribe.py      # Transcription worker
 └── README.md
 ```
--- a/ah.py
+++ b/ah.py
@ -0,0 +1,43 @@
 import os
 from ahserver.webapp import webapp
 from ahserver.serverenv import ServerEnv
 from ahserver.configuredServer import add_startup
 from longtasks.longtasks import LongTasks, schedule_once
 from appPublic.log import debug
 class ASRTasks(LongTasks):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.gpu_id = int(os.environ.get('ASR_GPU_ID', '5'))
    async def process_task(self, payload, workid=None):
        import json
        if isinstance(payload, str):
            payload = json.loads(payload)
        task_type = payload.get('task_type', '')
        if task_type == 'transcribe':
            from workers.transcribe import run_transcribe
            return await run_transcribe(self, payload)
        raise ValueError(f'Unknown task_type: {task_type}')
 async def on_app_built(app):
    env = ServerEnv()
    lt = env.longtasks
    if lt:
        schedule_once(0.1, lt.run)
        debug(f'ASR longtasks worker started, GPU: {lt.gpu_id}')
 def init():
    env = ServerEnv()
    env.longtasks = ASRTasks(
        'redis://127.0.0.1:6379', 'asr',
        worker_cnt=1, stuck_seconds=600, max_age_hours=24
    )
    add_startup(on_app_built)
 if __name__ == '__main__':
    webapp(init)
--- a/app/api/status/index.dspy
+++ b/app/api/status/index.dspy
@ -0,0 +1,31 @@
 # -*- coding:utf-8 -*-
 # GET /api/status - ASR服务状态
 import subprocess
 import json
 result = {
    'service': 'asr-transcription',
    'model': 'faster-whisper-large-v3-turbo-ct2',
    'gpu_id': 6,
    'gpus': []
 }
 try:
    out = subprocess.check_output(
        ['nvidia-smi', '--query-gpu=index,utilization.gpu,memory.used,memory.total',
         '--format=csv,noheader,nounits'],
        timeout=5
    ).decode().strip()
    for line in out.split('\n'):
        parts = [p.strip() for p in line.split(',')]
        result['gpus'].append({
            'id': int(parts[0]),
            'util': int(parts[1]),
            'mem_used': int(parts[2]),
            'mem_total': int(parts[3])
        })
 except Exception:
    pass
 return json.dumps(result)
--- a/app/api/submit/index.dspy
+++ b/app/api/submit/index.dspy
@ -0,0 +1,53 @@
 # -*- coding:utf-8 -*-
 # POST /api/submit - 提交ASR转录任务
 import json
 import uuid
 from ahserver.serverenv import ServerEnv
 method = request.method
 if method == 'POST':
    audio_path = params_kw.get('audio_path', '')
    if not audio_path:
        return json.dumps({'error': 'audio_path is required'}, ensure_ascii=False)
    task_id = params_kw.get('task_id', str(uuid.uuid4()).replace("-", "")[:12])
    language = params_kw.get('language', 'auto')
    beam_size = params_kw.get('beam_size', 5)
    payload = {
        'task_type': 'transcribe',
        'task_id': task_id,
        'audio_path': audio_path,
        'language': language,
        'beam_size': int(beam_size)
    }
    env = ServerEnv()
    longtasks = env.longtasks
    if longtasks is None:
        return json.dumps({'error': 'service not ready'}, ensure_ascii=False)
    result = await longtasks.submit_task(payload)
    real_task_id = result.get('task_id', str(result)) if isinstance(result, dict) else str(result)
    return json.dumps({
        'task_id': real_task_id,
        'status': 'queued',
        'audio_path': audio_path,
        'language': language,
        'message': 'task submitted',
        'check_url': f'/api/task?task_id={real_task_id}'
    }, ensure_ascii=False)
 else:
    return json.dumps({
        'usage': 'POST with JSON body',
        'params': {
            'audio_path': 'string (required, server path to audio file)',
            'language': 'string (default auto, or zh/en/ja/ko etc)',
            'beam_size': 'int (default 5)',
            'task_id': 'string (optional, auto-generated)',
        }
    }, ensure_ascii=False)
--- a/app/api/task/index.dspy
+++ b/app/api/task/index.dspy
@ -0,0 +1,17 @@
 # -*- coding:utf-8 -*-
 # GET /api/task?task_id=xxx - 查询任务状态
 import json
 from ahserver.serverenv import ServerEnv
 task_id = params_kw.get('task_id', '')
 if not task_id:
    return json.dumps({'error': 'task_id is required'}, ensure_ascii=False)
 env = ServerEnv()
 longtasks = env.longtasks
 if longtasks is None:
    return json.dumps({'error': 'service not ready'}, ensure_ascii=False)
 status = await longtasks.get_status(task_id)
 return json.dumps(status)
--- a/app/health.dspy
+++ b/app/health.dspy
@ -0,0 +1,3 @@
 import json
 result = {"status": "ok", "service": "$svc"}
 print(json.dumps(result))
--- a/build.sh
+++ b/build.sh
@ -0,0 +1,150 @@
 #!/bin/bash
 # 一键部署脚本模板
 # 用法: ./build.sh [deploy|update|stop|status]
 set -e
 SERVICE_NAME="asr-service"
 GIT_REPO="git@git.opencomputing.cn:yumoqing/asr-service.git"
 SERVICE_PORT=9925
 DEPLOY_DIR="/data/ymq/$SERVICE_NAME"
 VENV_PATH="/data/ymq/wan22-service/py3"
 GPU_ID="6"
 # 颜色输出
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[1;33m'
 NC='\033[0m'
 log_info() { echo -e "${GREEN}[INFO]${NC} $1"; }
 log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
 log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
 check_deps() {
    command -v git >/dev/null || { log_error "git not found"; exit 1; }
    [ -f "$VENV_PATH/bin/python" ] || { log_error "Python venv not found: $VENV_PATH"; exit 1; }
 }
 deploy() {
    log_info "Deploying $SERVICE_NAME..."
    # 检查依赖
    check_deps
    # 克隆或更新代码
    if [ -d "$DEPLOY_DIR/.git" ]; then
        log_info "Updating existing deployment..."
        cd "$DEPLOY_DIR"
        git fetch origin
        git reset --hard origin/master
    else
        log_info "Cloning repository..."
        cd /data/ymq
        git clone "$GIT_REPO" "$SERVICE_NAME"
        cd "$DEPLOY_DIR"
    fi
    # 创建必要目录
    mkdir -p "$DEPLOY_DIR/app/api/status"
    mkdir -p "$DEPLOY_DIR/app/api/submit"
    mkdir -p "$DEPLOY_DIR/app/api/task"
    # 设置权限
    chmod +x start.sh stop.sh 2>/dev/null || true
    # 启动服务
    start_service
 }
 start_service() {
    log_info "Starting $SERVICE_NAME on port $SERVICE_PORT..."
    # 停止旧进程
    if [ -f stop.sh ]; then
        bash stop.sh 2>/dev/null || true
        sleep 2
    fi
    # 启动新进程
    bash start.sh
    # 等待启动
    sleep 3
    # 验证
    if ss -tlnp | grep -q ":$SERVICE_PORT "; then
        log_info "✓ Service started successfully"
        verify_api
    else
        log_error "✗ Service failed to start"
        log_error "Check logs: $DEPLOY_DIR/nohup.out"
        exit 1
    fi
 }
 verify_api() {
    log_info "Verifying API endpoints..."
    # 检查 status endpoint
    if curl -s "http://127.0.0.1:$SERVICE_PORT/api/status" | grep -q "service"; then
        log_info "✓ /api/status OK"
    else
        log_warn "✗ /api/status failed"
    fi
 }
 stop_service() {
    log_info "Stopping $SERVICE_NAME..."
    if [ -f "$DEPLOY_DIR/stop.sh" ]; then
        cd "$DEPLOY_DIR"
        bash stop.sh
        log_info "✓ Service stopped"
    else
        log_warn "stop.sh not found"
    fi
 }
 show_status() {
    echo "=== $SERVICE_NAME Status ==="
    echo "Port: $SERVICE_PORT"
    echo "Deploy Dir: $DEPLOY_DIR"
    echo ""
    # 检查进程
    if ss -tlnp | grep -q ":$SERVICE_PORT "; then
        echo -e "Status: ${GREEN}RUNNING${NC}"
        PID=$(ss -tlnp | grep ":$SERVICE_PORT " | grep -oP 'pid=\K[0-9]+')
        echo "PID: $PID"
    else
        echo -e "Status: ${RED}STOPPED${NC}"
    fi
    echo ""
    # 检查 API
    echo "API Endpoints:"
    curl -s "http://127.0.0.1:$SERVICE_PORT/api/status" 2>/dev/null | python3 -m json.tool 2>/dev/null || echo "  (not responding)"
 }
 # 主入口
 case "${1:-deploy}" in
    deploy|install)
        deploy
        ;;
    update|upgrade)
        deploy
        ;;
    stop)
        stop_service
        ;;
    start)
        start_service
        ;;
    status)
        show_status
        ;;
    *)
        echo "Usage: $0 {deploy|update|stop|start|status}"
        exit 1
        ;;
 esac
--- a/conf/config.json
+++ b/conf/config.json
@ -0,0 +1 @@
 {"password_key":"ASRService2026Key","databases":{},"session_redis":{"host":"127.0.0.1","port":6379,"db":1},"website":{"paths":[["$[workdir]$/app",""]],"host":"0.0.0.0","port":9925,"coding":"utf-8","indexes":["index.html","index.dspy"],"processors":[[".dspy","dspy"]],"startswiths":[{"leading":"/idfile","registerfunction":"idfile"}]},"hot_reload":false,"filesroot":"/tmp/asr-outputs"}
--- a/start.sh
+++ b/start.sh
@ -0,0 +1,7 @@
 #!/bin/bash
 cd /data/ymq/asr-service
 export ASR_GPU_ID=6
 export CUDA_VISIBLE_DEVICES=6
 export PYTHONPATH=/data/ymq/asr-service
 nohup /data/ymq/wan22-service/py3/bin/python ah.py > nohup.out 2>&1 &
 echo "asr-service started, PID: $!, GPU: $ASR_GPU_ID"
--- a/stop.sh
+++ b/stop.sh
@ -0,0 +1,24 @@
 #!/bin/bash
 # Stop the asr-service
 PID=$(pgrep -f "python ah.py" | head -1)
 if [ -z "$PID" ]; then
    echo "asr-service is not running"
    exit 0
 fi
 echo "Stopping asr-service (PID: $PID)..."
 kill "$PID"
 # Wait up to 10 seconds for graceful shutdown
 for i in $(seq 1 10); do
    if ! kill -0 "$PID" 2>/dev/null; then
        echo "asr-service stopped"
        exit 0
    fi
    sleep 1
 done
 # Force kill if still running
 echo "Force killing asr-service (PID: $PID)..."
 kill -9 "$PID"
 echo "asr-service killed"
--- a/workers/init.py
+++ b/workers/init.py
--- a/workers/transcribe.py
+++ b/workers/transcribe.py
@ -0,0 +1,144 @@
 """
 ASR Transcription Worker using faster-whisper.
 Lazy-loads the model on first use and keeps it in GPU memory.
 Processes transcription tasks from the Redis queue.
 """
 import os
 import json
 import asyncio
 import time
 from appPublic.log import debug, error
 # Module-level model cache (lazy-loaded, stays in memory)
 _model = None
 _model_lock = None
 MODEL_PATH = '/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2'
 def _get_lock():
    """Get or create the async lock for model loading."""
    global _model_lock
    if _model_lock is None:
        _model_lock = asyncio.Lock()
    return _model_lock
 async def load_model():
    """Lazy-load the faster-whisper model. Thread-safe, loads once."""
    global _model
    if _model is not None:
        return _model
    async with _get_lock():
        # Double-check after acquiring lock
        if _model is not None:
            return _model
        debug(f'Loading faster-whisper model from {MODEL_PATH}...')
        t0 = time.time()
        from faster_whisper import WhisperModel
        # CUDA device 0 — CUDA_VISIBLE_DEVICES already isolates the GPU
        _model = WhisperModel(
            MODEL_PATH,
            device='cuda',
            device_index=0,
            compute_type='float16',
            num_workers=1,
        )
        elapsed = time.time() - t0
        debug(f'faster-whisper model loaded in {elapsed:.1f}s')
        return _model
 async def run_transcribe(tasks, payload):
    """
    Run transcription on an audio file.
    Payload fields:
        audio_path (str):      Path to the audio file (required)
        language (str):        Language code, default 'zh'
        word_timestamps (bool): Enable word-level timestamps, default True
        vad_filter (bool):     Enable VAD filter, default True
        output_path (str):     Optional path to save result JSON
    Returns:
        dict with segments, language, duration, etc.
    """
    audio_path = payload.get('audio_path')
    if not audio_path:
        raise ValueError('audio_path is required')
    if not os.path.exists(audio_path):
        raise FileNotFoundError(f'Audio file not found: {audio_path}')
    language = payload.get('language', 'zh')
    word_timestamps = payload.get('word_timestamps', True)
    vad_filter = payload.get('vad_filter', True)
    output_path = payload.get('output_path')
    debug(f'Transcribing: {audio_path} (lang={language}, vad={vad_filter}, words={word_timestamps})')
    t0 = time.time()
    model = await load_model()
    # Run the synchronous transcription in a thread to not block the event loop
    loop = asyncio.get_event_loop()
    segments_gen, info = await loop.run_in_executor(
        None,
        lambda: model.transcribe(
            audio_path,
            language=language,
            word_timestamps=word_timestamps,
            vad_filter=vad_filter,
        )
    )
    # Collect segments
    segments = []
    for seg in segments_gen:
        seg_data = {
            'text': seg.text,
            'start': round(seg.start, 3),
            'end': round(seg.end, 3),
        }
        if word_timestamps and seg.words:
            seg_data['words'] = [
                {
                    'word': w.word,
                    'start': round(w.start, 3),
                    'end': round(w.end, 3),
                    'probability': round(w.probability, 4),
                }
                for w in seg.words
            ]
        segments.append(seg_data)
    elapsed = time.time() - t0
    result = {
        'status': 'ok',
        'text': ' '.join(s['text'] for s in segments),
        'language': info.language,
        'language_probability': round(info.language_probability, 4),
        'duration': round(info.duration, 3),
        'segments': segments,
        'processing_time': round(elapsed, 2),
        'audio_path': audio_path,
    }
    debug(f'Transcription done in {elapsed:.1f}s: {len(segments)} segments, '
          f'duration={info.duration:.1f}s, lang={info.language}')
    # Save result if output_path specified
    if output_path:
        os.makedirs(os.path.dirname(output_path), exist_ok=True)
        with open(output_path, 'w', encoding='utf-8') as f:
            json.dump(result, f, ensure_ascii=False, indent=2)
        debug(f'Result saved to {output_path}')
    return result
		`@ -0,0 +1 @@`
							`{"password_key":"ASRService2026Key","databases":{},"session_redis":{"host":"127.0.0.1","port":6379,"db":1},"website":{"paths":[["$[workdir]$/app",""]],"host":"0.0.0.0","port":9925,"coding":"utf-8","indexes":["index.html","index.dspy"],"processors":[[".dspy","dspy"]],"startswiths":[{"leading":"/idfile","registerfunction":"idfile"}]},"hot_reload":false,"filesroot":"/tmp/asr-outputs"}`