Compare commits

..

No commits in common. "master" and "main" have entirely different histories.
master ... main

13 changed files with 1 additions and 722 deletions

6
.gitignore vendored
View File

@ -1,6 +0,0 @@
__pycache__/
*.pyc
nohup*.out
*.egg-info
.env
py3/

244
README.md
View File

@ -1,244 +1,2 @@
# ASR Service
# asr-service
Speech-to-text service powered by [faster-whisper](https://github.com/SYSTRAN/faster-whisper) (CTranslate2 backend). Uses the `large-v3-turbo` model for fast, high-quality transcription with word-level timestamps.
## Architecture
```
Client --> Redis Queue ("asr") --> ASRTasks (LongTasks worker)
|
v
faster-whisper (GPU)
|
v
Result (JSON)
```
- **ahserver**: Web framework serving HTTP on port 9925
- **longtasks**: Redis-backed async task queue with worker management
- **Redis**: Task queue broker (queue name: `asr`)
- **faster-whisper**: ASR engine running on GPU (CUDA, float16)
The service follows the same ahserver+longtasks pattern as wan22-service and realesrgan-service.
## Model
- **Model**: faster-whisper-large-v3-turbo-ct2
- **Path**: `/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2`
- **Device**: CUDA (float16)
- **GPU**: Isolated via `CUDA_VISIBLE_DEVICES` (default GPU 5)
The model is lazy-loaded on first transcription request and stays in GPU memory for subsequent requests.
## 模型下载(离线部署)
faster-whisper-large-v3-turbo-ct2 是 HuggingFace 模型,需要先下载再部署。
### 方法1: huggingface-cli推荐
```bash
# 安装 huggingface-cli
pip install huggingface_hub
# 下载模型到指定目录
huggingface-cli download deepdml/faster-whisper-large-v3-turbo-ct2 \
--local-dir /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2 \
--local-dir-use-symlinks False
```
**下载大小**: ~1.6GB
**下载时间**: 取决于网络速度约3-10分钟
### 方法2: git-lfs
```bash
# 安装 git-lfs
git lfs install
# 克隆模型仓库
cd /data/ymq/models
mkdir -p deepdml
cd deepdml
git clone https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
```
### 方法3: wget/curl单文件
如果只需要核心文件,可以直接下载:
```bash
cd /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2
# 下载模型文件
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/model.bin
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/tokenizer.json
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/vocabulary.json
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/config.json
wget https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2/resolve/main/preprocessor_config.json
```
### 验证下载
```bash
ls -lh /data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2/
# 应该看到 model.bin (约1.6GB) + tokenizer.json + vocabulary.json + config.json
```
### 模型来源
- **HuggingFace**: https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2
- **Base Model**: openai/whisper-large-v3-turbo (CTranslate2 优化版)
- **License**: MIT
- **优化**: CTranslate2 格式,比原版 Whisper 快 4 倍,内存占用更少
## Deployment
### Prerequisites
- Python venv with faster-whisper 1.2.1: `/data/ymq/demucs_venv`
- Redis server running on 127.0.0.1:6379
- CUDA-capable GPU
### Start
```bash
cd /data/ymq/asr-service
bash start.sh
```
### Stop
```bash
cd /data/ymq/asr-service
bash stop.sh
```
### Health Check
```bash
curl http://localhost:9925/health
```
Returns:
```json
{
"status": "ok",
"service": "asr-service",
"model": "faster-whisper-large-v3-turbo-ct2"
}
```
## API Usage
Tasks are submitted via Redis, same pattern as wan22-service.
### Submit a Transcription Task
```python
import redis
import json
import uuid
r = redis.Redis(host='127.0.0.1', port=6379)
task_id = str(uuid.uuid4())
payload = {
"task_id": task_id,
"task_type": "transcribe",
"audio_path": "/path/to/audio.wav",
"language": "zh",
"word_timestamps": True,
"vad_filter": True,
"output_path": "/tmp/asr-outputs/result.json"
}
# Push to the Redis queue
r.lpush('asr:queue', json.dumps(payload))
print(f"Task submitted: {task_id}")
```
### Check Task Status
```python
# Task status is stored in Redis by longtasks
status = r.get(f'asr:status:{task_id}')
result = r.get(f'asr:result:{task_id}')
```
## Task Payload Format
| Field | Type | Required | Default | Description |
|------------------|--------|----------|---------|--------------------------------------|
| task_type | string | Yes | - | Must be `"transcribe"` |
| audio_path | string | Yes | - | Path to input audio file |
| language | string | No | `"zh"` | Language code (zh, en, ja, etc.) |
| word_timestamps | bool | No | `True` | Enable word-level timestamps |
| vad_filter | bool | No | `True` | Enable voice activity detection |
| output_path | string | No | - | If set, save result JSON to this path|
## Output Format
```json
{
"status": "ok",
"text": "Full transcription text...",
"language": "zh",
"language_probability": 0.9876,
"duration": 125.340,
"segments": [
{
"text": "Segment text",
"start": 0.000,
"end": 5.120,
"words": [
{
"word": "你好",
"start": 0.000,
"end": 0.800,
"probability": 0.9523
}
]
}
],
"processing_time": 3.45,
"audio_path": "/path/to/audio.wav"
}
```
## Configuration
Config file: `conf/config.json`
| Setting | Value | Description |
|-----------------------|------------------------------|--------------------------------|
| website.port | 9925 | HTTP listen port |
| website.host | 0.0.0.0 | Bind address |
| session_redis | 127.0.0.1:6379 db=1 | Session storage |
| password_key | ASRService2026Key | Auth key |
| filesroot | /tmp/asr-outputs | Output files directory |
### Environment Variables
| Variable | Default | Description |
|----------------------|---------|---------------------------------------|
| ASR_GPU_ID | 5 | GPU device ID (for logging) |
| CUDA_VISIBLE_DEVICES | 5 | CUDA device isolation |
| PYTHONPATH | . | Python module search path |
## File Structure
```
asr-service/
├── ah.py # Main entry point
├── start.sh # Start script
├── stop.sh # Stop script
├── conf/
│ └── config.json # Service configuration
├── app/
│ └── health.dspy # Health check endpoint
├── workers/
│ ├── __init__.py
│ └── transcribe.py # Transcription worker
└── README.md
```

43
ah.py
View File

@ -1,43 +0,0 @@
import os
from ahserver.webapp import webapp
from ahserver.serverenv import ServerEnv
from ahserver.configuredServer import add_startup
from longtasks.longtasks import LongTasks, schedule_once
from appPublic.log import debug
class ASRTasks(LongTasks):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.gpu_id = int(os.environ.get('ASR_GPU_ID', '5'))
async def process_task(self, payload, workid=None):
import json
if isinstance(payload, str):
payload = json.loads(payload)
task_type = payload.get('task_type', '')
if task_type == 'transcribe':
from workers.transcribe import run_transcribe
return await run_transcribe(self, payload)
raise ValueError(f'Unknown task_type: {task_type}')
async def on_app_built(app):
env = ServerEnv()
lt = env.longtasks
if lt:
schedule_once(0.1, lt.run)
debug(f'ASR longtasks worker started, GPU: {lt.gpu_id}')
def init():
env = ServerEnv()
env.longtasks = ASRTasks(
'redis://127.0.0.1:6379', 'asr',
worker_cnt=1, stuck_seconds=600, max_age_hours=24
)
add_startup(on_app_built)
if __name__ == '__main__':
webapp(init)

View File

@ -1,31 +0,0 @@
# -*- coding:utf-8 -*-
# GET /api/status - ASR服务状态
import subprocess
import json
result = {
'service': 'asr-transcription',
'model': 'faster-whisper-large-v3-turbo-ct2',
'gpu_id': 6,
'gpus': []
}
try:
out = subprocess.check_output(
['nvidia-smi', '--query-gpu=index,utilization.gpu,memory.used,memory.total',
'--format=csv,noheader,nounits'],
timeout=5
).decode().strip()
for line in out.split('\n'):
parts = [p.strip() for p in line.split(',')]
result['gpus'].append({
'id': int(parts[0]),
'util': int(parts[1]),
'mem_used': int(parts[2]),
'mem_total': int(parts[3])
})
except Exception:
pass
return json.dumps(result)

View File

@ -1,53 +0,0 @@
# -*- coding:utf-8 -*-
# POST /api/submit - 提交ASR转录任务
import json
import uuid
from ahserver.serverenv import ServerEnv
method = request.method
if method == 'POST':
audio_path = params_kw.get('audio_path', '')
if not audio_path:
return json.dumps({'error': 'audio_path is required'}, ensure_ascii=False)
task_id = params_kw.get('task_id', str(uuid.uuid4()).replace("-", "")[:12])
language = params_kw.get('language', 'auto')
beam_size = params_kw.get('beam_size', 5)
payload = {
'task_type': 'transcribe',
'task_id': task_id,
'audio_path': audio_path,
'language': language,
'beam_size': int(beam_size)
}
env = ServerEnv()
longtasks = env.longtasks
if longtasks is None:
return json.dumps({'error': 'service not ready'}, ensure_ascii=False)
result = await longtasks.submit_task(payload)
real_task_id = result.get('task_id', str(result)) if isinstance(result, dict) else str(result)
return json.dumps({
'task_id': real_task_id,
'status': 'queued',
'audio_path': audio_path,
'language': language,
'message': 'task submitted',
'check_url': f'/api/task?task_id={real_task_id}'
}, ensure_ascii=False)
else:
return json.dumps({
'usage': 'POST with JSON body',
'params': {
'audio_path': 'string (required, server path to audio file)',
'language': 'string (default auto, or zh/en/ja/ko etc)',
'beam_size': 'int (default 5)',
'task_id': 'string (optional, auto-generated)',
}
}, ensure_ascii=False)

View File

@ -1,17 +0,0 @@
# -*- coding:utf-8 -*-
# GET /api/task?task_id=xxx - 查询任务状态
import json
from ahserver.serverenv import ServerEnv
task_id = params_kw.get('task_id', '')
if not task_id:
return json.dumps({'error': 'task_id is required'}, ensure_ascii=False)
env = ServerEnv()
longtasks = env.longtasks
if longtasks is None:
return json.dumps({'error': 'service not ready'}, ensure_ascii=False)
status = await longtasks.get_status(task_id)
return json.dumps(status)

View File

@ -1,3 +0,0 @@
import json
result = {"status": "ok", "service": "$svc"}
print(json.dumps(result))

150
build.sh
View File

@ -1,150 +0,0 @@
#!/bin/bash
# 一键部署脚本模板
# 用法: ./build.sh [deploy|update|stop|status]
set -e
SERVICE_NAME="asr-service"
GIT_REPO="git@git.opencomputing.cn:yumoqing/asr-service.git"
SERVICE_PORT=9925
DEPLOY_DIR="/data/ymq/$SERVICE_NAME"
VENV_PATH="/data/ymq/wan22-service/py3"
GPU_ID="6"
# 颜色输出
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log_info() { echo -e "${GREEN}[INFO]${NC} $1"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
check_deps() {
command -v git >/dev/null || { log_error "git not found"; exit 1; }
[ -f "$VENV_PATH/bin/python" ] || { log_error "Python venv not found: $VENV_PATH"; exit 1; }
}
deploy() {
log_info "Deploying $SERVICE_NAME..."
# 检查依赖
check_deps
# 克隆或更新代码
if [ -d "$DEPLOY_DIR/.git" ]; then
log_info "Updating existing deployment..."
cd "$DEPLOY_DIR"
git fetch origin
git reset --hard origin/master
else
log_info "Cloning repository..."
cd /data/ymq
git clone "$GIT_REPO" "$SERVICE_NAME"
cd "$DEPLOY_DIR"
fi
# 创建必要目录
mkdir -p "$DEPLOY_DIR/app/api/status"
mkdir -p "$DEPLOY_DIR/app/api/submit"
mkdir -p "$DEPLOY_DIR/app/api/task"
# 设置权限
chmod +x start.sh stop.sh 2>/dev/null || true
# 启动服务
start_service
}
start_service() {
log_info "Starting $SERVICE_NAME on port $SERVICE_PORT..."
# 停止旧进程
if [ -f stop.sh ]; then
bash stop.sh 2>/dev/null || true
sleep 2
fi
# 启动新进程
bash start.sh
# 等待启动
sleep 3
# 验证
if ss -tlnp | grep -q ":$SERVICE_PORT "; then
log_info "✓ Service started successfully"
verify_api
else
log_error "✗ Service failed to start"
log_error "Check logs: $DEPLOY_DIR/nohup.out"
exit 1
fi
}
verify_api() {
log_info "Verifying API endpoints..."
# 检查 status endpoint
if curl -s "http://127.0.0.1:$SERVICE_PORT/api/status" | grep -q "service"; then
log_info "✓ /api/status OK"
else
log_warn "✗ /api/status failed"
fi
}
stop_service() {
log_info "Stopping $SERVICE_NAME..."
if [ -f "$DEPLOY_DIR/stop.sh" ]; then
cd "$DEPLOY_DIR"
bash stop.sh
log_info "✓ Service stopped"
else
log_warn "stop.sh not found"
fi
}
show_status() {
echo "=== $SERVICE_NAME Status ==="
echo "Port: $SERVICE_PORT"
echo "Deploy Dir: $DEPLOY_DIR"
echo ""
# 检查进程
if ss -tlnp | grep -q ":$SERVICE_PORT "; then
echo -e "Status: ${GREEN}RUNNING${NC}"
PID=$(ss -tlnp | grep ":$SERVICE_PORT " | grep -oP 'pid=\K[0-9]+')
echo "PID: $PID"
else
echo -e "Status: ${RED}STOPPED${NC}"
fi
echo ""
# 检查 API
echo "API Endpoints:"
curl -s "http://127.0.0.1:$SERVICE_PORT/api/status" 2>/dev/null | python3 -m json.tool 2>/dev/null || echo " (not responding)"
}
# 主入口
case "${1:-deploy}" in
deploy|install)
deploy
;;
update|upgrade)
deploy
;;
stop)
stop_service
;;
start)
start_service
;;
status)
show_status
;;
*)
echo "Usage: $0 {deploy|update|stop|start|status}"
exit 1
;;
esac

View File

@ -1 +0,0 @@
{"password_key":"ASRService2026Key","databases":{},"session_redis":{"host":"127.0.0.1","port":6379,"db":1},"website":{"paths":[["$[workdir]$/app",""]],"host":"0.0.0.0","port":9925,"coding":"utf-8","indexes":["index.html","index.dspy"],"processors":[[".dspy","dspy"]],"startswiths":[{"leading":"/idfile","registerfunction":"idfile"}]},"hot_reload":false,"filesroot":"/tmp/asr-outputs"}

View File

@ -1,7 +0,0 @@
#!/bin/bash
cd /data/ymq/asr-service
export ASR_GPU_ID=6
export CUDA_VISIBLE_DEVICES=6
export PYTHONPATH=/data/ymq/asr-service
nohup /data/ymq/wan22-service/py3/bin/python ah.py > nohup.out 2>&1 &
echo "asr-service started, PID: $!, GPU: $ASR_GPU_ID"

24
stop.sh
View File

@ -1,24 +0,0 @@
#!/bin/bash
# Stop the asr-service
PID=$(pgrep -f "python ah.py" | head -1)
if [ -z "$PID" ]; then
echo "asr-service is not running"
exit 0
fi
echo "Stopping asr-service (PID: $PID)..."
kill "$PID"
# Wait up to 10 seconds for graceful shutdown
for i in $(seq 1 10); do
if ! kill -0 "$PID" 2>/dev/null; then
echo "asr-service stopped"
exit 0
fi
sleep 1
done
# Force kill if still running
echo "Force killing asr-service (PID: $PID)..."
kill -9 "$PID"
echo "asr-service killed"

View File

View File

@ -1,144 +0,0 @@
"""
ASR Transcription Worker using faster-whisper.
Lazy-loads the model on first use and keeps it in GPU memory.
Processes transcription tasks from the Redis queue.
"""
import os
import json
import asyncio
import time
from appPublic.log import debug, error
# Module-level model cache (lazy-loaded, stays in memory)
_model = None
_model_lock = None
MODEL_PATH = '/data/ymq/models/deepdml/faster-whisper-large-v3-turbo-ct2'
def _get_lock():
"""Get or create the async lock for model loading."""
global _model_lock
if _model_lock is None:
_model_lock = asyncio.Lock()
return _model_lock
async def load_model():
"""Lazy-load the faster-whisper model. Thread-safe, loads once."""
global _model
if _model is not None:
return _model
async with _get_lock():
# Double-check after acquiring lock
if _model is not None:
return _model
debug(f'Loading faster-whisper model from {MODEL_PATH}...')
t0 = time.time()
from faster_whisper import WhisperModel
# CUDA device 0 — CUDA_VISIBLE_DEVICES already isolates the GPU
_model = WhisperModel(
MODEL_PATH,
device='cuda',
device_index=0,
compute_type='float16',
num_workers=1,
)
elapsed = time.time() - t0
debug(f'faster-whisper model loaded in {elapsed:.1f}s')
return _model
async def run_transcribe(tasks, payload):
"""
Run transcription on an audio file.
Payload fields:
audio_path (str): Path to the audio file (required)
language (str): Language code, default 'zh'
word_timestamps (bool): Enable word-level timestamps, default True
vad_filter (bool): Enable VAD filter, default True
output_path (str): Optional path to save result JSON
Returns:
dict with segments, language, duration, etc.
"""
audio_path = payload.get('audio_path')
if not audio_path:
raise ValueError('audio_path is required')
if not os.path.exists(audio_path):
raise FileNotFoundError(f'Audio file not found: {audio_path}')
language = payload.get('language', 'zh')
word_timestamps = payload.get('word_timestamps', True)
vad_filter = payload.get('vad_filter', True)
output_path = payload.get('output_path')
debug(f'Transcribing: {audio_path} (lang={language}, vad={vad_filter}, words={word_timestamps})')
t0 = time.time()
model = await load_model()
# Run the synchronous transcription in a thread to not block the event loop
loop = asyncio.get_event_loop()
segments_gen, info = await loop.run_in_executor(
None,
lambda: model.transcribe(
audio_path,
language=language,
word_timestamps=word_timestamps,
vad_filter=vad_filter,
)
)
# Collect segments
segments = []
for seg in segments_gen:
seg_data = {
'text': seg.text,
'start': round(seg.start, 3),
'end': round(seg.end, 3),
}
if word_timestamps and seg.words:
seg_data['words'] = [
{
'word': w.word,
'start': round(w.start, 3),
'end': round(w.end, 3),
'probability': round(w.probability, 4),
}
for w in seg.words
]
segments.append(seg_data)
elapsed = time.time() - t0
result = {
'status': 'ok',
'text': ' '.join(s['text'] for s in segments),
'language': info.language,
'language_probability': round(info.language_probability, 4),
'duration': round(info.duration, 3),
'segments': segments,
'processing_time': round(elapsed, 2),
'audio_path': audio_path,
}
debug(f'Transcription done in {elapsed:.1f}s: {len(segments)} segments, '
f'duration={info.duration:.1f}s, lang={info.language}')
# Save result if output_path specified
if output_path:
os.makedirs(os.path.dirname(output_path), exist_ok=True)
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(result, f, ensure_ascii=False, indent=2)
debug(f'Result saved to {output_path}')
return result