Hermes Agent ccb2c5cca6 Initial: KTV/MTV synthesis HTTP service (ahserver+longtasks+Redis)

2026-06-14 14:46:26 +08:00

5.8 KiB

Raw Blame History

KTV Synth Service

KTV/MTV video synthesis service using FFmpeg. Creates karaoke videos with dual audio tracks (accompaniment + original) and synchronized ASS subtitles.

Overview

This service processes video clips, audio tracks, and subtitles to produce:

MTV (Music Television): Single audio track with original vocals and subtitles
KTV (Karaoke Television): Dual audio tracks - accompaniment (default) and original vocals

Architecture

Framework: ahserver + longtasks + Redis
Port: 9084
Queue: ktv_synth
Worker: FFmpeg subprocess (no GPU required)

Features

Two-step FFmpeg synthesis pipeline
ASS subtitle rendering with karaoke effects
Dual audio track support with proper metadata
Configurable video looping for scene clips
1920x1080 output resolution with Lanczos scaling
Automatic duration calculation

Installation

Prerequisites

Python 3.8+
FFmpeg with libx264 and AAC support
Redis server

Setup

# Clone repository
cd /data/ymq/ktv-synth-service

# Ensure FFmpeg is installed
ffmpeg -version

# Ensure Redis is running
redis-cli ping

Usage

Starting the Service

./start.sh

The service will start on port 9084 and begin processing tasks from the Redis queue.

Stopping the Service

./stop.sh

Health Check

Visit http://localhost:9084/app/health.dspy or check the service status.

API

Task Payload

Submit tasks to the Redis queue ktv_synth:

{
    "task_type": "synthesize",
    "video_files": [
        "/path/to/scene1.mp4",
        "/path/to/scene2.mp4",
        "/path/to/scene3.mp4"
    ],
    "original_audio": "/path/to/original.wav",
    "accompaniment": "/path/to/no_vocals.wav",
    "subtitle_path": "/path/to/subtitles.ass",
    "output_dir": "/tmp/ktv-synth-outputs",
    "title": "SongName",
    "duration": 240.5,
    "loops": 3,
    "output_modes": ["mtv", "ktv"]
}

Parameters

video_files (required): List of video file paths (scene clips to loop)
original_audio (required): Path to original full audio with vocals
accompaniment (required for KTV): Path to accompaniment track (no vocals)
subtitle_path (required): Path to ASS subtitle file
output_dir (optional): Output directory (default: /tmp/ktv-synth-outputs)
title (optional): Song title for output naming (default: output)
duration (optional): Target duration in seconds (auto-calculated if not provided)
loops (optional): Number of video loops (auto-calculated if not provided)
output_modes (optional): List of outputs to generate: ["mtv"], ["ktv"], or ["mtv", "ktv"]

Response

{
    "mtv_path": "/tmp/ktv-synth-outputs/SongName_MTV.mp4",
    "ktv_path": "/tmp/ktv-synth-outputs/SongName_KTV.mp4",
    "mtv_size_mb": 125.45,
    "ktv_size_mb": 145.67,
    "duration": 240.5
}

Technical Details

Two-Step Synthesis Process

Step 1: Create Silent Looped Video Track

Concatenates and loops scene clips to match target duration:

ffmpeg -y -f concat -safe 0 -stream_loop {loops} -i {concat_list} \
  -t {duration} -an -c:v libx264 -preset fast -crf 23 {temp_video}

Step 2a: MTV Synthesis (Single Track)

Combines video with original audio and ASS subtitles:

ffmpeg -y -i {temp_video} -i {original_audio} \
  -map 0:v -map 1:a \
  -vf "ass={subtitle_path},scale=1920:1080:flags=lanczos" \
  -c:v libx264 -preset fast -crf 23 \
  -c:a aac -b:a 192k \
  {mtv_output}

Step 2b: KTV Synthesis (Dual Track)

Creates dual audio tracks with accompaniment as default:

ffmpeg -y -i {temp_video} -i {accompaniment} -i {original_audio} \
  -map 0:v -map 1:a -map 2:a \
  -vf "ass={subtitle_path},scale=1920:1080:flags=lanczos" \
  -c:v libx264 -preset fast -crf 23 \
  -c:a:0 aac -b:a:0 192k -metadata:s:a:0 handler_name="伴奏(Accompaniment)" \
  -c:a:1 aac -b:a:1 192k -metadata:s:a:1 handler_name="原唱(Original)" \
  -disposition:a:0 default -disposition:a:1 0 \
  {ktv_output}

Video Encoding Settings

Codec: H.264 (libx264)
Preset: fast
CRF: 23 (balanced quality/size)
Resolution: 1920x1080
Scaling: Lanczos (high quality)

Audio Encoding Settings

Codec: AAC
Bitrate: 192 kbps
Tracks: 1 (MTV) or 2 (KTV)

KTV Audio Track Metadata

Track 0: Accompaniment (default playback)
- Handler: "伴奏(Accompaniment)"
- Disposition: default
Track 1: Original with vocals
- Handler: "原唱(Original)"
- Disposition: 0 (not default)

Configuration

Edit conf/config.json:

{
    "port": 9084,
    "queue": "ktv_synth",
    "filesroot": "/tmp/ktv-synth-outputs",
    "redis_url": "redis://127.0.0.1:6379",
    "worker_cnt": 1,
    "stuck_seconds": 1800,
    "max_age_hours": 24
}

Troubleshooting

FFmpeg Errors

Check FFmpeg installation and codec support:

ffmpeg -codecs | grep libx264
ffmpeg -codecs | grep aac

Redis Connection

Verify Redis is running:

redis-cli ping

Permission Issues

Ensure the service has write access to output directories:

chmod 755 /tmp/ktv-synth-outputs

High Memory Usage

Reduce worker count in conf/config.json:

{
    "worker_cnt": 1
}

Performance

MTV Generation: ~2-3x real-time (240s video in ~80-120s)
KTV Generation: ~2-3x real-time
Concurrent Tasks: Limited by worker_cnt (default: 1)
Memory: ~500MB-1GB per worker (depends on video resolution)

Integration

This service integrates with:

demucs-service: Audio source separation (provides accompaniment tracks)
whisper-service: Subtitle generation (provides ASS files)
wan22-service: Video generation (provides scene clips)

License

Internal use only.

Support

For issues or questions, contact the development team.

5.8 KiB Raw Blame History