ktv-synth-service/README.md

5.8 KiB

KTV Synth Service

KTV/MTV video synthesis service using FFmpeg. Creates karaoke videos with dual audio tracks (accompaniment + original) and synchronized ASS subtitles.

Overview

This service processes video clips, audio tracks, and subtitles to produce:

  • MTV (Music Television): Single audio track with original vocals and subtitles
  • KTV (Karaoke Television): Dual audio tracks - accompaniment (default) and original vocals

Architecture

  • Framework: ahserver + longtasks + Redis
  • Port: 9084
  • Queue: ktv_synth
  • Worker: FFmpeg subprocess (no GPU required)

Features

  • Two-step FFmpeg synthesis pipeline
  • ASS subtitle rendering with karaoke effects
  • Dual audio track support with proper metadata
  • Configurable video looping for scene clips
  • 1920x1080 output resolution with Lanczos scaling
  • Automatic duration calculation

Installation

Prerequisites

  • Python 3.8+
  • FFmpeg with libx264 and AAC support
  • Redis server

Setup

# Clone repository
cd /data/ymq/ktv-synth-service

# Ensure FFmpeg is installed
ffmpeg -version

# Ensure Redis is running
redis-cli ping

Usage

Starting the Service

./start.sh

The service will start on port 9084 and begin processing tasks from the Redis queue.

Stopping the Service

./stop.sh

Health Check

Visit http://localhost:9084/app/health.dspy or check the service status.

API

Task Payload

Submit tasks to the Redis queue ktv_synth:

{
    "task_type": "synthesize",
    "video_files": [
        "/path/to/scene1.mp4",
        "/path/to/scene2.mp4",
        "/path/to/scene3.mp4"
    ],
    "original_audio": "/path/to/original.wav",
    "accompaniment": "/path/to/no_vocals.wav",
    "subtitle_path": "/path/to/subtitles.ass",
    "output_dir": "/tmp/ktv-synth-outputs",
    "title": "SongName",
    "duration": 240.5,
    "loops": 3,
    "output_modes": ["mtv", "ktv"]
}

Parameters

  • video_files (required): List of video file paths (scene clips to loop)
  • original_audio (required): Path to original full audio with vocals
  • accompaniment (required for KTV): Path to accompaniment track (no vocals)
  • subtitle_path (required): Path to ASS subtitle file
  • output_dir (optional): Output directory (default: /tmp/ktv-synth-outputs)
  • title (optional): Song title for output naming (default: output)
  • duration (optional): Target duration in seconds (auto-calculated if not provided)
  • loops (optional): Number of video loops (auto-calculated if not provided)
  • output_modes (optional): List of outputs to generate: ["mtv"], ["ktv"], or ["mtv", "ktv"]

Response

{
    "mtv_path": "/tmp/ktv-synth-outputs/SongName_MTV.mp4",
    "ktv_path": "/tmp/ktv-synth-outputs/SongName_KTV.mp4",
    "mtv_size_mb": 125.45,
    "ktv_size_mb": 145.67,
    "duration": 240.5
}

Technical Details

Two-Step Synthesis Process

Step 1: Create Silent Looped Video Track

Concatenates and loops scene clips to match target duration:

ffmpeg -y -f concat -safe 0 -stream_loop {loops} -i {concat_list} \
  -t {duration} -an -c:v libx264 -preset fast -crf 23 {temp_video}

Step 2a: MTV Synthesis (Single Track)

Combines video with original audio and ASS subtitles:

ffmpeg -y -i {temp_video} -i {original_audio} \
  -map 0:v -map 1:a \
  -vf "ass={subtitle_path},scale=1920:1080:flags=lanczos" \
  -c:v libx264 -preset fast -crf 23 \
  -c:a aac -b:a 192k \
  {mtv_output}

Step 2b: KTV Synthesis (Dual Track)

Creates dual audio tracks with accompaniment as default:

ffmpeg -y -i {temp_video} -i {accompaniment} -i {original_audio} \
  -map 0:v -map 1:a -map 2:a \
  -vf "ass={subtitle_path},scale=1920:1080:flags=lanczos" \
  -c:v libx264 -preset fast -crf 23 \
  -c:a:0 aac -b:a:0 192k -metadata:s:a:0 handler_name="伴奏(Accompaniment)" \
  -c:a:1 aac -b:a:1 192k -metadata:s:a:1 handler_name="原唱(Original)" \
  -disposition:a:0 default -disposition:a:1 0 \
  {ktv_output}

Video Encoding Settings

  • Codec: H.264 (libx264)
  • Preset: fast
  • CRF: 23 (balanced quality/size)
  • Resolution: 1920x1080
  • Scaling: Lanczos (high quality)

Audio Encoding Settings

  • Codec: AAC
  • Bitrate: 192 kbps
  • Tracks: 1 (MTV) or 2 (KTV)

KTV Audio Track Metadata

  • Track 0: Accompaniment (default playback)
    • Handler: "伴奏(Accompaniment)"
    • Disposition: default
  • Track 1: Original with vocals
    • Handler: "原唱(Original)"
    • Disposition: 0 (not default)

Configuration

Edit conf/config.json:

{
    "port": 9084,
    "queue": "ktv_synth",
    "filesroot": "/tmp/ktv-synth-outputs",
    "redis_url": "redis://127.0.0.1:6379",
    "worker_cnt": 1,
    "stuck_seconds": 1800,
    "max_age_hours": 24
}

Troubleshooting

FFmpeg Errors

Check FFmpeg installation and codec support:

ffmpeg -codecs | grep libx264
ffmpeg -codecs | grep aac

Redis Connection

Verify Redis is running:

redis-cli ping

Permission Issues

Ensure the service has write access to output directories:

chmod 755 /tmp/ktv-synth-outputs

High Memory Usage

Reduce worker count in conf/config.json:

{
    "worker_cnt": 1
}

Performance

  • MTV Generation: ~2-3x real-time (240s video in ~80-120s)
  • KTV Generation: ~2-3x real-time
  • Concurrent Tasks: Limited by worker_cnt (default: 1)
  • Memory: ~500MB-1GB per worker (depends on video resolution)

Integration

This service integrates with:

  • demucs-service: Audio source separation (provides accompaniment tracks)
  • whisper-service: Subtitle generation (provides ASS files)
  • wan22-service: Video generation (provides scene clips)

License

Internal use only.

Support

For issues or questions, contact the development team.