KTV Synth Service
KTV/MTV video synthesis service using FFmpeg. Creates karaoke videos with dual audio tracks (accompaniment + original) and synchronized ASS subtitles.
Overview
This service processes video clips, audio tracks, and subtitles to produce:
- MTV (Music Television): Single audio track with original vocals and subtitles
- KTV (Karaoke Television): Dual audio tracks - accompaniment (default) and original vocals
Architecture
- Framework: ahserver + longtasks + Redis
- Port: 9084
- Queue:
ktv_synth - Worker: FFmpeg subprocess (no GPU required)
Features
- Two-step FFmpeg synthesis pipeline
- ASS subtitle rendering with karaoke effects
- Dual audio track support with proper metadata
- Configurable video looping for scene clips
- 1920x1080 output resolution with Lanczos scaling
- Automatic duration calculation
Installation
Prerequisites
- Python 3.8+
- FFmpeg with libx264 and AAC support
- Redis server
Setup
# Clone repository
cd /data/ymq/ktv-synth-service
# Ensure FFmpeg is installed
ffmpeg -version
# Ensure Redis is running
redis-cli ping
Usage
Starting the Service
./start.sh
The service will start on port 9084 and begin processing tasks from the Redis queue.
Stopping the Service
./stop.sh
Health Check
Visit http://localhost:9084/app/health.dspy or check the service status.
API
Task Payload
Submit tasks to the Redis queue ktv_synth:
{
"task_type": "synthesize",
"video_files": [
"/path/to/scene1.mp4",
"/path/to/scene2.mp4",
"/path/to/scene3.mp4"
],
"original_audio": "/path/to/original.wav",
"accompaniment": "/path/to/no_vocals.wav",
"subtitle_path": "/path/to/subtitles.ass",
"output_dir": "/tmp/ktv-synth-outputs",
"title": "SongName",
"duration": 240.5,
"loops": 3,
"output_modes": ["mtv", "ktv"]
}
Parameters
video_files(required): List of video file paths (scene clips to loop)original_audio(required): Path to original full audio with vocalsaccompaniment(required for KTV): Path to accompaniment track (no vocals)subtitle_path(required): Path to ASS subtitle fileoutput_dir(optional): Output directory (default:/tmp/ktv-synth-outputs)title(optional): Song title for output naming (default:output)duration(optional): Target duration in seconds (auto-calculated if not provided)loops(optional): Number of video loops (auto-calculated if not provided)output_modes(optional): List of outputs to generate:["mtv"],["ktv"], or["mtv", "ktv"]
Response
{
"mtv_path": "/tmp/ktv-synth-outputs/SongName_MTV.mp4",
"ktv_path": "/tmp/ktv-synth-outputs/SongName_KTV.mp4",
"mtv_size_mb": 125.45,
"ktv_size_mb": 145.67,
"duration": 240.5
}
Technical Details
Two-Step Synthesis Process
Step 1: Create Silent Looped Video Track
Concatenates and loops scene clips to match target duration:
ffmpeg -y -f concat -safe 0 -stream_loop {loops} -i {concat_list} \
-t {duration} -an -c:v libx264 -preset fast -crf 23 {temp_video}
Step 2a: MTV Synthesis (Single Track)
Combines video with original audio and ASS subtitles:
ffmpeg -y -i {temp_video} -i {original_audio} \
-map 0:v -map 1:a \
-vf "ass={subtitle_path},scale=1920:1080:flags=lanczos" \
-c:v libx264 -preset fast -crf 23 \
-c:a aac -b:a 192k \
{mtv_output}
Step 2b: KTV Synthesis (Dual Track)
Creates dual audio tracks with accompaniment as default:
ffmpeg -y -i {temp_video} -i {accompaniment} -i {original_audio} \
-map 0:v -map 1:a -map 2:a \
-vf "ass={subtitle_path},scale=1920:1080:flags=lanczos" \
-c:v libx264 -preset fast -crf 23 \
-c:a:0 aac -b:a:0 192k -metadata:s:a:0 handler_name="伴奏(Accompaniment)" \
-c:a:1 aac -b:a:1 192k -metadata:s:a:1 handler_name="原唱(Original)" \
-disposition:a:0 default -disposition:a:1 0 \
{ktv_output}
Video Encoding Settings
- Codec: H.264 (libx264)
- Preset: fast
- CRF: 23 (balanced quality/size)
- Resolution: 1920x1080
- Scaling: Lanczos (high quality)
Audio Encoding Settings
- Codec: AAC
- Bitrate: 192 kbps
- Tracks: 1 (MTV) or 2 (KTV)
KTV Audio Track Metadata
- Track 0: Accompaniment (default playback)
- Handler: "伴奏(Accompaniment)"
- Disposition: default
- Track 1: Original with vocals
- Handler: "原唱(Original)"
- Disposition: 0 (not default)
Configuration
Edit conf/config.json:
{
"port": 9084,
"queue": "ktv_synth",
"filesroot": "/tmp/ktv-synth-outputs",
"redis_url": "redis://127.0.0.1:6379",
"worker_cnt": 1,
"stuck_seconds": 1800,
"max_age_hours": 24
}
Troubleshooting
FFmpeg Errors
Check FFmpeg installation and codec support:
ffmpeg -codecs | grep libx264
ffmpeg -codecs | grep aac
Redis Connection
Verify Redis is running:
redis-cli ping
Permission Issues
Ensure the service has write access to output directories:
chmod 755 /tmp/ktv-synth-outputs
High Memory Usage
Reduce worker count in conf/config.json:
{
"worker_cnt": 1
}
Performance
- MTV Generation: ~2-3x real-time (240s video in ~80-120s)
- KTV Generation: ~2-3x real-time
- Concurrent Tasks: Limited by
worker_cnt(default: 1) - Memory: ~500MB-1GB per worker (depends on video resolution)
Integration
This service integrates with:
- demucs-service: Audio source separation (provides accompaniment tracks)
- whisper-service: Subtitle generation (provides ASS files)
- wan22-service: Video generation (provides scene clips)
License
Internal use only.
Support
For issues or questions, contact the development team.