256 lines
5.8 KiB
Markdown

# KTV Synth Service
KTV/MTV video synthesis service using FFmpeg. Creates karaoke videos with dual audio tracks (accompaniment + original) and synchronized ASS subtitles.
## Overview
This service processes video clips, audio tracks, and subtitles to produce:
- **MTV (Music Television)**: Single audio track with original vocals and subtitles
- **KTV (Karaoke Television)**: Dual audio tracks - accompaniment (default) and original vocals
## Architecture
- **Framework**: ahserver + longtasks + Redis
- **Port**: 9084
- **Queue**: `ktv_synth`
- **Worker**: FFmpeg subprocess (no GPU required)
## Features
- Two-step FFmpeg synthesis pipeline
- ASS subtitle rendering with karaoke effects
- Dual audio track support with proper metadata
- Configurable video looping for scene clips
- 1920x1080 output resolution with Lanczos scaling
- Automatic duration calculation
## Installation
### Prerequisites
- Python 3.8+
- FFmpeg with libx264 and AAC support
- Redis server
### Setup
```bash
# Clone repository
cd /data/ymq/ktv-synth-service
# Ensure FFmpeg is installed
ffmpeg -version
# Ensure Redis is running
redis-cli ping
```
## Usage
### Starting the Service
```bash
./start.sh
```
The service will start on port 9084 and begin processing tasks from the Redis queue.
### Stopping the Service
```bash
./stop.sh
```
### Health Check
Visit `http://localhost:9084/app/health.dspy` or check the service status.
## API
### Task Payload
Submit tasks to the Redis queue `ktv_synth`:
```json
{
"task_type": "synthesize",
"video_files": [
"/path/to/scene1.mp4",
"/path/to/scene2.mp4",
"/path/to/scene3.mp4"
],
"original_audio": "/path/to/original.wav",
"accompaniment": "/path/to/no_vocals.wav",
"subtitle_path": "/path/to/subtitles.ass",
"output_dir": "/tmp/ktv-synth-outputs",
"title": "SongName",
"duration": 240.5,
"loops": 3,
"output_modes": ["mtv", "ktv"]
}
```
### Parameters
- `video_files` (required): List of video file paths (scene clips to loop)
- `original_audio` (required): Path to original full audio with vocals
- `accompaniment` (required for KTV): Path to accompaniment track (no vocals)
- `subtitle_path` (required): Path to ASS subtitle file
- `output_dir` (optional): Output directory (default: `/tmp/ktv-synth-outputs`)
- `title` (optional): Song title for output naming (default: `output`)
- `duration` (optional): Target duration in seconds (auto-calculated if not provided)
- `loops` (optional): Number of video loops (auto-calculated if not provided)
- `output_modes` (optional): List of outputs to generate: `["mtv"]`, `["ktv"]`, or `["mtv", "ktv"]`
### Response
```json
{
"mtv_path": "/tmp/ktv-synth-outputs/SongName_MTV.mp4",
"ktv_path": "/tmp/ktv-synth-outputs/SongName_KTV.mp4",
"mtv_size_mb": 125.45,
"ktv_size_mb": 145.67,
"duration": 240.5
}
```
## Technical Details
### Two-Step Synthesis Process
#### Step 1: Create Silent Looped Video Track
Concatenates and loops scene clips to match target duration:
```bash
ffmpeg -y -f concat -safe 0 -stream_loop {loops} -i {concat_list} \
-t {duration} -an -c:v libx264 -preset fast -crf 23 {temp_video}
```
#### Step 2a: MTV Synthesis (Single Track)
Combines video with original audio and ASS subtitles:
```bash
ffmpeg -y -i {temp_video} -i {original_audio} \
-map 0:v -map 1:a \
-vf "ass={subtitle_path},scale=1920:1080:flags=lanczos" \
-c:v libx264 -preset fast -crf 23 \
-c:a aac -b:a 192k \
{mtv_output}
```
#### Step 2b: KTV Synthesis (Dual Track)
Creates dual audio tracks with accompaniment as default:
```bash
ffmpeg -y -i {temp_video} -i {accompaniment} -i {original_audio} \
-map 0:v -map 1:a -map 2:a \
-vf "ass={subtitle_path},scale=1920:1080:flags=lanczos" \
-c:v libx264 -preset fast -crf 23 \
-c:a:0 aac -b:a:0 192k -metadata:s:a:0 handler_name="伴奏(Accompaniment)" \
-c:a:1 aac -b:a:1 192k -metadata:s:a:1 handler_name="原唱(Original)" \
-disposition:a:0 default -disposition:a:1 0 \
{ktv_output}
```
### Video Encoding Settings
- **Codec**: H.264 (libx264)
- **Preset**: fast
- **CRF**: 23 (balanced quality/size)
- **Resolution**: 1920x1080
- **Scaling**: Lanczos (high quality)
### Audio Encoding Settings
- **Codec**: AAC
- **Bitrate**: 192 kbps
- **Tracks**: 1 (MTV) or 2 (KTV)
### KTV Audio Track Metadata
- **Track 0**: Accompaniment (default playback)
- Handler: "伴奏(Accompaniment)"
- Disposition: default
- **Track 1**: Original with vocals
- Handler: "原唱(Original)"
- Disposition: 0 (not default)
## Configuration
Edit `conf/config.json`:
```json
{
"port": 9084,
"queue": "ktv_synth",
"filesroot": "/tmp/ktv-synth-outputs",
"redis_url": "redis://127.0.0.1:6379",
"worker_cnt": 1,
"stuck_seconds": 1800,
"max_age_hours": 24
}
```
## Troubleshooting
### FFmpeg Errors
Check FFmpeg installation and codec support:
```bash
ffmpeg -codecs | grep libx264
ffmpeg -codecs | grep aac
```
### Redis Connection
Verify Redis is running:
```bash
redis-cli ping
```
### Permission Issues
Ensure the service has write access to output directories:
```bash
chmod 755 /tmp/ktv-synth-outputs
```
### High Memory Usage
Reduce worker count in `conf/config.json`:
```json
{
"worker_cnt": 1
}
```
## Performance
- **MTV Generation**: ~2-3x real-time (240s video in ~80-120s)
- **KTV Generation**: ~2-3x real-time
- **Concurrent Tasks**: Limited by `worker_cnt` (default: 1)
- **Memory**: ~500MB-1GB per worker (depends on video resolution)
## Integration
This service integrates with:
- **demucs-service**: Audio source separation (provides accompaniment tracks)
- **whisper-service**: Subtitle generation (provides ASS files)
- **wan22-service**: Video generation (provides scene clips)
## License
Internal use only.
## Support
For issues or questions, contact the development team.