AutoKara
An experiment in automatic karaoke timing.
Installation
Requirements
- FFmpeg
- Python >= 3.8 (tested up to latest 3.11 version)
Optional :
- MKVToolnix (at least the CLI utils) : required for the preprocessing scripts
All other python modules can be installed directly through PIP, see next section.
Install
Using a virtual environment is strongly recommended (but not mandatory if you know what you're doing) :
# create the virtual environment, do it once
python -m venv env
# activate the virtual environement
source env/bin/activate
# To exit the virtual environment
deactivate
The simplest way to install Autokara is through PIP.
# Using HTTPS
pip install git+https://git.iiens.net/bakaclub/autokara.git
# Or SSH
pip install git+ssh://git@git.iiens.net:bakaclub/autokara.git
Or you can clone the repo and use pip install <repo_directory>
if you prefer.
To use the custom phonetic mappings for Japanese Romaji and other non-English languages, you need to update manually (for now) the g2p DB (within the venv):
autokara-gen-lang
Configuration
Autokara comes with a default config file in autokara/default.conf
.
If you want to tweak some values (enable CUDA, for example), you should add them to a new config file in your personal config directory : ~/.config/autokara/autokara.conf
.
This new file has priority over the default one, which is used only as fallback.
Use
Autokara
To use Autokara, you need :
- A media file of the song (video, or pre-extracted vocals)
- An ASS file with the lyrics, split by syllable (you can use the Auto-Split in Aegisub, but doing it manually may yield better results)
To execute AutoKara on a MKV video file and an ASS file containing the lyrics (ASS will be overwritten):
autokara video.mkv lyrics.ass
To output to a different file (and keep the original) :
autokara video.mkv lyrics.ass -o output.ass
To execute AutoKara on a (pre-extracted) WAV (or OGG, MP3, ...) vocals file, pass the --vocals
flag :
autokara vocals.wav lyrics.ass --vocals
To use a phonetic transcription optimized for a specific language, use --lang
(or -l
). Default is Japanese Romaji.
You can also specify a specific language for uppercase words (default is set in your config file) :
# Use french transcription
autokara video.mkv lyrics.ass --lang fr
# Use english transcription, but treat all uppercase words as french :
autokara video.mkv lyrics.ass --lang en --uppercase-lang fr
Available languages options are :
jp : Japanese Romaji (base default)
en : English (uppercase default)
fr : French
fi : Finnish
da : Danish
Full help for all options is available with :
autokara -h
Useful scripts
Manual preprocessing
Use autokara-preprocess
if you want to manually preprocess video/lyrics in advance :
# Extract vocals from video :
autokara-preprocess --vocals video_file output_folder/
# Extract ASS file from a MKV containing a subtitle track :
autokara-preprocess --lyrics video_file output_file.ass
# Do both at once :
autokara-preprocess --full video_file output_folder/
Then you can use Autokara on the extracted files with the --vocals
flag.
Sound and onsets plotting
A visualization tool, mainly intended for debug or curious people.
Does the same as autokara
, but instead of writing to a file, plots a graphic with syllable onset times, spectrogram, probability curves,...
Does not work on video files, only separated vocals audio files :
autokara-plot vocals.wav lyrics.ass
Documentation and References
This section is mainly intended for people who would like to contribute and/or are curious about how this stuff works
Extracting vocals from music
Syllable segmentation
Aligning lyrics to song (Jiawen Huang, Emmanouil Benetos, Sebastian Ewert, 2022)
Using CNNs on spectrogram images (Schlüter, Böck, 2014) :