# AutoKara An experiment in automatic karaoke timing. # Installation ## Requirements - MKVToolnix (at least the CLI utils) - FFmpeg - Python >= 3.8 All other python modules can be installed directly through PIP, see next section. ## Install ### Linux Using a virtual environment is strongly recommended (but not mandatory if you know what you're doing) : ```bash $ python -m venv env # create the virtual environment, do it once $ source env/bin/activate # use the virtual environement # To exit the virtual environment $ deactivate ``` The simplest way to install Autokara is through PIP. ```bash # Using HTTPS $ pip install git+https://git.iiens.net/bakaclub/autokara.git # Or SSH $ pip install git+ssh://git@git.iiens.net:bakaclub/autokara.git ``` Or you can clone the repo and use `pip install <repo_directory>` if you prefer. To use the custom phonetic mappings for Japanese Romaji and other non-English languages, you need to update manually (for now) the g2p DB (within the venv): ```bash $ autokara-gen-lang ``` ## Configuration Autokara comes with a default config file in `autokara/default.conf`. If you want to tweak some values (enable CUDA, for example), you should add them to a new config file in your personal config directory : `~/.config/autokara/autokara.conf`. This new file has priority over the default one, which is used only as fallback. # Use ## Autokara To use Autokara, you need : - A media file of the song (video, or pre-extracted vocals) - An ASS file with the lyrics, split by syllable To execute AutoKara on a MKV video file and an ASS file containing the lyrics (ASS will be overwritten): ```bash $ autokara video.mkv lyrics.ass ``` To output to a different file (and keep the original) : ```bash $ autokara video.mkv lyrics.ass -o output.ass ``` To execute AutoKara on a (pre-extracted) WAV (or OGG, MP3, ...) vocals file, pass the `--vocals` flag : ```bash $ autokara vocals.wav output.ass --vocals ``` To use a phonetic transcription optimized for a specific language, use `--lang` (or `-l`) : ```bash $ autokara vocals.wav output.ass --lang jp ``` Available languages options are : ``` jp : Japanese Romaji (default) en : English fr : French fi : Finnish da : Danish ``` Full help for all options is available with : ```bash $ autokara -h ``` ## Useful scripts To only extract .wav audio from a MKV file : ```bash $ ./extractWav.sh source_video output_audio ``` To only extract .ass sub file from a MKV file : ```bash $ ./extractAss.sh source_video output_subs ``` To only separate vocals from instruments in an audio file : ```bash demucs --two-stems=vocals -o output_folder audio_file.wav ``` Batch preprocessing (vocals + ASS extraction) of all videos in a directory : ```bash $ ./preprocess_media.sh video_folder output_folder ``` A visualization tool, mainly intended for debug. Does the same as autokara.py, but instead of writing to a file, plots a graphic with onset times, spectrogram, probability curves,... Does not work on video files, only separated vocals audio files ```bash $ autokara-plot vocals.wav lyrics.ass ``` # Documentation and References This section is mainly intended for people who would like to contribute and/or are curious about how this stuff works ## Extracting vocals from music - https://github.com/facebookresearch/demucs ## Syllable segmentation [Aligning lyrics to song](https://github.com/jhuang448/LyricsAlignment-MTL) (Jiawen Huang, Emmanouil Benetos, Sebastian Ewert, 2022) [Using CNNs on spectrogram images](https://www.ofai.at/~jan.schlueter/pubs/2014_icassp.pdf) (Schlüter, Böck, 2014) : - [MADMOM implementation](https://madmom.readthedocs.io/en/v0.16/modules/features/onsets.html)