AutoKara
Experiment in automatic karaoke timing.
Some documentation first
Having a clean python environment:
An introduction to neural networks and deep learning:
Extracting vocals from music
Syllable segmentation
Symbolic methods
Machine Learning & Deep Learning methods
Using CNNs on spectrogram images (Schlüter, Böck, 2014) :
Aligning lyrics to song (Jiawen Huang, Emmanouil Benetos, Sebastian Ewert, 2022)
Other methods
Other stuff goes here
Syllable recognition
If we ever want to use an AI to identify syllables without a reference lyrics file
Installation
Requirements
- MKVToolnix (at least the CLI utils)
- Python >= 3.8
Optional :
- PyTorch for custom model training : follow the instructions here
All other python modules can be installed directly through pip, see further.
This project requires at least Python 3.8, and using a virtual environment is strongly recommended. To install the dependencies, execute in the project directory :
$ python -m venv env # create the virtual environment, do it once
$ source env/bin/activate # use the virtual environement
# Install the required python modules
$ pip install -r requirements.txt
# To exit the virtual environment
$ deactivate
Having a CUDA-capable GPU is optional, but can greatly reduce processing time in some situations.
Use
Autokara
To use Autokara, you need :
- A media file of the song (video, or pre-extracted vocals)
- An ASS file with the lyrics, split by syllable
To execute AutoKara on a MKV video file and an ASS file containing the lyrics (ASS will be overwritten):
$ python autokara.py video.mkv lyrics.ass
To output to a different file (and keep the original) :
$ python autokara.py video.mkv lyrics.ass -o output.ass
To execute AutoKara on a (pre-extracted) WAV (or OGG, MP3, ...) vocals file, pass the --vocals
flag :
$ python autokara.py vocals.wav output.ass --vocals
Useful scripts
To only extract .wav audio from a MKV file :
$ ./extractWav.sh source_video output_audio
To only extract .ass sub file from a MKV file :
$ ./extractAss.sh source_video output_subs
To only separate vocals from instruments in an audio file :
demucs --two-stems=vocals -o output_folder audio_file.wav
Batch preprocessing (vocals + ASS extraction) of all videos in a directory :
$ ./preprocess_media.sh video_folder output_folder
A visualization tool, mainly intended for debug. Does the same as autokara.py, but instead of writing to a file, plots a graphic with onset times, spectrogram, probability curves,... Does not work on video files, only separated vocals audio files
$ python plot_syls.py vocals.wav lyrics.ass