AutoKara
Experiment in automatic karaoke timing.
Some documentation first
Having a clean python environment:
An introduction to neural networks and deep learning:
Extracting vocals from music
Syllable segmentation
Symbolic methods
Machine Learning & Deep Learning methods
Using CNNs on spectrogram images (Schlüter, Böck, 2014) :
- MADMOM implementation
- Python implementation for Taiko rythm games : https://github.com/seiichiinoue/odcnn
Other methods
Other stuff goes here
Syllable recognition
If we ever want to use an AI to identify syllables without a reference lyrics file
Installation
Requirements
- MKVToolnix (at least the CLI utils)
- Python >= 3.8
Optional :
- PyTorch for custom model training : follow the instructions here
All other python modules can be installed directly through pip, see further.
This project requires at least Python 3.8, and using a virtual environment is strongly recommended. To install the dependencies, execute in the project directory :
$ python -m venv env # create the virtual environment, do it once
$ source env/bin/activate # use the virtual environement
# Install the required python modules
$ pip install -r requirements.txt
# To exit the virtual environment
$ deactivate
Having a CUDA-capable GPU is optional, but can greatly reduce processing time in some situations.
Use
Inference
To execute AutoKara on a MKV video file :
$ python autokara.py video.mkv output.ass
To execute AutoKara on a (pre-extracted) WAV vocals file :
$ python autokara.py vocals.wav output.ass --vocals
To only extract .wav audio from a MKV file :
$ ./extractWav.sh source_video output_audio
To only separate vocals from instruments in an audio file :
demucs --two-stems=vocals -o output_folder audio_file.wav