Skip to content
Extraits de code Groupes Projets

AutoKara

Experiment in automatic karaoke timing.

Some documentation first

Having a clean python environment:

An introduction to neural networks and deep learning:

Extracting vocals from music

Syllable segmentation

Symbolic methods

Machine Learning & Deep Learning methods

Using CNNs on spectrogram images (Schlüter, Böck, 2014) :

Aligning lyrics to song (Jiawen Huang, Emmanouil Benetos, Sebastian Ewert, 2022)

Other methods

Other stuff goes here

Syllable recognition

If we ever want to use an AI to identify syllables without a reference lyrics file

Installation

Requirements

  • MKVToolnix (at least the CLI utils)
  • Python >= 3.8

Optional :

  • PyTorch for custom model training : follow the instructions here

All other python modules can be installed directly through pip, see further.

This project requires at least Python 3.8, and using a virtual environment is strongly recommended. To install the dependencies, execute in the project directory :

$ python -m venv env     # create the virtual environment, do it once
$ source env/bin/activate # use the virtual environement

# Install the required python modules
$ pip install -r requirements.txt

# To exit the virtual environment
$ deactivate              

Having a CUDA-capable GPU is optional, but can greatly reduce processing time in some situations.

Use

Autokara

To execute AutoKara from scratch on a MKV video file :

$ python autokara.py video.mkv output.ass

To execute AutoKara with existing syl splits and line timings :

$ python autokara.py video.mkv output.ass --ref reference.ass

To execute AutoKara on a (pre-extracted) WAV vocals file :

$ python autokara.py vocals.wav output.ass --vocals

Useful scripts

To only extract .wav audio from a MKV file :

$ ./extractWav.sh source_video output_audio

To only extract .ass sub file from a MKV file :

$ ./extractAss.sh source_video output_subs

To only separate vocals from instruments in an audio file :

demucs --two-stems=vocals -o output_folder audio_file.wav

Batch preprocessing (vocals + ASS extraction) of all videos in a directory :

$ ./preprocess_media.sh video_folder output_folder