Skip to content
Extraits de code Groupes Projets

AutoKara

Experiment in automatic karaoke timing.

Some documentation first

Having a clean python environment:

An introduction to neural networks and deep learning:

Extracting vocals from music

Syllable segmentation

Symbolic methods

Machine Learning & Deep Learning methods

Using CNNs on spectrogram images (Schlüter, Böck, 2014) :

Aligning lyrics to song (Jiawen Huang, Emmanouil Benetos, Sebastian Ewert, 2022)

Other methods

Other stuff goes here

Syllable recognition

If we ever want to use an AI to identify syllables without a reference lyrics file

Installation

Requirements

  • MKVToolnix (at least the CLI utils)
  • Python >= 3.8

Optional :

  • PyTorch for custom model training : follow the instructions here

All other python modules can be installed directly through pip, see further.

This project requires at least Python 3.8, and using a virtual environment is strongly recommended. To install the dependencies, execute in the project directory :

$ python -m venv env     # create the virtual environment, do it once
$ source env/bin/activate # use the virtual environement

# Install the required python modules
$ pip install -r requirements.txt

# To exit the virtual environment
$ deactivate              

Having a CUDA-capable GPU is optional, but can greatly reduce processing time in some situations.

Use

Autokara

To use Autokara, you need :

  • A media file of the song (video, or pre-extracted vocals)
  • An ASS file with the lyrics, split by syllable

To execute AutoKara on a MKV video file and an ASS file containing the lyrics (ASS will be overwritten):

$ python autokara.py video.mkv lyrics.ass

To output to a different file (and keep the original) :

$ python autokara.py video.mkv lyrics.ass -o output.ass

To execute AutoKara on a (pre-extracted) WAV (or OGG, MP3, ...) vocals file, pass the --vocals flag :

$ python autokara.py vocals.wav output.ass --vocals

Useful scripts

To only extract .wav audio from a MKV file :

$ ./extractWav.sh source_video output_audio

To only extract .ass sub file from a MKV file :

$ ./extractAss.sh source_video output_subs

To only separate vocals from instruments in an audio file :

demucs --two-stems=vocals -o output_folder audio_file.wav

Batch preprocessing (vocals + ASS extraction) of all videos in a directory :

$ ./preprocess_media.sh video_folder output_folder

A visualization tool, mainly intended for debug. Does the same as autokara.py, but instead of writing to a file, plots a graphic with onset times, spectrogram, probability curves,... Does not work on video files, only separated vocals audio files

$ python plot_syls.py vocals.wav lyrics.ass