Skip to content
Extraits de code Groupes Projets

AutoKara

Experiment in automatic karaoke timing.

Some documentation first

Having a clean python environment:

An introduction to neural networks and deep learning:

Extracting vocals from music

Syllable segmentation

Symbolic methods

Machine Learning & Deep Learning methods

Using CNNs on spectrogram images (Schlüter, Böck, 2014) :

Other methods

Other stuff goes here

Syllable recognition

If we ever want to use an AI to identify syllables without a reference lyrics file

Installation

Requirements

  • MKVToolnix (at least the CLI utils)
  • Python >= 3.8

Optional :

  • PyTorch for custom model training : follow the instructions here

All other python modules can be installed directly through pip, see further.

This project requires at least Python 3.8, and using a virtual environment is strongly recommended. To install the dependencies, execute in the project directory :

$ python -m venv env     # create the virtual environment, do it once
$ source env/bin/activate # use the virtual environement

# Install the required python modules
$ pip install -r requirements.txt

# To exit the virtual environment
$ deactivate              

Having a CUDA-capable GPU is optional, but can greatly reduce processing time in some situations.

Use

Inference

To execute AutoKara on a MKV video file :

$ python autokara.py video.mkv output.ass

To execute AutoKara on a (pre-extracted) WAV vocals file :

$ python autokara.py vocals.wav output.ass --vocals

To only extract .wav audio from a MKV file :

$ ./extractWav.sh source_video output_audio

To only separate vocals from instruments in an audio file :

demucs --two-stems=vocals -o output_folder audio_file.wav