Skip to content
Extraits de code Groupes Projets

AutoKara

Experiment in automatic karaoke timing.

Some documentation first

Having a clean python environment:

An introduction to neural networks and deep learning:

Extracting vocals from music

Syllable segmentation

Symbolic methods

Machine Learning & Deep Learning methods

Using CNNs on spectrogram images (Schlüter, Böck, 2014) :

Other methods

Other stuff goes here

Syllable recognition

If we ever want to use an AI to identify syllables without a reference lyrics file

Installation

Requirements

  • MKVToolnix (at least the CLI utils)
  • Python >= 3.8

Having a CUDA-capable GPU is optional, but can greatly reduce processing time.

Setup

This project requires at least Python 3.8, and using a virtual environment is strongly recommended. To install the dependencies, execute in the project directory :

$ python -m venv env     # create the virtual environment, do it once
$ source env/bin/activate # use the virtual environement

# Install the Demucs (vocal separation tool)
$ pip install -U demucs
$ pip install librosa

# To exit the virtual environment
$ deactivate              

Use

To execute AutoKara on a MKV video file :

$ python autokara.py video.mkv output.ass

To execute AutoKara on a (pre-extracted) WAV vocals file :

$ python autokara.py vocals.wav output.ass --vocals

To only extract .wav audio from a MKV file :

$ ./extractWav.sh source_video output_audio

To only separate vocals from instruments in an audio file :

demucs --two-stems=vocals -o output_folder audio_file.wav