Skip to content
Extraits de code Groupes Projets
Avatar de Sting
Sting rédigé
49d551b7
Historique
Nom Dernière validation Dernière mise à jour
autokara
.gitignore
MANIFEST.in
README.md
requirements.txt
setup.py

AutoKara

An experiment in automatic karaoke timing.

Installation

Requirements

Optional :

  • MKVToolnix (at least the CLI utils) : required for the preprocessing scripts

All other python modules can be installed directly through PIP, see next section.

Install

Using a virtual environment is strongly recommended (but not mandatory if you know what you're doing) :

# create the virtual environment, do it once
python -m venv env

# activate the virtual environement
source env/bin/activate

# To exit the virtual environment
deactivate              

The simplest way to install Autokara is through PIP.

# Using HTTPS
pip install git+https://git.iiens.net/bakaclub/autokara.git

# Or SSH
pip install git+ssh://git@git.iiens.net:bakaclub/autokara.git

Or you can clone the repo and use pip install <repo_directory> if you prefer.

To use the custom phonetic mappings for Japanese Romaji and other non-English languages, you need to update manually (for now) the g2p DB (within the venv):

autokara-gen-lang

Configuration

Autokara comes with a default config file in autokara/default.conf.

If you want to tweak some values (enable CUDA, for example), you should add them to a new config file in your personal config directory : ~/.config/autokara/autokara.conf.

This new file has priority over the default one, which is used only as fallback.

Use

Autokara

To use Autokara, you need :

  • A media file of the song (video, or pre-extracted vocals)
  • An ASS file with the lyrics, split by syllable (you can use the Auto-Split in Aegisub, but doing it manually may yield better results)

To execute AutoKara on a MKV video file and an ASS file containing the lyrics (ASS will be overwritten):

autokara video.mkv lyrics.ass

To output to a different file (and keep the original) :

autokara video.mkv lyrics.ass -o output.ass

To execute AutoKara on a (pre-extracted) WAV (or OGG, MP3, ...) vocals file, pass the --vocals flag :

autokara vocals.wav lyrics.ass --vocals

To use a phonetic transcription optimized for a specific language, use --lang (or -l). Default is Japanese Romaji. You can also specify a specific language for uppercase words (default is set in your config file) :

# Use french transcription
autokara video.mkv lyrics.ass --lang fr

# Use english transcription, but treat all uppercase words as french :
autokara video.mkv lyrics.ass --lang en --uppercase-lang fr

Available languages options are :

jp : Japanese Romaji (base default)
en : English (uppercase default)
fr : French
fi : Finnish
da : Danish

Full help for all options is available with :

autokara -h

Useful scripts

Manual preprocessing

Use autokara-preprocess if you want to manually preprocess video/lyrics in advance :

# Extract vocals from video :
autokara-preprocess --vocals video_file output_folder/ 

# Extract ASS file from a MKV containing a subtitle track :
autokara-preprocess --lyrics video_file output_file.ass

# Do both at once :
autokara-preprocess --full video_file output_folder/

Then you can use Autokara on the extracted files with the --vocals flag.

Sound and onsets plotting

A visualization tool, mainly intended for debug or curious people. Does the same as autokara, but instead of writing to a file, plots a graphic with syllable onset times, spectrogram, probability curves,... Does not work on video files, only separated vocals audio files :

autokara-plot vocals.wav lyrics.ass

Caveats

  • It is an AI model, inaccuracies are still very much a possibility. You should always check the result and edit if necessary.
  • Overlapping voices, interwoven duets, background choirs,... are still a huge difficulty. Until we find a way to separate multiple voices in a song, don't expect good results with more than one singer at a time.

Documentation and References

Extracting vocals from music

Syllable segmentation

Aligning lyrics to song (Jiawen Huang, Emmanouil Benetos, Sebastian Ewert, 2022)

Using CNNs on spectrogram images (Schlüter, Böck, 2014) :