Skip to content
Extraits de code Groupes Projets
Nom Dernière validation Dernière mise à jour
autokara
.gitignore
MANIFEST.in
README.md
requirements.txt
setup.py

AutoKara

An experiment in automatic karaoke timing.

Installation

Requirements

Optional :

  • MKVToolnix (at least the CLI utils) : required for the preprocessing scripts

All other python modules can be installed directly through PIP, see next section.

Install

Using a virtual environment is strongly recommended (but not mandatory if you know what you're doing) :

# create the virtual environment, do it once
python -m venv env

# activate the virtual environement
source env/bin/activate

# To exit the virtual environment
deactivate              

The simplest way to install Autokara is through PIP.

# Using HTTPS
pip install git+https://git.iiens.net/bakaclub/autokara.git

# Or SSH
pip install git+ssh://git@git.iiens.net:bakaclub/autokara.git

Or you can clone the repo and use pip install <repo_directory> if you prefer.

To use the custom phonetic mappings for Japanese Romaji and other non-English languages, you need to update manually (for now) the g2p DB (within the venv):

autokara-gen-lang

Configuration

Autokara comes with a default config file in autokara/default.conf.

If you want to tweak some values (enable CUDA, for example), you should add them to a new config file in your personal config directory : ~/.config/autokara/autokara.conf.

This new file has priority over the default one, which is used only as fallback.

Use

Autokara

To use Autokara, you need :

  • A media file of the song (video, or pre-extracted vocals)
  • An ASS file with the lyrics, split by syllable (you can use the Auto-Split in Aegisub, but doing it manually may yield better results)

To execute AutoKara on a MKV video file and an ASS file containing the lyrics (ASS will be overwritten):

autokara video.mkv lyrics.ass

To output to a different file (and keep the original) :

autokara video.mkv lyrics.ass -o output.ass

To execute AutoKara on a (pre-extracted) WAV (or OGG, MP3, ...) vocals file, pass the --vocals flag :

autokara vocals.wav lyrics.ass --vocals

To use a phonetic transcription optimized for a specific language, use --lang (or -l). Default is Japanese Romaji. You can also specify a specific language for uppercase words (default is set in your config file) :

# Use french transcription
autokara video.mkv lyrics.ass --lang fr

# Use english transcription, but treat all uppercase words as french :
autokara video.mkv lyrics.ass --lang en --uppercase-lang fr

Available languages options are :

jp : Japanese Romaji (base default)
en : English (uppercase default)
fr : French
fi : Finnish
da : Danish

Full help for all options is available with :

autokara -h

Useful scripts

Manual preprocessing

Use autokara-preprocess if you want to manually preprocess video/lyrics in advance :

# Extract vocals from video :
autokara-preprocess --vocals video_file output_folder/ 

# Extract ASS file from a MKV containing a subtitle track :
autokara-preprocess --lyrics video_file output_file.ass

# Do both at once :
autokara-preprocess --full video_file output_folder/

Then you can use Autokara on the extracted files with the --vocals flag.

Sound and onsets plotting

A visualization tool, mainly intended for debug or curious people. Does the same as autokara, but instead of writing to a file, plots a graphic with syllable onset times, spectrogram, probability curves,... Does not work on video files, only separated vocals audio files :

autokara-plot vocals.wav lyrics.ass

Documentation and References

This section is mainly intended for people who would like to contribute and/or are curious about how this stuff works

Extracting vocals from music

Syllable segmentation

Aligning lyrics to song (Jiawen Huang, Emmanouil Benetos, Sebastian Ewert, 2022)

Using CNNs on spectrogram images (Schlüter, Böck, 2014) :