# AutoKara

An experiment in automatic karaoke timing.

# Installation

## Requirements

- MKVToolnix (at least the CLI utils)
- FFmpeg
- Python >= 3.8

All other python modules can be installed directly through PIP, see next section.

## Install

### Linux

Using a virtual environment is strongly recommended (but not mandatory if you know what you're doing) :
```bash
$ python -m venv env     # create the virtual environment, do it once
$ source env/bin/activate # use the virtual environement

# To exit the virtual environment
$ deactivate              
```

The simplest way to install Autokara is through PIP.
```bash
# Using HTTPS
$ pip install git+https://git.iiens.net/bakaclub/autokara.git

# Or SSH
$ pip install git+ssh://git@git.iiens.net:bakaclub/autokara.git
```

Or you can clone the repo and use `pip install <repo_directory>` if you prefer.


To use the custom phonetic mappings for Japanese Romaji and other non-English languages, you need to update manually (for now) the g2p DB (within the venv):
```bash
$ autokara-gen-lang
```



## Configuration

Autokara comes with a default config file in `autokara/default.conf`.

If you want to tweak some values (enable CUDA, for example), you should add them to a new config file in your personal config directory : `~/.config/autokara/autokara.conf`.

This new file has priority over the default one, which is used only as fallback.


# Use

## Autokara

To use Autokara, you need :
 - A media file of the song (video, or pre-extracted vocals)
 - An ASS file with the lyrics, split by syllable

To execute AutoKara on a MKV video file and an ASS file containing the lyrics (ASS will be overwritten):
```bash
$ autokara video.mkv lyrics.ass
```

To output to a different file (and keep the original) :
```bash
$ autokara video.mkv lyrics.ass -o output.ass
```

To execute AutoKara on a (pre-extracted) WAV (or OGG, MP3, ...) vocals file, pass the `--vocals` flag :
```bash
$ autokara vocals.wav output.ass --vocals
```

To use a phonetic transcription optimized for a specific language, use `--lang` (or `-l`) :
```bash
$ autokara vocals.wav output.ass --lang jp
```

Available languages options are :
```
jp : Japanese Romaji (default)
en : English
fr : French
fi : Finnish
da : Danish
```

Full help for all options is available with :
```bash
$ autokara -h
```

## Useful scripts

To only extract .wav audio from a MKV file :
```bash
$ ./extractWav.sh source_video output_audio
```

To only extract .ass sub file from a MKV file :
```bash
$ ./extractAss.sh source_video output_subs
```

To only separate vocals from instruments in an audio file :
```bash
demucs --two-stems=vocals -o output_folder audio_file.wav
```

Batch preprocessing (vocals + ASS extraction) of all videos in a directory :
```bash
$ ./preprocess_media.sh video_folder output_folder
```

A visualization tool, mainly intended for debug.
Does the same as autokara.py, but instead of writing to a file, plots a graphic with onset times, spectrogram, probability curves,... 
Does not work on video files, only separated vocals audio files
```bash
$ autokara-plot vocals.wav lyrics.ass
```


# Documentation and References

This section is mainly intended for people who would like to contribute and/or are curious about how this stuff works

## Extracting vocals from music

- https://github.com/facebookresearch/demucs

## Syllable segmentation

[Aligning lyrics to song](https://github.com/jhuang448/LyricsAlignment-MTL) (Jiawen Huang, Emmanouil Benetos, Sebastian Ewert, 2022)

[Using CNNs on spectrogram images](https://www.ofai.at/~jan.schlueter/pubs/2014_icassp.pdf) (Schlüter, Böck, 2014) :
 - [MADMOM implementation](https://madmom.readthedocs.io/en/v0.16/modules/features/onsets.html)