ucsinfer

Universal Category System LLM toolkit.

Install

Since this project is still experimental and not for production, it's not packaged on PyPi. You should clone the project to your local machine and do an editable install in a virtual environment.

Note: You will also need ffmpeg and ffprobe in order to interrogate audio files for their metadata.

$ brew install ffmpeg
$ git clone https://git.squad51.us/jamie/ucsinfer.git
$ git submodule update --init
$ python -m venv .venv
$ source .venv/bin/activate # or whatever command is approprate for your shell
$ pip install -e .

Or alternately, this module is packaged with the poetry dependency manager and can be run within a poetry virtualenv.

$ poetry run python -m ucsinfer

Running

python -m ucsinfer [command]

Pass --help to see a summary of subcommands and options.

The subcommands available at this time are gather and evaluate.

Functions

recommend

Infer a UCS category for a text description. Text metadata is extracted from audio files and the language model can recommend a corresponding list of appropriate categories, ranked by their alignment with the category definition.
gather

Scan files to capture existing text descriptions and UCS categories and save as a dataset. This function is used to construct datasets that evaluate can use to test models and finetune can use to refine them.
~~finetune~~ (planned)

Fine-tune an existing sentence embedding model with training data.
evaluate

Use datasets to evaluate the performance of a model and fine-tuning.