# ucsinfer

Tools for applying UCS categories to sounds using large-language models

## Install

Since this project is still experimental and not for production, it's not 
packaged on PyPi. You should clone the project to your local machine and 
do an [editable install](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs) 
in a [virtual environment](https://docs.python.org/3/library/venv.html).

Note: You will also need ffmpeg and ffprobe in order to interrogate audio 
files for their metadata.

```sh
$ brew install ffmpeg
$ git clone https://git.squad51.us/jamie/ucsinfer.git
$ git submodule update --init
$ python -m venv .venv
$ source .venv/bin/activate # or whatever command is approprate for your shell
$ pip install -e .
```

Or alternately, this module is packaged with the [poetry][py-poetry] dependency 
manager and can be run within a poetry virtualenv.

```sh
$ poetry run python -m ucsinfer 
```

[py-poetry]: https://python-poetry.org/docs/1.8/cli/#run$0

## Running 

```sh 
python -m ucsinfer [command]
```
Pass `--help` to see a summary of subcommands and options.

## Functions

* recommend

  Infer a UCS category for a text description. Text metadata is extracted from 
  audio files and the language model can recommend a corresponding list of 
  appropriate categories, ranked by their alignment with the category 
  definition.

* gather

  Scan files to capture existing text descriptions and UCS categories 
  and save as a dataset. This function is used to construct datasets 
  that `evaluate` can use to test models and finetune can use to 
  refine them.

* ~finetune~ (planned)

  Fine-tune an existing sentence embedding model with training data.

* evaluate

  Use datasets to evaluate the performance of a model and fine-tuning.

# Demos and More Reading

* [Category Inference Experiments With UCSINFER](https://squad51.us/notebook/category_inference_experiments_ucsinfer/)
* [UCSINFER for Renaming Sounds](https://squad51.us/notebook/ucsinfer_to_rename_sounds/)