62 lines
1.6 KiB
Markdown
62 lines
1.6 KiB
Markdown
# ucsinfer
|
|
|
|
Universal Category System LLM toolkit.
|
|
|
|
## Install
|
|
|
|
Since this project is still experimental and not for production, it's not
|
|
packaged on PyPi. You should clone the project to your local machine and
|
|
do an [editable install](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs)
|
|
in a [virtual environment](https://docs.python.org/3/library/venv.html).
|
|
|
|
Note: You will also need ffmpeg and ffprobe in order to interrogate audio
|
|
files for their metadata.
|
|
|
|
```sh
|
|
$ brew install ffmpeg
|
|
$ git clone https://git.squad51.us/jamie/ucsinfer.git
|
|
$ git submodule update --init
|
|
$ python -m venv .venv
|
|
$ source .venv/bin/activate # or whatever command is approprate for your shell
|
|
$ pip install -e .
|
|
```
|
|
|
|
Or alternately, this module is packaged with the [poetry][py-poetry] dependency
|
|
manager and can be run within a poetry virtualenv.
|
|
|
|
```sh
|
|
$ poetry run python -m ucsinfer
|
|
```
|
|
|
|
[py-poetry]: https://python-poetry.org/docs/1.8/cli/#run$0
|
|
|
|
## Running
|
|
|
|
```sh
|
|
python -m ucsinfer [command]
|
|
```
|
|
Pass `--help` to see a summary of subcommands and options.
|
|
|
|
The subcommands available at this time are `gather` and `evaluate`.
|
|
|
|
## Functions
|
|
|
|
* ~recommend~ (in-progress)
|
|
|
|
Infer a UCS category for a text description.
|
|
|
|
* gather
|
|
|
|
Scan files to capture existing text descriptions and UCS categories
|
|
and save as a dataset. This function is used to countruct datasets
|
|
that `evaluate` can use to test models and finetune can use to
|
|
refine them.
|
|
|
|
* ~finetune~ (planned)
|
|
|
|
Fine-tune an existing sentence embedding model with training data.
|
|
|
|
* evaluate
|
|
|
|
Use datasets to evaluate the performance of a model and fine-tuning.
|