]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
metadata: Detailed Dataset Authorship Metadata (#8875)
authorBrian <redacted>
Wed, 13 Nov 2024 10:10:38 +0000 (21:10 +1100)
committerGitHub <redacted>
Wed, 13 Nov 2024 10:10:38 +0000 (21:10 +1100)
commita0ec17b32ec6077f5ca22fe833ebdc9b86795a4d
tree4fb77e01472a2eb94b82f0c3fd6f995a2a1e6143
parent2e82ffa4af29f87e7d3d6dff8060a2a79613b72f
metadata: Detailed Dataset Authorship Metadata (#8875)

Converter script can now read these two fields as a detailed base model and dataset source.
This was done so that it will be easier for Hugging Face to integrate detailed metadata as needed.

 -  base_model_sources (List[dict], optional)
 -  dataset_sources (List[dict], optional)

Dataset now represented as:

 - general.dataset.count
 - general.dataset.{id}.name
 - general.dataset.{id}.author
 - general.dataset.{id}.version
 - general.dataset.{id}.organization
 - general.dataset.{id}.description
 - general.dataset.{id}.url
 - general.dataset.{id}.doi
 - general.dataset.{id}.uuid
 - general.dataset.{id}.repo_url

This also adds to base model these metadata:

 - general.base_model.{id}.description
examples/convert_legacy_llama.py
gguf-py/gguf/constants.py
gguf-py/gguf/gguf_writer.py
gguf-py/gguf/metadata.py
gguf-py/tests/test_metadata.py