server : match OAI structured output response (#9527)

author Vinesh Janarthanan <redacted>

Wed, 18 Sep 2024 06:50:34 +0000 (01:50 -0500)

committer GitHub <redacted>

Wed, 18 Sep 2024 06:50:34 +0000 (09:50 +0300)
author Vinesh Janarthanan <redacted>
Wed, 18 Sep 2024 06:50:34 +0000 (01:50 -0500)
committer GitHub <redacted>
Wed, 18 Sep 2024 06:50:34 +0000 (09:50 +0300)
diff --git a/examples/server/README.md b/examples/server/README.md

index 7a5d26ca08ba3ceb8a91abc5da8ab92c85abfbed..326e05e1e3ea1078b8102d0a243fb93cc0c12dda 100644 (file)
--- a/examples/server/README.md
+++ b/examples/server/README.md
@@ -501,7 +501,7 @@ Given a ChatML-formatted json description in `messages`, it returns the predicte
  
      See [OpenAI Chat Completions API documentation](https://platform.openai.com/docs/api-reference/chat). While some OpenAI-specific features such as function calling aren't supported, llama.cpp `/completion`-specific features such as `mirostat` are supported.
  
-    The `response_format` parameter supports both plain JSON output (e.g. `{"type": "json_object"}`) and schema-constrained JSON (e.g. `{"type": "json_object", "schema": {"type": "string", "minLength": 10, "maxLength": 100}}`), similar to other OpenAI-inspired API providers.
+    The `response_format` parameter supports both plain JSON output (e.g. `{"type": "json_object"}`) and schema-constrained JSON (e.g. `{"type": "json_object", "schema": {"type": "string", "minLength": 10, "maxLength": 100}}` or `{"type": "json_schema", "schema": {"properties": { "name": { "title": "Name",  "type": "string" }, "date": { "title": "Date",  "type": "string" }, "participants": { "items": {"type: "string" }, "title": "Participants",  "type": "string" } } } }`), similar to other OpenAI-inspired API providers.
  
      *Examples:*
  
diff --git a/examples/server/utils.hpp b/examples/server/utils.hpp

index 537c8a22324384aa87803a63cbb4c654c8a6f9b2..f093f547ff2c1324c304523187cd662f61353c0d 100644 (file)
--- a/examples/server/utils.hpp
+++ b/examples/server/utils.hpp
@@ -331,6 +331,9 @@ static json oaicompat_completion_params_parse(
          std::string response_type = json_value(response_format, "type", std::string());
          if (response_type == "json_object") {
              llama_params["json_schema"] = json_value(response_format, "schema", json::object());
+        } else if (response_type == "json_schema") {
+            json json_schema = json_value(response_format, "json_schema", json::object());
+            llama_params["json_schema"] = json_value(json_schema, "schema", json::object());
          } else if (!response_type.empty() && response_type != "text") {
              throw std::runtime_error("response_format type must be one of \"text\" or \"json_object\", but got: " + response_type);
          }
diff --git a/grammars/README.md b/grammars/README.md

index 7ec8154715457df59af237637b7efa0b037734b9..4e8b4e2fcfa1d42c5a360ef1308d7fd24e5a4ddf 100644 (file)
--- a/grammars/README.md
+++ b/grammars/README.md
@@ -120,7 +120,7 @@ You can use GBNF grammars:
  
  - In [llama-server](../examples/server):
      - For any completion endpoints, passed as the `json_schema` body field
-    - For the `/chat/completions` endpoint, passed inside the `response_format` body field (e.g. `{"type", "json_object", "schema": {"items": {}}}`)
+    - For the `/chat/completions` endpoint, passed inside the `response_format` body field (e.g. `{"type", "json_object", "schema": {"items": {}}}` or `{ type: "json_schema", json_schema: {"schema": ...} }`)
  - In [llama-cli](../examples/main), passed as the `--json` / `-j` flag
  - To convert to a grammar ahead of time:
      - in CLI, with [examples/json_schema_to_grammar.py](../examples/json_schema_to_grammar.py)
author	Vinesh Janarthanan <redacted>
	Wed, 18 Sep 2024 06:50:34 +0000 (01:50 -0500)
committer	GitHub <redacted>
	Wed, 18 Sep 2024 06:50:34 +0000 (09:50 +0300)
examples/server/README.md		patch \| blob \| history
examples/server/utils.hpp		patch \| blob \| history
grammars/README.md		patch \| blob \| history