Agentic RAG And Query Engine
UMMAYA uses retrieval for actions, not only for prose. The query engine receives a citizen request, retrieves adapter candidates, lets K-EXAONE choose a primitive, validates the call, checks permission, dispatches Live or Mock behavior, or produces Handoff. The final answer is synthesized from that evidence.
This page is for evaluators and maintainers who need to know whether UMMAYA is actually agentic or merely narrating search results. The answer is in the turn structure: retrieval narrows the action surface, reasoning chooses the next bounded action, tool execution produces evidence, and the stop reason decides whether the system can continue.
One Turn In Detail
Section titled “One Turn In Detail”1. User asks for a public-service outcome.2. Context assembly packages session state, prior results, policy mode, and runtime facts.3. Adapter retrieval ranks candidate tools by domain, hints, primitive support, tier, schema, and citation metadata.4. The prompt receives the relevant adapter set and primitive contracts.5. K-EXAONE chooses to answer, ask a question, or call a primitive.6. The query engine validates the tool call envelope.7. Permission classification decides safe, consent-gated, blocked, Mock, or Handoff behavior.8. The adapter runs live, replays a mock, or emits handoff material.9. Tool results are projected back into the model conversation.10. The final answer states evidence, boundary, and next action.The important point is ordering. UMMAYA should not write the answer first and then decorate it with sources. It should gather the minimum context needed to choose a tool, run the tool through validation and permission gates, then answer from the returned result or explain why the workflow stopped.
Why This Is Agentic RAG
Section titled “Why This Is Agentic RAG”Traditional RAG retrieves documents so the model can answer. UMMAYA retrieves tool candidates so the model can choose a safe action. The answer comes after tool results, permission events, and stop reasons are known.
That difference matters for national infrastructure. A document snippet can tell the user that a service exists, but a tool candidate can carry a schema, Live/Mock/Handoff status, credential requirement, citation, fixture, and permission metadata. The model is not only reading context; it is choosing from bounded public-service actions.
Retrieval Inputs
Section titled “Retrieval Inputs”Retrieval can use Korean and English search hints, public-service domain, agency metadata, primitive support, Live/Mock/Handoff state, schema shape, policy citation, and prior tool results.
The retrieval output is not the final answer. It narrows the model’s decision space so the model does not need every national-infrastructure surface in the prompt.
| Retrieval signal | Why it matters |
|---|---|
Korean and English search_hint | Citizens may ask in everyday Korean while adapters need stable metadata |
| Primitive support | The engine must know whether the candidate can locate, find, check, or send |
| Live/Mock/Handoff state | The answer must not overclaim execution authority |
| Schema shape | The model must provide valid arguments, not only a plausible intent |
| Policy citation | Protected actions need an external boundary, not UMMAYA-invented permission |
| Prior results | A later step should reuse resolved location, agency, or receipt context |
Query Engine Responsibilities
Section titled “Query Engine Responsibilities”The query engine is responsible for control, not only orchestration. It owns the loop that decides whether the next event is a model call, a tool call, a permission request, a stop reason, or a final answer. That responsibility keeps public-service work inspectable.
| Responsibility | What the engine checks | Failure mode if skipped |
|---|---|---|
| Context assembly | Session state, prior results, current user request, policy mode | The model repeats work or loses legal ordering |
| Candidate narrowing | Relevant adapters and primitive contracts | The prompt grows without improving decision quality |
| Tool-call validation | Envelope, schema, required fields, type constraints | Invalid public-service requests reach adapters |
| Permission gate | Public lookup vs protected action vs Handoff | The system sounds authorized when it is not |
| Result projection | Compact evidence back into the conversation | Final text becomes detached from tool results |
| Stop decision | Complete, ask user, retry, Mock, Handoff, or error | The loop spins or invents completion |
Tool Calling Is A Contract
Section titled “Tool Calling Is A Contract”Tool calling in UMMAYA is not a generic function-call convenience. Each call is an agreement between the model, the primitive envelope, the selected adapter, and the permission pipeline. The model proposes an action; the engine verifies that the proposal is shaped correctly; the adapter produces evidence or refuses; the UI shows what happened.
This contract is why UMMAYA can support broad domains without giving the model unlimited authority. locate can turn a phrase like “near Dong-A University Seunghak Campus” into usable location context. find can fetch public emergency or weather data through a selected adapter. check can expose a protected eligibility boundary. send can prepare official continuation only when the channel and consent make that legitimate.
Agentic RAG Failure Modes
Section titled “Agentic RAG Failure Modes”The architecture is only useful if failures remain visible. UMMAYA treats these as first-class outcomes:
- retrieval found no relevant adapter;
- retrieval found a candidate but the requested operation is Handoff-only;
- the model proposed invalid arguments;
- the adapter requires a credential or consent that is not present;
- the live channel returned zero usable results;
- the response is Mock evidence and must be labeled as such;
- the context budget requires compression before the next turn;
- an official service must continue the workflow.
Each failure should leave a traceable reason. A user-facing answer can be short, but the system record should still show which layer stopped the workflow.
Stop Reasons
Section titled “Stop Reasons”Clear stop reasons make the system debuggable and safe:
- no adapter found;
- invalid arguments;
- permission denied;
- credential missing;
- protected channel unavailable;
- adapter error;
- max iterations or budget reached;
- official Handoff required.
A stop reason is useful when it tells the user or evaluator which layer blocked the flow.
Traceability
Section titled “Traceability”A strong UMMAYA answer should be traceable from request to adapter selection, primitive call, permission decision, tool result, and final text. If the final answer cannot be traced to the loop, it should be treated as ungrounded.
The trace also supports documentation. Architecture prose should not claim a capability simply because the target state is desirable. It should point to adapter metadata, scenario coverage, schema files, generated LLM-readable outputs, CI checks, or a visible terminal behavior. That is the documentation version of the same rule the query engine follows at runtime: answer from evidence, and stop where evidence stops.