Why this example
Most agent demos use a simple tool call: look up the weather, search a database, summarise a document. These are fine for illustrating a single concept, but they don't show you what happens when the pieces interact.
A trip planner forces every concept from the enterprise agents page to work together at once. Intent is ambiguous, data comes from multiple services, some tasks can run in parallel, and the user expects a coherent result even when parts of the system fail.
Everything on this page — the event topics, the agent roles, the context window design, the fallback chain — is a direct consequence of those requirements. The system shape comes from the problem.
Ambiguous intent
"Plan a trip to Tokyo in spring" is not a structured API call. The agent has to parse intent, fill in gaps, and ask when it genuinely doesn't know.
Multiple external APIs
Flights, hotels, weather, and activities are separate services with different latencies, rate limits, and failure modes. The agent can't treat them as one.
Natural parallelism
Once the destination and dates are known, flight and hotel searches can run at the same time. There is no reason to wait for one before starting the other.
Persistent session state
The user might refine the plan across several turns. The agent needs to remember what was decided, what was rejected, and why.
Graceful degradation
If the flight API is down, the agent shouldn't fail the whole request. It should work with what it has and tell the user clearly what it couldn't do.
The system you build for a trip planner is structurally identical to what you would build for a procurement agent, an onboarding assistant, or a research workflow. The domain changes; the architecture doesn't.
What the agent needs to do
Before designing the system, it helps to walk through what the agent actually needs to do at each step. This is the lifecycle from first user message to final itinerary.
"I want to visit Tokyo for a week in late April, budget around $3,000, just me."
The most important design decision happens at step 3: how much to ask vs. how much to assume. Agents that ask too many questions feel unusable. Agents that assume too much produce wrong results. The right answer is: assume and surface your assumption, so the user can correct it.
System architecture
The system has five layers. A UI layer where the user interacts. An intent layer where free text becomes structured data. An event bus that decouples intent from execution. A pool of specialist agents that each own one concern. And an assembly layer that turns raw results into a final answer.
The event bus is the critical design choice. Without it, the orchestrator would call each agent in sequence — or in parallel but with a tangled web of direct calls. With it, agents are fully decoupled. You can add a new specialist (say, a visa requirements agent) without touching any existing code.
Each specialist agent is a consumer of one event type and a producer of one result event. This makes them independently deployable, independently testable, and independently replaceable.
Event orchestration design
The event bus is where choreography happens. Rather than an orchestrator calling each agent and waiting, the IntentAgent publishes one event and walks away. Each search agent reacts to it independently. The ResultAssembler listens for all four result events and assembles the plan when they arrive.
This is the fan-out pattern. One event, many consumers. No coupling between the producers and consumers. The IntentAgent has no idea who is listening.
trip.intent.parsed
Producer
IntentAgent
Consumer
FlightAgent, HotelAgent, WeatherAgent, ActivitiesAgent
Purpose
Published once per request. Contains destination, dates, budget, origin, party size. All search agents subscribe to this topic.
Failure has its own event type. When an agent can't complete a search after retries, it publishes to the dead-letter topic rather than silently dropping the result. This means failures are visible, auditable, and retrievable without digging through logs.
Agent roles
Six agents, each with a single responsibility. The search agents — FlightAgent, HotelAgent, WeatherAgent, ActivitiesAgent — don't call external APIs directly. Each one submits an intent to the Lattice capability runtime and receives a structured projection back.
IntentAgent
Parses the user's free-text request into a structured trip constraint object. Uses an LLM with a tight system prompt focused on extraction, not generation. Publishes trip.intent.parsed.
Input
Raw user message
Output
trip.intent.parsed event
Lattice Capabilities
This agent sets the quality ceiling for everything downstream. Bad extraction → bad itinerary.
Separating hard dependencies (flights, hotels) from soft ones (weather, activities) lets the ResultAssembler apply different timeout policies to each. Because failure modes are declared inside the Lattice capability — not improvised in agent code — each agent's timeout behaviour is consistent and auditable.
Capabilities with Lattice
The search agents in this system don't call external APIs directly. Each one submits a typed intent to Lattice — a capability runtime that executes the underlying steps, manages retries and failure policy, keeps intermediate state out of the model's context, and returns a projection shaped for reasoning rather than integration.
The model sees two meta-tools regardless of how many capabilities exist. It never needs to know about the underlying APIs.
search_capabilities(query)
Returns matching capability signatures from the registry. The model calls this to discover what outcomes are available.
← "flight search for trip planner"
→ TripFlightSearch — inputs, projection schema, example
execute_capability(name, inputs)
Submits a typed intent to the runtime. Lattice loads the capability, executes all steps, and returns the projection. The model never sees intermediate state.
← "TripFlightSearch", {origin, dest, dates, budget}
→ FlightProjection {flights, best_option, status, alternatives}
Capability signature
TripFlightSearch( origin: str, # IATA code destination: str, # IATA code departure_date: str, # ISO 8601 return_date: str, passengers: int, max_budget_usd: float ) -> FlightProjection
from lattice import capability, step, projection
from lattice.failure import retry, hard_failure, abort, fallback_to
@capability(name="TripFlightSearch", version="1.0")
async def trip_flight_search(ctx):
@step(scope="travel.search")
@retry(max=3, backoff="exponential",
on=[TimeoutError, ServerError])
@hard_failure(on_exhausted=fallback_to(search_backup))
async def search_primary():
client = ctx.client("flights_api_primary")
return await client.search(
origin=ctx.intent.origin,
destination=ctx.intent.destination,
depart=ctx.intent.departure_date,
ret=ctx.intent.return_date,
pax=ctx.intent.passengers,
max_price=ctx.intent.max_budget_usd)
@step(scope="travel.search")
@retry(max=2, backoff="exponential",
on=[TimeoutError, ServerError])
@hard_failure(on_exhausted=abort)
async def search_backup():
client = ctx.client("flights_api_secondary")
return await client.search(...)
results = state.search_primary or state.search_backup
top3 = sorted(results, key=lambda f: f.value_score)[:3]
return projection(
flights=top3,
cheapest_usd=top3[0].total_price if top3 else None,
best_option=top3[0] if top3 else None,
status="found" if top3 else "none_in_budget",
source="primary" if state.search_primary else "backup",
alternatives=[] if top3 else [
{"action": "expand_budget",
"suggestion": "Increase budget by $200 to unlock 4 options"},
{"action": "adjust_dates",
"suggestion": "Shift by 3 days for 30% lower fares"},
]
)Adding a new step inside a capability — say, a price alert check inside TripFlightSearch — does not change the signature, does not add a new tool the model has to learn, and does not require any agent code to change. The capability is the unit of change, not the tool list.
Context engineering
The IntentAgent builds a context window before each LLM call. The contents are assembled dynamically based on what's known: trip constraints extracted from the current message, user preferences pulled from memory, conversation history from the session, and any context retrieved from earlier agent results.
Not all of this fits in one call. Context windows have limits, and even within the limit, attention is not uniform. Items in the middle of a long context receive less weight. So the IntentAgent prioritises: hard constraints first, then preferences, then history.
Context usage
40 / 128 tokens (system prompt)
Light context — room to add retrieved docs or history.
User preferences from memory are not free. Every token you inject from a past session is a token you can't use for the current retrieved content. The IntentAgent makes this tradeoff explicit: preferences get a fixed token budget, and if the session history is long, it gets summarised before injection.
Dynamic orchestration
Once the intent event is published, all four search agents start simultaneously. None of them wait for each other. The ResultAssembler holds until the hard dependencies — flights and hotels — are done. Soft dependencies contribute if they finish within the timeout window.
The three traces below show how this plays out across different types of requests. Select a query to see how intent was parsed, which agents ran in parallel, and what came back.
destination
Tokyo, Japan
dates
Apr 22–29, 2026
budget
$3,000
party
1 traveller
origin
JFK (assumed)
IntentAgent
FlightAgent
HotelAgent
WeatherAgent
ActivitiesAgent
ResultAssembler
Wall time (critical path)
7.0s
If run sequentially
13.4s
Saved by parallelism
−6.4s
The timeline makes the tradeoff visible. Sequential execution for the Tokyo trace would take 13.4s. Parallel execution brings it to 7.0s. The time saved scales with the number of independent agents — which is why the fan-out pattern is worth the added complexity.
Memory and session state
The trip planner needs three kinds of memory working at the same time. Working memory holds the current request — the constraints extracted in this session, the results received so far, any corrections the user has made. It lives in the session state attached to the request and disappears when the request is done.
Long-term memory holds user preferences: airports they prefer, hotel amenities they care about, destinations they've visited and want to avoid repeating. This is retrieved from a vector store at the start of each session and injected into the IntentAgent's context. It persists across requests.
The conversation buffer holds the dialogue from the current planning session. If the user rejects the first hotel option and says "something in Shinjuku instead," the buffer gives the IntentAgent the context to understand that correction without asking again.
Long-term memory is retrieved, not injected wholesale. The IntentAgent queries the vector store with the current request as a query, then brings in only the top-k most relevant preferences. Injecting everything a user has ever told the system would waste context tokens on irrelevant history.
Error handling
The FlightAgent is a hard dependency. If it fails completely, the itinerary can't include flights. The error handling chain tries to avoid that outcome through three layers: retry, fallback provider, and finally graceful degradation.
Standard search call with all constraints. Timeout set at 6 seconds.
If both primary and secondary providers fail after all retries, the ResultAssembler receives an empty flights array with an error field. The itinerary is generated without flights, and the user is told clearly: "We couldn't find flight options right now. Here are hotel and activity suggestions for your dates."
Observability
Each agent emits an OpenTelemetry trace span when it starts and when it finishes. Tool calls are child spans. LLM calls are child spans with token counts attached as span attributes. The result is a complete trace tree that shows exactly where time was spent.
The trace below represents a real request. The root span covers the full request duration. You can see the sequential IntentAgent phase, the parallel search phase where flights and hotels run concurrently, and the assembly phase at the end.
trip.plan
7.2s
IntentAgent.parse
1.8s
LLM.extract_constraints
1.4s
VectorStore.get_preferences
0.2s
FlightAgent.search
3.2s
FlightsAPI.search (primary)
2.8s
HotelAgent.search
2.8s
HotelsAPI.search
2.4s
WeatherAgent.fetch
1.4s
ResultAssembler.build
2.0s
LLM.generate_itinerary
1.6s
The trace immediately shows that the LLM call inside IntentAgent (1.4s) dominates the first phase, and the FlightAgent API call (2.8s) dominates the parallel phase. These are the two places to optimise first — prompt caching for the intent extraction, and request timeout tuning for the flights API.
Full architecture recap
Every component in one diagram. The user enters through the UI. The IntentAgent translates the request into a structured event. The event bus fans the event out to four search agents. Each agent calls its tool, handles its own failures, and publishes its result. The ResultAssembler waits for hard dependencies, calls the LLM, and sends the itinerary back.
The observability stack watches every hop. The memory store is read at the start and written when new preferences are detected. Dead-letter events from failed agents go to a monitoring service that can trigger retries or alert on-call.
This is the same structure you would use for a procurement agent, a research assistant, or a customer onboarding workflow. The domain changes. The topology — intent layer, event bus, specialist agents, assembler — stays the same.