Skip to content

PRD: Search indexing (CDC / Typesense)

ModulePlatform (CORE-16)PRD IDPRD-IDX-001
StatusShippedOwnerPlatform / Search squad
Date2026-06-15Versionv1.0
Packages@nx/search · @nx/coreURDIDX · SCH

TL;DR

Gives the whole platform one always-fresh, denormalized search surface without any service writing to the search engine itself. Every committed database change flows out as a change-data event, and a single consumer mirrors it into the matching search collection - nine collections (organizers, merchants, categories, devices, sale-channels, products, product-variants, inventories, users) fed from a catalogue of CDC source tables. Each document is enriched with its related data so one hit carries everything the UI needs (the merchant's name, the variant's price, its images, its scan codes, its stock per location, its option facets), and a change to a shared parent (a merchant rename, a location rename, a shared code) fans out to every dependent document by targeted patch - never a full re-index. Documents are versioned so replayed or out-of-order events never resurrect stale state, the stream degrades safely when the engine is down, and callers query any collection through a unified keyword + semantic search API.

1. Context & Problem

KICKO's data lives across many services and Postgres schemas - commerce, pricing, inventory, identity. A storefront or back-office screen that needs to "find a product by name, barcode, option, or price, scoped to a merchant" cannot fan a query across all of those tables at read time, and asking every producing service to also write into a search engine would scatter indexing logic, double every write path, and drift out of sync the first time one service forgot to update.

What was missing is a single seam that turns the platform's existing change-data stream into a ready-to-query search surface. The hard parts are not the storage: they are keeping each search document denormalized but fresh (a product's hit must still show the right price after a fare edit, the right name after a rename), surviving replays and outages without corrupting state, and exposing one consistent query contract every app can use. This increment delivers that backbone.

2. Goals & Non-Goals

Goals

  • Mirror every committed write to an indexed source table into its search collection, driven only by the change-data stream - no producing service writes to search (IDX).
  • Map source tables to nine collections, each with one document-source table and a set of related/derived inputs (IDX).
  • Enrich every document with its joined related data before indexing, so one hit is self-contained (IDX).
  • Fan out a shared-record change to every dependent document by targeted patch, not a full re-index (IDX).
  • Version each document against replayed / out-of-order events; tombstone deletes and soft-deletes (IDX).
  • Degrade safely - circuit-break on engine/dependency outage, dead-letter poison messages, isolate per-document failures (IDX).
  • A unified keyword + semantic search query API over any registered collection, with a count, scoping, and the platform's standard list contract (SCH).

Non-Goals

  • Owning change-data capture - publishing topic events is the platform's CDC infrastructure (Debezium); search consumes well-formed events (URD-CON-008).
  • Indexing every table - SaleOrder is a defined CDC source but is not yet indexed into a collection (URD-CON-009).
  • An end-user re-index / backfill screen - backfill is operational snapshot replay (URD-CON-010).
  • The producing services' own write logic, pricing, or stock math - those live in Commerce, pricing, and Inventory.

3. Success Metrics

MetricTarget / signal
FreshnessA committed write is reflected in its collection within the stream's normal lag, with no manual step
Denormalization completenessA single search hit carries its related fields (name, price, images, codes, stock, facets) without a second lookup
Fan-out correctnessA parent rename / shared-code change updates every dependent document; no stale denormalized value lingers
Replay safetyA replayed or out-of-order event never overwrites a newer document state
Outage resilienceAn engine outage pauses and resumes the stream without data loss; poison messages land in the dead-letter topic, not the live path
Query consistencyEvery app searches any collection through one contract (envelope-or-array + range headers), keyword or semantic

4. Personas & Use Cases

PersonaGoal in this feature
Cashier / StorefrontFind a product, variant, or customer instantly by name, barcode, or option facet
Owner / ManagerSearch merchants, categories, sale-channels, inventory, and users scoped to what they manage
Channel / Back-office integrationQuery a collection by filter + count against a stable contract
Platform operatorTrust the stream stays fresh, survives outages, and isolates bad messages

Core scenario: an owner renames a merchant. The change is captured from the change-data stream and indexed onto the merchants collection; the same rename fans out to that merchant's products, categories, and sale-channels so every hit shows the new name - by targeted patch, not a re-index. Moments later a cashier searches products for a drink by name and gets a hit already carrying its price, image, option facets, and per-location stock. If the search engine goes down mid-stream, the consumer pauses, probes for recovery, and resumes where it left off - no events lost.

5. User Stories

  • As a storefront, I search a collection by name / barcode / option and get back hits that already carry everything I need to render, so I never make a second call per result.
  • As an owner, I rename a merchant once and every product, category, and channel that shows that name updates - I never re-publish a catalogue.
  • As an integration, I query and count a collection with a standard filter, so search behaves like every other list endpoint.
  • As a platform operator, I trust that a replayed event won't resurrect deleted data and that an engine outage pauses the stream instead of dropping writes.
  • As a back-office user, my search is scoped to my tenant automatically, so I never see records I may not.

6. Functional Requirements

#RequirementURD ref
FR-1Every committed change to an indexed source table is captured from the change-data stream and reflected in its collection - no producing service writes to searchURD-IDX-001
FR-2Source tables map to nine collections; each collection has one document-source table, the rest of its inputs are related / derivedURD-IDX-002
FR-3Create / update / snapshot events upsert the document; a delete or soft-delete writes a tombstone so it leaves resultsURD-IDX-003
FR-4Each document is enriched with its joined related data (owner names, category set, price, images, scan codes, stock-by-location, option facets, user identity / roles / organizers) before indexingURD-IDX-004
FR-5A shared / parent change fans out to every dependent document by targeted patch, not a full re-indexURD-IDX-005
FR-6Each document carries a version stamp; replays / out-of-order events never overwrite newer state; child→parent patches touch only their own fieldsURD-IDX-006
FR-7Events process in per-topic batches; a wholly-unparseable or unsynced batch is reported failed for retry, never silently skippedURD-IDX-007
FR-8Engine / dependency outage trips a circuit breaker that pauses and probes for recovery; poison messages divert to a dead-letter topicURD-IDX-008
FR-9A failed enrichment still indexes the document with the data on it; one failed cascade never blocks the batch's other fan-outsURD-IDX-009
FR-10Record ids embedded in engine filters are validated so a malformed id can never alter a fan-out's target setURD-IDX-010
FR-11Full-text search any registered collection by name with an Ignis-style filter (where / limit / skip / order / include / fields), returned in the platform's list envelope-or-array shape with range headers, plus a countURD-SCH-001..002
FR-12Search supports hybrid keyword + semantic (vector) matching - generic endpoint hybrid by default, resource-mounted search keyword-only with opt-inURD-SCH-003 · URD-SCH-005
FR-13A resource controller can mount a scoped /search + /search/count that merges a caller-scope (tenant) into the query; search and count are authenticated and permission-gatedURD-SCH-004 · URD-SCH-006..007

Full requirement text and acceptance criteria live in the Platform URD - IDX and SCH. This PRD references them rather than restating them.

7. Non-Functional Requirements

AreaRequirement
FreshnessIndexing follows the change-data stream; no producing service writes to search, so there is one path and no double-write drift
IdempotencyA document's version stamp (source log position) makes re-delivery and replay safe - newer state always wins
ResilienceEngine / dependency outage pauses the stream via a circuit breaker and probes for recovery; poison messages are dead-lettered; batches retry on failure
Fault isolationEnrichment failure → index un-enriched; one cascade failure → others still apply; one document's failure never fails the batch wholesale
PerformanceFan-out uses targeted filtered patches with a concurrency cap; a parent change touching thousands of children never triggers a full re-index
SafetyIds embedded in engine filters are validated; query failures classify cleanly (missing collection → empty, bad query → 400)
Tenancy & authzResource-mounted search merges a caller-scope into the query; search / count are JWT- or Basic-authenticated and permission-gated
i18nDenormalized names are stored as bilingual ({ en, vi }) objects and searchable in both

8. UX & Flows

A document-source table (e.g. Product, ProductVariant, InventoryStock, User) produces its own collection document; the rest of a collection's inputs - pricing, options, images, scan codes, profiles, grants, join tables - are cascade-only: their changes fan out into the document that already exists rather than producing one of their own.

9. Data & Domain

ConceptRole
Search collectionOne of nine denormalized indexes: organizers, merchants, categories, devices, sale-channels, products, product-variants, inventories, users
Document-source tableThe single CDC table whose rows become a collection's documents (e.g. InventoryStockinventories, Userusers)
Cascade-only sourceA related table with no collection of its own; its change fans out into an existing document (e.g. FareSet/Fare → variant price, ProductOption → variant facets, UserIdentifier → user contact)
EnrichmentThe join step that folds a document's related Postgres data onto it before indexing
Cascade triggerA typed signal that a shared / parent change must patch a set of dependent documents
Version stampThe source log position (and a tombstone marker) carried on every document so the newest write wins

Conceptual only - the collection schemas, mapper set, and pipeline internals live in the search developer docs. Cross-entity relations are soft references resolved at enrichment time, not database joins.

10. Dependencies & Assumptions

Depends on

  • Change-data infrastructure - Debezium publishes nx.bana.cdc.<schema>.<Table> topics the consumer subscribes to (URD-CON-008).
  • The search engine (Typesense) - the index store and query engine behind every collection.
  • @nx/core - the CDC source-table catalogue, topic registry, and the shared models the enrichment loaders read.
  • Commerce / pricing / inventory / identity data - the source rows and the related data each document is enriched and fanned out from.

Assumptions

  • Change-data events are well-formed and ordered enough that the version stamp can resolve the rest.
  • The collections are provisioned in the engine before the stream runs; a not-yet-created collection returns empty rather than erroring.
  • Producing services emit their domain writes normally; they neither know nor care that search consumes them.

11. Risks & Open Questions

Risk / questionMitigation / status
A replayed or out-of-order event resurrects stale / deleted stateEvery document is version-stamped by source log position; newer state always wins; child→parent patches never flip lifecycle
A parent rename touching thousands of children causes a re-index stormFan-out uses targeted filtered patches with a concurrency cap - never a full re-index
Search engine outage stalls or loses the streamCircuit breaker pauses and probes for recovery; batches are reported failed and retried; no offset advance on failure
One bad message or failed enrichment blocks the batchPoison messages dead-letter; enrichment failure indexes un-enriched; per-cascade try/catch isolates failures
A malformed id alters a fan-out's target filterIds embedded in engine filters are validated before use
SaleOrder is a CDC source but not indexedDocumented as a deliberate non-goal for this increment (URD-CON-009)

12. Release Plan & Launch Criteria

AspectPlan
PhaseP2 - IDX and SCH in the URD feature catalog
RolloutAll merchants; the backbone runs platform-wide, no per-merchant flag
MigrationNone at the data layer - collections are provisioned and backfilled by snapshot replay
Operational toggleThe circuit breaker is enabled by environment configuration; the dead-letter topic is configurable
Launch criteriaA committed write reaches its collection enriched; a parent rename fans out to dependents; a replay does not overwrite newer state; an engine outage pauses and resumes without loss; any collection is searchable + countable through the unified API
MonitoringPer-batch stats (creates / updates / deletes / snapshots / parse errors / engine ok / fail / throughput), cascade ok/failed counts, circuit-breaker trips and escalations, dead-letter volume

13. FAQ

Does each service write to the search engine? No - no producing service touches search. Every index write is driven by the change-data stream, so there is one path and no double-write drift.

How does a hit show the right price or name after an edit? The document is enriched with its related data at index time, and a change to a shared parent (a fare, a rename, a shared code) fans out to every dependent document by targeted patch - so the denormalized value is refreshed, not left stale.

What happens to a deleted record? A delete or soft-delete writes a tombstone, so the record leaves results while its version ordering is preserved.

Won't a replayed event corrupt the index? No - every document carries a version stamp (source log position). A replayed or out-of-order event that is older than the current state is ignored.

What happens when the search engine is down? A circuit breaker pauses the stream and probes for recovery; the batch is reported failed and retried, so nothing is lost. Poison messages are diverted to a dead-letter topic instead of blocking the live path.

How do apps query it? Through one contract: full-text search any registered collection by name with an Ignis-style filter, plus a count - keyword by default, with hybrid semantic (vector) matching available. Resource-mounted search automatically scopes results to the caller's tenant.

References

Proprietary and Confidential. Unauthorized copying, distribution, or use of this software is strictly prohibited.