Import Audit & Idempotency
Problem
Without stable source keys, re-running an import creates duplicates. Without run tracking, you can't answer "what did this import create?" or roll back a bad run.
Solution: Source IDs + Import Run ID
Features
Every IMDF feature (unit or amenity) carries:
properties: {
// ... featureType, amenityCategory, etc.
sourceIds?: {
mappedin?: {
polygonId: string; // Mappedin polygon document ID
locationId?: string; // Mappedin location ID
importKey?: string; // Stable natural key
};
};
importRunId?: string; // UUID4, one per script invocation
importSource?: string; // "mappedin", "abuzz", etc.
}Stores
Stores carry:
{
sourceIds?: {
mappedin?: {
venueId?: string;
locationId: string;
floorMapId?: string;
externalId?: string;
importKey: string; // "mappedin:{venueId}:{locationId}:{floorMapId}"
};
};
importRunId?: string;
importSource?: string;
}The POST /stores endpoint already upserts by importKey — if a store with the same key exists, it PATCHes instead of creating.
Backfill Strategy
First Run (Legacy Data)
Malls imported before import audit existed have no source keys. The --claim-legacy mode:
- Stores: Match existing keyless stores by
(name, floor), stamp withimportKey.--delete-legacy-dupesremoves within-floor name duplicates (keeps oldest). - Units: Match existing keyless units by polygon centroid proximity (greedy nearest-first within threshold), stamp with
polygonId. Properties-only PATCH to avoid triggering the floor geometry quality gate. - Amenities: Delete all legacy (unkeyed) amenities and recreate with proper keys (amenities have no occupancy links to preserve).
Subsequent Runs (Idempotent Upsert)
Once all entities have source keys:
- polygonId in source AND in DB → PATCH
- polygonId in source, NOT in DB → POST
- polygonId in DB, NOT in source → DELETE (gated behind
--delete-removedfor units/stores due to cascading)
Run Tracking
Every write stamps importRunId (UUID4 generated once per script invocation) and importSource. To audit or roll back:
# Find all features from a specific run
# (query Firestore where properties.importRunId == "<run-id>")Design Decisions
Why centroid matching for units?
Legacy units have no source identifier. Name matching (like stores) doesn't work because unit labels are often empty or generic ("Unit 42"). Polygon centroid proximity is reliable because the same source data was used for both the original import and the backfill — centroids match within a few meters.
Threshold: 0.00005° for geographic coords (~5m), 0.02 for normalized coords.
Why properties-only PATCH for unit claims?
Sending full geometry in the PATCH triggers the floor geometry quality gate, which evaluates ALL units on the floor for overlaps. Minor coordinate differences between the original import and re-parsed data cause false-positive overlap rejections. Since the existing geometry is already correct (same source), we only need to stamp the source key — no geometry update needed.
Why delete-and-recreate for amenities?
Amenities have no occupancy links or store references that would break on delete. Delete-and-recreate is simpler than upsert and ensures the amenity set matches the parser output exactly (correct categories, correct positions).