Skip to content

Import Audit & Idempotency

Problem

Without stable source keys, re-running an import creates duplicates. Without run tracking, you can't answer "what did this import create?" or roll back a bad run.

Solution: Source IDs + Import Run ID

Features

Every IMDF feature (unit or amenity) carries:

typescript
properties: {
  // ... featureType, amenityCategory, etc.
  sourceIds?: {
    mappedin?: {
      polygonId: string;     // Mappedin polygon document ID
      locationId?: string;   // Mappedin location ID
      importKey?: string;    // Stable natural key
    };
  };
  importRunId?: string;      // UUID4, one per script invocation
  importSource?: string;     // "mappedin", "abuzz", etc.
}

Stores

Stores carry:

typescript
{
  sourceIds?: {
    mappedin?: {
      venueId?: string;
      locationId: string;
      floorMapId?: string;
      externalId?: string;
      importKey: string;     // "mappedin:{venueId}:{locationId}:{floorMapId}"
    };
  };
  importRunId?: string;
  importSource?: string;
}

The POST /stores endpoint already upserts by importKey — if a store with the same key exists, it PATCHes instead of creating.

Backfill Strategy

First Run (Legacy Data)

Malls imported before import audit existed have no source keys. The --claim-legacy mode:

  1. Stores: Match existing keyless stores by (name, floor), stamp with importKey. --delete-legacy-dupes removes within-floor name duplicates (keeps oldest).
  2. Units: Match existing keyless units by polygon centroid proximity (greedy nearest-first within threshold), stamp with polygonId. Properties-only PATCH to avoid triggering the floor geometry quality gate.
  3. Amenities: Delete all legacy (unkeyed) amenities and recreate with proper keys (amenities have no occupancy links to preserve).

Subsequent Runs (Idempotent Upsert)

Once all entities have source keys:

  • polygonId in source AND in DB → PATCH
  • polygonId in source, NOT in DB → POST
  • polygonId in DB, NOT in source → DELETE (gated behind --delete-removed for units/stores due to cascading)

Run Tracking

Every write stamps importRunId (UUID4 generated once per script invocation) and importSource. To audit or roll back:

bash
# Find all features from a specific run
# (query Firestore where properties.importRunId == "<run-id>")

Design Decisions

Why centroid matching for units?

Legacy units have no source identifier. Name matching (like stores) doesn't work because unit labels are often empty or generic ("Unit 42"). Polygon centroid proximity is reliable because the same source data was used for both the original import and the backfill — centroids match within a few meters.

Threshold: 0.00005° for geographic coords (~5m), 0.02 for normalized coords.

Why properties-only PATCH for unit claims?

Sending full geometry in the PATCH triggers the floor geometry quality gate, which evaluates ALL units on the floor for overlaps. Minor coordinate differences between the original import and re-parsed data cause false-positive overlap rejections. Since the existing geometry is already correct (same source), we only need to stamp the source key — no geometry update needed.

Why delete-and-recreate for amenities?

Amenities have no occupancy links or store references that would break on delete. Delete-and-recreate is simpler than upsert and ensures the amenity set matches the parser output exactly (correct categories, correct positions).