POI Taxonomy & Classification Pipeline

A robust Point of Interest (POI) taxonomy serves as the semantic backbone for indoor wayfinding, transforming raw spatial exports into routable, queryable assets. For facilities teams and GIS developers, the classification layer dictates routing weights, accessibility filters, and search relevance. Without strict schema enforcement, navigation engines degrade into heuristic guesswork, producing suboptimal paths and broken accessibility routing. This guide details the implementation pipeline for automated POI classification, spatial validation, and graph-ready attribute mapping within the broader Indoor Mapping Architecture & Standards framework.

Hierarchical Schema & Attribute Standardization

Indoor POI classification requires a deterministic, multi-tier hierarchy that maps cleanly to routing graph nodes. The standard implementation uses a three-level taxonomy:

  • L1 (Domain): High-level facility zones (Healthcare, Corporate, Retail, Transit)
  • L2 (Category): Functional groupings (Clinical, Administrative, Amenity, Circulation)
  • L3 (Type): Discrete, routable entities (Exam_Room, Restroom, Elevator, Security_Checkpoint)

Adherence to Best practices for indoor POI taxonomy ensures consistent attribute propagation across multi-tenant environments and prevents classification drift during CAD-to-GIS conversion. Each L3 type must carry a standardized attribute payload that routing engines consume to calculate traversal costs, enforce ADA/WCAG constraints, and manage fallback priorities.

Mandatory fields include:

  • poi_id: Immutable UUID v4 identifier across sync cycles
  • classification_path: Ordered array [L1, L2, L3]
  • is_accessibility_compliant: Boolean flag for wheelchair/stroller routing
  • operational_hours: ISO 8601 schedule string or 24/7
  • routing_weight: Float (0.11.0, where 1.0 represents standard corridor traversal cost)

Optional but highly recommended fields: capacity, requires_badge_access, maintenance_status, floor_level. Facilities engineers should enforce strict schema validation at ingestion to prevent downstream routing anomalies.

Production Pipeline: Ingestion to Graph-Ready Classification

The following Python pipeline handles raw ingestion, schema validation, classification normalization, and routing weight assignment. It leverages pydantic for strict type enforcement and geopandas for spatial operations.

import uuid
import logging
from typing import List, Optional, Dict, Any
from datetime import datetime
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, Polygon, box
from pydantic import BaseModel, Field, ValidationError, field_validator
from pydantic import ConfigDict

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s | %(levelname)s | %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S"
)

# --- Schema Definitions ---
class POIAttributes(BaseModel):
    model_config = ConfigDict(frozen=True)
    
    poi_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    classification_path: List[str] = Field(min_length=3, max_length=3)
    is_accessibility_compliant: bool
    operational_hours: str
    routing_weight: float = Field(ge=0.1, le=1.0)
    capacity: Optional[int] = None
    requires_badge_access: bool = False
    maintenance_status: str = "active"

    @field_validator("classification_path")
    @classmethod
    def validate_taxonomy(cls, v: List[str]) -> List[str]:
        allowed_l1 = {"Healthcare", "Corporate", "Retail", "Transit"}
        allowed_l2 = {"Clinical", "Administrative", "Amenity", "Circulation"}
        if v[0] not in allowed_l1:
            raise ValueError(f"Invalid L1 Domain: {v[0]}")
        if v[1] not in allowed_l2:
            raise ValueError(f"Invalid L2 Category: {v[1]}")
        return v

# --- Pipeline Functions ---
def normalize_classification_path(raw_path: str | List[str]) -> List[str]:
    """Coerce raw taxonomy strings into standardized L1/L2/L3 format."""
    if isinstance(raw_path, str):
        parts = [p.strip() for p in raw_path.split("/") if p.strip()]
    else:
        parts = [str(p).strip() for p in raw_path if p]
    
    # Fallback mapping for legacy CAD exports
    taxonomy_map = {
        "Restroom": ["Corporate", "Amenity", "Restroom"],
        "Elevator_Core": ["Corporate", "Circulation", "Elevator"],
        "Patient_Room": ["Healthcare", "Clinical", "Exam_Room"],
        "Security": ["Corporate", "Administrative", "Security_Checkpoint"]
    }
    l3_key = parts[-1] if parts else ""
    return taxonomy_map.get(l3_key, parts) if len(parts) == 3 else parts

def assign_routing_weight(l3_type: str, is_accessible: bool) -> float:
    """Calculate traversal cost based on POI type and accessibility constraints."""
    base_weights = {
        "Elevator": 0.9, "Escalator": 0.8, "Stairwell": 0.6,
        "Corridor": 1.0, "Restroom": 0.4, "Security_Checkpoint": 0.3,
        "Exam_Room": 0.2, "Amenity": 0.5
    }
    weight = base_weights.get(l3_type, 0.7)
    # Accessibility penalty: compliant routes get lower cost
    return round(weight * 0.85 if is_accessible else weight, 2)

def validate_and_classify_pois(raw_gdf: gpd.GeoDataFrame) -> gpd.GeoDataFrame:
    """Main ingestion pipeline: validate, classify, and attach routing attributes."""
    validated_records = []
    rejected_count = 0
    
    for idx, row in raw_gdf.iterrows():
        try:
            raw_path = row.get("classification_path", row.get("category", []))
            norm_path = normalize_classification_path(raw_path)
            l3 = norm_path[-1] if norm_path else "Unknown"
            
            is_acc = bool(row.get("is_accessibility_compliant", False))
            weight = assign_routing_weight(l3, is_acc)
            
            poi = POIAttributes(
                classification_path=norm_path,
                is_accessibility_compliant=is_acc,
                operational_hours=row.get("operational_hours", "24/7"),
                routing_weight=weight,
                capacity=row.get("capacity"),
                requires_badge_access=bool(row.get("requires_badge_access", False)),
                maintenance_status=row.get("maintenance_status", "active")
            )
            validated_records.append({**poi.model_dump(), "geometry": row.geometry})
        except ValidationError as e:
            rejected_count += 1
            logging.warning(f"Schema rejection at index {idx}: {e}")
        except Exception as e:
            rejected_count += 1
            logging.error(f"Unexpected error at index {idx}: {e}")

    if not validated_records:
        raise RuntimeError("Pipeline halted: 0 valid POIs ingested.")

    logging.info(f"Validation complete. Accepted: {len(validated_records)} | Rejected: {rejected_count}")
    return gpd.GeoDataFrame(validated_records, crs="EPSG:3857" if raw_gdf.crs is None else raw_gdf.crs)

# --- Execution Example ---
if __name__ == "__main__":
    # Simulate raw CAD/CMDB export
    mock_data = {
        "classification_path": ["Corporate/Amenity/Restroom", "Healthcare/Clinical", ["Retail", "Circulation", "Elevator"]],
        "is_accessibility_compliant": [True, False, True],
        "operational_hours": ["06:00-22:00", "24/7", "24/7"],
        "capacity": [2, 1, 12],
        "requires_badge_access": [False, True, False],
        "geometry": [Point(0, 0), Point(10, 5), Point(20, 15)]
    }
    raw_gdf = gpd.GeoDataFrame(mock_data, crs="EPSG:3857")
    
    classified_gdf = validate_and_classify_pois(raw_gdf)
    print(classified_gdf[["poi_id", "classification_path", "routing_weight", "is_accessibility_compliant"]].head())

Spatial Validation & Z-Axis Alignment

Classification alone does not guarantee routability. POI geometries must align with the facility’s spatial reference system and vertical topology. Misaligned coordinates or missing elevation metadata cause routing engines to project paths through walls or across disconnected floors.

When ingesting BIM exports, always verify that the coordinate reference system matches your indoor mapping baseline. Consult Indoor Coordinate Reference Systems for projection normalization workflows, particularly when merging local CAD grids with global basemaps. Additionally, vertical routing depends on explicit floor-level binding. POIs tied to elevators, stairwells, or atrium spaces must inherit the correct floor_level and z_offset attributes to prevent cross-floor path leakage. Implement strict Z-axis validation using Level Mapping & Z-Axis Logic to ensure vertical transitions only occur at designated circulation nodes.

A practical spatial validation step involves checking for geometry overlap with restricted zones (e.g., mechanical rooms, server closets) and verifying that all POIs fall within the building footprint polygon. Use shapely’s within() and intersects() predicates to flag spatial outliers before graph generation.

Troubleshooting & Edge Case Resolution

Production pipelines inevitably encounter malformed records, legacy taxonomy drift, or disconnected graph nodes. The following diagnostic steps address the most frequent failure modes:

1. Routing Weight Anomalies

Symptom: Navigation engine routes users through restricted areas or avoids accessible paths entirely. Root Cause: routing_weight values outside the 0.1–1.0 range, or is_accessibility_compliant flags inverted during CSV/JSON parsing. Resolution:

# Quick audit script
invalid_weights = classified_gdf[~classified_gdf["routing_weight"].between(0.1, 1.0)]
if not invalid_weights.empty:
    logging.warning(f"Found {len(invalid_weights)} POIs with invalid routing weights. Clipping to bounds.")
    classified_gdf["routing_weight"] = classified_gdf["routing_weight"].clip(0.1, 1.0)

2. Orphaned POI Nodes

Symptom: Wayfinding UI displays POIs that cannot be reached from any entrance or corridor. Root Cause: Spatial disconnect between POI geometry and the underlying navigation mesh, or missing classification_path hierarchy. Resolution: Run a connectivity audit using graph traversal (e.g., networkx). Isolate POIs with zero adjacent edges and verify their spatial coordinates against the routing graph. For systematic remediation, follow the workflow in Detecting and fixing orphaned POI nodes to automatically snap or flag disconnected entities.

3. Schema Drift During Sync Cycles

Symptom: Classification paths degrade from ["Healthcare", "Clinical", "Exam_Room"] to ["Healthcare", "Exam_Room", ""] after CMDB sync. Root Cause: Upstream data source changed delimiter format or dropped L2/L3 columns. Resolution: Implement a pre-ingestion schema diff check. Compare incoming column headers and value distributions against the last known good snapshot. Reject batches where classification_path cardinality drops below 3, and trigger an alert to the data engineering team.

Operational Deployment Notes

  • Idempotency: Always use deterministic poi_id generation tied to source asset tags (e.g., room_number + floor_level) rather than random UUIDs. This enables safe upserts during nightly syncs.
  • Performance: For facilities with >10,000 POIs, batch validation using pydantic’s TypeAdapter or vectorized pandas operations reduces pipeline latency by 60–80%.
  • Compliance: Maintain an audit log of all schema rejections and weight overrides. Facilities compliance teams require traceability for ADA routing claims and emergency egress mapping.

By enforcing strict taxonomy classification, validating spatial alignment, and automating weight assignment, indoor navigation teams can deliver deterministic, accessible wayfinding at scale. The pipeline outlined here integrates directly into CI/CD workflows for map updates, ensuring that every POI entering the routing graph is semantically sound and spatially valid.