Attribute Mapping from Blueprints

Attribute mapping serves as the semantic normalization layer between raw geometric extraction and operational indoor navigation systems. It bridges vectorized floor plan outputs with the structured, queryable records required by facilities management platforms, spatial analytics engines, and routing graphs. The process resolves coordinate drift, ambiguous drafting conventions, and fragmented metadata into deterministic feature attributes that downstream systems can consume without manual intervention.

Pipeline Architecture & Semantic Handoff

Within the broader Automated Floor Plan Parsing & Vectorization ecosystem, attribute mapping operates as a stateless, idempotent transformation stage. It ingests polygonized room boundaries, detected architectural elements, and raw text annotations, then emits standardized feature records aligned to a unified spatial reference system (SRS). The architecture enforces a strict three-phase execution model:

  1. Coordinate Normalization & Text Extraction – Unit conversion, affine alignment, and semantic filtering of CAD/SVG text entities.
  2. Spatial Association & Label Resolution – R-tree indexing, buffered spatial joins, and confidence-weighted assignment of labels to polygons.
  3. Schema Validation & Graph Handoff – Type enforcement, mandatory field verification, and topology consistency checks before routing graph ingestion.

The handoff from upstream parsers requires rigorous coordinate space alignment. Blueprint units (millimeters, inches, architectural units) must be converted to a consistent metric CRS before any spatial operations occur. Text entities extracted from CAD blocks or SVG <text> nodes carry baseline coordinates, rotation angles, and layer metadata that must be projected into the same frame as vectorized boundaries. Misalignment at this stage propagates directly into routing failures, making deterministic normalization non-negotiable.

Phase 1: Coordinate Normalization & Text Extraction

Blueprint text rarely aligns with room centroids. Drafters place labels near doorways, along circulation paths, or in open zones. The first implementation step normalizes all coordinates, applies unit scaling, and filters out non-semantic annotations using positional heuristics and regex patterns.

import re
import logging
from typing import List, Tuple, Dict, Optional
import numpy as np
import pyproj
from shapely.geometry import Point, Polygon, box
from shapely.affinity import rotate, translate
from shapely.ops import transform
from pydantic import BaseModel, Field, ValidationError

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

class BlueprintText(BaseModel):
    raw_text: str
    x: float
    y: float
    rotation_deg: float = 0.0
    layer: str = ""
    font_size: float = 0.0
    confidence: float = Field(default=0.0, ge=0.0, le=1.0)

class NormalizedAnnotation(BaseModel):
    text: str
    geometry: Point
    layer: str
    font_size: float
    confidence: float

# Non-semantic patterns common in architectural drawings
NON_SEMANTIC_PATTERNS = re.compile(
    r"^(REV|SCALE|DATE|DRAWN BY|CHECKED|SHEET|DWG|NORTH|SCALE\s+\d+:\d+|\d{1,3}[-/]\d{1,3}[-/]\d{2,4}|[A-Z]{2,4}-\d{3,5})$",
    re.IGNORECASE
)

def normalize_units_to_meters(
    raw_coords: List[Tuple[float, float]],
    drawing_units: str = "mm",
    scale_factor: Optional[float] = None
) -> List[Tuple[float, float]]:
    """Convert drawing coordinates to meters using standard architectural scale factors."""
    unit_multipliers = {"mm": 0.001, "in": 0.0254, "ft": 0.3048, "m": 1.0, "arch": 0.0254}
    multiplier = unit_multipliers.get(drawing_units, 0.001)
    if scale_factor is not None:
        multiplier *= scale_factor
    return [(x * multiplier, y * multiplier) for x, y in raw_coords]

def extract_and_filter_text(
    raw_texts: List[BlueprintText],
    drawing_units: str = "mm"
) -> List[NormalizedAnnotation]:
    """Normalize coordinates, filter non-semantic text, and return structured annotations."""
    normalized = []
    for txt in raw_texts:
        if NON_SEMANTIC_PATTERNS.match(txt.raw_text.strip()):
            continue
            
        # Convert coordinates
        x_m, y_m = normalize_units_to_meters([(txt.x, txt.y)], drawing_units)[0]
        
        # Apply rotation correction if text is skewed
        geom = Point(x_m, y_m)
        if txt.rotation_deg != 0:
            geom = rotate(geom, -txt.rotation_deg, origin=(x_m, y_m))
            
        # Initial confidence based on font size consistency and layer classification
        base_conf = min(1.0, txt.font_size / 12.0) if txt.font_size > 0 else 0.3
        if txt.layer.lower() in ("text", "annotations", "labels", "room_names"):
            base_conf = min(1.0, base_conf + 0.3)
            
        normalized.append(NormalizedAnnotation(
            text=txt.raw_text.strip(),
            geometry=geom,
            layer=txt.layer,
            font_size=txt.font_size,
            confidence=base_conf
        ))
    return normalized

Coordinate normalization must be applied before any spatial indexing. The pyproj library should be used when transforming between local CAD origins and real-world CRS coordinates, particularly when integrating with GIS platforms. See the pyproj documentation for authoritative guidance on CRS transformations and datum shifts.

Phase 2: Spatial Indexing & Label Association

A naive point-in-polygon test fails when labels sit outside room boundaries, when multiple rooms share open-plan spaces, or when drafting standards place text in corridors. The association engine must use buffered spatial queries, proximity scoring, and fallback heuristics.

from rtree import index
from shapely.geometry import Polygon, box
from typing import Dict, List, Tuple

class SpatialLabelResolver:
    def __init__(self, room_polygons: Dict[str, Polygon]):
        self.polygons = room_polygons
        self.idx = index.Index()
        # rtree requires integer IDs, so keep a position -> room_id map alongside it.
        self._id_lookup: Dict[int, str] = {}
        for i, (room_id, poly) in enumerate(room_polygons.items()):
            self.idx.insert(i, poly.bounds)
            self._id_lookup[i] = room_id

    def associate_labels(
        self, 
        annotations: List[NormalizedAnnotation], 
        buffer_m: float = 0.5
    ) -> Dict[str, List[NormalizedAnnotation]]:
        """Map annotations to rooms using buffered spatial joins and confidence scoring."""
        room_assignments: Dict[str, List[NormalizedAnnotation]] = {rid: [] for rid in self.polygons}
        
        for ann in annotations:
            # Expand search area with buffer
            search_box = box(
                ann.geometry.x - buffer_m, 
                ann.geometry.y - buffer_m,
                ann.geometry.x + buffer_m, 
                ann.geometry.y + buffer_m
            )
            
            candidates = []
            for rtree_id in self.idx.intersection(search_box.bounds):
                room_id = self._id_lookup[rtree_id]
                poly = self.polygons[room_id]
                if poly.is_valid and ann.geometry.distance(poly) <= buffer_m:
                    dist = ann.geometry.distance(poly)
                    # Score: closer + higher confidence = better match
                    score = ann.confidence * (1.0 / (1.0 + dist))
                    candidates.append((room_id, score, ann))
                    
            if candidates:
                candidates.sort(key=lambda x: x[1], reverse=True)
                best_room = candidates[0][0]
                room_assignments[best_room].append(ann)
            else:
                # Fallback: nearest centroid heuristic
                nearest = min(
                    self.polygons.items(),
                    key=lambda item: ann.geometry.distance(item[1].centroid)
                )
                room_assignments[nearest[0]].append(ann)
                
        return room_assignments

This resolver handles edge cases where labels fall in circulation zones by applying a configurable buffer and distance-weighted scoring. For facilities teams, the buffer_m parameter should be tuned based on drafting scale (typically 0.3–0.8m for 1:100 or 1:50 plans). When integrating with upstream SVG/DWG Parsing Workflows, ensure that block attributes and text entities are exported with consistent layer naming conventions to improve initial confidence scoring.

Phase 3: Schema Validation & Graph Handoff

Once labels are spatially resolved, the pipeline must enforce a strict output schema. Routing engines and CMDB integrations require deterministic field types, mandatory identifiers, and topology-ready attributes.

from pydantic import BaseModel, field_validator, ValidationError
from typing import Optional, List

class MappedRoomAttribute(BaseModel):
    room_id: str
    name: str
    area_sqm: float
    occupancy_type: str
    floor_level: int
    door_count: int
    wall_material: Optional[str] = None
    label_confidence: float
    geometry: str  # WKT or GeoJSON string

    @field_validator("area_sqm", "label_confidence")
    @classmethod
    def validate_positive(cls, v: float) -> float:
        if v < 0:
            raise ValueError("Must be non-negative")
        return v

    @field_validator("occupancy_type")
    @classmethod
    def normalize_occupancy(cls, v: str) -> str:
        return v.strip().upper()

def validate_batch(
    assignments: Dict[str, List[NormalizedAnnotation]],
    polygon_areas: Dict[str, float],
    floor_level: int
) -> List[MappedRoomAttribute]:
    """Convert spatial assignments into validated schema records."""
    records = []
    for room_id, anns in assignments.items():
        # Extract highest-confidence label as room name
        name = max(anns, key=lambda a: a.confidence).text if anns else f"ROOM_{room_id}"
        avg_conf = sum(a.confidence for a in anns) / len(anns) if anns else 0.0
        
        try:
            rec = MappedRoomAttribute(
                room_id=room_id,
                name=name,
                area_sqm=polygon_areas.get(room_id, 0.0),
                occupancy_type="GENERAL",  # Default; override via NLP or lookup table
                floor_level=floor_level,
                door_count=0,  # Populated by [Wall & Door Detection Algorithms](/automated-floor-plan-parsing-vectorization/wall-door-detection-algorithms/)
                label_confidence=round(avg_conf, 3),
                geometry=polygon_areas.get(f"{room_id}_wkt", "")
            )
            records.append(rec)
        except ValidationError as e:
            logging.warning(f"Schema validation failed for {room_id}: {e}")
            
    return records

Validation should run synchronously within the mapping stage to prevent malformed records from entering the routing graph. The occupancy_type and door_count fields are typically enriched by downstream classifiers or detection modules. For authoritative spatial data modeling standards, refer to the OGC IndoorGML specification, which defines interoperable schemas for indoor navigation and facility management.

Production Execution & Async Orchestration

Attribute mapping must scale horizontally across multi-floor campuses and batch-processing queues. The following patterns ensure fault tolerance and deterministic execution:

  • Idempotency Keys: Hash input polygon geometries and raw text payloads to generate deterministic job IDs. Re-running the same blueprint yields identical outputs without duplication.
  • Chunked Processing: Split floor plans by spatial tiles or functional zones to avoid memory spikes during R-tree construction.
  • Stateless Workers: Use Celery, RQ, or AWS Lambda with ephemeral storage. Persist only validated JSON/GeoJSON outputs to object storage or PostGIS.
  • Observability: Emit structured metrics for labels_filtered, spatial_misses, validation_failures, and processing_latency. Alert when validation_failures > 5% of batch size.
# Example async worker signature (Celery-compatible)
from celery import Celery
import json

app = Celery("attribute_mapper", broker="redis://localhost:6379/0")

@app.task(bind=True, max_retries=3, default_retry_delay=60)
def process_floor_plan(self, blueprint_id: str, raw_polygons: dict, raw_texts: list):
    try:
        normalized = extract_and_filter_text(raw_texts)
        resolver = SpatialLabelResolver(raw_polygons)
        assignments = resolver.associate_labels(normalized)
        records = validate_batch(assignments, {k: v.area for k, v in raw_polygons.items()}, floor_level=1)
        return {"status": "success", "records": [r.model_dump() for r in records]}
    except Exception as exc:
        logging.error(f"Mapping failed for {blueprint_id}: {exc}")
        raise self.retry(exc=exc)

Troubleshooting & Edge Case Resolution

Symptom Root Cause Resolution
Labels assigned to wrong rooms Buffer too small or text placed in corridor Increase buffer_m to 0.6–1.0, enable centroid fallback, verify layer filtering
High validation failure rate Missing mandatory fields or malformed WKT Add pre-validation geometry repair (buffer(0)), enforce schema defaults, log raw payloads
Coordinate drift across floors Inconsistent drawing origins or missing CRS Apply global affine registration, use control points, enforce pyproj transformations
Duplicate room names Multiple labels per room with equal confidence Implement NLP deduplication, prioritize largest font size or closest-to-centroid label
Slow spatial joins Unindexed polygons or overlapping boundaries Pre-merge overlapping rooms, use rtree bulk loading (index.Index(interleaved=True)), validate is_valid

Facilities Tech Checklist:

  • Verify CAD export settings preserve text baseline coordinates and rotation.
  • Ensure all layers containing room names are explicitly whitelisted in the extraction config.
  • Run topology validation post-mapping to confirm door-to-room connectivity aligns with physical access control logs.

GIS Dev Checklist:

  • Project all geometries to EPSG:3857 or local UTM zone before spatial joins.
  • Use shapely’s make_valid() and buffer(0) to repair self-intersecting polygons from legacy DWG exports.
  • Index final outputs in PostGIS with GIST indexes on geometry and room_id for sub-50ms routing queries.

Attribute mapping transforms static drafting artifacts into operational spatial intelligence. By enforcing strict coordinate normalization, confidence-weighted spatial association, and schema validation, engineering teams can reliably feed routing engines, CMDBs, and real-time wayfinding systems with deterministic, production-grade indoor data.