Attribute Mapping from Blueprints
Attribute mapping serves as the semantic normalization layer between raw geometric extraction and operational indoor navigation systems. It bridges vectorized floor plan outputs with the structured, queryable records required by facilities management platforms, spatial analytics engines, and routing graphs. The process resolves coordinate drift, ambiguous drafting conventions, and fragmented metadata into deterministic feature attributes that downstream systems can consume without manual intervention.
Pipeline Architecture & Semantic Handoff
Within the broader Automated Floor Plan Parsing & Vectorization ecosystem, attribute mapping operates as a stateless, idempotent transformation stage. It ingests polygonized room boundaries, detected architectural elements, and raw text annotations, then emits standardized feature records aligned to a unified spatial reference system (SRS). The architecture enforces a strict three-phase execution model:
- Coordinate Normalization & Text Extraction – Unit conversion, affine alignment, and semantic filtering of CAD/SVG text entities.
- Spatial Association & Label Resolution – R-tree indexing, buffered spatial joins, and confidence-weighted assignment of labels to polygons.
- Schema Validation & Graph Handoff – Type enforcement, mandatory field verification, and topology consistency checks before routing graph ingestion.
The handoff from upstream parsers requires rigorous coordinate space alignment. Blueprint units (millimeters, inches, architectural units) must be converted to a consistent metric CRS before any spatial operations occur. Text entities extracted from CAD blocks or SVG <text> nodes carry baseline coordinates, rotation angles, and layer metadata that must be projected into the same frame as vectorized boundaries. Misalignment at this stage propagates directly into routing failures, making deterministic normalization non-negotiable.
Phase 1: Coordinate Normalization & Text Extraction
Blueprint text rarely aligns with room centroids. Drafters place labels near doorways, along circulation paths, or in open zones. The first implementation step normalizes all coordinates, applies unit scaling, and filters out non-semantic annotations using positional heuristics and regex patterns.
import re
import logging
from typing import List, Tuple, Dict, Optional
import numpy as np
import pyproj
from shapely.geometry import Point, Polygon, box
from shapely.affinity import rotate, translate
from shapely.ops import transform
from pydantic import BaseModel, Field, ValidationError
logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
class BlueprintText(BaseModel):
raw_text: str
x: float
y: float
rotation_deg: float = 0.0
layer: str = ""
font_size: float = 0.0
confidence: float = Field(default=0.0, ge=0.0, le=1.0)
class NormalizedAnnotation(BaseModel):
text: str
geometry: Point
layer: str
font_size: float
confidence: float
# Non-semantic patterns common in architectural drawings
NON_SEMANTIC_PATTERNS = re.compile(
r"^(REV|SCALE|DATE|DRAWN BY|CHECKED|SHEET|DWG|NORTH|SCALE\s+\d+:\d+|\d{1,3}[-/]\d{1,3}[-/]\d{2,4}|[A-Z]{2,4}-\d{3,5})$",
re.IGNORECASE
)
def normalize_units_to_meters(
raw_coords: List[Tuple[float, float]],
drawing_units: str = "mm",
scale_factor: Optional[float] = None
) -> List[Tuple[float, float]]:
"""Convert drawing coordinates to meters using standard architectural scale factors."""
unit_multipliers = {"mm": 0.001, "in": 0.0254, "ft": 0.3048, "m": 1.0, "arch": 0.0254}
multiplier = unit_multipliers.get(drawing_units, 0.001)
if scale_factor is not None:
multiplier *= scale_factor
return [(x * multiplier, y * multiplier) for x, y in raw_coords]
def extract_and_filter_text(
raw_texts: List[BlueprintText],
drawing_units: str = "mm"
) -> List[NormalizedAnnotation]:
"""Normalize coordinates, filter non-semantic text, and return structured annotations."""
normalized = []
for txt in raw_texts:
if NON_SEMANTIC_PATTERNS.match(txt.raw_text.strip()):
continue
# Convert coordinates
x_m, y_m = normalize_units_to_meters([(txt.x, txt.y)], drawing_units)[0]
# Apply rotation correction if text is skewed
geom = Point(x_m, y_m)
if txt.rotation_deg != 0:
geom = rotate(geom, -txt.rotation_deg, origin=(x_m, y_m))
# Initial confidence based on font size consistency and layer classification
base_conf = min(1.0, txt.font_size / 12.0) if txt.font_size > 0 else 0.3
if txt.layer.lower() in ("text", "annotations", "labels", "room_names"):
base_conf = min(1.0, base_conf + 0.3)
normalized.append(NormalizedAnnotation(
text=txt.raw_text.strip(),
geometry=geom,
layer=txt.layer,
font_size=txt.font_size,
confidence=base_conf
))
return normalized
Coordinate normalization must be applied before any spatial indexing. The pyproj library should be used when transforming between local CAD origins and real-world CRS coordinates, particularly when integrating with GIS platforms. See the pyproj documentation for authoritative guidance on CRS transformations and datum shifts.
Phase 2: Spatial Indexing & Label Association
A naive point-in-polygon test fails when labels sit outside room boundaries, when multiple rooms share open-plan spaces, or when drafting standards place text in corridors. The association engine must use buffered spatial queries, proximity scoring, and fallback heuristics.
from rtree import index
from shapely.geometry import Polygon, box
from typing import Dict, List, Tuple
class SpatialLabelResolver:
def __init__(self, room_polygons: Dict[str, Polygon]):
self.polygons = room_polygons
self.idx = index.Index()
# rtree requires integer IDs, so keep a position -> room_id map alongside it.
self._id_lookup: Dict[int, str] = {}
for i, (room_id, poly) in enumerate(room_polygons.items()):
self.idx.insert(i, poly.bounds)
self._id_lookup[i] = room_id
def associate_labels(
self,
annotations: List[NormalizedAnnotation],
buffer_m: float = 0.5
) -> Dict[str, List[NormalizedAnnotation]]:
"""Map annotations to rooms using buffered spatial joins and confidence scoring."""
room_assignments: Dict[str, List[NormalizedAnnotation]] = {rid: [] for rid in self.polygons}
for ann in annotations:
# Expand search area with buffer
search_box = box(
ann.geometry.x - buffer_m,
ann.geometry.y - buffer_m,
ann.geometry.x + buffer_m,
ann.geometry.y + buffer_m
)
candidates = []
for rtree_id in self.idx.intersection(search_box.bounds):
room_id = self._id_lookup[rtree_id]
poly = self.polygons[room_id]
if poly.is_valid and ann.geometry.distance(poly) <= buffer_m:
dist = ann.geometry.distance(poly)
# Score: closer + higher confidence = better match
score = ann.confidence * (1.0 / (1.0 + dist))
candidates.append((room_id, score, ann))
if candidates:
candidates.sort(key=lambda x: x[1], reverse=True)
best_room = candidates[0][0]
room_assignments[best_room].append(ann)
else:
# Fallback: nearest centroid heuristic
nearest = min(
self.polygons.items(),
key=lambda item: ann.geometry.distance(item[1].centroid)
)
room_assignments[nearest[0]].append(ann)
return room_assignments
This resolver handles edge cases where labels fall in circulation zones by applying a configurable buffer and distance-weighted scoring. For facilities teams, the buffer_m parameter should be tuned based on drafting scale (typically 0.3–0.8m for 1:100 or 1:50 plans). When integrating with upstream SVG/DWG Parsing Workflows, ensure that block attributes and text entities are exported with consistent layer naming conventions to improve initial confidence scoring.
Phase 3: Schema Validation & Graph Handoff
Once labels are spatially resolved, the pipeline must enforce a strict output schema. Routing engines and CMDB integrations require deterministic field types, mandatory identifiers, and topology-ready attributes.
from pydantic import BaseModel, field_validator, ValidationError
from typing import Optional, List
class MappedRoomAttribute(BaseModel):
room_id: str
name: str
area_sqm: float
occupancy_type: str
floor_level: int
door_count: int
wall_material: Optional[str] = None
label_confidence: float
geometry: str # WKT or GeoJSON string
@field_validator("area_sqm", "label_confidence")
@classmethod
def validate_positive(cls, v: float) -> float:
if v < 0:
raise ValueError("Must be non-negative")
return v
@field_validator("occupancy_type")
@classmethod
def normalize_occupancy(cls, v: str) -> str:
return v.strip().upper()
def validate_batch(
assignments: Dict[str, List[NormalizedAnnotation]],
polygon_areas: Dict[str, float],
floor_level: int
) -> List[MappedRoomAttribute]:
"""Convert spatial assignments into validated schema records."""
records = []
for room_id, anns in assignments.items():
# Extract highest-confidence label as room name
name = max(anns, key=lambda a: a.confidence).text if anns else f"ROOM_{room_id}"
avg_conf = sum(a.confidence for a in anns) / len(anns) if anns else 0.0
try:
rec = MappedRoomAttribute(
room_id=room_id,
name=name,
area_sqm=polygon_areas.get(room_id, 0.0),
occupancy_type="GENERAL", # Default; override via NLP or lookup table
floor_level=floor_level,
door_count=0, # Populated by [Wall & Door Detection Algorithms](/automated-floor-plan-parsing-vectorization/wall-door-detection-algorithms/)
label_confidence=round(avg_conf, 3),
geometry=polygon_areas.get(f"{room_id}_wkt", "")
)
records.append(rec)
except ValidationError as e:
logging.warning(f"Schema validation failed for {room_id}: {e}")
return records
Validation should run synchronously within the mapping stage to prevent malformed records from entering the routing graph. The occupancy_type and door_count fields are typically enriched by downstream classifiers or detection modules. For authoritative spatial data modeling standards, refer to the OGC IndoorGML specification, which defines interoperable schemas for indoor navigation and facility management.
Production Execution & Async Orchestration
Attribute mapping must scale horizontally across multi-floor campuses and batch-processing queues. The following patterns ensure fault tolerance and deterministic execution:
- Idempotency Keys: Hash input polygon geometries and raw text payloads to generate deterministic job IDs. Re-running the same blueprint yields identical outputs without duplication.
- Chunked Processing: Split floor plans by spatial tiles or functional zones to avoid memory spikes during R-tree construction.
- Stateless Workers: Use Celery, RQ, or AWS Lambda with ephemeral storage. Persist only validated JSON/GeoJSON outputs to object storage or PostGIS.
- Observability: Emit structured metrics for
labels_filtered,spatial_misses,validation_failures, andprocessing_latency. Alert whenvalidation_failures > 5%of batch size.
# Example async worker signature (Celery-compatible)
from celery import Celery
import json
app = Celery("attribute_mapper", broker="redis://localhost:6379/0")
@app.task(bind=True, max_retries=3, default_retry_delay=60)
def process_floor_plan(self, blueprint_id: str, raw_polygons: dict, raw_texts: list):
try:
normalized = extract_and_filter_text(raw_texts)
resolver = SpatialLabelResolver(raw_polygons)
assignments = resolver.associate_labels(normalized)
records = validate_batch(assignments, {k: v.area for k, v in raw_polygons.items()}, floor_level=1)
return {"status": "success", "records": [r.model_dump() for r in records]}
except Exception as exc:
logging.error(f"Mapping failed for {blueprint_id}: {exc}")
raise self.retry(exc=exc)
Troubleshooting & Edge Case Resolution
| Symptom | Root Cause | Resolution |
|---|---|---|
| Labels assigned to wrong rooms | Buffer too small or text placed in corridor | Increase buffer_m to 0.6–1.0, enable centroid fallback, verify layer filtering |
| High validation failure rate | Missing mandatory fields or malformed WKT | Add pre-validation geometry repair (buffer(0)), enforce schema defaults, log raw payloads |
| Coordinate drift across floors | Inconsistent drawing origins or missing CRS | Apply global affine registration, use control points, enforce pyproj transformations |
| Duplicate room names | Multiple labels per room with equal confidence | Implement NLP deduplication, prioritize largest font size or closest-to-centroid label |
| Slow spatial joins | Unindexed polygons or overlapping boundaries | Pre-merge overlapping rooms, use rtree bulk loading (index.Index(interleaved=True)), validate is_valid |
Facilities Tech Checklist:
- Verify CAD export settings preserve text baseline coordinates and rotation.
- Ensure all layers containing room names are explicitly whitelisted in the extraction config.
- Run topology validation post-mapping to confirm door-to-room connectivity aligns with physical access control logs.
GIS Dev Checklist:
- Project all geometries to EPSG:3857 or local UTM zone before spatial joins.
- Use
shapely’smake_valid()andbuffer(0)to repair self-intersecting polygons from legacy DWG exports. - Index final outputs in PostGIS with
GISTindexes ongeometryandroom_idfor sub-50ms routing queries.
Attribute mapping transforms static drafting artifacts into operational spatial intelligence. By enforcing strict coordinate normalization, confidence-weighted spatial association, and schema validation, engineering teams can reliably feed routing engines, CMDBs, and real-time wayfinding systems with deterministic, production-grade indoor data.