Building a Coverage Taxonomy of Weirdness: Families of Edge Cases for Autonomous Vehicles

Robust autonomy requires more than isolated strange examples: it needs deliberate coverage across coherent families of “weirdness.” This article defines a practical taxonomy you can use to plan synthetic-data generation, test-case selection, and validation so simulations probe the right margins of real-world risk.

Why a taxonomy matters

Edge cases are heterogeneous; two pedestrian near-misses can look nothing alike. Grouping rare events into families makes it possible to (1) ensure breadth when sampling or synthesizing scenes, (2) prioritize effort by risk and frequency, and (3) design validators that check realism and transferability to real sensors and behaviors.

Core families of weirdness

1) Environmental and weather extremes — heavy rain, snow, fog, wet glare, sun low-angle, localized wind gusts affecting movable objects. These change sensor noise, object appearance, and road friction.

2) Visibility & occlusion patterns — transient occluders (parked delivery trucks, valve trucks), static occlusion geometry (curbside foliage, tight intersections), sensor-specific blind spots, and partial-observation trajectories (children emerging from between cars).

3) Intent ambiguity and social behavior edge cases — hesitating pedestrians, jaywalking with attention diverted, cyclists weaving unpredictably, informal human signaling (hand waves, nods), and cultural driving norms (e.g., gap-taking at informal merges).

4) Road geometry and infrastructure anomalies — unusual lane splits, temporary lane closures, poorly marked intersections, atypical roundabouts, mismatched signage, faded or contradictory road paint, and nonstandard curb cuts.

5) Dynamic actor failures and degraded actors — drivers with impaired control, stalled vehicles in travel lanes, trailers with sway, or actors making mechanical errors (tire blowouts, flashing hazard lights incorrectly).

6) Sensor and perception failure modes — camera saturation, LIDAR multi-path or dropouts, radar ghosting, calibration drift, time-synchronization hiccups, and spoofing-like artifacts that mimic rare but plausible sensor outputs.

7) Rare multi-hazard combinations — interacting stresses such as low visibility + complex geometry + ambiguous actor intent; these are the high-payoff families because failures often emerge from interactions.

8) Edge-case operational-design-domain (ODD) boundary events — situations near or outside declared ODD (unexpected pedestrian-only events on a vehicle-only route, temporary mixed-use streets, emergency vehicle closures) that require graceful fallback.

Prioritization framework

Score each family against three axes and combine into a priority score:

Safety impact — potential for harm if handled poorly (high/medium/low).
Occurrence likelihood (in-deployment) — estimated frequency in target operating areas.
Transfer gap — how different synthetic/simulated representations are from real-world sensor failures or behaviors (large gap = needs better validation).

Example: multi-hazard combinations often rank high on safety impact and transfer gap despite low individual occurrence; elevate them in test plans.

How to operationalize the taxonomy

1) Map data and logs to families. Tag recorded incidents and near-misses by family; identify under-sampled cells (e.g., low-light occlusions combined with cyclists).

2) Design synthetic generators per family. For weather, model sensor-domain effects (spray on camera lens, LIDAR attenuation). For intent ambiguity, create behavior policies with stochastic hesitation and unexpected crossings.

3) Build family-specific realism validators. Validate that generated sensor outputs exhibit the same statistical signatures as real failure modes (noise spectra, dropout rates, detection-confidence distributions) and that actor kinematics obey physical constraints.

4) Create prioritized test suites. Combine single-family tests with cross-family permutations (sampled combinatorially or via importance sampling) and run both closed-loop simulation and batch perception-only evaluations.

5) Measure transfer to real-world performance. Track metrics such as false-positive/negative shifts, planner intervention rate, and disengagements per family; prefer generators whose simulated improvements reduce real-world incident rates.

Practical tips

– Start with families that are high impact but underrepresented in logs (e.g., occlusion+low light).

– Model sensor realism, not just visual appearance: corrupt depth, add sensor-specific artifacts, and preserve timing jitter.

– Use staged validation: perception-only, closed-loop simulated driving, and limited real-world shadow-mode trials before deployment changes.

– Keep the taxonomy living: regularly update families and subcategories using newly collected incident data.

A coverage taxonomy of weirdness turns unstructured rarity into actionable testing plans. By grouping edge cases into families, teams can prioritize synthetic-data efforts, design validators tuned to real failure modes, and—most importantly—ensure that improvements in simulation yield measurable safety gains on real roads.

Sources

Why Autonomous Vehicles Are Not Ready Yet: A Multi-Disciplinary Survey (arXiv; 2023-11-xx)