Dr. Gongbo "Tony" Liang - MTSL-RoadRisk

MTSL-RoadRisk

Unveiling Roadway Hazards: Enhancing Fatal Crash Risk Estimation through Multi-Scale Satellite Imagery and Self-Supervised Cross-Matching

Gongbo Liang1, Janet Zulu1, Xin Xing2, Nathan Jacobs3

1Texas A&M University-San Antonio, 2University of Kentucky, 3Washington University in St. Louis

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (J-STARS), 2023

Abstract

Traffic accidents threaten human lives and impose substantial financial burdens annually. Accurate estimation of fatal crash risk is crucial for enhancing road safety and saving lives. This paper proposes an innovative approach that utilizes multi-scale satellite imagery and self-supervised learning for fatal crash risk estimation. By integrating multi-scale imagery, our network captures diverse features at different scales, encompassing observations of surrounding environmental factors in low-resolution images that cover larger areas and learning detailed ground-level information from high-resolution images. One advantage of our work is its sole reliance on satellite imagery data, making it an efficient and practical solution, especially when other data modalities are unavailable. With the ability to accurately estimate fatal crash risk, our method exhibits a potential for enhancing road safety, optimizing infrastructure planning, preventing accidents, and ultimately saving lives.

Multi-Scale Imagery & Self-Supervision

We introduce a novel self-supervised learning strategy called Multi-Scale Cross-Matching (MSCM). The model learns powerful feature representations by matching different resolution views of the same geographic location without requiring manual labels. This approach allows the network to learn both fine-grained details (e.g., road markings) from high-resolution imagery and broader environmental context (e.g., complex intersections, surrounding land use) from low-resolution imagery, leading to a more robust understanding of roadway hazards.

Multi-Head Fatal Crash Risk Learning

For the final risk estimation, we use a Multi-Scale Multi-Head CNN. After the pre-trained encoder extracts feature maps from each image scale, these maps are fed into a series of prediction heads. A main head uses the concatenated features from all scales for a holistic prediction, while auxiliary heads specialize on features from individual scales. This joint optimization allows the model to leverage both shared and scale-specific information, leading to more accurate and robust risk predictions.

Model Interpretation

To understand what our model learns, we use saliency maps to visualize the features that most significantly influence its predictions. The highlighted areas (in blue) show that the model focuses on relevant environmental features, such as complex intersections, large parking lots with truck traffic, and road geometry, to make its risk estimations. This interpretability confirms that our model makes reasonable and well-founded decisions.

Dataset--A Large-Scale, Multi-Scale Imagery Dataset

To train and evaluate our model, we collected a comprehensive dataset covering approximately 20,000 km² across four major metropolitan areas in Texas. The dataset contains over 240,000 satellite images from 80,276 distinct locations, each provided at three different resolutions

Zoom Levels: Level 17 (1.1943 m/pixel), Level 18 (0.5972 m/pixel), and Level 19 (0.2986 m/pixel).

16,451 positive locations (with historical fatal crashes).

63,825 negative locations (no recorded fatal crashes).

Covers a population of approximately 20 million people.

Please use THIS LINK to request access to out dataset.