Multi-Scale Imagery & Self-Supervision
We introduce a novel self-supervised learning strategy called Multi-Scale Cross-Matching (MSCM). The model learns powerful feature representations by matching different resolution views of the same geographic location without requiring manual labels. This approach allows the network to learn both fine-grained details (e.g., road markings) from high-resolution imagery and broader environmental context (e.g., complex intersections, surrounding land use) from low-resolution imagery, leading to a more robust understanding of roadway hazards.
Multi-Head Fatal Crash Risk Learning
For the final risk estimation, we use a Multi-Scale Multi-Head CNN. After the pre-trained encoder extracts feature maps from each image scale, these maps are fed into a series of prediction heads. A main head uses the concatenated features from all scales for a holistic prediction, while auxiliary heads specialize on features from individual scales. This joint optimization allows the model to leverage both shared and scale-specific information, leading to more accurate and robust risk predictions.