Publication:
Generating a high-definition map (HD MAP) via yolo (you only look once) deep learning-based object detection model

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

ITU Graduate School

Research Projects

Organizational Units

Journal Issue

Abstract

This thesis presents a comprehensive and practical approach for generating High-Definition (HD) maps using image-based object detection methods, particularly focusing on the application of the YOLOv11 deep learning model to 360-degree panoramic imagery. The research is centered on the Ayazağa Campus of Istanbul Technical University (ITU), where high-resolution visual data was collected using a Ladybug5 panoramic camera system equipped with GNSS capabilities. The overarching goal was to create a reliable, cost-efficient, and scalable HD mapping pipeline suitable for urban mobility applications, autonomous vehicle navigation, and geographic information system (GIS) integration. At the heart of the study lies the integration of computer vision and geospatial analysis. Objects such as cars, pedestrians, traffic signs, and other environmental features were detected in the panoramic images using the YOLOv11 architecture, a cutting-edge object detection algorithm known for its speed and precision. The outputs of the detection process, initially confined to image coordinates, were meticulously transformed into georeferenced spatial data by utilizing the intrinsic and extrinsic parameters of the camera along with synchronized GNSS metadata. This transformation enabled the projection of detected objects onto real-world maps. HD map generation process involved several sequential stages: acquisition and preprocessing of imagery, annotation using the AnyLabeling tool, YOLO format conversion, model training and validation, coordinate transformation, and map creation in OpenStreetMap (OSM) format. The spatial datasets were then visualized and analyzed using QGIS and PyQGIS, confirming the accuracy and usability of the resulting maps for practical deployment. Performance of the YOLOv11 model was rigorously evaluated through both quantitative metrics and qualitative visual analyses. With a mean Average Precision (mAP) of 76% and frame processing speed of 32 FPS, the model exhibited strong detection capabilities, particularly in well-lit outdoor conditions. While high accuracy was achieved for prominent object classes like vehicles and trees, performance declined slightly in detecting pedestrians under challenging lighting conditions—an issue attributed to dataset variability and illumination sensitivity. One of the key contributions of the study is its demonstration that a low-cost HD mapping system can be realized using only camera and GNSS data, without the need for expensive LiDAR equipment. Furthermore, the modular structure of the proposed system makes it adaptable for integration into existing GIS infrastructures and autonomous simulation platforms like SUMO and CARLA. The generation of layered map outputs categorized by object class enhances the flexibility of spatial analyses and opens new avenues for smart mobility planning. Despite its achievements, the study also identified limitations, including depth estimation inaccuracies due to fixed-distance assumptions, reduced detection reliability in dim environments, and minor errors arising from GNSS drift. Addressing these limitations, the thesis proposes future enhancements through multi-sensor fusion (e.g., LiDAR, IMU), real-time map updating, and model generalization across diverse geographical settings. In summary, this research bridges the gap between computer vision and spatial data engineering by presenting an end-to-end workflow for HD map generation that is not only technically robust but also economically feasible. It offers a strong foundation for further development of real-time, vision-based mapping systems, contributing to safer and smarter autonomous transportation ecosystems. Class-based F1 (macro/micro) = 0.82/0.84 at the IoU≥0.50 threshold, mAP@ 0.50 ≈ 82%, and Geo-F1 (macro/micro) = 0.74/0.76 when the geographical consistency condition (planimetric error ε≤0,5 m) is added. While the pedestrian class showed sensitivity losses from shadows and low light, the vehicle and traffic sign classes show high accuracy, especially during the day. The produced layers were visualized in QGIS/PyQGIS and exported in OSM/GeoJSON format. The results show that a camera and GNSS alone can generate an object inventory that is quickly deployable and mappable; however, LiDAR integration and greater data diversity can further minimize limitations resulting from calibration errors, GNSS deviations, and depth assumptions.

Description

Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2025

Subject

deep learning (machine learning), derin öğrenme (makine öğrenimi)

Citation

Endorsement

Review

Supplemented By

Referenced By

Related Goal

14

Views

14

Downloads