欢迎访问《汽车安全与节能学报》,

JASE ›› 2018, Vol. 9 ›› Issue (4): 433-440.DOI: 10.3969/j.issn.1674-8484.2018.04.010

• 汽车节能与环保 • 上一篇    下一篇

用轻量化卷积神经网络图像语义分割的交通场景理解

白 傑,郝培涵,陈思汉   

  1. (同济大学 汽车学院,上海201804,中国)
  • 收稿日期:2018-05-19 出版日期:2018-12-31 发布日期:2019-01-02
  • 作者简介:第一作者 / First author : 白傑(1968—),男( 汉),江西,教授。E-mail: baijie@tongji.edu.cn。 第二作者 / Second author : 郝培涵(1993—),男( 汉),河南,硕士研究生。E-mail: hopehana@tongji.edu.cn。
  • 基金资助:

    国家重点研发计划 (2016YFB0101101)。

Traffic scene understanding using image semantic segmentation with an improved lightweight convolutional-neural-network

BAI Jie, HAO Peihan, CHEN Sihan   

  1. (School of Automotive Studies, Tongji University, Shanghai 201804, China)
  • Received:2018-05-19 Online:2018-12-31 Published:2019-01-02

摘要:

       为提高汽车自动驾驶系统中视觉感知模块的鲁棒性,提出了使用图像语义分割方法进行交通场景理解。采用基于深度学习的语义分割方法,设计了兼顾运行速度和准确率的轻量化卷积神经网络。在特征提取部分,用轻量化特征提取模型MobileNetV2 结构,用可变形卷积代替步长为2 的卷积层;在特征解码部分,缩减卷积核数目、引入多尺度的空洞可变形卷积,补充低层特征细节。用扩充的Pascal VOC 2012 数据集进行预训练和评估,用交通场景数据集Cityscapes 进行测试。结果表明:该网络结构的准确率达到了平均交互比(mean IoU) 69.2%,超过了用MobileNetV2 的DeepLab 语义分割网络,运行速度127 ms/ 帧,占内存1.073 GB,优于使用VGG-16、ResNet-101 的结果。

关键词: 汽车自动驾驶, 场景理解, 视觉感知, 图像语义分割, 轻量化卷积神经网络, 深度学习

Abstract:

A method of traffic scene understanding was proposed using image semantic segmentation method to improve the robustness of a visual perception model in an automotive autonomous driving system. A lightweight convolutional-neural-network was designed adopting semantic segmentation using deep learning
with striking an optimal balance between efficiency and performance. The lightweight model, Mobile Net V2, was adopted in the feature-extraction section, and the convolution layers were replaced using stride = 2 with deformable convolution layers; In feature-decoder section, multi-scale Atrous deformable convolution module was designed and low-level features were also used to add more detail information. Augmented PASCAL VOC 2012 dataset was used to pre-train and evaluate the network and the traffic scene dataset, Cityscapes, was used to fine-tune and test. The results show that the new network achieves an accuracy of mean IoU (intersection over union) of 69.2%, and has better performances than that from DeepLab semantic segmentation networks with MobileNetV2. The new network takes only 127 ms per frame and 1.073 GB memory and is more efficient than that by the networks with VGG-16 and ResNet-101.

Key words: automotive autonomous driving, scene understanding, visual perception, image semantic segmentation, lightweight convolutional neural network, deep learning