欢迎访问《汽车安全与节能学报》,

汽车安全与节能学报 ›› 2025, Vol. 16 ›› Issue (4): 529-538.DOI: 10.3969/j.issn.1674-8484.2025.04.003

• 汽车安全 • 上一篇    下一篇

基于声学频谱-时域信息融合的噪声环境中应急车辆检测

李昊1(), 周浩2,*()   

  1. 1 零束科技有限公司上海 201804, 中国
    2 重庆邮电大学 通信与信息工程学院重庆 400065, 中国
  • 收稿日期:2024-10-30 修回日期:2025-03-13 出版日期:2025-08-30 发布日期:2025-08-27
  • 通讯作者: *周浩,博士研究生。E-mail:choushuhao@outlook.com
  • 作者简介:李昊(1982—),男(汉),上海,高级工程师。E-mail:lihao13@saicmotor.com
  • 基金资助:
    国家重点研发计划专项课题(2024QY2630)

Emergency vehicle detection in noisy environments based on acoustic spectral-temporal information fusion

LI Hao1(), ZHOU Hao2,*()   

  1. 1 Z-one Technology co., Ltd., Shanghai 201804, China
    2 School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 40065, China
  • Received:2024-10-30 Revised:2025-03-13 Online:2025-08-30 Published:2025-08-27

摘要:

为实现汽车在高速行驶过程中的车外应急车辆警笛声检测,提出一种基于频谱—时域特征融合的车载检测方法。对输入声音信号执行快速Fourier变换并计算对数Mel谱图以获得频域特征;采用卷积神经网络在时域中建模声音波形,得到其时域表示。利用坐标注意力网络对频域与时域特征进行融合与增强,并将融合结果输入分类器以实现检测。在公开和实采数据集上进行了实验。结果表明:在LSAD-EVSRN数据集上,受试者工作特征曲线下面积(AUC)得分为98.92%,较单独采用时域特征方法提升14.88%,较单独采用频域特征方法提升2.52%。因而,验证了该融合策略在提升检测性能方面的有效性,尤其在噪声环境下具有高稳定性。

关键词: 汽车安全, 警笛声检测, 应急车辆, 声音事件检测, 特征融合

Abstract:

An in-vehicle detection method was proposed based on the fusion of spectral and temporal features to detect the external emergency vehicle sirens during high-speed driving. The input audio signal was transformed using the fast Fourier transform, and its log-Mel spectrogram was computed to extract spectral features. A convolutional neural network was used to model the raw waveform in the time domain, yielding temporal features. A coordinate attention mechanism was used to fuse and enhance the spectral and the temporal representations. The fused features were subsequently fed into a classifier for final detection. The experiments were conducted on both public and real-recorded datasets. The results show that on the LSAD-EVSRN dataset, the proposed method achieves an AUC (area under the receiver operating characteristic curve) score of 98.92%, with representing an improvement of 14.88% compared to using temporal features alone, and 2.52% compared to using spectral features alone. These results confirm the effectiveness of the fusion strategy, with a high robustness particularly under noisy conditions.

Key words: automotive safety, siren detection, emergency vehicles, sound event detection, feature fusion

中图分类号: