Vision-Based Autonomous Driving for Smart City: A Case for End-to-End Learning Utilizing Temporal Information

Publication Date


Document Type

Conference Proceeding

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)


12608 LNCS



First Page


Last Page



End-to-End learning models trained with conditional imitation learning (CIL) have demonstrated their capabilities in autonomous driving. In this work, we explore the use of temporal information with a recurrent network to improve driving performance especially in dynamic environments. We propose the TCIL (Temporal CIL) model that combines an efficient, pre-trained, deep convolutional neural network to better capture image features with a long short-term memory network to better explore temporal information. Experimental results in the CARLA benchmark indicate that the proposed model achieves performance gain in most tasks. Comparing with other CIL-based models in the most challenging task, navigation in dynamic environments, it achieves a 96% success rate while other CIL-based models had 82–92% in training conditions; it is also competitively by achieving 88% while other CIL-based models were at 42–90% in the new town and new weather conditions. We believe that this work contributes significantly towards safe, efficient, clean autonomous driving for future smart cities.


Autonomous driving, Conditional Imitation Learning (CIL), Convolutional Neural Network (CNN), End-to-End learning, Long Short-Term Memory (LSTM)


Computer Science