Real-time traffic congestion prediction using big data and machine learning techniques

Publication Date


Document Type


Publication Title

World Journal of Engineering




Purpose: The study aims to propose an intelligent real-time traffic model to address the traffic congestion problem. The proposed model assists the urban population in their everyday lives by assessing the probability of road accidents and accurate traffic information prediction. It also helps in reducing overall carbon dioxide emissions in the environment and assists the urban population in their everyday lives by increasing overall transportation quality. Design/methodology/approach: This study offered a real-time traffic model based on the analysis of numerous sensor data. Real-time traffic prediction systems can identify and visualize current traffic conditions on a particular lane. The proposed model incorporated data from road sensors as well as a variety of other sources. It is difficult to capture and process large amounts of sensor data in real time. Sensor data is consumed by streaming analytics platforms that use big data technologies, which is then processed using a range of deep learning and machine learning techniques. Findings: The study provided in this paper would fill a gap in the data analytics sector by delivering a more accurate and trustworthy model that uses internet of things sensor data and other data sources. This method can also assist organizations such as transit agencies and public safety departments in making strategic decisions by incorporating it into their platforms. Research limitations/implications: The model has a big flaw in that it makes predictions for the period following January 2020 that are not particularly accurate. This, however, is not a flaw in the model; rather, it is a flaw in Covid-19, the global epidemic. The global pandemic has impacted the traffic scenario, resulting in erratic data for the period after February 2020. However, once the circumstance returns to normal, the authors are confident in their model’s ability to produce accurate forecasts. Practical implications: To help users choose when to go, this study intended to pinpoint the causes of traffic congestion on the highways in the Bay Area as well as forecast real-time traffic speeds. To determine the best attributes that influence traffic speed in this study, the authors obtained data from the Caltrans performance measurement system (PeMS), reviewed it and used multiple models. The authors developed a model that can forecast traffic speed while accounting for outside variables like weather and incident data, with decent accuracy and generalizability. To assist users in determining traffic congestion at a certain location on a specific day, the forecast method uses a graphical user interface. This user interface has been designed to be readily expanded in the future as the project’s scope and usefulness increase. The authors’ Web-based traffic speed prediction platform is useful for both municipal planners and individual travellers. The authors were able to get excellent results by using five years of data (2015–2019) to train the models and forecast outcomes for 2020 data. The authors’ algorithm produced highly accurate predictions when tested using data from January 2020. The benefits of this model include accurate traffic speed forecasts for California’s four main freeways (Freeway 101, I-680, 880 and 280) for a specific place on a certain date. The scalable model performs better than the vast majority of earlier models created by other scholars in the field. The government would benefit from better planning and execution of new transportation projects if this programme were to be extended across the entire state of California. This initiative could be expanded to include the full state of California, assisting the government in better planning and implementing new transportation projects. Social implications: To estimate traffic congestion, the proposed model takes into account a variety of data sources, including weather and incident data. According to traffic congestion statistics, “bottlenecks” account for 40% of traffic congestion, “traffic incidents” account for 25% and “work zones” account for 10% (Traffic Congestion Statistics). As a result, incident data must be considered for analysis. The study uses traffic, weather and event data from the previous five years to estimate traffic congestion in any given area. As a result, the results predicted by the proposed model would be more accurate, and commuters who need to schedule ahead of time for work would benefit greatly. Originality/value: The proposed work allows the user to choose the optimum time and mode of transportation for them. The underlying idea behind this model is that if a car spends more time on the road, it will cause traffic congestion. The proposed system encourages users to arrive at their location in a short period of time. Congestion is an indicator that public transportation needs to be expanded. The optimum route is compared to other kinds of public transit using this methodology (Greenfield, 2014). If the commute time is comparable to that of private car transportation during peak hours, consumers should take public transportation.


Big data, Machine learning, Real-time traffic congestion, Smart city, Smart traffic


Computer Engineering