Impact of Data Quality and Target Representation on Predictions for Urban Bus Networks

Authors: Reich, T., Budka, M. and Hulbert, D.

Journal: 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020

Pages: 2843-2852

ISBN: 9781728125473

DOI: 10.1109/SSCI47803.2020.9308166

Abstract:

Passengers of urban bus networks often rely on forecasts of Estimated Times of Arrival (ETA) and live-vehicle movements to plan their journeys. ETA predictions are unreliable due to the lack of good quality historical data, while 'live' positions in mobile apps suffer from delays in data transmission. This study uses deep neural networks to predict the next position of a bus under various vehicle-location data-quality regimes. Additionally, we assess the effect of the target representation in the prediction problem by encoding it either as unconstrained geographical coordinates, progress along known trajectory or ETA at the next two stops. We demonstrate that without data cleaning, model predictions give false confidence if mean errors are used, highlighting the importance of a holistic assessment of the results. We show that target representation affects the prediction accuracy, by constraining the prediction space. The literature is vague about quality issues in public transport data. Here we show that noisy data is a problem and discuss simple but effective approaches to address these issues. Research generally only focuses on a single method of target representation. Therefore, comparing several methods is a useful addition to the literature. This gives insight into the value of addressing data quality issues in urban transport data to enable better predictions and improve the passenger experience. We show that 'rephrasing' the prediction problem by changing the target representation can yield massively improved predictions. Our findings enable researchers using deep learning approaches in public transport to make more informed decisions about essential data cleaning steps and problem representation for improved results.

https://eprints.bournemouth.ac.uk/36200/

Source: Scopus

Impact of Data Quality and Target Representation on Predictions for Urban Bus Networks

Authors: Reich, T., Budka, M. and Hulbert, D.

Journal: 2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI)

Pages: 2843-2852

DOI: 10.1109/ssci47803.2020.9308166

https://eprints.bournemouth.ac.uk/36200/

Source: Web of Science (Lite)

Impact of Data Quality and Target Representation on Predictions for Urban Bus Networks

Authors: Reich, T., Budka, M. and Hulbert, D.

Conference: 2020 IEEE Symposium Series on Computational Intelligence (SSCI)

Dates: 1-4 December 2020

Journal: 2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020

Pages: 2843-2852

ISBN: 9781728125473

DOI: 10.1109/SSCI47803.2020.9308166

Abstract:

Passengers of urban bus networks often rely on forecasts of Estimated Times of Arrival (ETA) and live-vehicle movements to plan their journeys. ETA predictions are unreliable due to the lack of good quality historical data, while 'live' positions in mobile apps suffer from delays in data transmission. This study uses deep neural networks to predict the next position of a bus under various vehicle-location data-quality regimes. Additionally, we assess the effect of the target representation in the prediction problem by encoding it either as unconstrained geographical coordinates, progress along known trajectory or ETA at the next two stops. We demonstrate that without data cleaning, model predictions give false confidence if mean errors are used, highlighting the importance of a holistic assessment of the results. We show that target representation affects the prediction accuracy, by constraining the prediction space. The literature is vague about quality issues in public transport data. Here we show that noisy data is a problem and discuss simple but effective approaches to address these issues. Research generally only focuses on a single method of target representation. Therefore, comparing several methods is a useful addition to the literature. This gives insight into the value of addressing data quality issues in urban transport data to enable better predictions and improve the passenger experience. We show that 'rephrasing' the prediction problem by changing the target representation can yield massively improved predictions. Our findings enable researchers using deep learning approaches in public transport to make more informed decisions about essential data cleaning steps and problem representation for improved results.

https://eprints.bournemouth.ac.uk/36200/

Source: Manual

Preferred by: Marcin Budka

Impact of Data Quality and Target Representation on Predictions for Urban Bus Networks.

Authors: Reich, T., Budka, M. and Hulbert, D.

Journal: SSCI

Pages: 2843-2852

Publisher: IEEE

ISBN: 978-1-7281-2547-3

https://eprints.bournemouth.ac.uk/36200/

https://doi.org/10.1109/SSCI47803.2020

Source: DBLP

Impact of Data Quality and Target Representation on Predictions for Urban Bus Networks.

Authors: Reich, T., Budka, M. and Hulbert, D.

Conference: IEEE Symposium Series on Computational Intelligence

Pages: 2843-2852

Publisher: IEEE

ISBN: 978-1-7281-2547-3

Abstract:

Passengers of urban bus networks often rely on forecasts of Estimated Times of Arrival (ETA) and live-vehicle movements to plan their journeys. ETA predictions are unreliable due to the lack of good quality historical data, while ‘live’ positions in mobile apps suffer from delays in data transmission. This study uses deep neural networks to predict the next position of a bus under various vehicle-location data-quality regimes. Additionally, we assess the effect of the target representation in the prediction problem by encoding it either as unconstrained geographical coordinates, progress along known trajectory or ETA at the next two stops. We demonstrate that without data cleaning, model predictions give false confidence if mean errors are used, highlighting the importance of a holistic assessment of the results. We show that target representation affects the prediction accuracy, by constraining the prediction space. The literature is vague about quality issues in public transport data. Here we show that noisy data is a problem and discuss simple but effective approaches to address these issues. Research generally only focuses on a single method of target representation.

Therefore, comparing several methods is a useful addition to the literature. This gives insight into the value of addressing data quality issues in urban transport data to enable better predictions and improve the passenger experience. We show that ‘rephrasing’ the prediction problem by changing the target representation can yield massively improved predictions. Our findings enable researchers using deep learning approaches in public transport to make more informed decisions about essential data cleaning steps and problem representation for improved results.

https://eprints.bournemouth.ac.uk/36200/

https://doi.org/10.1109/SSCI47803.2020

Source: BURO EPrints