2019 CSCE Annual Conference - Laval (Greater Montreal) Conference
Dr. Usman Khan, York University
Mrs. Rahma Khalid, York University
Flood risk and vulnerability in cities are expected to increase in coming years due to climate change and rapid urbanization. Flood damage can be mitigated using early warning systems, which provide emergency services with an advanced notice of flood likelihood. In recent years, machine learning has been demonstrated as a suitable approach for modelling complex hydrological processes, including flood forecasting. While the use of machine learning models has been widely investigated in recent years, relatively little attention has been given to model Input Variable Selection (IVS) for these models. IVS is overwhelmingly reliant on expert knowledge and ad-hoc approaches, rather than objective approaches to obtain the best performing model.
This research uses Artificial Neural Networks (ANNs), a common machine learning model in hydrology, to generate flow forecasts of 1, 2, and 3 days ahead, for the Bow River in Calgary AB. The model considers a set of candidate input variables including mean, maximum, and minimum upstream flow, downstream flow, and temperature, and daily precipitation. Three lag times are included for each unique candidate input.
IVS methods are used to reduce the input candidate set to an optimal subset. This research compares two well established IVS methods, Partial Correlation (PC) and Partial Mutual Information (PMI), and two relatively uncommon methods, Combined Neural Pathway Strength (CNPS) and Input Saliency (IS). These methods are compared to each other based the performance of the ANNs they inform.
Preliminary results indicate that all four IVS methods are adequate in reducing the number of inputs from 30 to 6 without significant losses in model performance; the PC informed models exhibit the worst performance and the PMI informed models, the most favourable. Next, there is notable variance amongst the number of inputs determined by each selection criterion. Collectively, this indicates that while distinct IVS methods may be capable of ranking input variable importance, the selection criterion is more important.
This research demonstrates the nuances between different IVS methods and underlines the necessity of reliable and systematic input reduction in the development of ANN models. Additionally, validating model-based IVS methods (CNPS and IS) may help to directly attribute physical meaning to ANNs, as the absence of a physical basis is a common criticism of machine learning. Upcoming research topics include assessing the sensitivity of each termination criterion and cross-validating these findings with other hydrological systems or synthetic datasets.