Application of machine learning to the analysis of pipeline incidents in Canada
Marcelo Guarido, Daniel O. Trad, Kristopher A. Innanen
Analyzing pipelines incidents in Canada is important to understand their impact on the environment and workers' safety. Data provided by the Government of Canada is confusing and incomplete but contains useful information that can be analyzed and modeled to mitigate future incidents. Most of the reports come from the province of Alberta, which contains most of the pipelines in Canada, and are mainly related to 4 companies. We could notice a correlation between the number of incidents per year with the price of the WTI crude oil price, as well as weekend effects. No seasonality is observed in the data, but we noticed some outliers - months with a larger number of reports than the average - and they are related to a single company. Clustering for dimensionality reduction and cluster analysis, applied on pipeline and maintenance information, showed 4 main clusters, each associated with different insights, such as the average volume of substance released, how long took for the occurrences discovering, and emergency level.