Data Science

Data Science is a growing field with different tasks and applications. Everyday more and more people are changing their career course and moving to this relatively new and exciting area. Here at the CREWES Data Science Initiative we are engaged on research and dissemination of what is new in the data science world.

With the CREWES Data Science Learning Labs, we focus on the learning steps to become a data scientist and how you can bring business value to your organization. The labs will focus on how a data science project is conducted, from data reading, through data cleaning and pre-processing, visualization, data transformation, machine learning modeling, and finishing with app development/deployment. Join us for bi-weekly webinars beginning July 2, 2020 to get access to codes and "cookbooks."

Lab 0: July 2, 2020, Noon (MST): Introduction to R and Shiny

Marcelo Guarido

In our first lab we will set out our goals, define a learning path, and introduce both the R programming language and the building of apps with the Shiny library.
Data Science Lab 0 (video)

Lab 1: July 16, 2020, Noon (MST): WTI crude oil price forecasting with the Facebook Prophet algorithm

Marcelo Guarido

In this lab, we will present a workflow in R to predict the WTI crude oil price that includes an automated API request from the Quandl database, as well as the univariate forecast algorithm Facebook Prophet. We will end the session with a demonstration of an app built in Shiny.

Data Science Lab 1 (video)
Data Science Lab 1 (zip)

Lab 2: July 30, 2020, Noon (MST): Fundamentals of R, Flexdashboard, and Shiny

Marcelo Guarido

Next on the CREWES Data Science Initiative online series of learning labs will expose you to the fundamentals of the Flexdashboard and Shiny libraries. We will start a new RMarkdown from scratch and show you how to create a functional application with HTML functionalities.

Data Science Lab 2 (video)
Data Science Lab 2 (zip)

Lab 3: August 13, 2020, Noon (MST): Introduction to HTML, CSS, and Chrome DevTools for Shiny Apps Layouts

Marcelo Guarido

For this lab, we will continue from where we stopped in Lab 2: Fundamentals of R, Flexdashboard, and Shiny for Data Science, when we built a Shiny App from scratch but without modifying its layout. Now, the next step to create a product to increase the business value of your organization is to edit the app's layout to something that has the "face" of your research group, company, or organization. This requires mild abilities in HTML, CSS, and a little help from the Chrome DevTools (this last one is not mandatory, but it is quite powerful). We are going to show you how to change and edit the app's fonts, colours, and behaviour by combining the tools cited before. By the end of the session, you will be able to easily read a Flexdashboard code, interpret all the CSS and HTML layouts, and to create your own app!!!

Data Science Lab 3 (video)
Data Science Lab 3 (zip)

Lab 4: August 27, 2020, Noon (MST): Natural Language Processing and Machine Learning to Classify Severe Injuries in the Oil and Gas Industry

Marcelo Guarido

For Learning Lab 4, we are going to use Natural Language Processing (NLP) methods, combined with machine learning algorithms, to classify severe injuries for the Oil and Gas industry from the accident report in the US. We are going to introduce you to some neat packages in R to process and prepare text data and, as a bonus, we are going to show how to use Python inside R with the library Reticulate!

Data Science Lab 4 (video)
Data Science Lab 4 (zip)

Lab 5: September 10, 2020, Noon (MST): Using Machine Learning for Lithology Classification from Wireline Logs

Marcelo Guarido

Facies classification is a common practice in the Oil and Gas industry, where rock types are interpreted as correlations between the wireline logs and core analysis logs. However, it is can be a long process and each interpreter has a different approach for the classification. The goal of an automated machine learning facies classification is to help the interpreters in their conclusions (not a replacement). We will go through the whole data science process for the facies classification: data cleaning, data analysis, data imputation, feature engineering, modeling, and interpretation. All in R.

Data Science Lab 5 (video)
Data Science Lab 5 (zip)

Lab 6: September 24, 2020, Noon (MST): Salt Identification in Seismic Sessions using Tensorflow for Deep Learning Solutions

Marcelo Guarido

For this lab, we will be presenting a deep learning solution for the TGS Salt Identification Challenge from the Kaggle website. We are going to demonstrate how to build an image segmentation model in Tensorflow 2 with the goal to classify each pixel in the seismic section as salt or no salt. For this session, we will be using the Google Colab system to run our notebook.

Data Science Lab 6 (video)
Data Science Lab 6 (zip)

Lab 7: October 8, 2020, Noon (MST): Unsupervised seismic facies classification using Python

Brian Russell

For this lab, we will be presenting clustering solutions for seismic facies classification.

Data Science Lab 7 (video)
Data Science Lab 7 (zip)

Lab 8: November 5, 2020, Noon (MST): Time Series Forecasting with SARIMA -Application to COVID-19 Pandemic Data

Marcelo Guarido

This lab will be a technical presentation and demonstration of the use of the R library Modeltime for time series forecasting, and all the theory behind the seasonal ARIMA (or SARIMA) model. We are living through the historic moment of the COVID-19 pandemic, so it actually makes sense for us to use our analytical skills to understand better the evolution of the pandemic and how to forecast it. For that, we will use the data from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.

Data Science Lab 8 (video)
Data Science Lab 8 (zip)

Lab 9: November 18, 2020, Noon (MST): Impact analysis in R - the effects of the COVID-19 pandemic to the oil industry

Marcelo Guarido

We will use the work we have done on forecasting and we will be analyzing the impact of the COVID-19 pandemic on the Oil & Gas industry in the US. We will work with the oil production and price data from before and during the pandemic period, and we will perform an impact analysis. In this lab, we will continue using R and the library Modeltime for time series forecasting and will see the applications of different algorithms, such as the ARIMA, Facebook Prophet, and XGBoost.

Data Science Lab 9 (video)
Data Science Lab 9 (zip)

Lab 10: February 4, 2021, 4pm (MST): An Overview of Machine Learning Applications on the Energy Sector

Marcelo Guarido and Daniel Trad

We present a compilation of papers with examples of machine learning applications in the renewable and non-renewable energy industries.

Data Science Lab 10 (video)
Data Science Lab 10 (pdf)

Lab 11: February 18, 2021, 4pm (MST): Using Hybrid Machine Learning Models

Marcelo Guarido, Daniel Trad, and David Emery

For this lab, we will go through the work from Khan et al (2020), that used a hybrid model to forecast the energy consumption of renewable and non-renewable power sources, and we will "reproduce" part of their methodology (the modeling part). We will show different ways to combine trained machine learning models to create a more powerful and more robust model using Python libraries such as Scikit-Learn and Mlxtend.

Data Science Lab 11 (video)
Data Science Lab 11 (zip)

Lab 12: March 4, 2021, 4pm (MST): Clustering Models Applied to the Energy Sector - Part 1

Marcelo Guarido, Daniel Trad, and David Emery

This lab is the first part of the "Clustering Models" series, and was inspired by the work of Smith K. J. (2017), which shows a clustering application to the energy industry by creating a seismic velocity auto-picking on a semblance pannel. Clustering models are widely used on different applications, and they have the goal to group your data into similarity groups. There are a large selection of models, each one with its particularity, and in this lab we will understand how to implement clustering work flows. During the lab, we will present the definitions of different clustering models, and show how to select and implement them in Python using the package Scikit-Learn.

Data Science Lab 12 (video)
Data Science Lab 12 (zip)

Lab 13: March 18, 2021, 4pm (MST): Clustering Models Applied to the Energy Sector - Part 2

Marcelo Guarido, Daniel Trad, and David Emery

In this lab, we will continue the demonstration using different and more complex models. We will keep using models and toy datasets from the Scikit-Learn package. During the lab, we will go through a high-level explanation for each of the models, following by a hands-on application in Python.

Data Science Lab 13 (video)
Data Science Lab 13 (zip)

Lab 14: April 1, 2021, 4pm (MST): Clustering Models Applied to the Energy Sector - Part 3

Ninoska Amundaray, Marcelo Guarido, Daniel Trad, and David Emery

Ninoska will show us a demonstration on how to use clustering algorithms to automate semblance velocity analysis. The project starts understanding the data, data pre-processing, and clustering modeling. The demonstration is a combination of a Powerpoint presentation and Python coding in the Jupyter Notebook environment.

Data Science Lab 14 (video)
Data Science Lab 14 (zip)