Business Science’s Time Series Course is Incredible

I’m a time series fan. Big fan. My first job out of grad school was for a utility company building econometric time series analysis and forecasting models. Lots of ARIMAs and neural nets. However, that was now over 10 years ago (don’t know how the hell that happened).

This post contains affiliate links that help to offset the cost of running the blog, plus the link gives you a special 15% discount.  If you use the link, thank you!

I’m a time series fan.  Big fan.  My first job out of grad school was for a utility company building econometric time series analysis and forecasting models.  Lots of ARIMAs and neural nets. However, that was now over 10 years ago (don’t know how the hell that happened).

In almost every position I've held in data, a question has come up that involved a time series (not a surprise that business cares about what has happened over time).  Often, I was the only one who had any knowledge of time series on my team.  I'm not sure why it isn't taught as a standard part of most university programs that are training data scientists, but it's just unfortunately not.  I believe that understanding time series analysis is currently a great way to differentiate yourself, since many in the field are just not well versed in it.

I wanted to understand what was current in the world of applying time series analysis to business.  It had been a real long time since I had given the subject some of the love and attention, and I thought taking this Business Science course would be the perfect way to do that.

My History With Business Science Courses:

I’ve previously written about Business Science’s first course, you can check it out here.  I've also taken his first Shiny app course (there’s a more advanced one as well) and went from zero to Shiny app in 2 days using survey data I collected with Kate Strachnyi.  It was a real win.

via GIPHY

The app is still on my site here, just scroll down.  For this little flexdashboard app I went from basically zero Shiny to having something that was useful in 2 days leveraging only the first 25% of the course. The course cannot actually be completed in 2 days. It's also worth noting that the course builds an app with much more functionality than mine. It’s a long course.

Back to the Time Series Review:

It’s broken into three different section:

  • Things I freakin’ love

  • The sexy

  • Everything else

Things I freakin’ love:

You’re learning about packages from the package creator.  Who is going to understand a library better than the person who wrote it?.  Matt built both modeltime and timetk that are used in this course. I find that super impressive.  These packages are also a step up from what was currently out there from a "not needing a million packages to do what I want" perspective.

He uses his own (anonymized) data fromBusiness Science to demonstrate some of the models.  I haven’t seen others do this, and I think it’s cool.  It’s a real, practical dataset of his Google Analytics and Mailchimp email data with an explanation of the fields.  If you don’t have analytics experience in e-commerce and are thinking about taking a role in e-commerce, definitely give some thought to this course.  

I love how in-depth he gets with the subject.  If you follow all that is covered in the course, you should be able to apply time series to your own data. 

The Sexy:

via GIPHY

Ok, so I’m sure some are interested in seeing just how “cutting edge” the course gets. 

Once you're combining deep learning Gluon models and machine learning models using ensembling methods, you might be the coolest kid at work (but I’m not making any promises). Gluon is a package that was created by Amazon in Python. So you’ll leverage both Python and R for Gluon.

Some of the deep learning algorithms you’ll learn how to leverage are:

  • DeepAR

  • DeepVAR

  • N-Beats

  • Deep Factor Estimator

Module 18 of the course is where you'll get into deep learning.  A couple years ago I might have said "deep learning, bah humbug, requires too much computing power and isn't necessary, simpler is better."  As things change and progress (and computers get even more beefy) I'm definitely changing my tune.   Especially as an ensemble N-Beats algorithm beat the ES-RNN's score in the M4 competition.  M competitions are prestigious forecasting challenges, and they've historically been won by statistical algorithms.  (I wouldn't have known this information without this course).  The stuff being taught in this course is very current and the sexy new techniques that are winning the big competitions.

Here's a look at the syllabus for preparing the data and learning about the DeepAR model.  You're doing log transformations, Fourier Series, and when you get to modeling the course even covers how to handle errors. I just love it.  I know I'll be referring back to the course when a time series use case pops up in the future.

The course covers 17 different algorithms. I'm trying to think if I could name 17 algorithms off the top of my head…  it’d take me a minute.   ARIMA is obviously included, because It’s like the linear regression of time series.  You’ll go through ARIMA, TBATS (a fave because you don’t need to worry about stationarity the way you do with ARIMA. I’ve used this one in industry as well). 

Along with these other algos:

  • ARIMA Boost

  • Prophet Boost

  • Cubist

  • KNN

  • MARS

  • Seasonal decomposition models

Then you’ve got your ensemble algos being leveraged for time series:

  • GLMNET

  • Random Forest

  • Neural Net

  • Cubist

  • SVM

Strap in for 8 solid hours of modeling, hyperparameter tuning, visualizing output, cross-validation and stacking!

Everything else:

  • Matt (the owner of Business Science) speaks clearly and is easy to understand.  Occasionally I'll put him on 1.25x speed.

  • His courses in general spend a good amount of time setting the stage for the course.  Once you start coding, you’ll have a great understanding of where you’re going, goals, and context (and your file management will be top notch), but if you’re itching to put your fingers on the keyboard immediately, you’ll need to calm the ants in your pants. It is a thorough start.

  • You have to already feel comfy in R AND the tidyverse. Otherwise you’ll need to get up to speed first and Business Science has a group of courses to help you do that.  You can see what's included here.

Before we finish off this article, one super unique part of the course I enjoyed was where Matt compared the top 4 time series Kaggle competitions and dissected what went into each of the winning models. I found the whole breakdown fascinating, and thought it added wonderful beginning context for the course.

In the 2014 Walmart Challenge, taking into account the “special event” of a shift in holiday sales was what landed 1st place. So you're actually seeing practical use cases for many of the topics taught in the course and this certainly helps with retention of the material.  

Likewise, special events got me good in 2011.  I was modeling and forecasting gas and the actual consumption of gas and number of customers was going through the roof!  Eventually we realized it was that the price of oil had gotten so high that people were converting to gas, but that one tripped me up for a couple months. Thinking about current events is so important in time series analysis and we'll see it time and again.  I've said it before, but Business Science courses are just so practical.

Summary:

If you do take this course, you’ll be prepared to implement time series analysis to time series that you encounter in the real world.  I've always found time series analysis useful at different points in my career, even when the job description did not explicitly call for knowledge of time series. 

As you saw from the prerequisites, you need to already know R for this course.  Luckily, Business Science has created a bundle at a discounted price so that you can both learn R, a whole lot of machine learning, and then dive into time series.  Plus you’ll get an additional 15% off the already discounted price with this link.  If you're already comfortable in R and you're just looking to take the time series course, you can get 15% off of the single course here

Edit:  People have asked for a coupon to buy all 5 courses at once.  That's something I'm able to do!  Learn R, machine learning, beginner and advanced Shiny app development and time series here.

Read More

My Favorite R Programming Course

Note: This article includes affiliate links. Meaning at no cost to you (actually, you get a discount, score!) I will receive a small commission if you purchase the course.

I've been using R since 2004, long before the Tidyverse was introduced. I knew I'd benefit from fully getting up to speed on the newest packages and functionality, and I finally decided to take the plunge and fully update my skills. I wanted a course that was going to cover every nook and cranny in R. My personal experience learning R had been pasting together tutorials and reading documentation as I needed something. I wanted to be introduced to functions I may not need now, but may have a use case for in the future. I wanted everything.

I've known for a while that the Tidyverse was a massive improvement over using base R functionality for manipulating data. However, I also knew my old school skills got the job done. I have now seen the light. There is a better way. It wasn't super painful to make the move (especially since I'm familiar with programming) and the Business Science's "Business Analysis with R" course will take you from 0 to pretty dangerous in 4 weeks.

For the person with no R experience who doesn't want to bang their head on the wall for years trying to get to a "serious R user" level. I highly recommend this Business Science's "Business Analysis with R" course. Don't let the name fool you, the course spends 5 hours of using the parsnip package for machine learning techniques. And more importantly, how to communicate those results to stakeholders.

The course was thorough, clear, and concise.

Course Coverage

General:

The course takes you from the very beginning:

  • Installing R
  • Setting up your work environment
    • full disclosure, I even learned new tips and tricks in that section
  • and then straight into a relevant business analysis using transactional data

This course "holds your hand" on your journey to becoming self-sufficient in R. I couldn't possibly list everything in this article that is covered in the course, that would make no sense. However, the most life changing for me were:

  • regex using stringr
    • Working with strings is a different world in the Tidyverse compared to base R. I can't believe how much more difficult I had been making my life
  • working with date times using lubridate
    • The beginning of my career was solely in econometric time series analysis. I could have used this much earlier.
  • formatting your visualizations
    • This is another area where I have lost significant hours of my life that I'll never get back through the process of learning R. Matt can save you the pain I suffered.

All of the material that I would have wanted was here. All of it.

Modeling & Creating Deliverables:

Again, do not let the title of the course fool you. This course gets HEAVY into machine learning. Spending 5 HOURS in the parsnip library (it's the scikit learn of R).

The course goes through:

  • K-means
  • Regression & GLM
  • tree methods
  • XGBoost
  • Support Vector Machines

And then teaches you how to create deliverables in R-markdown and interactive plots in Shiny. All in business context and always emphasizing how you'll "communicate it to the business". I can't stress enough how meticulous the layout of the course is and how much material is covered. This course is truly special.

How many tutorials or trainings have you had over the years where everything looked all "hunky dory" when you were in class? Then you try to adopt those skills and apply them to personal projects and there are huge gaping holes in what you needed to be successful. I have plenty of experience banging my head on the wall trying to get things to work with R.

Things You'll Love:

  • Repetition of keyboard short-cuts so that I'll actually remember them.
  • Immediately using transactional data to walk through an analysis. You're not only learning R, you're learning the applications and why the functions are relevant, immediately.
  • Reference to the popular R cheatsheets and documentation. You'll leave here understanding how to read the documentation and R cheatsheets - and let's be honest, a good portion of being a strong programmer is effective googling skills. Learning to read the documentation is equivalent to teaching a man to fish.
  • Matt has a nice voice. There, I said it. If you're going to listen to something for HOURS, I feel like this a relevant point to make.

For the beginner:

  • Instruction starts right at the beginning and the instruction is clear.
  • Code to follow along with the lecture is crazy well organized. Business Science obviously prides itself on structure.
  • There is no need to take another R basics course, where you'd repeat learning a bunch of stuff that you've learned before. This one course covers everything you'll need. It. Is. Comprehensive.
  • e-commerce/transactional data is an incredibly common use case. If you're not familiar with how transactional data works or you've never had to join multiple tables, this is fantastic exposure/a great use case for the aspiring data scientist.
  • A slack channel with direct access to Matt (course creator) if you have any questions. I didn't personally use this feature, but as a newbie it's a tremendous value to have direct access to Matt.

I'm honestly jealous that I wasn't able to learn this way the first time, but the Tidyverse didn't even exist back then. Such is life.

The course ends with a k-means example with a deliverable that has been built in R-markdown that is stakeholder ready. This course is literally data science demystified.

In Summary:

Maybe I'm too much of a nerd. But seeing a course this well executed that provides this much value is so worth the investment to me. The speed of the transformation you'll make through taking this course is incredible. If this was available when I first started learning R I would have saved myself so much frustration. Matt Dancho (owner of Business Science) was kind enough to give me a link so that you can receive 15% off of the course. Link

The 15% off is an even better deal if you buy the bundle, but to be honest I haven't taken the 2nd course yet. I certainly will! And I'll definitely write a review afterwards to let you know my thoughts. Here is the link to the bundle: Link

If you're feeling like becoming a data science rockstar, Matt launch a brand new course and you're able to buy the 3 course bundle. The new course is "Predictive Web Applications For Business With R Shiny": Link

If you take the course, please let me know if you thought it was as amazing as I did. You can leave a testimonial in the comment or reach out to me on LinkedIn. I'd love to hear your experience!

Read More

What I Enjoyed Most at ODSC East 2018

Last week I had the opportunity to attend Open Data Science Conference (ODSC) in Boston.  It was awesome to see people just walking around who I had previously read about or I'm following them on twitter.  It was even nicer to meet some of these people, and I was amazed at how friendly everyone was.

Of course you can't attend everything at a conference like this, at one point there was 11 different sessions going on at once.  It was really difficult to determine which sessions to attend given the number of great options, but I tried to align the information I'd be consuming closely with what I'd be able to bring back to my day job and implement.

In this article I'll cover some learnings/ favorite moments from:

  • one of the trainings
  • a couple different workshops
  • the sweet conference swag
  • mention one of the keynotes

Trainings:My original plan was to take an R training in the morning on Tuesday and take a Python training that afternoon.  However, what really happened was I went to the R training in the morning, this training left me feeling super jazzed about R, and so I ended up going to another R training that afternoon (instead of the Python training I had originally planned on).  The morning R training I took was "Getting to grips with the tidyverse (R)" given by Dr. Colin Gillespie.  This was perfect, because I had been struggling with dplyr (an R package) the night previously, and this training went through parts of dplyr with great explanations along the way.  Colin also showed us how to create plots using the package "Plotly".  This was my first time creating an interactive graph in R. Easy to use, and super cool. He was also nice enough to take a look at the code I was currently working on, I definitely appreciated this.

The afternoon R training I attended was given by Jared Lander entitled "Intermediate RMarkdown in Shiny".  It was my first introduction to Shiny.  I had heard about it, but had never ventured to use it, now I don't know what I was waiting for. If you ever have the opportunity to hear Jared speak, I found him incredibly entertaining, and he explained the material clearly, making it super accessible.  I like to think Jared also enjoyed my overly animated crowd participation.  
Workshops:

On Thursday I attended "Uplift Modeling and Uplift Prescriptive Analytics: Introduction and Advanced Topics" by Victor Lo, PHD. This information really resonated with me.  Dr. Lo spoke about the common scenario in Data Science where you'll build a model to try and predict something like customer attrition.  You'd maybe take the bottom three deciles (the people with the highest probability of cancelling their subscription, and do an A/B test with some treatment to try and encourage those customers to stay.  

In the end, during analysis, you'd find that you did not have a statistically significant lift in test over control with the usual methods.  You end up in a situation where the marketers would be saying "hey, this model doesn't work" and the data scientist would be saying "what? It's a highly predictive model".  It's just that this is not the way that you should be going about trying to determine the uplift.  Dr. Lo spoke about 3 different methods and showed their results.  

These included:

  • Two Model Approach
  • Treatment Dummy Approach
  • Four Quadrant Method

Here is the link to his ODSC slides from 2015 where he also covered these 3 models (with similar slides): here 

I've experienced this scenario before myself, where the marketing team will ask for a model and want to approach testing this way.  I'm super excited to use these methods to determine uplift in the near future.

Another workshop I attended was "R Packages as Collaboration Tools" by Stephanie Kirmer (slides).  Stephanie spoke about creating R packages as a way to automate repeated tasks.  She also showed us how incredibly easy it is to take your code and make it an R package for internal use.  Here is another case that is applicable currently at my work.  I don't have reports or anything that is due on a regular cadence, but we could certainly automate part of the test analysis process, and there are currently ongoing requests asked of Analytics in our organization that could be automated.  Test analysis is done in a different department, but if automated, this would save time on analysis, reduce potential for human error in test analysis, and free up bandwidth for more high value work.SWAG:

Although conference swag probably doesn't really need a place in this article, Figure Eight gave out a really cool little vacuum that said "CLEAN YOUR DATA".  I thought I'd share a picture with you.  Also, my daughter loved the DataRobot stickers and little wooden robots they gave out.  She fashioned the sticker around her wrist and wore it as a bracelet.  3 year olds love conference swag:

ODSC vacuum  ODSC stickers Keynote:The keynote was Thursday morning.  I LOVED the talk given by Cathy O'Neil, a link to her TED talk is here.  She spoke about the importance of ethics in data science, and how algorithms have to use historical data, therefore, they're going perpetuate our current social biases. I love a woman who is direct, cares about ethics, and has some hustle.  Go get em' girl. I made sure to get a chance to tell her how awesome her keynote was afterwards.  And of course I went home and bought her book "Weapons of Math Destruction".  I fully support awesome. Summary:I had an incredible time at the ODSC conference.  Everyone was so friendly, my questions were met with patience, and it was clear that many attendees and speakers had a true desire to help others learn. I could feel the sense of community.  I highly suggest that if you every get the opportunity to attend, go!  I am returning to work with a ton of new information that I can begin using immediately at my current job, it was a valuable experience.  I hope to see you there next year.

What I enjoyed most at Data Science Conference ODSC East 2018. Data analysis, data collection , data management, data tracking, data scientist, data science, big data, data design, data analytics, behavior data collection, behavior data, data recovery, data analyst. For more on data science, visit www.datamovesme.com
Read More