Business Science’s Time Series Course is Incredible

I’m a time series fan. Big fan. My first job out of grad school was for a utility company building econometric time series analysis and forecasting models. Lots of ARIMAs and neural nets. However, that was now over 10 years ago (don’t know how the hell that happened).

This post contains affiliate links that help to offset the cost of running the blog, plus the link gives you a special 15% discount.  If you use the link, thank you!

I’m a time series fan.  Big fan.  My first job out of grad school was for a utility company building econometric time series analysis and forecasting models.  Lots of ARIMAs and neural nets. However, that was now over 10 years ago (don’t know how the hell that happened).

In almost every position I've held in data, a question has come up that involved a time series (not a surprise that business cares about what has happened over time).  Often, I was the only one who had any knowledge of time series on my team.  I'm not sure why it isn't taught as a standard part of most university programs that are training data scientists, but it's just unfortunately not.  I believe that understanding time series analysis is currently a great way to differentiate yourself, since many in the field are just not well versed in it.

I wanted to understand what was current in the world of applying time series analysis to business.  It had been a real long time since I had given the subject some of the love and attention, and I thought taking this Business Science course would be the perfect way to do that.

My History With Business Science Courses:

I’ve previously written about Business Science’s first course, you can check it out here.  I've also taken his first Shiny app course (there’s a more advanced one as well) and went from zero to Shiny app in 2 days using survey data I collected with Kate Strachnyi.  It was a real win.

via GIPHY

The app is still on my site here, just scroll down.  For this little flexdashboard app I went from basically zero Shiny to having something that was useful in 2 days leveraging only the first 25% of the course. The course cannot actually be completed in 2 days. It's also worth noting that the course builds an app with much more functionality than mine. It’s a long course.

Back to the Time Series Review:

It’s broken into three different section:

  • Things I freakin’ love

  • The sexy

  • Everything else

Things I freakin’ love:

You’re learning about packages from the package creator.  Who is going to understand a library better than the person who wrote it?.  Matt built both modeltime and timetk that are used in this course. I find that super impressive.  These packages are also a step up from what was currently out there from a "not needing a million packages to do what I want" perspective.

He uses his own (anonymized) data fromBusiness Science to demonstrate some of the models.  I haven’t seen others do this, and I think it’s cool.  It’s a real, practical dataset of his Google Analytics and Mailchimp email data with an explanation of the fields.  If you don’t have analytics experience in e-commerce and are thinking about taking a role in e-commerce, definitely give some thought to this course.  

I love how in-depth he gets with the subject.  If you follow all that is covered in the course, you should be able to apply time series to your own data. 

The Sexy:

via GIPHY

Ok, so I’m sure some are interested in seeing just how “cutting edge” the course gets. 

Once you're combining deep learning Gluon models and machine learning models using ensembling methods, you might be the coolest kid at work (but I’m not making any promises). Gluon is a package that was created by Amazon in Python. So you’ll leverage both Python and R for Gluon.

Some of the deep learning algorithms you’ll learn how to leverage are:

  • DeepAR

  • DeepVAR

  • N-Beats

  • Deep Factor Estimator

Module 18 of the course is where you'll get into deep learning.  A couple years ago I might have said "deep learning, bah humbug, requires too much computing power and isn't necessary, simpler is better."  As things change and progress (and computers get even more beefy) I'm definitely changing my tune.   Especially as an ensemble N-Beats algorithm beat the ES-RNN's score in the M4 competition.  M competitions are prestigious forecasting challenges, and they've historically been won by statistical algorithms.  (I wouldn't have known this information without this course).  The stuff being taught in this course is very current and the sexy new techniques that are winning the big competitions.

Here's a look at the syllabus for preparing the data and learning about the DeepAR model.  You're doing log transformations, Fourier Series, and when you get to modeling the course even covers how to handle errors. I just love it.  I know I'll be referring back to the course when a time series use case pops up in the future.

The course covers 17 different algorithms. I'm trying to think if I could name 17 algorithms off the top of my head…  it’d take me a minute.   ARIMA is obviously included, because It’s like the linear regression of time series.  You’ll go through ARIMA, TBATS (a fave because you don’t need to worry about stationarity the way you do with ARIMA. I’ve used this one in industry as well). 

Along with these other algos:

  • ARIMA Boost

  • Prophet Boost

  • Cubist

  • KNN

  • MARS

  • Seasonal decomposition models

Then you’ve got your ensemble algos being leveraged for time series:

  • GLMNET

  • Random Forest

  • Neural Net

  • Cubist

  • SVM

Strap in for 8 solid hours of modeling, hyperparameter tuning, visualizing output, cross-validation and stacking!

Everything else:

  • Matt (the owner of Business Science) speaks clearly and is easy to understand.  Occasionally I'll put him on 1.25x speed.

  • His courses in general spend a good amount of time setting the stage for the course.  Once you start coding, you’ll have a great understanding of where you’re going, goals, and context (and your file management will be top notch), but if you’re itching to put your fingers on the keyboard immediately, you’ll need to calm the ants in your pants. It is a thorough start.

  • You have to already feel comfy in R AND the tidyverse. Otherwise you’ll need to get up to speed first and Business Science has a group of courses to help you do that.  You can see what's included here.

Before we finish off this article, one super unique part of the course I enjoyed was where Matt compared the top 4 time series Kaggle competitions and dissected what went into each of the winning models. I found the whole breakdown fascinating, and thought it added wonderful beginning context for the course.

In the 2014 Walmart Challenge, taking into account the “special event” of a shift in holiday sales was what landed 1st place. So you're actually seeing practical use cases for many of the topics taught in the course and this certainly helps with retention of the material.  

Likewise, special events got me good in 2011.  I was modeling and forecasting gas and the actual consumption of gas and number of customers was going through the roof!  Eventually we realized it was that the price of oil had gotten so high that people were converting to gas, but that one tripped me up for a couple months. Thinking about current events is so important in time series analysis and we'll see it time and again.  I've said it before, but Business Science courses are just so practical.

Summary:

If you do take this course, you’ll be prepared to implement time series analysis to time series that you encounter in the real world.  I've always found time series analysis useful at different points in my career, even when the job description did not explicitly call for knowledge of time series. 

As you saw from the prerequisites, you need to already know R for this course.  Luckily, Business Science has created a bundle at a discounted price so that you can both learn R, a whole lot of machine learning, and then dive into time series.  Plus you’ll get an additional 15% off the already discounted price with this link.  If you're already comfortable in R and you're just looking to take the time series course, you can get 15% off of the single course here

Edit:  People have asked for a coupon to buy all 5 courses at once.  That's something I'm able to do!  Learn R, machine learning, beginner and advanced Shiny app development and time series here.

Read More

Asking Great Questions as a Data Scientist

questions data science

Asking questions can sometimes seem scary. No one wants to appear "silly." But I assure you:

  1. You're not silly.
  2. It's way more scary if you're not asking questions.

Data Science is a constant collaboration with the business and a series of questions and answers that allow you to deliver the analysis/model/data product that the business has in their head.

Questions are required to fully understand what the business wants and not find yourself making assumptions about what others are thinking.

Asking the right questions, like those you identified here is what separate Data Scientists that know 'why' from folks that only know what (tools and technologies).

-Kayode Ayankoya

We're going to answer the following questions:

  1. Where do we ask questions?
  2. What are great questions?

I had posted on LinkedIn recently about asking great questions in data science and received a ton of thought provoking comments. I will add a couple of my favorite comments/quotes throughout this article.

Where do we ask questions?

Basically every piece of the pipeline can be expressed as a question:

data moves me

And each of these questions could involve a plethora of follow up questions.

To touch the tip of the iceberg, Kate Strachnyi posted a great assortment of questions that we typically ask (or want to consider) when scoping an analysis:

Few questions to ask yourself:  

How will the results be used? (make business decision, invest in product category, work with a vendor, identify risks, etc)

What questions will the audience have about our analysis? (ability to filter on key segments, look at data across time to identify trends, drill-down into details, etc)

How should the questions be prioritized to derive the most value?

Who should be able to access the information? think about confidentiality/ security concerns

Do I have the required permissions or credentials to access the data necessary for analysis?

What are the different data sources, which variables do I need, and how much data will I need to get from each one?

Do I need all the data for more granular analysis, or do I need a subset to ensure faster performance?

-Kate Strachnyi

Kate's questions spanned both:

  • Questions you'd ask stakeholders/different departments
  • Questions you'd ask internally on the data science/analytics team.

Any of the questions above could yield a variety of answers, so it is imperative that you're asking questions. Just because you have something in your mind that is an awesome idea for approaching the problem, does not mean that other people don't similarly have awesome ideas that need to be heard an discussed. At the end of the day, data science typically functions as a support function to other areas of the business. Meaning we can't just go rogue.

In addition to getting clarification and asking questions of stakeholders of the project, you'll also want to collaborate and ask questions of those on your data science team.

Even the most seasoned data scientist will still find themselves creating a methodology or solution that isn't in their area of expertise or is a unique use case of an algorithm that would benefit from the thoughts of other data subject matter experts. Often times the person listening to your proposed methodology will just give you the thumbs up, but when you've been staring at your computer for hours there is also a chance that you haven't considered one of the underlying assumptions of your model or you're introducing bias somewhere. Someone with fresh eyes can give a new perspective and save you from realizing your error AFTER you've presented your results.

Keeping your methodology a secret until you deliver the results will not do you any favors. If anything, sharing your thoughts upfront and asking for feedback will help to ensure a successful outcome.

What are great questions?

Great questions are the ones that get asked. However, there is an art and science to asking good questions and also a learning process involved. Especially when you're starting at a new job, ask everything. Even if it's something that you believe you should already know, it's better to ask and course-correct, than to not ask. You could potentially lose hours working on an analysis and then have your boss tell you that you misunderstood the request.

It is helpful to also pose questions in a way that requires more than a "yes/no" response, so you can open up a dialogue and receive more context and information.

How we formulate the questions is also very important. I've often found that people feel judged by my questions. I have to reassure them that all I want is to understand how they work and what are their needs and that my intention is not to judge them or criticize them.

 

-Karlo Jimenez

I've experienced what Karlo mentioned myself. Being direct can sometimes come off as judgement.  We definitely need to put on our "business acumen" hats on to the best of our ability to come across as someone who is genuinely trying to understand and deliver to their needs. I've found that if I can pose the question as "looking for their valuable feedback", it's a win-win for everyone involved.

As you build relationships with your team and stakeholders, this scenario is much less likely to occur. Once everyone realizes your personality and you've built a rapport, people will expect your line of questioning.

Follow up questions, in its various forms, are absolutely critical. Probing gives you an opportunity to paraphrase the ask and gain consensus before moving forward.

-Toby Baker

Follow-up questions feel good. When a question prompts another question you feel like you're really getting somewhere. Peeling back another layer of the onion if you will. You're collaborating, you're listening, you're in the zone.

In Summary

The main takeaway here is that there are a TON of questions you need to ask to effectively produce something that the business wants. Once you start asking questions, it'll become second nature and you'll immediately see the value and find yourself asking even more questions as you gain more experience.

Questioning has been instrumental to my career. An additional benefit is that I've found my 'voice' over the years. I feel heard in meetings and my opinion is valued. A lot of this growth has come from getting comfortable asking questions and I've also learned a ton about a given business/industry through asking these questions.

I've learned a lot about diversity of viewpoints and that people express information in different ways. This falls under the "business acumen" piece of data science that we're not often taught in school. But I hope you can go forward and fearlessly ask a whole bunch of questions.

Also published on KDNuggets: link

Read More