5/25/21 5/25/21

Business Science’s Time Series Course is Incredible

I’m a time series fan. Big fan. My first job out of grad school was for a utility company building econometric time series analysis and forecasting models. Lots of ARIMAs and neural nets. However, that was now over 10 years ago (don’t know how the hell that happened).

This post contains affiliate links that help to offset the cost of running the blog, plus the link gives you a special 15% discount. If you use the link, thank you!

In almost every position I've held in data, a question has come up that involved a time series (not a surprise that business cares about what has happened over time). Often, I was the only one who had any knowledge of time series on my team. I'm not sure why it isn't taught as a standard part of most university programs that are training data scientists, but it's just unfortunately not. I believe that understanding time series analysis is currently a great way to differentiate yourself, since many in the field are just not well versed in it.

I wanted to understand what was current in the world of applying time series analysis to business. It had been a real long time since I had given the subject some of the love and attention, and I thought taking this Business Science course would be the perfect way to do that.

My History With Business Science Courses:

I’ve previously written about Business Science’s first course, you can check it out here. I've also taken his first Shiny app course (there’s a more advanced one as well) and went from zero to Shiny app in 2 days using survey data I collected with Kate Strachnyi. It was a real win.

via GIPHY

The app is still on my site here, just scroll down. For this little flexdashboard app I went from basically zero Shiny to having something that was useful in 2 days leveraging only the first 25% of the course. The course cannot actually be completed in 2 days. It's also worth noting that the course builds an app with much more functionality than mine. It’s a long course.

Back to the Time Series Review:

It’s broken into three different section:

Things I freakin’ love
The sexy
Everything else

Things I freakin’ love:

You’re learning about packages from the package creator. Who is going to understand a library better than the person who wrote it?. Matt built both modeltime and timetk that are used in this course. I find that super impressive. These packages are also a step up from what was currently out there from a "not needing a million packages to do what I want" perspective.

He uses his own (anonymized) data fromBusiness Science to demonstrate some of the models. I haven’t seen others do this, and I think it’s cool. It’s a real, practical dataset of his Google Analytics and Mailchimp email data with an explanation of the fields. If you don’t have analytics experience in e-commerce and are thinking about taking a role in e-commerce, definitely give some thought to this course.

I love how in-depth he gets with the subject. If you follow all that is covered in the course, you should be able to apply time series to your own data.

The Sexy:

via GIPHY

Ok, so I’m sure some are interested in seeing just how “cutting edge” the course gets.

Once you're combining deep learning Gluon models and machine learning models using ensembling methods, you might be the coolest kid at work (but I’m not making any promises). Gluon is a package that was created by Amazon in Python. So you’ll leverage both Python and R for Gluon.

Some of the deep learning algorithms you’ll learn how to leverage are:

DeepAR
DeepVAR
N-Beats
Deep Factor Estimator

Module 18 of the course is where you'll get into deep learning. A couple years ago I might have said "deep learning, bah humbug, requires too much computing power and isn't necessary, simpler is better." As things change and progress (and computers get even more beefy) I'm definitely changing my tune. Especially as an ensemble N-Beats algorithm beat the ES-RNN's score in the M4 competition. M competitions are prestigious forecasting challenges, and they've historically been won by statistical algorithms. (I wouldn't have known this information without this course). The stuff being taught in this course is very current and the sexy new techniques that are winning the big competitions.

Here's a look at the syllabus for preparing the data and learning about the DeepAR model. You're doing log transformations, Fourier Series, and when you get to modeling the course even covers how to handle errors. I just love it. I know I'll be referring back to the course when a time series use case pops up in the future.

The course covers 17 different algorithms. I'm trying to think if I could name 17 algorithms off the top of my head… it’d take me a minute. ARIMA is obviously included, because It’s like the linear regression of time series. You’ll go through ARIMA, TBATS (a fave because you don’t need to worry about stationarity the way you do with ARIMA. I’ve used this one in industry as well).

Along with these other algos:

ARIMA Boost
Prophet Boost
Cubist
KNN
MARS
Seasonal decomposition models

Then you’ve got your ensemble algos being leveraged for time series:

GLMNET
Random Forest
Neural Net
Cubist
SVM

Strap in for 8 solid hours of modeling, hyperparameter tuning, visualizing output, cross-validation and stacking!

Everything else:

Matt (the owner of Business Science) speaks clearly and is easy to understand. Occasionally I'll put him on 1.25x speed.
His courses in general spend a good amount of time setting the stage for the course. Once you start coding, you’ll have a great understanding of where you’re going, goals, and context (and your file management will be top notch), but if you’re itching to put your fingers on the keyboard immediately, you’ll need to calm the ants in your pants. It is a thorough start.
You have to already feel comfy in R AND the tidyverse. Otherwise you’ll need to get up to speed first and Business Science has a group of courses to help you do that. You can see what's included here.

Before we finish off this article, one super unique part of the course I enjoyed was where Matt compared the top 4 time series Kaggle competitions and dissected what went into each of the winning models. I found the whole breakdown fascinating, and thought it added wonderful beginning context for the course.

In the 2014 Walmart Challenge, taking into account the “special event” of a shift in holiday sales was what landed 1st place. So you're actually seeing practical use cases for many of the topics taught in the course and this certainly helps with retention of the material.

Likewise, special events got me good in 2011. I was modeling and forecasting gas and the actual consumption of gas and number of customers was going through the roof! Eventually we realized it was that the price of oil had gotten so high that people were converting to gas, but that one tripped me up for a couple months. Thinking about current events is so important in time series analysis and we'll see it time and again. I've said it before, but Business Science courses are just so practical.

Summary:

If you do take this course, you’ll be prepared to implement time series analysis to time series that you encounter in the real world. I've always found time series analysis useful at different points in my career, even when the job description did not explicitly call for knowledge of time series.

As you saw from the prerequisites, you need to already know R for this course. Luckily, Business Science has created a bundle at a discounted price so that you can both learn R, a whole lot of machine learning, and then dive into time series. Plus you’ll get an additional 15% off the already discounted price with this link. If you're already comfortable in R and you're just looking to take the time series course, you can get 15% off of the single course here.

Edit: People have asked for a coupon to buy all 5 courses at once. That's something I'm able to do! Learn R, machine learning, beginner and advanced Shiny app development and time series here.

12/3/18 12/3/18

Getting into Data Science FAQs

I often see similar questions in my inbox or asked in webinars. I'd like to set the record straight with data. However, I didn't need to start from scratch, there was an excellent article on KD Nuggets by Jeff Hale. Here is the link to his article: "The Most in Demand Skills for Data Scientists". He had already scoured multiple job search site and aggregated data on the demand for different skills for data scientist. I recreated some of his analysis myself, so that I could come up with some points for this article, and to make sure his numbers matched mine before posting. The charts I created below are based on data from searches on indeed.com only. A search for "Data Scientist" was the denominator, and the numerator would be "Data Scientist" plus another term I was looking to see results for. I'm not sure how many job descriptions listed on indeed.com might be duplicates, so this is not gospel, but still interesting.This article will cover a couple of "Frequently Asked Questions" using the methodology above (that was adopted from Jeff).

Questions I'm frequently asked:

Should I learn R or Python?
As a Computer Science major, can I get into data science?
How important is SQL?

Should I learn R or Python?This would most likely be the most frequently asked question (although I've never analyzed the questions that I'm asked). In Jeff's article, you were able to see that Python has the edge in terms of coming up in job listings. I recreated the methodology for myself to look at this a little further.55% of the job listings actually list both tools, as in the company would like to see that you have experience with "Python and/or R". That should make those who have a preference for one tool feel better. If you're looking to pick up either R or Python and you're just getting your hands dirty, I'd suggest python. For listings that only specify one tool, Python is almost 5x more likely to be listed as the tool of choice compared to R.I was happy to see this, as I've mentioned in a number of webinars and comments on social media that it "feels like" Python is slightly more popular. It would have been a bummer if I had been giving misinformation this whole time.

% of Data Science Positions Mentioning a Particular Skill on Indeed.com

Pulled this data by doing a search on indeed.com 11/2018

As a Computer Science major, can I get into data science?I'm always surprised when I see this question, because as someone who's been in the field for a long time, it just seems clear that this is a fantastic skill as your foundation for moving into data science. Data science requires a number of different skills to be successful, and being able to program is definitely one of the core pillars. Analytics and Statistics are coming in first, but Analytics and Statistics could easily be mentioned somewhere in the job description other than specifically where preferred degrees are mentioned. If a job description says "computer science" they're most likely speaking to the degrees they would prefer from candidates. More than 50% of job descriptions mention "computer science". There you have it, a degree in computer science is something "in demand" for getting into data science.

% of Data Science Positions Mentioning a Particular Skill on Indeed.com

Pulled this data by doing a search on indeed.com 11/2018

How important is SQL?I'm frequently asked this question, and I was honestly surprised that SQL came in third behind Python and R in terms of skills. However, 51% of jobs do mention SQL. So it is certainly desired for the majority of positions, but I expected it to rank higher. Is it possible this skill is becoming assumed as a prerequisite? Or are companies figuring that SQL is easily learned and therefore not necessary to list on the job description? I wouldn't mind a job where all the datasets were aggregated for me before data cleaning and applying machine learning, I'm just not sure how many of those jobs exist. If you're a data scientist, and you haven't had to understand relational databases at any point, let me know. I'd love to hear about it.Conclusion:We saw that Python is preferred over R, but that either tool will allow you to apply to the majority of data science jobs in the US. Computer science degrees are a great stepping stone to getting into data science, and the majority of listings will expect you to know SQL.I also want to point out that "communication" was very much in the top list of skills. 46% of job descriptions listed communication in the job description. This means I'll continue to keep writing about how I use softer skills to be effective in my job. I think we sometimes do not talk about communication enough in data science, it's really imperative to delivering models and analysis that are aligned with what the business is looking for. If you'd like to see how Jeff used the data from the job search websites to discuss most in demand skills, here is the link one more time. Link.

8/22/18 8/22/18

Trying to Change Careers or Get Your Start in Data Science?

If you’re someone who is looking to make a move to data science, there are some ways that you can polish your approach to get noticed during your job search.

Assuming that you've built up the skills required for the job see if you're able to leverage some of these tips:

Optimize your resume (as best you can) for the job you WANT not the jobs you’ve HAD.
Try to gain experience at your current job (if you’re a career changer), or work on your own data science projects at home. (continuous learning is a big plus).
Develop a killer elevator pitch.

Optimizing your resume for the job you want:

Describe your projects in a way that shows you’re results-focused.

The points you’re going to want to demonstrate on your resume need to both:

Demonstrate that you understand general corporate culture, and showcase your collaborative, result achieving, problem solving and self-managing competencies.
Show that you have the technical chops as a data scientist.

The first bullet takes a lot of thought - it is really easy to list job duties, it’s another thing to reword them effectively to highlight your true strengths and demonstrate how what you've done has improved the business. Your bullet points should be full of action verbs and results, even if you need to stretch yourself mentally to identify these.

Did you automate a process that saved hours of time manually doing a task? That time saved is business value.
Demonstrating that you've worked cross-functionally or presented results to the business are again, things that are desirable for the new job you want (data scientist).

It is helpful to read job descriptions and see what companies are looking for, you'll find consistent themes. If you look closely, you'll see there are a lot of skills listed that aren't necessarily technical. Make sure you shine when speaking to those softer skills. But of course, these softer skills need to be demonstrated in a way that still demonstrates an action and result. Do not just put a "soft skills" section on your resume and list a bunch of words with no context.

"Show you have the technical chops as a data scientist". This is pretty straight-forward. Try to use the verbiage from the actual job description for the job you're applying to. You might want to sound fancy, but “empirical bayesian 3-stage hierarchical model” probably isn’t on the job description. Having this specifically listed on your resume isn’t going to help you pass ATS (the applicant tracking system), and the person in human resources who doesn’t have a data science background is not going to know whether that is relevant or not. Again, looking at multiple job descriptions and trying to gauge what type of language to use on your resume is helpful.

Gain experience at your current job or work on a project:

If you currently have a job, do you have access to SQL? Does your company have a data warehouse or database? Can you file a ticket with the service desk to get SQL? Can you then play with data to make your own project?

You could even go a step further and bring data from the database into R or Python. Maybe you make a nice decision tree that answers a business questions then wonderfully and concisely place your results of your project on your resume.

Try to automate a task that’s repeatable that you do on a regular cadence. That’s next level resume content. You’re increasing efficiency in this scenario.

If you’ve done data science projects on your own to round out your resume, make sure those bullets are full of action verbs and results, action verbs and results. I almost want to say it a third time.

SQL Lite is open source, R is open source, Python is open source, there is tons of free data out there. The world can really be your oyster, but you’ll need to market these go-getter skills effectively.

Develop a killer elevator pitch:

A strong, well-targeted resume might open the door, but you need to keep that door open and keep the conversation going once the door has been opened. The resume does nothing more than open the door, that’s it.

Getting your resume into the right hands can sometimes be difficult. Leveraging LinkedIn effectively can help bridge that gap. How do we begin the conversation if you’re reaching out to someone on LinkedIn to ask about opportunities?

Important note: When cold reaching out to people on LinkedIn, this should be after you have visited the company website, found a job that you’re interested in and (pretty much) qualified for, and then you reach out to a relevant person with a well-targeted message.

It is impossible to be well-targeted if you are reaching out to someone who works at a company that doesn’t have any positions available. Because you didn’t read a job description. So you wouldn’t be able to infer the needs of the business. Data Science is a large field, with many specializations, a blanket approach will not work.

Back to the pitch. You’re results-focused, you’re innovative, and you view things from the business’ perspective.

I'd suggest starting with something conversational, this will help if the person you're messaging is already being inundated with requests. A comment about a post they made recently makes your connection come across as more authentic.
Why you’re messaging: you’re interested in the open position, and you’re trying to get your resume to the correct person.
Then mention a number of things concisely that are specifically mentioned on the job description. Basically saying “hi, look at me, I’m a fit.”
Let them know that you’d really appreciate it if they’d simply forward you to the correct person (hopefully the person you’re messaging is the correct person, but there is also a chance it’s not the right person, so don’t assume).
Close strong. You’re here to add value for the company, not to talk about your needs; imply you’re aware that you’re here to talk about how you can fit the needs of the business.

Hi [name],

I enjoyed your recent post on [topic] and I look forward to reading more of your posts.

I noticed [company] is hiring for [position title], and I’m hoping I can get my resume in the right hands. I have an MS in Statistics, plus 7 years of real-world experience building models. I’m a wiz at SQL, modeling in R, and I have exposure to Python.

I’d appreciate the opportunity to speak with the appropriate person about the open position, and share how I’ve delivered insights and added value for companies through the use of statistical methods.

Thanks, Kristen

Now you may have a very different background from me. However, you can talk about the education that you do have (concisely), the exposure that you do have to building models, about your technical chops, and that you want to deliver value.

I hope that you’ll be able to use some of these suggestions. And I wish you a successful a rewarding career in data science. If you have additional suggestions for trying to make a change to data science, I’d love to hear your thoughts! The next article I post will be covering how to write crisp content for your resume that makes an impact, that article is here.