Business Science’s Time Series Course is Incredible

I’m a time series fan. Big fan. My first job out of grad school was for a utility company building econometric time series analysis and forecasting models. Lots of ARIMAs and neural nets. However, that was now over 10 years ago (don’t know how the hell that happened).

This post contains affiliate links that help to offset the cost of running the blog, plus the link gives you a special 15% discount.  If you use the link, thank you!

I’m a time series fan.  Big fan.  My first job out of grad school was for a utility company building econometric time series analysis and forecasting models.  Lots of ARIMAs and neural nets. However, that was now over 10 years ago (don’t know how the hell that happened).

In almost every position I've held in data, a question has come up that involved a time series (not a surprise that business cares about what has happened over time).  Often, I was the only one who had any knowledge of time series on my team.  I'm not sure why it isn't taught as a standard part of most university programs that are training data scientists, but it's just unfortunately not.  I believe that understanding time series analysis is currently a great way to differentiate yourself, since many in the field are just not well versed in it.

I wanted to understand what was current in the world of applying time series analysis to business.  It had been a real long time since I had given the subject some of the love and attention, and I thought taking this Business Science course would be the perfect way to do that.

My History With Business Science Courses:

I’ve previously written about Business Science’s first course, you can check it out here.  I've also taken his first Shiny app course (there’s a more advanced one as well) and went from zero to Shiny app in 2 days using survey data I collected with Kate Strachnyi.  It was a real win.

via GIPHY

The app is still on my site here, just scroll down.  For this little flexdashboard app I went from basically zero Shiny to having something that was useful in 2 days leveraging only the first 25% of the course. The course cannot actually be completed in 2 days. It's also worth noting that the course builds an app with much more functionality than mine. It’s a long course.

Back to the Time Series Review:

It’s broken into three different section:

  • Things I freakin’ love

  • The sexy

  • Everything else

Things I freakin’ love:

You’re learning about packages from the package creator.  Who is going to understand a library better than the person who wrote it?.  Matt built both modeltime and timetk that are used in this course. I find that super impressive.  These packages are also a step up from what was currently out there from a "not needing a million packages to do what I want" perspective.

He uses his own (anonymized) data fromBusiness Science to demonstrate some of the models.  I haven’t seen others do this, and I think it’s cool.  It’s a real, practical dataset of his Google Analytics and Mailchimp email data with an explanation of the fields.  If you don’t have analytics experience in e-commerce and are thinking about taking a role in e-commerce, definitely give some thought to this course.  

I love how in-depth he gets with the subject.  If you follow all that is covered in the course, you should be able to apply time series to your own data. 

The Sexy:

via GIPHY

Ok, so I’m sure some are interested in seeing just how “cutting edge” the course gets. 

Once you're combining deep learning Gluon models and machine learning models using ensembling methods, you might be the coolest kid at work (but I’m not making any promises). Gluon is a package that was created by Amazon in Python. So you’ll leverage both Python and R for Gluon.

Some of the deep learning algorithms you’ll learn how to leverage are:

  • DeepAR

  • DeepVAR

  • N-Beats

  • Deep Factor Estimator

Module 18 of the course is where you'll get into deep learning.  A couple years ago I might have said "deep learning, bah humbug, requires too much computing power and isn't necessary, simpler is better."  As things change and progress (and computers get even more beefy) I'm definitely changing my tune.   Especially as an ensemble N-Beats algorithm beat the ES-RNN's score in the M4 competition.  M competitions are prestigious forecasting challenges, and they've historically been won by statistical algorithms.  (I wouldn't have known this information without this course).  The stuff being taught in this course is very current and the sexy new techniques that are winning the big competitions.

Here's a look at the syllabus for preparing the data and learning about the DeepAR model.  You're doing log transformations, Fourier Series, and when you get to modeling the course even covers how to handle errors. I just love it.  I know I'll be referring back to the course when a time series use case pops up in the future.

The course covers 17 different algorithms. I'm trying to think if I could name 17 algorithms off the top of my head…  it’d take me a minute.   ARIMA is obviously included, because It’s like the linear regression of time series.  You’ll go through ARIMA, TBATS (a fave because you don’t need to worry about stationarity the way you do with ARIMA. I’ve used this one in industry as well). 

Along with these other algos:

  • ARIMA Boost

  • Prophet Boost

  • Cubist

  • KNN

  • MARS

  • Seasonal decomposition models

Then you’ve got your ensemble algos being leveraged for time series:

  • GLMNET

  • Random Forest

  • Neural Net

  • Cubist

  • SVM

Strap in for 8 solid hours of modeling, hyperparameter tuning, visualizing output, cross-validation and stacking!

Everything else:

  • Matt (the owner of Business Science) speaks clearly and is easy to understand.  Occasionally I'll put him on 1.25x speed.

  • His courses in general spend a good amount of time setting the stage for the course.  Once you start coding, you’ll have a great understanding of where you’re going, goals, and context (and your file management will be top notch), but if you’re itching to put your fingers on the keyboard immediately, you’ll need to calm the ants in your pants. It is a thorough start.

  • You have to already feel comfy in R AND the tidyverse. Otherwise you’ll need to get up to speed first and Business Science has a group of courses to help you do that.  You can see what's included here.

Before we finish off this article, one super unique part of the course I enjoyed was where Matt compared the top 4 time series Kaggle competitions and dissected what went into each of the winning models. I found the whole breakdown fascinating, and thought it added wonderful beginning context for the course.

In the 2014 Walmart Challenge, taking into account the “special event” of a shift in holiday sales was what landed 1st place. So you're actually seeing practical use cases for many of the topics taught in the course and this certainly helps with retention of the material.  

Likewise, special events got me good in 2011.  I was modeling and forecasting gas and the actual consumption of gas and number of customers was going through the roof!  Eventually we realized it was that the price of oil had gotten so high that people were converting to gas, but that one tripped me up for a couple months. Thinking about current events is so important in time series analysis and we'll see it time and again.  I've said it before, but Business Science courses are just so practical.

Summary:

If you do take this course, you’ll be prepared to implement time series analysis to time series that you encounter in the real world.  I've always found time series analysis useful at different points in my career, even when the job description did not explicitly call for knowledge of time series. 

As you saw from the prerequisites, you need to already know R for this course.  Luckily, Business Science has created a bundle at a discounted price so that you can both learn R, a whole lot of machine learning, and then dive into time series.  Plus you’ll get an additional 15% off the already discounted price with this link.  If you're already comfortable in R and you're just looking to take the time series course, you can get 15% off of the single course here

Edit:  People have asked for a coupon to buy all 5 courses at once.  That's something I'm able to do!  Learn R, machine learning, beginner and advanced Shiny app development and time series here.

Read More
Uncategorized Uncategorized

Setting Your Hypothesis Test Up For Success

Setting up your hypothesis test for success as a data scientist is critical. I want to go deep with you on exactly how I work with stakeholders ahead of launching a test.  This step is crucial to make sure that once a test is done running, we'll actually be able to analyze it.  This includes:

  • A well defined hypothesis

  • A solid test design

  • Knowing your sample size

  • Understanding potential conflicts

  • Population criteria (who are we testing)

  • Test duration (it's like the cousin of sample size)

  • Success metrics

  • Decisions that will be made based on results

This is obviously a lot of information.  Before we jump in, here is how I keep it all organized:I recently created a google doc at work so that stakeholders and analytics could align on all the information to fully scope a test upfront.  This also gives you (the analyst/data scientist) a bit of an insurance policy.  It's possible the business decides to go with a design or a sample size that wasn't your recommendation.  If things end up working out less than stellar (not enough data, design that is basically impossible to analyze), you have your original suggestions documented.In my previous article I wrote:

"Sit down with marketing and other stakeholders before the launch of the A/B test to understand the business implications, what they’re hoping to learn, who they’re testing, and how they’re testing.  In my experience, everyone is set up for success when you’re viewed as a thought partner in helping to construct the test design, and have agreed upon the scope of the analysis ahead of launch."

Well, this is literally what I'm talking about:This document was born of things that we often see in industry:HypothesisI've seen scenarios that look like "we're going to make this change, and then we'd like you to read out on the results".  So, your hypothesis is what?  You're going to make this change, and what do you expect to happen? Why are we doing this?  A hypothesis clearly states the change that is being made, the impact you expect it to have, and why you think it will have that impact.  It's not an open-ended statement.  You are testing a measurable response to a change.  It's ok to be a stickler, this is your foundation.Test DesignThe test design needs to be solid, so you'll want to have an understanding of exactly what change is being made between test and control.  If you're approached by a stakeholder with a design that won't allow you to accurately measure criteria, you'll want to coach them on how they could design the test more effectively to read out on the results.  I cover test design a bit in my article here.Sample SizeYou need to understand the sample size ahead of launch, and your expected effect size.  If you run with a small sample and need an unreasonable effect size for it to be significant, it's most likely not worth running.  Time to rethink your sample and your design.  Sarah Nooravi recently wrote a great article on determining sample size for a test.  You can find Sarah's article here.

  • An example might be that you want to test the effect of offering a service credit to select customers.  You have a certain budget worth of credits you're allowed to give out.  So you're hoping you can have 1,500 in test and 1,500 in control (this is small).  The test experience sees the service along with a coupon, and the control experience sees content advertising the service but does not see any mention of the credit.  If the average purchase rate is 13.3% you would need a 2.6 point increase (15.9%) in the control to see significance at 0.95 confidence.  This is a large effect size that we probably won't achieve (unless the credit is AMAZING).  It's good to know these things upfront so that you can make changes (for instance, reduce the amount of the credit to allow for additional sample size, ask for extra budget, etc).

Potential Conflicts:It's possible that 2 different groups in your organization could be running tests at the same time that conflict with each other, resulting in data that is junk for potentially both cases. (I actually used to run a "testing governance" meeting at my previous job to proactively identify these cases, this might be something you want to consider).

  • An example of a conflict might be that the acquisition team is running an ad in Google advertising 500 business cards for $10.  But if at the same time this test was running another team was running a pricing test on the business card product page that doesn't respect the ad that is driving traffic, the acquisition team's test is not getting the experience they thought they were!  Customers will see a different price than what is advertised, and this has negative implications all around.

  • It is so important in a large analytics organization to be collaborating across teams and have an understanding of the tests in flight and how they could impact your test.

Population criteria: Obviously you want to target the correct people. But often I've seen criteria so specific that the results of the test need to be caveated with "These results are not representative of our customer base, this effect is for people who [[lists criteria here]]."  If your test targeted super performers, you know that it doesn't apply to everyone in the base, but you want to make sure it is spelled out or doesn't get miscommunicated to a more broad audience.

Test duration: This is often directly related to sample size. (see Sarah's article) You'll want to estimate how long you'll need to run the test to achieve the required sample size.  Maybe you're randomly sampling from the base and already have sufficient population to choose from.  But often we're testing an experience for new customers, or we're testing a change on the website and we need to wait for traffic to visit the site and view the change.  If it's going to take 6 months of running to get the required sample size, you probably want to rethink your population criteria or what you're testing.  And better to know that upfront.

Success Metrics: This is an important one to talk through.  If you've been running tests previously, I'm sure you've had stakeholders ask you for the kitchen sink in terms of analysis.If your hypothesis is that a message about a new feature on the website will drive people to go see that feature; it is reasonable to check how many people visited that page and whether or not people downloaded/used that feature.  This would probably be too benign to cause cancellations, or effect upsell/cross-sell metrics, so make sure you're clear about what the analysis will and will not include.  And try not to make a mountain out of a molehill unless you're testing something that is a dramatic change and has large implications for the business.

Decisions! Getting agreement ahead of time on what decisions will be made based on the results of the test is imperative.Have you ever been in a situation where the business tests something, it's not significant, and then they roll it out anyways?  Well then that really didn't need to be a test, they could have just rolled it out.  There are endless opportunities for tests that will guide the direction of the business, don't get caught up in a test that isn't actually a test.

Conclusion: Of course, each of these areas could have been explained in much more depth.  But the main point is that there are a number of items that you want to have a discussion about before a test launches.  Especially if you're on the hook for doing the analysis, you want to have the complete picture and context so that you can analyze the test appropriately.I hope this helps you to be more collaborative with your business partners and potentially be more "proactive" rather than "reactive". 

No one has any fun when you run a test and then later find out it should have been scoped differently.  Adding a little extra work and clarification upfront can save you some heartache later on.  Consider creating a document like the one I have pictured above for scoping your future tests, and you'll have a full understanding of the goals and implications of your next test ahead of launch. :)

Read More

Effective Data Science Presentations

If you're new to the field of Data Science, I wanted to offer some tips on how to transition from presentations you gave in academia to creating effective presentations for industry.Unfortunately, if your background is of the math, stats, or computer science variety, no one probably prepared you for creating an awesome data science presentations in industry.  And the truth is, it takes practice.  In academia, we share tables of t-stats and p-values and talk heavily about mathematical formulas.  That is basically the opposite of what you'd want to do when presenting to a non-technical audience.If your audience is full of a bunch of STEM PhD's then have at it, but in many instances we need to adjust the way we think about presenting our technical material.I could go on and on forever about this topic, but here we'll cover:

  1. Talking about model output without talking about the model

  2. Painting the picture using actual customers or inputs

  3. Putting in the Time to Tell the Story

Talking about model output without talking about the modelCertain models really lend themselves well to this.  Logistic regression, decision trees, they're just screaming to be brought to life.You don't want to be copy/pasting model output into your data science presentations.  You also don't want to be formatting the output into a nice table and pasting it into your presentation.  You want to tell the story and log odds certainly are not going to tell the story for your stakeholders.A good first step for a logistic regression model would just be to exponentiate the log odds so that you're at least dealing in terms of odds.  Since this output is multiplicative, you can say:"For each unit increase of [variable] we expect to see a lift of x% on average with everything else held constant."So instead of talking about technical aspects of the model, we're just talking about how the different drivers effect the output. 

We could, however, take this one step further. 

Using Actual Customers to Paint the Picture: I love using real-life use cases to demonstrate how the model is working.  Above we see something similar to what I presented when talking about my seasonality model.  Of course I changed his name for this post, but in the presentation I would talk about this person's business, why it's seasonal, show the obvious seasonal pattern, and let them know that the model classified this person as seasonal.  I'm not talking about fourier transforms, I'm describing how real people are being categorized and how we might want to think about marketing to them.  Digging in deep like this also helps me to better understand the big picture of what is going on.  We all know that when we dig deeper we see some crazy behavioral patterns.Pulling specific customers/use cases works for other types of models as well.  You built a retention model?  Choose a couple people with a high probability of churning, and a couple with a low probability of churning and talk about those people."Mary here has been a customer for a long time, but she has been less engaged recently and hasn't done x, y, or z (model drivers), so the probability of her cancelling her subscription is high, even though customers with longer tenure are usually less likely to leave. 

Putting in the Time to Tell the Story: As stated before, it takes some extra work to put these things together.  Another great example is in cluster analysis.  You could create a slide for each attribute, but then people would need to comb through multiple slides to figure out WHO cluster 1 really is vs. cluster 2, etc.  You want to aggregate all of this information for your consumer.  And I'm not above coming up with cheesy names for my segments, it just comes with the territory :).It's worth noting here that if I didn't aggregate all this information by cluster, I also wouldn't be able to speak at a high level about who was actually getting into these different clusters.  That would be a large miss on my behalf, because at the end of the day, your stakeholders want to understand the big picture of these clusters.Every analysis I present I spend time thinking about what the appropriate flow should be for the story the data can tell. 

I might need additional information like market penetration by geography, (or anything, the possibilities are endless).  The number of small businesses by geography may not have been something I had in my model, but with a little google search I can find it.  Put in the little extra work to do the calculation for market penetration, and then create a map and use this information to further support my story.  Or maybe I learn that market penetration doesn't support my story and I need to do more analysis to get to the real heart of what is going on.  We're detectives. And we're not just dealing with the data that is actually in the model.  We're trying to explore anything that might give interesting insight and help to tell the story.  Also, if you're doing the extra work and find your story is invalidated, you just saved yourself some heartache.  It's way worse when you present first, and then later realize your conclusions were off.  womp womp. 

Closing comments: Before you start building a model, you were making sure that the output would be actionable, right?  At the end of your presentation you certainly want to speak to next steps on how your model can be used and add value whether that's coming up with ideas on how you can communicate with customers in a new way that you think they'll respond to, reduce retention, increase acquisition, etc.  But spell it out.  Spend the time to come up with specific examples of how someone could use this output.I'd also like to mention that learning best practices for creating great visualizations will help you immensely. 

There are two articles by Kate Strachnyi that cover pieces of this topic.  You can find those articles here and here. If you create a slide and have trouble finding what the "so what?" is of the slide, it probably belongs in the appendix.  When you're creating the first couple decks of your career it might crush you to not include a slide that you spent a lot of time on, but if it doesn't add something interesting, unfortunately that slide belongs in the appendix.I hope you found at least one tip in this article that you'll be able to apply to your next data science presentation.  If I can help just one person create a kick-ass presentation, it'll be worth it.   

Read More
Segmentation Segmentation

Designing and Learning With A/B Testing

I've spent the last 6 years of my life heavily involved in A/B testing, and other testing methodologies.  Whether it was the performance of an email campaign to drive health outcomes,  product changes, Website changes, the example list goes on. A few of these tests have been full factorial MVT tests (my fave). I wanted to share some testing best practices and examples in marketing, so that you can feel confident about how you're designing and thinking about A/B testing.As a Data Scientist, you may be expected to be the subject matter expert on how to test correctly.  Or it may be that you've just built a product recommendation engine (or some other model), and you want to see how much better you're performing compared to the previously used model or business logic, so you'll test the new model vs. whatever is currently in production.There is SO MUCH more to the world of testing than is contained here, but what I'm looking to cover here is:

  • Determining test and control populations

  • Scoping the test ahead of launch

  • test design that will allow us to read the results we’re hoping to measure

  • Test Analysis

  • Thoughts on automating test analysis

Choosing Test and Control PopulationsThis is where the magic starts.  The only way to determine a causal relationship is by having randomized populations (and a correct test design). So it's imperative that our populations are drawn correctly if we want to learn anything from our A/B test. In general, the population you want to target will be specific to what you're testing.  If this is a site test for an Ecommerce company, you hope that visitors are randomized to test and control upon visiting the website.  If you're running an email campaign or some other type of test, then you'll pull all of the relevant customers/people from a database or BigData environment who meet the criteria for being involved in your A/B test.  If this is a large list you'll probably want to take a random sample of customers over some time period. This is called a simple random sample. A simple random sample is a subset of your population, where every member had an equal probability of being chosen to be in the sample.

Here is a great example on how to pull a random sample from Hive: here

Also, just to be clear, writing a "select top 1000 * from table" in SQL is NOT A RANDOM SAMPLE. There are a couple different ways to get a random sample in SQL, but how to do it will depend on the "flavor" of SQL you're using.

Here is an example pulling a random sample in SQL server: here

Now that you have your sample, you'll randomly assign these people to test and control groups.There are times when we’ll need to be a little more sophisticated….Let’s say that the marketing team wants to learn about ability to drive engagement by industry (and that you have industry data). Some of the industries are probably going to contain fewer members than others. Meaning that if you just split a portion of your population into two groups, you might not have a high enough sample size in certain industries that you care about to determine statistical significance.Rather than putting in all the effort running the A/B test to the find out that you can’t learn about an industry you care about, use stratified sampling (This would involve doing a simple random sample within each group of interest).

Scoping Ahead of LaunchI've seen in practice when the marketing team doesn't see the results they want say "We're going to let this A/B test run for two more weeks to see what happens".  Especially for site tests, if you run anything long enough, tiny effect sizes can become statistically significant. You should have an idea of how much traffic you're getting to the particular webpage, and how long the A/B test should run before you launch. Otherwise, what is to stop us from just running the A/B test until we get the result that we want?Sit down with marketing and other stakeholders before the launch of the A/B test to understand the business implications, what they're hoping to learn, who they're testing, and how they're testing.  In my experience, everyone is set up for success when you're viewed as a thought partner in helping to construct the test design, and have agreed upon the scope of the analysis ahead of launch.

Test DesignFor each cell in an A/B test, you can only make ONE change. For instance, if we have:

  • Cell A: $15 price point

  • Cell B: $25 price point

  • Cell C: UI change and a $30 price point

You just lost valuable information. Adding a UI change AND a different price option makes it impossible to parse out what effect was due to the UI change or the $30 price point. We’ll only know how that cell performed in aggregate.  Iterative A/B testing is when you take the winner from one test and make it the control for a subsequent A/B test. This method is going to result in a loss of information. What if the combination of the loser from test 1 and the winner from test 2 is actually the winner? We’d never know!Sometimes iterating like this makes sense (maybe you don't have enough traffic for more test cells), but we’d want to talk about all potential concessions ahead of time.Another type of test design is MVT (Multivariate).  Here we'll look at a full-factorial MVT.  There are more types of multivariate tests, but full-factorial is the easiest to analyze.

  • MVT is better for more subtle optimizations (A/B testing should be used if you think the test will have a huge impact)

  • Rule of thumb is at least 100,000 unique visitors per month.

  • You'll need to know how to use ANOVA to analyze (I will provide a follow-up article with code and explanation for how to do this analysis and link it here later)

A/B learning control monitor

a/b test treatment monitors

One illustrative example of an MVT test is below.  The left (below) is the control experiences, and on the right are the 3 test treatments.  This results in 2^3 = 8 treatments, because we'll look at each possible combination of test and control.

On the left: The controls would be the current experience

On the right: Cell A could be new photography (ex: friendly waving stick figure), Cell B could reference a sale and, Cell C could show new content.

chart assignment in excel for a/b testing

We can learn about all the interactions! Understanding the interactions and finding the optimal treatment when changing multiple items is the big benefit of MVT testing.  The chart below shows you how each person would be assigned to one of the 8 treatments in this example.

In a future article I'll write up one of my previous MVT tests that I've analyzed, with R code.A/B Test AnalysisOne of the most important parts of test analysis is to have consistency across the business in how we analyze tests.  You don't want to say something had a causal effect, when if another person had analyzed the same test, they might have reached a different conclusion.  In addition to having consistent ways of determining conclusions, you'll also want to have a consistent way of communicating these results with the rest of the business.  For example, "Do we share results we find with a p-value greater than .05?" Maybe we do, maybe we don't, but make sure the whole team is being consistent in their communication with marketing and other teams.  Confidence intervals should always be given! You don’t want to say “Wow! This is worth $700k a year”, when really it’s worth somewhere between $100k and $1.3m. That's a big difference and could have an impact on decisions whether to roll out the change or not.Let's Automate our A/B Test Analysis!Why spend multiple hours analyzing each A/B test, when we can:

  • Automate removal of outliers

  • Build in not calculating statistical significance if the sample is not quite large enough yet

  • Determine statistical significance of metrics with confidence intervals and engaging graphs

  • See how A/B tests are performing soon after launch to make sure there aren’t any bugs messing with our results or large drops in revenue.

  • This also reduces opportunity for error in analysis

With a couple data entries and button pushes!This would take a while to build, and will not be a one size fits all for all of your tests.  Automating even a portion could greatly reduce the amount of time spent analyzing tests!I hope this article gave you some things to be on the lookout for when testing.  If you're still in school to become a Data Scientist, taking a general statistics class that covers which statistics to use and how to calculate confidence intervals is something that will benefit you throughout your career in Data Science.  Otherwise, there is certainly tons of information on the internet to give you an overview of how to calculate these statistics.  I personally prefer Coursera, because it's nice to sit back and watch videos on the content, knowing that the content is from well known universities.You can learn a ton through properly executed testing.  Happy learning!

Learning with A/B Testing in Data Science. Data analysis, data collection , data management, data tracking, data scientist, data science, big data, data design, data analytics, behavior data collection, behavior data, data recovery, data analyst. For more on data science, visit www.datamovesme.com

Read More
Career Career

The Successful Data Science Job Hunt

The point of this article is to show you what a successful Data Science job hunt looks like, from beginning to end. Strap-in, friends. I’m about to bring you from day 1 of being laid-off to the day that I accepted an offer. Seriously, it was an intense two months.I have an MS in Statistics and have been working in Advanced Analytics since 2010. If you’re new to the field, your experience may be different, but hopefully you’ll be able to leverage a good amount of this content.We’re going to cover how I leveraged LinkedIn, keeping track of all the applications, continuing to advance your skills while searching, what to do when you receive an offer, and how to negotiate.

Day 1 Being Laid-off

dyed my hair bright pink before job hunting

Vistaprint decided to decrease it’s employee headcount by $20 million dollars in employee salary, I was part of that cut. I was aware that the market was hot at the moment, so I was optimistic from day 1. I received severance, and this was an opportunity to give some real thought about what I would like my next move to be.I happened to get laid-off 4 days after I had just dyed my hair bright pink for the first time, that was a bummer.I actually went to one job interview with my pink hair, and they loved it. However, I did decide to bring my hair back to a natural color for the rest of my search.

Very First Thing I Did:

I am approached by recruiters pretty frequently on LinkedIn. I always reply.Although if you’re just getting into the field, you may not have past messages from recruiters in your LinkedIn mail, but I mention this so that you can start to do this throughout the rest of your career.Now that I was looking, my first action was to go through that list, message everyone and say:“Hi (recruiter person), I’m currently looking for a new opportunity. If there are any roles you’re looking to fill that would be a good fit, I’d be open to a chat.”

There were a number of people that replied back saying they had a role, but after speaking with them, it didn’t seem like the perfect fit for me at the moment.In addition to reaching out to the recruiters who had contacted me, I also did a google search (and a LinkedIn hunt) to find recruiters in the analytics space. I reached out to them as well to let them know I was looking. You never know who might know of something that isn’t on the job boards yet, but is coming on soon.

First Meeting With the Career Coach

As part of the layoff, Vistaprint set me up with a career coach. The information she taught me was incredibly valuable, I’ll be using her tips throughout my career. I met with Joan Blake from Transition Solutions. On our first meeting, I brought my resume and we talked about what I was looking for in my next role.Because my resume and LinkedIn had success in the past, she did not change much of the content on my resume, but we did bring my skills and experience up to the top, and put my education at the bottom.

They also formatted it to fit on one page. It’s starting to get longer, but I’m a believer in the one page resume.I also made sure to include a cover letter with my application. This gave me the opportunity to explicitly call out that my qualifications are a great match with their job description. It’s much more clear than having to read through my resume for buzzwords.I kept a spreadsheet with all of the companies I applied to. In this spreadsheet I’d put information like the company name, date that I completed the application, if I had heard back, the last update, if I had sent a thank you, the name of the hiring manager, etc.This helped me keep track of all the different things I had in flight, and if there was anything I could be doing on my side to keep the process moving.

Each Application:

For each job I applied to, I would then start a little hunt on LinkedIn. I’d look to see if anyone in my network currently worked for the company. If so, they’d probably like to know that I’m applying, because a lot of companies offer referral bonuses. I’d message the person and say something like:Hey Michelle,I’m applying for the Data Scientist position at ______________. Any chance you’d be willing to refer me?

If there is no one in my network that works for the company, I then try and find the hiring manager for the position. Odds are it was going to be a title like “Director (or VP) of Data Science and Analytics”, or some variation, you’re trying to find someone who is a decision maker.This requires LinkedIn Premium, because I’m about to send an InMail. My message to a hiring manager/decision maker would look something like:

Hi Sean,I’m interested in the remote Data Science position, and I’m hoping I can get my resume in the right hands. I have an MS in Statistics, plus 7 years of real-world experience building models. I’m a wiz at SQL, modeling in R, and I have some exposure to Python.I’d appreciate the opportunity to speak with the appropriate person about the open position, and share how I’ve delivered insights and added value for company’s through the use of statistical methods.Thanks, Kristen

Most people actually responded, Joan (the career coach) was surprised when I told her about my cold-calling LinkedIn success.

I Started Applying to Jobs, and Started Having “Phone Screens”

Phone screens are basically all the same. Some were a little more intense and longer than others, but they were all around a half hour, and they’re typically with someone in HR. Since it’s HR, you don’t want to go too deep in the technical stuff, you just want to be able to pass this stage, follow up with a note thanking them for their time, and try to firm up when you’ll be able to speak with the hiring manager :)Tell me about yourself:People just want to hear that you can speak to who you are and what you’re doing.

Mine was some variation of:

I am a Data Scientist with 7 years of experience using statistical methods and analysis to solve business problems across various industries. I’m skilled in SQL, model building in R, and I’m currently learning Python.

What are you looking to do?I’d make sure that what I’m looking to do ties directly to the job description. At the end of the day, it was some variation of:

“I’m looking to continuously learn new tools, technologies and techniques. I want to work on interesting problems that add business value”.

Then I’d talk about how interesting one of the projects on the job description sounded.What are you looking for in terms of salary?Avoid this question if you can, you’ll be asked, but try to steer in a different direction. You can always reply with “I’ve always been paid fairly in the past, I trust that I’ll be paid fairly working for [insert company name]. Do you have an idea of the salary range for the position”.They’ll know the range for the position, but they’ll probably tell you that they don’t. Most of the time I’d finally concede and give them my salary, this doesn’t mean that you won’t be able to negotiate when you receive an offer.

All The While, I’m Still Learning, And Can Speak to This in Interviews:

If I was going to tell everyone that I was very into learning technologies, I better be “walking the walk” so to speak. Although I am constantly learning, because it’s in my nature. Make sure that if you say you’re learning something new, you’re actually studying it.

The course I took was: Python for everybody

Disclaimer: This is an affiliate link, meaning that at no cost to you, I will earn a commission if you end up signing up for this course.

This course goes over your basic lists, arrays, tuples, defining a function.. but it also goes over how to access and parse web data. I had always wanted to know how to access Twitter data for future analysis, so this was super cool. The specialization (that’s the name they give for a series of courses on Coursera) also gives a brief overview in how to construct a database. This was a super bonus for me, because if I want to operationalize a model, I’m going to want to know how to write from Python to a database table. All-in-all, I found this course to be a great use of my time, and I finished it being able to speak to things intelligently, that I was not able to speak to prior to taking the course.

In Person Interviews:

I've written a whole article on in person interviews: here

At some point, you might receive a call saying they plan on putting an offer together for you, if you're still interested.Great! You’ve got an offer coming. At this point, you want to call all the other companies that you would consider an offer from and say “I’ve been informed that I am expecting an offer, is there anything you can do to accelerate your process?”I mentioned this to 2 companies. One of them did speed up their process and it resulted in an additional offer.  The other company said that they would not speed up their process, I thanked them for their time and said I'd hope to cross paths in the future.

Negotiating:

The phone rings, and you answer. This is it, you’re getting your first offer. It’s time to negotiate. Only a relatively small percentage of people ever negotiate their salary, the percentage is even smaller when we’re talking about women.Ladies! Negotiate! I’m here rooting for you, you got this.Joan from Transition Solutions had coached me on this. She said “Don’t try and solve the problem for them”.When they call, let them know how excited you are that they called, and that you’re interested in hearing their offer.

Once you’ve heard the salary, vacation time, and that they’re going to send over the benefits information, you can say something along the lines of:

"Thank you so much for the offer, I really appreciate it. You know, I was hoping that you could do more on the salary."

Then wait for a response, and again be positive. They’ll most likely say that they need to bring this information back to the hiring manager."

Great! I look forward to hearing back from you. I’ll take some time to look over the benefits package. Want to speak again on ____. I’m feeling confident that we can close this."

Then you’d be walking away from the conversation with a concrete time that you’ll speak to them next, and you let them know that you were happy to hear from them, all of this is positive!I successfully negotiated my offer, and started a week later. I couldn’t be happier with where I am now and the work I’m doing. It took a lot of applying and a lot of speaking with companies who weren’t “the one”, but it was worth it.To sum up my job search. I learned that a targeted cover letter and directly applying on a company website greatly increase the response rate on your applications.

I learned that you can effectively leverage LinkedIn to find the decision maker for a position and they’ll help keep the process moving if you’re a good fit. I also gained a ton of confidence in my ability to articulate my skills, and this came with practice. I wish you lots of success on your hunt, and I hope that there was a couple of tips in this article that you are able to use :)

The successful data science job hunt. Data analysis, data collection , data management, data tracking, data scientist, data science, big data, data design, data analytics, behavior data collection, behavior data, data recovery, data analyst. For more on data science, visit www.datamovesme.com.

Read More