Hands-on SQL Practice For A Data Science Interview
I bet you're searching the internet for a fantastic reference to help you get ready for your SQL interview. You've landed in the right place.
Let's jump right in and get started. I have a SQL browser available for you to use here: Show me the data
The data is fictitious, but extensive and useful for writing queries.
You can use this browser to answer all of the questions below. When you're done, if you weren't able to answer a couple of the questions, you can give me your email address in the email form near the bottom of this article and I'll send you the query solutions.
This article is designed to help you with "white boarding" SQL questions. We're not going to cover any theory here. Just a bunch of questions and how to answer them leveraging SQL.
I wish you a ton of luck on your interview, I hope it results in an offer! If you're looking to further your SQL skills for data science, I have also created the ultimate course in SQL for data science. We cover tons of material that you won't see here, because feature engineering, handling NULLs, working with datetimes, etc., is not typically part of the interview. But if you want to really hit the ground running at your new job, I'd highly suggest this course. It's free and you can find it here
SQL Questions using just the select statement:
Write a query to determine the number of rows in the customer table. Answer: 351,962
What was the maximum commission paid in the customer table? The median? Answer: Max -$10,295, Average -$66.30
Write a query to that returns the customer_id, business_type and Country from the customer table.
SQL Questions using a where statement:
How many customers do we have "has_instagram" information for in the customer table? i.e. - How many rows are not NULL? Answer: 128,449
How many customers have a "First_conversion_date" greater than 1/1/2016 in the customer table? Answer: 54,397
SQL Question using a group by statement:
How many customers have "has_facebook" =1 in the customer table? Use a group by statement. Answer: 60,894
SQL Question using a group by and order statement:
Which state has the most customers? How many customers live in that state in the customer table? Answer: California, 43,736 customers
SQL Question using a having statement and subquery:
Using the billedservices table, how many customers had more than 1 billed service? Use a subquery to answer this question. Answer: 44
SQL Question requiring a join:
How many customers from OUTSIDE the United States have an entry in the billed services table? Answer: 89
In an interview, they'll typically place two or three pieces of paper up on the whiteboard. This will have your data. Obviously, this means that the data you'll be working with is much smaller.
Take home tests are typically much more difficult than what was covered here. During an in-person interview, they'll typically only have 30 minutes to an hour to assess your SQL knowledge. Most often, they just want to know that if you have SQL listed on your resume, that you can write some simple queries like up above.
I've never personally been asked to whiteboard the solution for creating a table, updating a column, etc., but obviously any SQL questions are fair game.
If you had no difficulty answering these questions, you're likely to do fine on your SQL interview.
Want to further your SQL skills for data science? Check out the Ultimate SQL for Data Science course.
Looking for the solutions to the questions above? I'll send them directly to your inbox :)
How to Ace the In-Person Data Science Interview
I’ve written previously about my recent data science job hunt, but this article is solely devoted to the in-person interview. That full-day, try to razzle-dazzle em’, cross your fingers and hope you’re well prepared for what gets thrown at you. After attending a ton of these interviews, I’ve found that they tend to follow some pretty standard schedules.
But first, if your sending out job applications and aren't hearing back, you'll want to take a second look at your resume. I've written a couple articles on how to create a strong resume. One helpful article is
You may meet with 3–7 different people, and throughout the span of meeting with these different people, you’ll probably cover:
Tell me about yourself
Behavioral interview questions
“White boarding” SQL
“White boarding” code (technical interview)
Talking about items on your resume
Simple analysis interview questions
Asking questions of your own
Tell me about yourselfI’ve mentioned this before when talking about phone screens. The way I approach this never changes. People just want to hear that you can speak to who you are and what you’re doing. Mine was some variation of:I am a Data Scientist with 8 years of experience using statistical methods and analysis to solve business problems across various industries. I’m skilled in SQL, model building in R, and I’m currently learning Python.
Behavioral Questions
Almost every company I spoke with asked interview questions that should be answered in the STAR format. The most prevalent STAR questions I’ve seen in Data Science interviews are:
Tell me about a time you explained technical results to a non-technical person
Tell me about a time you improved a process
Tell me about a time with a difficult stakeholder, and how was it resolved
The goal here is to concisely and clearly explain the Situation, Task, Action and Result. My response to the “technical results” questions would go something like this:Vistaprint is a company that sells marketing materials for small businesses online (always give context, the interviewer may not be familiar with the company). I had the opportunity to do a customer behavioral segmentation using k-means. This involved creating 54 variables, standardizing the data, plenty of analysis, etc. When it was time to share my results with stakeholders, I had really taken this information up a level and built out the story. Instead of talking about the methodology, I spoke to who the customer segments were and how their behaviors were different. I also stressed that this segmentation was actionable! We could identify these customers in our database, develop campaigns to target them, and I gave examples of specific campaigns we might try. This is an example of when I explained technical results to non-technical stakeholders. (always restate the question afterwards).For me, these questions required some preparation time. I gave some real thought to my best examples from my experience, and practiced saying the answer. This time paid-off. I was asked these same questions over and over throughout my interviewing.
White Boarding:
White Boarding SQL
This is when the interviewer has you stand at the whiteboard an answer some SQL questions. In most scenarios, they’ll tape a couple pieces of paper up on the whiteboard. I have a free video course on refreshing SQL for the data science interview
White Boarding Code
As mentioned in my previous article. I was asked FizzBuzz two days in a row by two different companies. A possible way to write the solution (just took a screenshot of my computer) is below:
The coding problem will most likely involve some loops, logic statements and may have you define a function. The hiring manager just wants to be sure that when you say you can code, you at least have some basic programming knowledge.
Items on Your Resume
I’ve been asked about all the methods I mention on my resume at one point or another (regression, classification, time-series analysis, MVT testing, etc). I don’t mention my thesis from my Master’s Degree on my resume, but casually referenced it when asked if I had previously had experience with Bayesian methods.
The interviewer followed up with a question on the prior distributions used in my thesis.
I had finished my thesis 9 years ago, couldn’t remember the priors and told him I’d need to follow up.
I did follow up and send him the answer to his question, they did offer me a job, but it’s not a scenario you want to find yourself in. If you are going to reference something, be able to speak to it. Even if it means refreshing your memory by looking at wikipedia ahead of the interview. Things on your resume and projects you mention should be a home run.
Simple Analysis Questions
Some basic questions will be asked to make sure that you have an understanding of how numbers work. The question may require you to draw a graph or use some algebra to get at an answer, and it’ll show that you have some business context and can explain what is going on. Questions around changes in conversion, average sale price, why is revenue down in this scenario? What model would you choose in this scenario? Typically I’m asked two or three questions of this type.
I was asked a probability question at one interview. They asked what the expected value was of rolling a fair die. I was then asked if the die was weighted in a certain way, what would the expected value of that die be. I wasn’t allowed to use a calculator.
Questions I asked:
Tell me about the behaviors of a person that you would consider a high-performing/high-potential employee.
Honestly, I used the question above to try and get at whether you needed to work 60 hours a week and work on the weekends to be someone who stood out. I pretty frequently work on the weekends because I enjoy what I do, I wouldn’t enjoy it if it was expected.
What software are you using?
Really, I like to get this question out of the way during the phone screen. I’m not personally interested in working for a SAS shop, so I’d want to know that upfront. My favorite response to this question is “you can use whatever open source tools you’d like as long as it’s appropriate for the problem.”
Is there anything else I can tell you about my skills and qualifications to let you know that I am a good fit for this job?
This is your opportunity to let them tell you if there is anything that you haven’t covered yet, or that they might be concerned about. You don’t want to leave an interview with them feeling like they didn’t get EVERYTHING they needed to make a decision on whether or not to hire you.
When can I expect to hear from you?
I also ask about the reporting structure, and I certainly ask about what type of projects I’d be working on soon after starting (if that is not already clear).
Summary
I wish you so much success in your data science interviews. Hopefully you meet a lot of great people, and have a positive experience. After each interview, remember to send your thank you notes! If you do not receive an offer, or do not accept an offer from a given company, still go on LinkedIn and send them connection requests. You never know when timing might be better in the future and your paths might cross.
To read about my job hunt from the first application until I accepted an offer,
.