Beginning the Data Science Pipeline - Meetings

I spoke in a Webinar recently about how to get into Data Science. One of the questions asked was "What does a typical day look like?"  I think there is a big opportunity to explain what really happens before any machine learning takes place for a large project. I've previously written about thinking creatively for feature engineering,  but there is even more to getting ready for a data science project, you need to get buy in on the project from other areas of the business to ensure you're delivery insights that the business wants and needs.It may be that the business has a high priority problem for you to solve, but often you'll identify projects with a high ROI and want to show others the value you could provide if you were given the opportunity to work on the project you've come up with.The road to getting to the machine learning algorithm looks something like:

  • Plenty of meetings

  • Data gathering (often from multiple sources)

  • Exploratory data analysis

  • Feature engineering

  • Researching the best methodology (if it's not standard)

  • Machine learning

We're literally going to cover the 1st bullet here in this article. There are a ton of meetings that take place before I ever write a line of SQL for a big project.  If you read enough comments/blogs about Data Science, you'll see people say it's 90% data aggregation and 10% modeling (or some other similar split), but that's also not quite the whole picture. I'd love for you to fully understand what you're signing up for when you become a data scientist. 

Meetings: As I mentioned, the first step is really getting buy in on your project.  It's important that as an Analytics department, we're working to solve the needs of the business.  We want to help the rest of the business understand the value that a project could deliver, through pitching the idea in meetings with these stakeholders.  Just to be clear, I'm also not a one woman show. My boss takes the opportunity to talk about what we could potentially learn and action on with this project whenever he gets the chance (in additional meetings). After meetings at all different levels with all sorts of stakeholders, we might now have agreement that this project should move forward.

More Meetings: At this point I'm not just diving right into SQL.  There may be members of my team who have ideas for data that I'm not aware of that might be relevant.  Other areas of the business can also help give inputs into what variables might be relevant (they don't know they database, but they have the business context, and this project is supposed to SUPPORT their work).There is potentially a ton of data living somewhere that has yet to be analyzed, the databases of a typical organization are quite large, unless you've been at a company for years, there is most likely useful data that you are not aware of.

The first step was meeting with my team to discuss every piece of data that we could think of that might be relevant.  Thinking of things like:

  • If something might be a proxy for customers who are more "tech savvy".  Maybe this is having a business email address as opposed to a gmail address (or any non-business email address), or maybe customers who utilize more advanced features of our product are the ones we'd consider tech savvy.  It all depends on context and could be answered in multiple ways.  It's an art.

  • Census data could tell us if a customers zip code is in a rural or urban area? Urban or rural customers might have different needs and behave differently, maybe the extra work to aggregate by rural/urban isn't necessary for this particular project.  Bouncing ideas off other and including your teammates and stakeholders will directly impact your effectiveness.

  • What is available in the BigData environment? In the Data Warehouse? Other data sources within the company.  When you really look to list everything, you find that this can be a large undertaking and you'll want the feedback from others.

After we have a list of potential data to find, then the meetings start to help track all that data down.  You certainly don't want to reinvent the wheel here.  No one gets brownie points for writing all of the SQL themselves when it would have taken you half the time if you leveraged previously written queries from teammates. If I know of a project where someone had already created a few cool features, I email them and ask for their code, we're a team.  For a previous project I worked on, there were 6 different people outside of my team that I needed to connect with who knew these tables or data sources better than members of my team.  So it's time to ask those other people about those tables, and that means scheduling more meetings.

Summary: I honestly enjoy this process, it's an opportunity to learn about the data we have, work with others, and think of cool opportunities for feature engineering.  The mental picture is often painted of data scientists sitting in a corner by themselves, for months, and then coming back with a model.  But by getting buy in, collaborating with other teams, and your team members, you can keep stakeholders informed through the process and feel confident that you'll deliver what they're hoping.  You can be a thought partner that is proactively delivering solutions.

Tips for starting a data science project. Data analysis, data collection , data management, data tracking, data scientist, data science, big data, data design, data analytics, behavior data collection, behavior data, data recovery, data analyst. For more on data science, visit www.datamovesme.com.

Read More
Segmentation Segmentation

A Different Use of Time Series to Identify Seasonal Customers

I had previously written about creatively leveraging your data using segmentation to learn about a customer base. The article is here. In the article I mentioned utilizing any data that might be relevant. Trying to identify customers with seasonal usage patterns was one of the variables that I mentioned that sounded interesting. And since I'm getting ready to do another cluster analysis, I decided to tackle this question.

These are my favorite types of data science problems because they require you to think a little outside the box to design a solution.  Basically, I wanted to be able to tag each customer as whether or not they exhibited a seasonal pattern, this would be a first step.  Later I may further build this out to determine the beginning of each customer's "off-season."  This will allow us to nurture these customer relationships better, and provide a more personalized experience.

I'm a data scientist at Constant Contact, which provide email marketing solutions to small businesses.  Since it is a subscription product, customers have different usage patterns that I'm able to use for this analysis.

At first, my assumption was that a good portion of these customers might be living in an area that has four seasons.  You know, the ice cream shop in New England that shuts down for the winter.  After thinking about it some more, if I'm looking for seasonal usage patterns, this is also going to include people with seasonal business patterns that aren't necessarily driven by the weather.  People who have accounts in the education field taking summers off are going to be picked up as seasonal.  Businesses in retail who have pretty consistent usage all year, but pick up their engagement at Christmas are also exhibiting a seasonal pattern.  So the people who the model would determine were seasonal were not based solely on the weather, but could also be by the type of business.  (Or maybe there are people that are fortunate enough to take random long vacations for no reason in the middle of the year, I want to make sure I find those people too, if they exist).

To do this analysis, I aggregated the email sending patterns of each customer with at least 2 years by customer, by month.  Each customer is it's own time series. However, there were a couple complexities.  One detail in particular is worth noting, customers might take a month or two (or more) off from usage.  So first I had to write some code to fill in zeros for those months.  I couldn't be specifying that I was looking for a yearly pattern, but only giving 8 months worth of data per year in the model, I needed those zeros.  I found these missing zeros using Python, and then decided I wanted to use R for the time series/determining if a seasonal pattern was present portion.  I got to use the rpy2 package in Python for the first time. Check that off the list of new packages I've wanted to try.

I fit a TBATS model for each customer in R.  This is probably overkill, because TBATS was meant to deal with very complex (and potentially multiple) seasonal patterns.  However, it was really simple to ask the model if it had a yearly seasonal component.  Bonus, TBATS is more robust to stationarity than other methods. 

Here is a picture of a customer who the model determined to be seasonal, and on the right is a customer who is obviously not seasonal, and the model agrees.

seasonal vs non-seasonal graphAfter I had the output of my model, I went back and did a full analysis of what these customers looked like. They over-indexed in the Northeast, and were less likely to be in the West and South. Seasonal users were also more likely to self-report being in an industry like:

  • Retail
  • Sports and Recreation
  • Non Profits

Non seasonal users were also more likely to self-report being in an industry like:

  • Auto Services
  • Financial Advisor
  • Medical Services
  • Insurance

Customers with only 2-3 years tenure were less likely to be seasonal than more tenured customers.  This could potentially be due to a couple different factors.  Maybe there just wasn't enough data to detect them yet, maybe they have some period of getting acquainted with the tool (involving a different usage pattern) before they really hit their stride, or maybe they're just really not seasonal. There were more insights, but this is company data ;)Here is a map of seasonal customers over-indexing in the Northeast.  Stakeholders typically enjoy seeing a nice map.  Note:  The split was not 50/50 seasonal vs. non-seasonal.seasonal percentage mapAt the moment, we're thinking through what data we might be able to leverage in the upcoming segmentation (where this seasonal variable will be one candidate variable.  This might include information from the BigData environment or anything that lives in the relational database. We're also weighing difficulty to get a specific variable compared to the added value we might get from gathering that data.  I feel super fortunate to be able to work on projects that help us learn about our customers, so that when we message to them, we can be more relevant. Nothing is worse than receiving a communication from a company that totally misses the mark on what you're about. I find this type of work exciting, and it allows me to be creative, which is important to me. I hope you found this article enjoyable, and maybe there is a couple people out there that will actually find this applicable to their own work.  I wish you lots of fun projects that leave you feeling inspired :)Again, the code I used to do this project can be found in my article here.

A different use of time series to identify seasonal customers. Data science courses. Data Science resources. Data analysis, data collection , data management, data tracking, data scientist, data science, big data, data design, data analytics, behavior data collection, behavior data, data recovery, data analyst. For more on data science, visit www.datamovesme.com
Read More
Segmentation Segmentation

Target Customers By Learning with Customer Segmentation

It’s easy to get people to buy into the idea of “Don’t test to win, test to learn.” However, when it comes to segmentation, it’s sometimes more difficult to get people over to the camp of “Don’t segment to align with your previously held ideas, segment to learn.” This may be because this saying is certainly not as short, sweet, and catchy; but I’m a strong believer in segment to learn.

I’ve been approached before with: “We did a segmentation of the market; can you tie this back to our customer base?”. This means that they did a segmentation based on survey data of the population, these people could be using your product, but they’re not necessarily using your product. This survey probably asked about the functionality they need, what they’re trying to do, and who they are.

Although sometimes we might do things with data that mystify and dazzle, we are not wizards. I cannot take your segmentation of the market and find those same segments in our customer base. Our customers are not necessarily representative of the market, and I don’t have your market survey data for all customers. But what I have is even more precious, actual behaviors that your customer has taken.

These actions include things like:

· I can see how often someone has visited our website

· Are they buying certain products and not others?

· What are they doing on our site? Reading informative articles? Visiting a lot but not converting?

· How often they’re purchasing.. Are these less expensive or more expensive products?

· If they’re calling customer service.. Did they just need help finding a product? Or were they unhappy and looking for a refund?

· How long they’ve been with us

· How were they acquired (channel)?

· Are they being upsold or cross-sold products?

This type of information is hopefully in your database, somewhere. If not, you may be able to find a way to get at it.

We could also append data from a data vendor if we have the budget. There are companies like Epsilon, Full Contact or Axiom (to name a few). If you have the budget to do this, and send them customers name, address, and some other information, they can add columns for things like:

· Income

· Race

· Education

· Employment

· Spend behavior

· Lifestyle and interests

· And more!

This would give you lots of great data to play with. The other option might be appending Census data at the zip code level. All this data could be analyzed and potentially be meaningful in creating a segmentation.

There is another instance of this problem manifesting itself more directly. When we do find ourselves in the position to do a segmentation of our customer base, I sometimes hear: “We’d like to do a segmentation, I’m thinking segments like….” Here people are explicitly using a segmentation to reinforce their previously held beliefs.

Think of how you’d be short changing yourself. Think of all the variables that you could come up with! Be innovative! Create variables that the business has never looked at before.

Try to identify a way to determine:

· Who are your seasonal customers?

· Who in your base responds to your marketing campaigns? And how?

· Time between different forms of engagement with purchasing your product or visiting the website.

You can create a segmentation around acquisition or retention, where you know what you’re trying to optimize. And these types of segmentation certainly fall into “segment to learn”, my favorite is when we use an unsupervised algorithm.

There was a great article on by john Sukup on DataScience.com, that explains some drawback of k-means and offers some different solutions. Check it out! I also used his code to make the clusters visual at the top of my article.

And once you have that output that makes sense, start learning! Learning is a manual process that typically involves doing a lot of crosstabs (at least in my experience it has been), but you can take that back to the business with a big smile and recommendations on how to target the customers in the segments. Show them what you’ve learned and let them know it’s ACTIONABLE. These are the segments that you can easily add to your database and use to build campaigns.

Summary:

Segmentation is an enjoyable experience, where you learn a ton about who your customers are. This will allow you to help determine what type of content might be most appropriate, and nurture these customers appropriately. If you're interested, I have another article on how to get buy-in from the business to get support for you big projects.  That article is here.  I’m amped up just thinking about data and cluster analysis. Hope you are too.

Target customers by learning with customer segmentation. Data analysis, data collection , data management, data tracking, data scientist, data science, big data, data design, data analytics, behavior data collection, behavior data, data recovery, data analyst. For more on data science, visit www.datamovesme.com.

[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section]

Read More