Data Moved Me in 2018

Dear diary,

I'm writing this article so that a year from now when I've completely forgotten how cool 2018 was, I can look back on this post.  I'm literally floored by all that transpired this year, here is a small snapshot in chronological-ish order:

  • I started a new position in January 2018 as a Senior Data Scientist at Constant Contact.  I've been fortunate to work on interesting projects throughout the year that have often served as inspiration for blog posts. 

Constant Contact Logo

  • I launched my first blog article (ever) in March of 2018. This was originally on the domain kristenkehrer.com which is no longer live. This first blog article was rejected by Towards Data Science on Medium.  My 2nd blog article was accepted, and now I cross-post most of my articles on TDS.  (I've said this before, but if you're blogging and you get rejected, just keep coming back ;)

 

  • I spoke on a panel at Hult International Business School on how to get into data science. 

 

  • I launched datamovesme.com in July after banging my head against the wall trying to figure out Wordpress.  I made this move because I knew I'd like to eventually launch a course on my own hosted site and the website builder I was using for kristenkehrer.com would not allow me to do that.  In addition, my previous website was never going to rank for SEO.

Data Moves Me

  • I spoke with Mike Delgado at Experian on the DataTalk Podcast. So many laughs, fun, and data science in this episode, give it a listen :)

podcast data moves me

  • In the end of August I launched my first ever online course "Up-Level Your Data Science Resume."  It has helped so many people effectively market themselves and land jobs in data science positions.  When people email me to tell me that they have found a job it literally brightens my week.

 

  • I was invited to join the YouTube channel Data Science Office Hours with Sarah Nooravi, Eric Weber, Tarry Singh, Kate Strachnyi, Favio Vazquez, Andreas Kretz and newly added Matt Dancho.  It's given me the opportunity to create friendships with these wonderful and intelligent people who are all giving back to the community.  I want to give a special shout out to Mohamed Mokhtar for creating wonderful posters for office hours.  You can check out previous episodes on the Data Science Office Hours YouTube channel (link above).

data science office hours

  • August 22nd was Favio Vazquez and I launched Data Science Live.   We've had incredible guests, take questions from the community, and generally just talk about important topics in data science in industry. We already have some amazing guests planned for 2019 that I cannot wait to hear their perspective and learn from them. 

data science live

  • I spoke at Data Science Go in October and had the time of my life.  It was basically the king of data parties.  I'm grateful to Kirill Eremenko and his team for giving me the opportunity. My talk was around how to effectively communicate complex model output to stakeholders. I went through 4 case studies and demonstrated how I've evolved through time to position myself as a though partner with stakeholders. I also had the opportunity to speak on a panel discussing women in data and diversity. I love sharing my experience as a woman in data and also how I'm able to be an ally and advocate for those who aren't always heard at work.

speaking live kristen kehrer

  • I was also on the SuperDataScience Podcast in November. Getting to chat 1-on-1 with Kirill was fantastic. He has great energy and was a joy to speak with.

 

  • In November I was #8 LinkedIn Top Voices 2018 in Data Science and Analytics.  That still seems a little surreal.  Then in December LinkedIn sent me a gift after I wrote an article about the wonderful data science community on LinkedIn.  That's also pretty nuts.

  • I picked up a part-time job as a Teaching Assistant for an Applied Data Science online course through Emeritus.  Being at DSGO made me think of how I'm contributing to the community, and having the opportunity to help students learn data science has given me extra purpose while helping to keep my skills sharp.  It's really a win all around.

It's been a jam-packed year and at times a little hectic between the 9-5, my two young children, and all the fun data science related activities I've participated in.  Luckily I have a husband who is so supportive; all of these extracurricular activities wouldn't be possible without him.

Looking to 2019:

I've set some big goals for myself and already have a number of conferences I'll be speaking at in the calendar.  I can't wait to share some of these exciting new ventures in the New Year. I wish you a wonderful holiday and can't wait to see and engage with you in 2019.

Read More

How Blogging Helps You Build a Community in Data Science

Holy Moly. I started blogging in March and it has opened my eyes.I want to start off by saying that I didn't magically come up with this idea of blogging on my own. I noticed my friend Jonathan Nolis becoming active on LinkedIn, so I texted them to get the scoop. They told me to start a blog and jokingly said "I'm working on my #brand". I'm the type of person to try anything once, plus I already owned a domain name, had a website builder (from working at Vistaprint), and I have an email marketing account (because I work for Constant Contact). So sure, why not? If you're thinking about starting a blog. Know that you do not need to have a bunch of tools already at your disposal. If needed, you can create articles on LinkedIn or Medium. There are many options to try before investing a penny . . . but of course, you can go ahead and create your own site.

I have since moved to self-hosted Wordpress. I've fallen in love with blogging, and Wordpress lets me take advantage of lots of extra functionality.With my first post, my eyes started to open up to all the things that other members of the Data Science community were doing. And honestly, if you had asked me about who I most looked up to in Data Science prior to starting my blog, I'd probably just rattle off people who have created R packages that have made my life easier, or people who post a lot of answers to questions on Stack Overflow. But now I was paying attention on LinkedIn and Twitter, and seeing the information that big data science influencers like Kirk Borne, Carla Gentry, Bernard Marr, and many others (seriously, so many others) were adding to the community.

I also started to see first hand the amount of people that were studying to become a data scientist (yay!). Even people who are still in school or very early in their careers are participating by being active in the data science community. (You don't need to be a pro, just hop in).  If you're looking for great courses to take in data science, these ones have been highly recommended by the community here.I've paid attention to my blog stats (of course, I'm a data nerd), and have found that the articles that I write that get the biggest response are either:

  1. Articles on how to get into data science

  2. Coding demos on how to perform areas of data science

But you may find that something different works for you and your style of writing. I don't just post my articles on LinkedIn. I also post on Twitter, Medium, I send them to my email list, and I put them on Pinterest. I balked when someone first mentioned the idea of Pinterest for data science articles. It's crazy, but Pinterest is the largest referrer of traffic to my site. Google Analytics isn't lying to me.

I've chatted with so many people in LinkedIn messaging, I've had the opportunity to speak with and (virtually) meet some awesome people who are loving data and creating content around data science. I'm honestly building relationships and contributing to a community, it feels great. If you're new to the "getting active in the data science community on LinkedIn" follow Tarry Singh, Randy Lao, Kate Strachnyi, Favio Vazquez, Beau Walker, Eric Weber, and Sarah Nooravi just to name a few. You'll quickly find your tribe if you put yourself out there. I find that when I participate, I get back so much more than I've put in.Hitting "post" for the very first time on content you've created is intimidating, I'm not saying that this will be the easiest thing you ever do. But you will build relationships and even friendships of real value with people who have the same passion. If you start a blog, I look forward to reading your articles and watching your journey.

Building community in data science through blogging. Data analysis, data collection , data management, data tracking, data scientist, data science, big data, data design, data analytics, behavior data collection, behavior data, data recovery, data analyst. For more on data science, visit www.datamovesme.com

Read More
Segmentation Segmentation

A Different Use of Time Series to Identify Seasonal Customers

I had previously written about creatively leveraging your data using segmentation to learn about a customer base. The article is here. In the article I mentioned utilizing any data that might be relevant. Trying to identify customers with seasonal usage patterns was one of the variables that I mentioned that sounded interesting. And since I'm getting ready to do another cluster analysis, I decided to tackle this question.

These are my favorite types of data science problems because they require you to think a little outside the box to design a solution.  Basically, I wanted to be able to tag each customer as whether or not they exhibited a seasonal pattern, this would be a first step.  Later I may further build this out to determine the beginning of each customer's "off-season."  This will allow us to nurture these customer relationships better, and provide a more personalized experience.

I'm a data scientist at Constant Contact, which provide email marketing solutions to small businesses.  Since it is a subscription product, customers have different usage patterns that I'm able to use for this analysis.

At first, my assumption was that a good portion of these customers might be living in an area that has four seasons.  You know, the ice cream shop in New England that shuts down for the winter.  After thinking about it some more, if I'm looking for seasonal usage patterns, this is also going to include people with seasonal business patterns that aren't necessarily driven by the weather.  People who have accounts in the education field taking summers off are going to be picked up as seasonal.  Businesses in retail who have pretty consistent usage all year, but pick up their engagement at Christmas are also exhibiting a seasonal pattern.  So the people who the model would determine were seasonal were not based solely on the weather, but could also be by the type of business.  (Or maybe there are people that are fortunate enough to take random long vacations for no reason in the middle of the year, I want to make sure I find those people too, if they exist).

To do this analysis, I aggregated the email sending patterns of each customer with at least 2 years by customer, by month.  Each customer is it's own time series. However, there were a couple complexities.  One detail in particular is worth noting, customers might take a month or two (or more) off from usage.  So first I had to write some code to fill in zeros for those months.  I couldn't be specifying that I was looking for a yearly pattern, but only giving 8 months worth of data per year in the model, I needed those zeros.  I found these missing zeros using Python, and then decided I wanted to use R for the time series/determining if a seasonal pattern was present portion.  I got to use the rpy2 package in Python for the first time. Check that off the list of new packages I've wanted to try.

I fit a TBATS model for each customer in R.  This is probably overkill, because TBATS was meant to deal with very complex (and potentially multiple) seasonal patterns.  However, it was really simple to ask the model if it had a yearly seasonal component.  Bonus, TBATS is more robust to stationarity than other methods. 

Here is a picture of a customer who the model determined to be seasonal, and on the right is a customer who is obviously not seasonal, and the model agrees.

seasonal vs non-seasonal graphAfter I had the output of my model, I went back and did a full analysis of what these customers looked like. They over-indexed in the Northeast, and were less likely to be in the West and South. Seasonal users were also more likely to self-report being in an industry like:

  • Retail
  • Sports and Recreation
  • Non Profits

Non seasonal users were also more likely to self-report being in an industry like:

  • Auto Services
  • Financial Advisor
  • Medical Services
  • Insurance

Customers with only 2-3 years tenure were less likely to be seasonal than more tenured customers.  This could potentially be due to a couple different factors.  Maybe there just wasn't enough data to detect them yet, maybe they have some period of getting acquainted with the tool (involving a different usage pattern) before they really hit their stride, or maybe they're just really not seasonal. There were more insights, but this is company data ;)Here is a map of seasonal customers over-indexing in the Northeast.  Stakeholders typically enjoy seeing a nice map.  Note:  The split was not 50/50 seasonal vs. non-seasonal.seasonal percentage mapAt the moment, we're thinking through what data we might be able to leverage in the upcoming segmentation (where this seasonal variable will be one candidate variable.  This might include information from the BigData environment or anything that lives in the relational database. We're also weighing difficulty to get a specific variable compared to the added value we might get from gathering that data.  I feel super fortunate to be able to work on projects that help us learn about our customers, so that when we message to them, we can be more relevant. Nothing is worse than receiving a communication from a company that totally misses the mark on what you're about. I find this type of work exciting, and it allows me to be creative, which is important to me. I hope you found this article enjoyable, and maybe there is a couple people out there that will actually find this applicable to their own work.  I wish you lots of fun projects that leave you feeling inspired :)Again, the code I used to do this project can be found in my article here.

A different use of time series to identify seasonal customers. Data science courses. Data Science resources. Data analysis, data collection , data management, data tracking, data scientist, data science, big data, data design, data analytics, behavior data collection, behavior data, data recovery, data analyst. For more on data science, visit www.datamovesme.com
Read More