Data Moved Me in 2018

Dear diary,

I'm writing this article so that a year from now when I've completely forgotten how cool 2018 was, I can look back on this post.  I'm literally floored by all that transpired this year, here is a small snapshot in chronological-ish order:

  • I started a new position in January 2018 as a Senior Data Scientist at Constant Contact.  I've been fortunate to work on interesting projects throughout the year that have often served as inspiration for blog posts. 

Constant Contact Logo

  • I launched my first blog article (ever) in March of 2018. This was originally on the domain kristenkehrer.com which is no longer live. This first blog article was rejected by Towards Data Science on Medium.  My 2nd blog article was accepted, and now I cross-post most of my articles on TDS.  (I've said this before, but if you're blogging and you get rejected, just keep coming back ;)

 

  • I spoke on a panel at Hult International Business School on how to get into data science. 

 

  • I launched datamovesme.com in July after banging my head against the wall trying to figure out Wordpress.  I made this move because I knew I'd like to eventually launch a course on my own hosted site and the website builder I was using for kristenkehrer.com would not allow me to do that.  In addition, my previous website was never going to rank for SEO.

Data Moves Me

  • I spoke with Mike Delgado at Experian on the DataTalk Podcast. So many laughs, fun, and data science in this episode, give it a listen :)

podcast data moves me

  • In the end of August I launched my first ever online course "Up-Level Your Data Science Resume."  It has helped so many people effectively market themselves and land jobs in data science positions.  When people email me to tell me that they have found a job it literally brightens my week.

 

  • I was invited to join the YouTube channel Data Science Office Hours with Sarah Nooravi, Eric Weber, Tarry Singh, Kate Strachnyi, Favio Vazquez, Andreas Kretz and newly added Matt Dancho.  It's given me the opportunity to create friendships with these wonderful and intelligent people who are all giving back to the community.  I want to give a special shout out to Mohamed Mokhtar for creating wonderful posters for office hours.  You can check out previous episodes on the Data Science Office Hours YouTube channel (link above).

data science office hours

  • August 22nd was Favio Vazquez and I launched Data Science Live.   We've had incredible guests, take questions from the community, and generally just talk about important topics in data science in industry. We already have some amazing guests planned for 2019 that I cannot wait to hear their perspective and learn from them. 

data science live

  • I spoke at Data Science Go in October and had the time of my life.  It was basically the king of data parties.  I'm grateful to Kirill Eremenko and his team for giving me the opportunity. My talk was around how to effectively communicate complex model output to stakeholders. I went through 4 case studies and demonstrated how I've evolved through time to position myself as a though partner with stakeholders. I also had the opportunity to speak on a panel discussing women in data and diversity. I love sharing my experience as a woman in data and also how I'm able to be an ally and advocate for those who aren't always heard at work.

speaking live kristen kehrer

  • I was also on the SuperDataScience Podcast in November. Getting to chat 1-on-1 with Kirill was fantastic. He has great energy and was a joy to speak with.

 

  • In November I was #8 LinkedIn Top Voices 2018 in Data Science and Analytics.  That still seems a little surreal.  Then in December LinkedIn sent me a gift after I wrote an article about the wonderful data science community on LinkedIn.  That's also pretty nuts.

  • I picked up a part-time job as a Teaching Assistant for an Applied Data Science online course through Emeritus.  Being at DSGO made me think of how I'm contributing to the community, and having the opportunity to help students learn data science has given me extra purpose while helping to keep my skills sharp.  It's really a win all around.

It's been a jam-packed year and at times a little hectic between the 9-5, my two young children, and all the fun data science related activities I've participated in.  Luckily I have a husband who is so supportive; all of these extracurricular activities wouldn't be possible without him.

Looking to 2019:

I've set some big goals for myself and already have a number of conferences I'll be speaking at in the calendar.  I can't wait to share some of these exciting new ventures in the New Year. I wish you a wonderful holiday and can't wait to see and engage with you in 2019.

Read More
Uncategorized Uncategorized

Life Changing Moments of DataScienceGO 2018

DataScienceGO is truly a unique conference.  Justin Fortier summed up part of the ambiance when replying to Sarah Nooravi's LinkedIn post.And although I enjoy a good dance party (more than most), there were a number of reasons why this conference (in particular) was so memorable.

  1. Community
  2. Yoga + Dancing + Music + Fantastic Energy
  3. Thought provoking keynotes (saving the most life changing for last)

Community:In Kirill's keynotes he mentioned that "community is king".  I've always truly subscribed to this thought, but DataScienceGO brought this to life.  I met amazing people, some people that I had been building relationships for months online but hadn't yet had the opportunity to meet in person, some people I connected with that I had never heard of.  EVERYONE was friendly.  I mean it, I didn't encounter a single person that was not friendly.  I don't want to speak for others, but I got the sense that people had an easier time meeting new people than what I have seen at previous conferences.  It really was a community feeling.  Lots of pictures, tons of laughs, and plenty of nerdy conversation to be had.If you're new to data science but have been self conscious about being active in the community, I urge you to put yourself out there.  You'll be pleasantly surprised.Yoga + Dancing + Music + Fantastic EnergyBoth Saturday and Sunday morning I attended yoga at 7am.  To be fully transparent, I have a 4 year old and a 1 year old at home. I thought I was going to use this weekend as an opportunity to sleep a bit.  I went home more tired than I had arrived.  Positive, energized, and full of gratitude, but exhausted.Have you ever participated in morning yoga with 20-30 data scientists?  If you haven't, I highly recommend it.It was an incredible way to start to the day, Jacqueline Jai brought the perfect mix of yoga and humor for a group of data scientists.  After yoga each morning you'd go to the opening keynote of the day.  This would start off with dance music, lights, sometimes the fog machine, and a bunch of dancing data scientists.  My kind of party.The energized start mixed with the message of community really set the pace for a memorable experience.Thought provoking keynotes Ben Taylor spoke about "Leaving an AI Legacy", Pablos Holman spoke about actual inventions that are saving human lives, and Tarry Singh showed the overwhelming (and exciting) breadth of models and applications in deep learning.  Since the conference I have taken a step back and have been thinking about where my career will go from here.  In addition, Kirill encouraged us to think of a goal and to start taking small actions towards that goal starting today.I haven't nailed down yet how I will have a greater impact, but I have some ideas (and I've started taking action).  It may be in the form of becoming an adjunct professor to educate the next wave of future mathematicians and data scientists. Or I hope to have the opportunity to participate in research that will aid in helping to solve some of the world's problems and make someone's life better.I started thinking about my impact (or using modeling for the forces of good) a couple weeks ago when I was talking with Cathy O'Neil for the book I'm writing with Kate Strachnyi "Mothers of Data Science".  Cathy is pretty great at making you think about what you're doing with your life, and this could be it's own blog article.  But attending DSGO was the icing on the cake in terms of forcing me to consider the impact I'm making.Basically, the take away that I'm trying to express is that this conference pushed me to think about what I'm currently doing, and to think about what I can do in the future to help others.  Community is king in more ways than one.ClosingI honestly left the conference with a couple tears.  Happy tears, probably provoked a bit by being so overtired.  There were so many amazing speakers in addition to the keynotes.  I particularly enjoyed being on the Women's panel with Gabriela de Queiroz, Sarah Nooravi, Page Piccinini, and Paige Bailey talking about our real life experiences as data scientists in a male dominated field and about the need for diversity in business in general.  I love being able to connect with other women who share a similar bond and passion.I was incredibly humbled to have the opportunity to speak at this conference and also cheer for the talks of some of my friends: Rico Meinl, Randy Lao, Tarry Singh, Matt Dancho and other fantastic speakers.  I spoke about how to effectively present your model output to stakeholders, similar to the information that I covered in this blog article:  Effective Data Science Presentations  This article is obviously an over simplification of all of the awesomeness that happened during the weekend.  But if you missed the conference, I hope this motivates you to attend next year so that we can meet.  And I urge you to watch the recordings and reflect on the AI legacy you want to leave behind.I haven't seen the link to the recordings from DataScienceGo yet, but when I find them I'll be sure to link here.   

Read More
Uncategorized Uncategorized

Setting Your Hypothesis Test Up For Success

Setting up your hypothesis test for success as a data scientist is critical. I want to go deep with you on exactly how I work with stakeholders ahead of launching a test.  This step is crucial to make sure that once a test is done running, we'll actually be able to analyze it.  This includes:

  • A well defined hypothesis

  • A solid test design

  • Knowing your sample size

  • Understanding potential conflicts

  • Population criteria (who are we testing)

  • Test duration (it's like the cousin of sample size)

  • Success metrics

  • Decisions that will be made based on results

This is obviously a lot of information.  Before we jump in, here is how I keep it all organized:I recently created a google doc at work so that stakeholders and analytics could align on all the information to fully scope a test upfront.  This also gives you (the analyst/data scientist) a bit of an insurance policy.  It's possible the business decides to go with a design or a sample size that wasn't your recommendation.  If things end up working out less than stellar (not enough data, design that is basically impossible to analyze), you have your original suggestions documented.In my previous article I wrote:

"Sit down with marketing and other stakeholders before the launch of the A/B test to understand the business implications, what they’re hoping to learn, who they’re testing, and how they’re testing.  In my experience, everyone is set up for success when you’re viewed as a thought partner in helping to construct the test design, and have agreed upon the scope of the analysis ahead of launch."

Well, this is literally what I'm talking about:This document was born of things that we often see in industry:HypothesisI've seen scenarios that look like "we're going to make this change, and then we'd like you to read out on the results".  So, your hypothesis is what?  You're going to make this change, and what do you expect to happen? Why are we doing this?  A hypothesis clearly states the change that is being made, the impact you expect it to have, and why you think it will have that impact.  It's not an open-ended statement.  You are testing a measurable response to a change.  It's ok to be a stickler, this is your foundation.Test DesignThe test design needs to be solid, so you'll want to have an understanding of exactly what change is being made between test and control.  If you're approached by a stakeholder with a design that won't allow you to accurately measure criteria, you'll want to coach them on how they could design the test more effectively to read out on the results.  I cover test design a bit in my article here.Sample SizeYou need to understand the sample size ahead of launch, and your expected effect size.  If you run with a small sample and need an unreasonable effect size for it to be significant, it's most likely not worth running.  Time to rethink your sample and your design.  Sarah Nooravi recently wrote a great article on determining sample size for a test.  You can find Sarah's article here.

  • An example might be that you want to test the effect of offering a service credit to select customers.  You have a certain budget worth of credits you're allowed to give out.  So you're hoping you can have 1,500 in test and 1,500 in control (this is small).  The test experience sees the service along with a coupon, and the control experience sees content advertising the service but does not see any mention of the credit.  If the average purchase rate is 13.3% you would need a 2.6 point increase (15.9%) in the control to see significance at 0.95 confidence.  This is a large effect size that we probably won't achieve (unless the credit is AMAZING).  It's good to know these things upfront so that you can make changes (for instance, reduce the amount of the credit to allow for additional sample size, ask for extra budget, etc).

Potential Conflicts:It's possible that 2 different groups in your organization could be running tests at the same time that conflict with each other, resulting in data that is junk for potentially both cases. (I actually used to run a "testing governance" meeting at my previous job to proactively identify these cases, this might be something you want to consider).

  • An example of a conflict might be that the acquisition team is running an ad in Google advertising 500 business cards for $10.  But if at the same time this test was running another team was running a pricing test on the business card product page that doesn't respect the ad that is driving traffic, the acquisition team's test is not getting the experience they thought they were!  Customers will see a different price than what is advertised, and this has negative implications all around.

  • It is so important in a large analytics organization to be collaborating across teams and have an understanding of the tests in flight and how they could impact your test.

Population criteria: Obviously you want to target the correct people. But often I've seen criteria so specific that the results of the test need to be caveated with "These results are not representative of our customer base, this effect is for people who [[lists criteria here]]."  If your test targeted super performers, you know that it doesn't apply to everyone in the base, but you want to make sure it is spelled out or doesn't get miscommunicated to a more broad audience.

Test duration: This is often directly related to sample size. (see Sarah's article) You'll want to estimate how long you'll need to run the test to achieve the required sample size.  Maybe you're randomly sampling from the base and already have sufficient population to choose from.  But often we're testing an experience for new customers, or we're testing a change on the website and we need to wait for traffic to visit the site and view the change.  If it's going to take 6 months of running to get the required sample size, you probably want to rethink your population criteria or what you're testing.  And better to know that upfront.

Success Metrics: This is an important one to talk through.  If you've been running tests previously, I'm sure you've had stakeholders ask you for the kitchen sink in terms of analysis.If your hypothesis is that a message about a new feature on the website will drive people to go see that feature; it is reasonable to check how many people visited that page and whether or not people downloaded/used that feature.  This would probably be too benign to cause cancellations, or effect upsell/cross-sell metrics, so make sure you're clear about what the analysis will and will not include.  And try not to make a mountain out of a molehill unless you're testing something that is a dramatic change and has large implications for the business.

Decisions! Getting agreement ahead of time on what decisions will be made based on the results of the test is imperative.Have you ever been in a situation where the business tests something, it's not significant, and then they roll it out anyways?  Well then that really didn't need to be a test, they could have just rolled it out.  There are endless opportunities for tests that will guide the direction of the business, don't get caught up in a test that isn't actually a test.

Conclusion: Of course, each of these areas could have been explained in much more depth.  But the main point is that there are a number of items that you want to have a discussion about before a test launches.  Especially if you're on the hook for doing the analysis, you want to have the complete picture and context so that you can analyze the test appropriately.I hope this helps you to be more collaborative with your business partners and potentially be more "proactive" rather than "reactive". 

No one has any fun when you run a test and then later find out it should have been scoped differently.  Adding a little extra work and clarification upfront can save you some heartache later on.  Consider creating a document like the one I have pictured above for scoping your future tests, and you'll have a full understanding of the goals and implications of your next test ahead of launch. :)

Read More

How Blogging Helps You Build a Community in Data Science

Holy Moly. I started blogging in March and it has opened my eyes.I want to start off by saying that I didn't magically come up with this idea of blogging on my own. I noticed my friend Jonathan Nolis becoming active on LinkedIn, so I texted them to get the scoop. They told me to start a blog and jokingly said "I'm working on my #brand". I'm the type of person to try anything once, plus I already owned a domain name, had a website builder (from working at Vistaprint), and I have an email marketing account (because I work for Constant Contact). So sure, why not? If you're thinking about starting a blog. Know that you do not need to have a bunch of tools already at your disposal. If needed, you can create articles on LinkedIn or Medium. There are many options to try before investing a penny . . . but of course, you can go ahead and create your own site.

I have since moved to self-hosted Wordpress. I've fallen in love with blogging, and Wordpress lets me take advantage of lots of extra functionality.With my first post, my eyes started to open up to all the things that other members of the Data Science community were doing. And honestly, if you had asked me about who I most looked up to in Data Science prior to starting my blog, I'd probably just rattle off people who have created R packages that have made my life easier, or people who post a lot of answers to questions on Stack Overflow. But now I was paying attention on LinkedIn and Twitter, and seeing the information that big data science influencers like Kirk Borne, Carla Gentry, Bernard Marr, and many others (seriously, so many others) were adding to the community.

I also started to see first hand the amount of people that were studying to become a data scientist (yay!). Even people who are still in school or very early in their careers are participating by being active in the data science community. (You don't need to be a pro, just hop in).  If you're looking for great courses to take in data science, these ones have been highly recommended by the community here.I've paid attention to my blog stats (of course, I'm a data nerd), and have found that the articles that I write that get the biggest response are either:

  1. Articles on how to get into data science

  2. Coding demos on how to perform areas of data science

But you may find that something different works for you and your style of writing. I don't just post my articles on LinkedIn. I also post on Twitter, Medium, I send them to my email list, and I put them on Pinterest. I balked when someone first mentioned the idea of Pinterest for data science articles. It's crazy, but Pinterest is the largest referrer of traffic to my site. Google Analytics isn't lying to me.

I've chatted with so many people in LinkedIn messaging, I've had the opportunity to speak with and (virtually) meet some awesome people who are loving data and creating content around data science. I'm honestly building relationships and contributing to a community, it feels great. If you're new to the "getting active in the data science community on LinkedIn" follow Tarry Singh, Randy Lao, Kate Strachnyi, Favio Vazquez, Beau Walker, Eric Weber, and Sarah Nooravi just to name a few. You'll quickly find your tribe if you put yourself out there. I find that when I participate, I get back so much more than I've put in.Hitting "post" for the very first time on content you've created is intimidating, I'm not saying that this will be the easiest thing you ever do. But you will build relationships and even friendships of real value with people who have the same passion. If you start a blog, I look forward to reading your articles and watching your journey.

Building community in data science through blogging. Data analysis, data collection , data management, data tracking, data scientist, data science, big data, data design, data analytics, behavior data collection, behavior data, data recovery, data analyst. For more on data science, visit www.datamovesme.com

Read More