Storytelling and Algorithms: The Life of a Data Scientist
Are you thinking about pursuing a career in data science? With the right academic/technical skills, you too could be diving into a world of algorithms, data visualization, and storytelling and have a significant influence on an organization’s decision-making. Balancing a highly technical skillset and ability to channel your inner storyteller is the name of the game when it comes to being an effective data scientist.
I virtually sat down with our very own Kyle Garnick, a data scientist at Skillsoft. Kyle and I talked about his journey to his current role, as well as how he approaches data’s story and how he helps more business-minded folks interpret the data he presents them.
Ryan Tidwell: Hey, Kyle! Thanks for talking with me today.
Kyle Garnick: No problem, Ryan! Happy to chat with you.
RT: So, first things first—your story. How did you become a data scientist?
KG: It was a bit of a domino effect; I definitely didn't know I wanted to go into data science originally. I think when I left high school, I was kind of reasonably good at a lot of things, but not that good at one thing. I didn't really know what I wanted to study and ended up studying genetics at the University of New Hampshire, and I really liked it.
From there, I did my master's in psychology. And after that, I started working on a Ph.D. When I was getting my Ph.D., I ended up in a lab that was doing computational biology, where I was building statistical and machine learning models to try to predict gene interactions.
And at some point, I realized that I pretty much had the skillset of a data scientist. When I had this realization, I obviously considered how lucrative it was and how hot it was. It sounded good to me because I was burnt out from doing lab science.
So, I got an offer from someone I just happened to know at Southern New Hampshire University (SNHU). SNHU is this enormous online university and has hundreds of thousands of online students. It sounded like a fun and exciting opportunity, so I took it. I was the first data scientist on their team. During my time at SNHU, my team and I built models to try and predict the likelihood of students' graduation and to recommend the courses that they should take next to enhance their likelihood of graduation.
RT: Very cool. When you started working as a data scientist, what surprised you?
KG: This actually brings me back to school and my computational bio program. The program was self-driven and very technical, and was essentially all about, “can we improve this model?” More times than not, the absolute best outcome is you improve the model to the best it can be. This idea is very different when you apply it to business, however. In business, the main goal is not that you have an algorithm—or model—that is perfect. It’s that you have an algorithm that is good enough to meet what the business needs are and that it’s less questionable. For example, you may have one algorithm that is very, very good at predicting, but is less good at interpretability.
Over time, I’ve realized that the interpretability is extremely important to the business and is often more important than the actual activity because if you can't use your insights, it's not relevant. So, a major part of my job is translating what is going on at an algorithmic level to an executive.
RT: Interesting. So, would you say you try to identify the “right” algorithms before you even approach data?
KG: Sort of. It’s partially the algorithm identification, and it's partially the ability to translate something from a technical level to a business level. It’s not something I was necessarily unprepared for but that it was not an element of my data science career at the academic level. At the business level though, it undoubtedly has become the largest element of my data science career.
RT: I can see how the two contexts would differ. So, follow up question, and I hope this makes sense. Let’s say you’ve gathered all of this data based on algorithms you’ve tested and so forth. When you approach the data, do you have somewhat of a story in mind, or does the data’s story reveal itself to you as you collect it?
KG: That’s an interesting question, and it takes me back to my Ph.D. because you’re essentially getting at the two major types of science:
The first is hypothesis-driven science, where you either set out to prove or disprove an idea. Then there’s discovery-based science, which is, “here’s a bunch of data, figure something out.”
Most of the time, I try to go with the hypothesis-driven route. Take Percipio, our learner experience platform, for example. We come forward and say, “if we introduce this new feature, we hypothesize that usage will go up by 10%” or something like that.
Sometimes our hypotheses are correct, but sometimes they give us totally unexpected results. You then have to tweak or come up with completely new hypotheses and set down new roads.
So, in short, I think it's really a mixture of both. But in general, I think it's good to define what you're looking for beforehand. For me, it's almost like a checklist. It's, "Did we check if this makes usage go up by 10%? Okay. Yes, it does it.” I find it to be more structured that way.
RT: Gotcha—it’s a bit of balancing act of the two types of science.
KG: Essentially, yes.
RT: Interesting. So, once you have your story, I can imagine it’s still in pretty technical language, right?
KG: For the most part, yeah. There’s a lot we have to do to “translate” data into ordinary language.
RT: I see. How do you prepare for that? How do you put what you’ve discovered into terms more business-minded folk can understand?
KG: I try to make a point to involve the business stakeholders along the way. That way, it’s not a data dump at the end where they see all the results, and it's all new information they’re supposed to process quickly. I think it's important, not just from a perspective of trying to make them feel involved, but to get their input because I'm sort of a generalist in the sense that I am doing work for many wings of the business. I don't have a specific and deep understanding as some of these business people do. It’s a good idea to get perspective about their day-to-day business practices, the business trends they see, etc.
The other thing I think is important is taking a step back as a data scientist, and remembering what life was like when you didn’t know how an algorithm functions. It’s easy to get lost in data concepts and jargon because once you know what you’re doing, the work is fairly intuitive. It’s easy to get lost and think, “oh, I know how this works, and therefore, everybody knows how this works."
RT: I see, so, you’d say it’s easy for data scientists to get stuck in a silo?
KG: Exactly. It’s like teaching anything else. Whether it’s math, a new language, or data, it’s crucial to put yourself in somebody else's shoes and recognize that they don't have all the technical background.
RT: That's interesting you brought that up because through my research, I've seen the sentiment of “teaching others to ask the right questions about data.” How do you do that with someone who just doesn't get data, someone who just doesn't understand it?
KG: I don't know that I've ever encountered somebody who just doesn't get data in a broad sense. There are definitely people that, just by nature of their jobs, maybe tend to work with data on a much smaller scale than I do. Data can be overwhelming to someone who works with smaller scales of data.
If anything, I’ve more so dealt with people who just don’t get technical jargon.
RT: Interesting, so, it’s more about weeding out the technical-speak?
KG: Exactly. Data isn’t that hard to grasp; it’s often just the terms surrounding it that trip people up.
RT: That makes sense. Well, Kyle I only have one more question for you. I want to give you a moment to brag about something that you’ve done as a data scientist. What's the coolest thing you've done with data at Skillsoft? What's something that you're really proud of?
KG: I don't know if it would be the “coolest” thing, but what I'm most proud of at Skillsoft is probably the predictive repeat user model. I think it's cool because it has a tangible effect on users at a human level.
We have Percipio, where people come to enhance their business skills or further their professional development. We want them to come to the platform more than once to try to cement their learning as much as possible and to keep these skills over time, rather than a platform where they log in one time and click a box that they’re required to click.
We performed an analysis around the drivers and the features that make people come back to a platform more than once and to help them retain these skills over time. We built a model to try to predict which users were likely to return to the platforms. And then we were able to focus on these drivers and promote campaigns trying to push people back to the platform.
For example, we learned that if you go to a Percipio page and are able to immediately find what you're looking for, you are much, much more likely to actually consume that content and therefore return to the platform in the future. Whereas, if you have to do five or six searches before you find what you're looking for, there’s a significant turn off from coming back to a platform.
So, we were obviously able to use that insight to work with the search team to improve our terms in certain directions. I think that model was gratifying because not only does it enhance our usage and help out on the business end, but we're actually pushing users to come back more than once and build skills that last and that are persistent.
RT: Great stuff, Kyle—thanks so much for chatting with me!
KG: You’re welcome! Anytime.
Ryan Tidwell is a Content Marketing Specialist at Skillsoft