Today’s Ask A Data Mentor podcast is with Andreas Martinson, who is currently a lead data scientist at Validate Health.
We actually met a while back on LinkedIn, and finally got a lot closer when we build out the Learning SQL publication. He leads that publication on LinkedIn and on Medium.
Check them out!
Medium's Learning SQL publication (https://medium.com/learning-sql)
LinkedIn's Learning SQL (https://www.linkedin.com/company/learning-sql)
Transcript
[00:00:00] Sarah Floris: Welcome to the Ask A Data Mentor podcast. And today will be our episode with Andreas Martinson.
[00:00:05] Sarah Floris: Andreas, do you wanna introduce yourself?
[00:00:07] Andreas Martinson: Sure. So in my current role, I am a data science lead at Validate. I've been working at Validate for a little over a year, and before that I worked as a data analyst for three years for Intermountain Healthcare.
[00:00:26] Andreas Martinson: For part of that time I was actually working for a subsidiary of the Intermountain Healthcare called castell. And for the past four years, I've been working in value-based. And so value-based healthcare, I can go more into that if you want to, but just in the healthcare space.
[00:00:44] Sarah Floris: Anything else you wanna share with us before we get started?
[00:00:47] Andreas Martinson: Yeah, I think just in general I've been in my career for, I'm going on five years now. I graduated in 2018. And so after I graduated and I got my bachelor's actually in accounting. I I went and got my master's and I started working at Intermountain.
[00:01:09] Andreas Martinson: And I worked full-time while doing my master's. I got my master's. Biomedical informatics at the University of Utah with an emphasis on data science. And so I pivoted to healthcare at that point. But actually, throughout my bachelor's, I was interested in healthcare, in the healthcare industry.
[00:01:29] Andreas Martinson: I actually was planning the plan when I started my bachelor's was to graduate as an accountant and then eventually work my way up to CFO of a hospital or something like that. Obviously, my plans have changed a lot. I started learning how to code and really getting into coding. After my junior year of my undergrad, I had a class that went over VBA and taught me VBA, and I feel like that's like the. The gateway drug equivalent of getting into programming for business majors. And so I learned how to code BBA in Excel like 2016. And then after that, I just really started looking into the space more. Data science was new when I graduated. They actually created a data science degree. It was brand new right as I was graduating from Brigham Young University at Idaho, and then I did my master's at the University of Utah. So yeah, that's a little more of my past. You can know where I'm coming from, but I think, I feel like the data space is so diverse, so many people from so many different backgrounds because the field is so new.
[00:02:36] Andreas Martinson: I think that's really cool. I think that's really unique. And it shows how versatile people are too.
[00:02:42] Sarah Floris: That's funny because when my dad actually showed me VBA and I was like “What is this crap?” Like I don't wanna do this at all. Yeah. I went that opposite direction as you did I was like, Nope, coding is not for me.
[00:02:54] Sarah Floris: I guess I'll do
[00:02:55] Andreas Martinson: oh, had the opposite effect?
[00:02:56] Sarah Floris: Yeah. It wasn't exactly the opposite effect. It wasn't until so I went to the University of Utah as well for a summer program. And it wasn't until someone showed me was like, oh yeah, you could actually do all of these cool other things with when you do computer programs.
[00:03:11] Sarah Floris: That was like, oh, this is actually interesting. Yeah. Yeah. It was, it might have just been like the application so like I just, my dad is very like, oh, I'm gonna set up Sudoku and things like that and macros in Excel. And that just was not interesting to me at all. So it's just, it tells you that, the different stories that you have.
[00:03:29] Sarah Floris: So yeah, like right now. I want to talk a little bit more about like where your current role is and what you're doing in At validate and really what is, what do you do? What is, how does your day-to-day look like as a data scientist? And then a little bit about your team as well and how that works together.
[00:03:45] Andreas Martinson: Yeah. Yeah. So as a data science lead, a lot of the things that I'm doing throughout the day let's see. I kinda, I switch between SQL and Python. Every. I am in SQL or Python, or I'm coding in Python, but really it's just using ginger variables and manipulating SQL that way. So I'd say a ton, just the majority is going to be sql and in a data professional role, you're not gonna get away from coding in sql.
[00:04:21] Andreas Martinson: But besides those two mainstays. The things that I work on. I also use Databricks and then I also code in SaaS. And so I think today was one of those days where I was in all four of those. And I really liked a variety. I love being in Spark and being able to be in Python and there's just so much to learn and I think that's part of being in this industry and this type of role, is you really need to have a desire to keep on learning, cuz it's never gonna stop.
[00:04:51] Andreas Martinson: There's so much to learn and there's so many new things coming out. A little more about my role specifically as a data science lead. And so at Validate, we're a team of 13 people Right now, we're hiring to add some more individuals. We have data engineer and data science scientists.
[00:05:07] Andreas Martinson: We don't have any data analysts. We're a small company, so that might happen later, but we really have. Three main roles at Validate. We have data scientists, we have data engineers, and we have strategy consultants and the cons, strategy consultants are expected to know SQL and to they're like a hybrid role where they're the subject matter experts and they also have all the analytics skills, and so they're like our pseudo analyst, but they also meet with all the clients, so they're extremely busy. And so we have three. Strategy consultants right now. I know there is one of, there's one other person on our team who's transitioning to a strategy consultant, so four if you count her, and yeah, so that's the dynamics of the team.
[00:05:56] Andreas Martinson: There are three data scientists as a lead us three, we work together to provide the insights that the company needs. Yeah, I think that's a good overview. What follow up questions do you have?
[00:06:11] Sarah Floris: Do you work on any ml? Is that a thing that you all work with often or is it something that doesn't get worked on very often?
[00:06:19] Sarah Floris: So it's a question that a lot of folks have, right? When you're talking about data science. Because when I was a data scientist I didn't do Machine Learning work. Is that the same for your specific organization or how does it compare to like previous organizations you have been at.
[00:06:32] Andreas Martinson: Yeah, so I think, so in my current role, we don't. I don't do a lot of ML at all.
[00:06:39] Andreas Martinson: Being in such a small company, I do some data engineering work and some data science work, but a lot of what I'm doing is just simulations and forecasting. And the simulations that I'm doing is just trying to replicate the risk models that Center for Medicare and Medicaid Services uses.
[00:06:57] Andreas Martinson: And yeah, typical machine learning and Scikit learn not really. We might do that in the future as the company grows, and I think you might find more of that if you're at a bigger company or if your company is really AI-focused. But I think for a lot of the people who aspire to be data scientist, there are a lot of roles out there.
[00:07:14] Andreas Martinson: I just made a post about this a little while back that I was telling you about before, beforehand, where I shared a meme where Basically it was saying that like a lot of data science is just analytics. Like you're gonna be doing a lot of SQL and things like that. But depending on the role, like data science is so broad.
[00:07:33] Andreas Martinson: You could be, coding and deploying and making sure that your ML models are all. It's taken care of. And you might be doing that the majority of the time. You might be doing statistics, just like simple regression. Or you might be on the lower end, as far as your exposure to ml.
[00:07:51] Andreas Martinson: And so I'd say as far as the data science spectrum goes, I'm on the lower end of how much ML I get to use my job. Of course, I want to do more. It's always exciting, but there are just so many fundamental data engineering things like just setting up dags that are going to run without failing. I've learned to be better about unit testing and regression testing and especially, and like just doing an integration test every time before I deploy code.
[00:08:18] Andreas Martinson: So yeah, I'd say I'm at a hybrid data engineering, data science kind of role in that way. To answer your question, so the most data science type thing that we have is, we have a Bayesian model where we were using PyMC3 to code to code up our Bayesian model because we were predicting net shared savings for ACOs or accountable care organizations.
[00:08:44] Andreas Martinson: But besides that is the main thing that we have. And I'm not the main one who's working on that.
[00:08:49] Sarah Floris: You were talking earlier how you were in a hybrid role between data science and data engineering, but you also mentioned earlier that you were talking about that you also have data engineers in your organization.
[00:09:00] Sarah Floris: So what is their role and how does it differ from yours?
[00:09:03] Andreas Martinson: Yeah. Good question. So one of the data engineers, their role is actually, their official role is actually data scientist slash engineering. We actually have some job postings out there that, that are data scientists slash engineer.
[00:09:18] Andreas Martinson: And I think you'll probably won't find that in a ton of spots. It's just probably just a really small startup kind of thing. But we all do data engineering. One of the employees he's been with the company since the beginning. He is the very first employees hired after the founders.
[00:09:35] Andreas Martinson: He was originally software engineer, and then he just used the data engineer role as time went on and he takes care of things that you would more typically think of. Data engineers working on, he maintains our cloud instance, makes sure that of course, billing with that or any sort of maintenance of our cloud AWS server.
[00:09:58] Andreas Martinson: He also helps maintain our Postgres database and especially with memory, making sure we have enough space and we have our AWS S3 buckets and our EC2 instance, and he's managing how to, how all that goes. I don't have to mess with that as much. He also helps maintain our airflow dags. There have been a few times where he is really busy. Like we only have two data engineers and three data scientists. And so we're hiring more data engineers . And so there's been a few times where I've gone in and made a couple tweaks to the dagger, just opened up airflow to see if my DAG had failed.
[00:10:38] Andreas Martinson: But that's typically what, that's more of their purview. So if any dags are failing, they're more working on it. But then there's a, an in between ground where maybe you're not working. You're not in AWS or you're not helping maintain the database as much, but you need to load some of your output into the database.
[00:10:58] Andreas Martinson: And we have some automation for that, but sometimes you might have to do that manually. That's the main thing. And then just writing dags and helping them maintain the dags, if there's errors, there's too many errors to, for just the data engineers to focus on. And sometimes we wrote the code and put it into production.
[00:11:15] Andreas Martinson: You'd be expected that you maintain that and update it instead of just throwing it over the fence, so to speak.
[00:11:21] Sarah Floris: So what are some of the things you would recommend for folks who are trying to either go in from data engineering to data science or even from data science into data engineering?
[00:11:30] Andreas Martinson: Yeah, so hopefully, I think there's a lot of different ways, but hopefully I can capture what I think. So I think you should know as a data engineer, my personal opinion is you should know some sort of orchestrator. So you should either know how to use airflow or some of the alternatives to airflow.
[00:11:50] Andreas Martinson: So I think Prefect or there's a couple other obviously. DAG orchestrators that you can use, you should know at least one of those. And I think as a data engineer, you're really expected to have at least one cloud platform that you're familiar with. I don't think any data engineer I've talked to is not really known their way around at least somewhat a cloud platform.
[00:12:13] Andreas Martinson: And so if I was to ever switch to data engineering I actually made a post, a LinkedIn post. A while back where I showed like a, just a, like a flow where you start out as a data analyst, then you become a data scientist, then you become a data engineer, and then you become an analytics engineer.
[00:12:31] Andreas Martinson: And I put like the reasons why you'd switch from each one cause there's so much overlap between all of them. I saw another cool chart recently where somebody was showing it in a. In a graph where they showed kinda like the overlap and anyways, so those are the two main things I think if I was to switch that I would need to improve on.
[00:12:49] Andreas Martinson: And with just those two things in addition to my data science skills, I think that would be enough. Of course, I'm like really strong SQL skills. But I think that's a given. If you're gonna switch from data engineering to data science, you really need to focus more on your math.
[00:13:07] Andreas Martinson: Data engineers might have really strong software engineering, but you need to at least know how to do some math and figure out the predict. And forecasting part of ml, but then also the inference side where you're using more traditional statistics to help understand your data.
[00:13:27] Andreas Martinson: So I think that's the distinction.
[00:13:29] Sarah Floris: What do you mean by a traditional statistics for people who might not be as familiar with it?
[00:13:35] Andreas Martinson: Yeah, I guess it's still technically ml, but if you're just looking at like using linear regression, I guess that's the most simple ml if I think that I think of more traditional, but also understanding, okay, what is R squared?
[00:13:49] Andreas Martinson: When you look at linear regression, how would you take two different groups and compare them and determine if you have a significant difference? Do you know, like the very basics, like what is a p value, how you can find if something significant? Are you able to use slightly more advanced concepts like the concept of power and statistics?
[00:14:11] Andreas Martinson: What's the minimum sample size that you'd need to be comfortable with the significant result that you got and not just know that it was by chance? I think that is where you would start just on some of those really fundamental statistical concepts. And then and just in general, just knowing, I guess normal distribution and the different kinds of distributions that there are.
[00:14:40] Andreas Martinson: There's a lot of different things that come up like you have. More than if you're trying to do like an ANOVA or if you're trying to do something more basic where you just have two parameters that you're looking at, like how to understand what kind of problem you're looking at and what tool is best to infer.
[00:15:02] Andreas Martinson: So yeah, so there's a, I the way I think about it is when I think of like statistics, more traditional statistics, I'm thinking about inferring something from the data. And when I'm thinking of ml, I'm. Thinking more of prediction and forecasting, so hopefully that helps. Yeah. I have like my old statistics like book from like my intermediate statistics class in my undergrad cause, so like whenever I need to look up things I mean there's, I'm never gonna remember it all.
[00:15:29] Andreas Martinson: And so I'm just looking at using that as like my reference.
[00:15:33] Sarah Floris: Thank you Andreas for coming on. Is there, how would our how would my viewers come contact you? Like what are some of the ways we can reach you outside of this pla outside of this podcast?
[00:15:44] Andreas Martinson: Yeah. If you wanna reach out to me. I think LinkedIn is the best. I try to post on there every week and yeah, it is through LinkedIn.
[00:15:52] Sarah Floris: What about the learning SQL publication?
[00:15:54] Andreas Martinson: That's right. Yes. I also maintain a learning SQL publication. Sarah helped out when that was first starting out. And it's a medium publication, so you can go on Medium and look up learning sequel.
[00:16:06] Andreas Martinson: There's lots of great SQL articles on there. There's also a learning SQL LinkedIn page if you wanna go visit that as well. I try to post on there every week.
[00:16:16] Sarah Floris: And those links will be posted below. Thank you for joining us and hope to see you next week.