How To Organize Data Science Teams and Data Science Projects For Startups With Ivy Lu, Chief Data Scientist, Oxygen

DUE TO SOME HEADACHES IN THE PAST, PLEASE NOTE LEGAL CONDITIONS:

WHAT YOU’RE WELCOME TO DO: You are welcome to share the below transcript (up to 500 words but not more) in media articles (e.g., The New York Times, LA Times, The Guardian), on your personal website, in a non-commercial article or blog post (e.g., Medium), and/or on a personal social media account for non-commercial purposes, provided that you include attribution to “The HumAIn Podcast” and link back to the humainpodcast.com URL. For the sake of clarity, media outlets with advertising models are permitted to use excerpts from the transcript per the above.

WHAT IS NOT ALLOWED: No one is authorized to copy any portion of the podcast content or use David Yakobovitch’s name, image or likeness for any commercial purpose or use, including without limitation inclusion in any books, e-books, book summaries or synopses, or on a commercial website or social media site (e.g., Facebook, Twitter, Instagram, etc.) that offers or promotes your or another’s products or services. For the sake of clarity, media outlets are permitted to use photos of David Yakobovitch from the media room on humainpodcast.com or (obviously) license photos of David Yakobovitch from Getty Images, etc.

Welcome to our newest season of HumAIn podcast in 2021. HumAIn as your first look at the startups and industry titans that are leading and disrupting ML and AI, data science, developer tools, and technical education. I am your host, David Yakobovitch, and this is HumAIn. If you liked this episode, remember to subscribe and leave a review, now on to our show.

David Yakobovitch

Welcome back listeners to the HumAIn podcast. Today, we’re honored to bring to you our guest, who is Ivy Lu, the chief data scientist at Oxygen. Ivy has a career that stems deep in the data science industry. Having previously worked at multiple FinTech companies and BigTech companies, including Capital One and Apple. Ivy, thanks so much for joining us on the show.

Ivy Lu

Thanks for having me, David.

David Yakobovitch

Well, I’m really excited about today’s show, because Oxygen is at the heart, not only a FinTech company, but also a data science and AI company. Myself, as the host for both HumAIn and Voice of FinTech, we previously featured on Voice of FinTech one of your leads from Oxygen. We talk a lot more about the business, about how Oxygen has been growing across the industry. Now we’re going to dive more into technology, today. Before we dive into Oxygen and what the product is. Ivy, can you tee up for our audience a little bit about your background being in the data science industry?

Ivy Lu

I joined Capital One as a data scientist after my graduation from George Mason University with a PhD in Geographic Information Science. So, I worked extensively on transaction-based customers, facing product while I was at Capital One. For example, I worked on matching transaction data with merchant information and also alerting customers with Euro transactions on their account.

Then after I moved to the west coast, I joined Apple. So, at Apple, I work on an anti-fraud team where we fight against all kinds of fraud and abuse within the whole Apple ecosystem to bring trust and safety to the Apple customers. So we deal with different kinds of frauds.

Fraudulent reveals and tracking in the app store and content. So I was responsible for fraud in one specific domain or area, then I built machine learning models to identify fraudsters as well as to engage in a policy-making process to mitigate fraud. So I have a lot of experience working with the banking and transactional data from my time at Capital One. Then I learned a lot about the fraudulent abuse from my role at Apple in the anti-fraud team. So both experiments helped me prepare for my new challenge at Oxygen as a FinTech company. So, that’s my career , how I passed from the traditional banking industry to a large technology company. And now I’m at the spin hat company Oxygen.

David Yakobovitch

It’s super exciting, recently on Voice of FinTech, the other show that I co-host, I brought Ryan Conway, who’s the senior vice president and head of business development and strategic partnerships at Oxygen. We spoke about banking for small businesses and FinTech trends more on the business side.

The evolution from FinTech 1.0 to 2.0 to 3.0, and that sounds like you have been on the technology side of those FinTech changes. How have you seen the FinTech industry evolve with data science, these challenges you’ve been learning and these lessons learned as you’ve been scaling data science for different FinTech products?

Ivy Lu

For sure. So I do see there are multiple challenges, like I have never experienced when I was at a large company, like Capital One, or like Apple, which is even larger. So also FinTech startups, especially if you’re joining like me as the force of the data scientist, you are facing a lot of challenges that I never experienced before. So for example, a collaboration challenge, since you are the only one and only data scientist on the team, basically, you are collaborating with so many different teams and departments: from operations to marketing customer support or product features.

So, you need to collaborate with every single one in the different departments and understand their needs, understand their pain. That also comes related to the first challenge. Collaboration comes with prioritization. Basically, since you are the only one, you are a small team, but on the contracts, every single department in the business will need you somehow, one way or the other. They all need some kind of data and analytical support for their business.

So, how can you prioritize things to make sure you are now the blocker of the whole company? How can you help them move things forward? How can you help the person who is the business decision? That’s a huge thing. That’s a huge challenge for prioritization, and also for a FinTech startup, because you are just a starting. So there’s always this challenge for code to start. Basically you’ll have no label. You have no data, you have no infrastructure and you have no one to turn to, to help you. So, how can you solve this challenge from the start? So that’s what I see as a huge challenge when you are trying to start a data science team within the FinTech startup.

David Yakobovitch

I’ve seen firsthand working at different startups as well, Galvanize, General Assembly and Single Store. That one thing we’ve done on the data science side, which I think is quite similar to software engineering, is solving collaboration by having collaboration tools. Some of those tools are Slack or Microsoft Teams or JIRA or Confluence to manage a lot of issues of project management.

That’s critical, cause I can’t imagine yourself, as data scientists, number one. You must be having a fire hose of requests. Now, fast forward two years, you were leading and scaling data science across the entire organization. From these challenges, you talked about collaboration, prioritization. How have those evolved for you and your team over time?

Ivy Lu

So, for collaboration, how I solve it is like that data science teams should be positioned as the foundation and the cross team within the whole organization. So for each line of the business, data scientists should have domain knowledge about the problem that they are trying to deal with. So, for example, for product, we need to deeply understand the product features and have the product sense to drive decisions.

Then for fraud and risk, we need to have those anti-fraud experience. We need to know what’s the trendy thing in the current fraud trend, and also have the financial knowledge about the added risk. For marketing and growth, we need to understand the customer acquisition or the marketing strategy, how the ad campaign works, and how we do the AB testing.

And then, last but not least, for operation we need to understand the user experience, conduct user interviews and understand and work hard without compliance regulation requirements.

Also, the data science team needs to collaborate with the engineers teams, they need to work with the data infrastructure, which is the foundation to all the analytical and the machine learning projects. So before I’m the only data scientist, I prioritize myself and then have this prioritization procedure in my mind. And I know what to work on and how to collaborate, but now with the team growing, I’m trying to set up versus spring planning to have a team beyond agile. So it is pretty clear what they want to do in the NACA sprint and how much they want to do, so that we don’t burn everybody out. So they still have a very good work balance here in Oxygen.

David Yakobovitch

That’s right. As you mentioned there’s multiple data science projects in Oxygen. One of the misnomers that new data scientists to the industry don’t know, until they get into data science, is that projects don’t stop when you ship the product.

But in fact, it’s an iterative cycle. So working on things like fraud and product features and operations continue, and that’s a continuous loop. What have you seen about some of these data science projects with Oxygen that you’ve found very fascinating to share with our audience?

Ivy Lu

I can give you, probably, a January introduction about each of the lines of business, and then I can give a detailed example of something I worked on and pretty excited about. So, first, in general, for fraud, at the beginning of the time I was involved very extensively in fraud. I spent a lot of time off route, because we are new to the market. We just launched and we are kind of a target for the fraudsters. So, they specifically target newly founded FinTechs a lot, because they always think that the new FinTechs don’t have good fraud controls. So I collaborate with our fraud team to set up a lot of protections in the core sets. We collaborate with different fraud vendors on how to set up all the parameters, all the controls in place in the fraud vendors for our sign up status. After the sign up flow is pretty under control, I built a preliminary machine learning model for the fraudsters, to detect fraudsters after sign up for the behaviors they show with our card.

So that’s kind of the fraud project that we have at Oxygen. Basically, all kinds of fraud we can try to solve from a machine learning perspective and then for operation, as we are a customer facing product, every day we have so many customers contact us for one reason or the other.

So, January was a lot of customer chat, script, or phone calls. I use machine learning to analyze those scripts, to analyze those cut phone recordings, and to catch what’s the most critical issues that we have. Why do people contact us? And can we solve the reason for them before they contact us again. So, we can reduce our contact rate or reduce the cost for the agent.

Then for marketing. Marketing is so critical for FinTech startups, because we need to understand how we can spend our money wisely. What’s our CAC? What’s the most efficient channel? What’s the most efficient channel that can bring us back to the customer? So, I collaborate with our marketing team a lot to understand what’s the best channel to use. What’s the best way to use. So that’s the marketing project that we have at Oxygen for machine learning. Then, last but no least, we create a summary of product features in the Oxygen app. One specific example I want to give is the project where we clean up the transaction team house for the customers.

So, as you know, even the neobanks or the FinTech apps are all so fancy, so slim, so good-looking. But behind the scenes, the information is still in a very legacy system, like a decade old system. So, for example, for the transactions that we receive from our transaction processor, the transaction looks very messy. All of the letters are capitalized and there’s no information about the merchant at all. So, at the first iteration of the app,

We just showed whatever the transaction processor showed us. If you look at our app at that moment it looks very ugly, because of all these capital letters and the minimalist numbers in the transaction description, and it feels as if your bank is yelling at you. So I see this as a low-hanging fruit that we can quickly find some solution to prioritise the UI, but also it has the transaction with the merchant information. So that also helps a lot for the customers so that they can understand where they transact, and they don’t call us for disputes.

So what I did is I made this logic where we can identify those chain merchants and branded names, for example: Starbucks, Walmart, Trader Joe’s. Those are pretty famous chains. Now we can enhance the transaction with the merchant name, with the merchant logo, clean up the name, and even show a small map in the app. With this app, you can remember where you’re shopped, so that not only helps the customer in the UI sense, but also helps us to understand all our transactions and all our customer behavior.

David Yakobovitch

You are reminded, when you’re going through all these products, Ivy, that data science has truly become like software engineering. As we’re moving into the 2020s, the data science stack continues to mature. There are a lot of Open source developer tools that have now matured to become an augmentation to the software engineer stack. Which means when you’re organizing and scaling a data science team, it might be quite similar to building and scaling a software engineering team. So I’d like to hear from your experience. What is needed for a good data science team?

Ivy Lu

That’s a great question. So, I did help our team to hire some engineers on the side. I do see the similarity while I was trying to find a good candidate. But also, these days there are just so many great enterprise level AI platforms that help so much on the infrastructure side.

So you can also argue that it doesn’t need to be very technical, and has strong knowledge in the software engineer side. So, I recently just saw a demo from a vendor where they just streamline all the machine learning processes from a data cleaning to data waggling, and then to a future generation to pick up the mouse though. So, they can run so many models parallels and pick up the best ones that suited you for your problem, for your data.

Then, they can do what you mentioned as monitoring as well. See whether there’s any bias in your data or like any transition you’re in your data trend. So, I do see these days, as data scientists it may require different skills than before. For example, when I was working as a machine learning engineer at Apple, I was using all of the coding skills, the statisticians.

Nowadays, maybe, coding skills are not required anymore with such a good tool for data scientists and for machine learning engineers. But, ultimately, I still think the important thing is the study section background on the machine learning algorithm, the deep understanding of the machine learning algorithms.

Also what’s important is the deep understanding of the problem they’re solving. So, you may have really great technical skills, but you don’t have the contacts of the problem that you are solving, maybe you are solving the wrong problem. So I would encourage all the data scientists and machine learning engineers, or whoever wants to get into the data science domain to really understand which industry do we want to go to, and then how can you get more domain knowledge. So that domain knowledge is actually valuable as data science.

David Yakobovitch

This context is very powerful on the different products that you’re building in the data science organization. And for data scientists who are listening to the show today, there’s always been a lot of talk about teams being hub and spoke, and centralized in many different other models as well and different roles for the team. Whether some of the best practices you’ve seen building and scaling your team?

Ivy Lu

Great question. So, this is a pretty natural transition from centralized to embedded. So there are two types of team structure, like you mentioned. So, one is like the data science team belongs to one centralized team and then people may wear multiple hats. So, one day you may work on project A, then another day and work on project B, versus another one that is more embedded.

So within the data science team, or this team, there will be some small teams and one team may be fully responsible for, let’s say, operation; the other is for marketing. So, I’m thinking like for our data science team, my design is like at the earliest stage because we’re pretty small right now.

We only have two people, I guess, tomorrow more people are going to join. But with this team size it makes sense to have everybody wearing multiple hats. So, all the data scientists that can have a chance to explore two different things, that’s also a benefit to join us as startups. So you are not dealing with just one single problem in the entire, like in the huge machine, like what I did before for all those large companies.

But, otherwise, you will have those chances to explore different problems in different domains that you didn’t have a chance before. So at this stage, I encourage my team members to explore different things where we have to work on different projects from time to time.

But as the team grows, as the Oxygen grows, as our data science team grows, it makes sense to adopt this embedded format. So that we’ll have people dedicated to marketing. We will have people dedicated to product features. I guess, like within the team, there could be rotation in the future, but there’s change from the centralized position to our embedded version. That’s also natural as the growth of the whole company.

David Yakobovitch

I always think that dedicated resources are one of the best ways to build and scale products. And naturally, of course, doing those rotations helps enable each one of your team members to have a full domain expertise around the entire data science stack across all the business lines. So that’s great to hear that you’re building a powerful embedded data science organization in Oxygen. Recently your team, Ivy, has announced a new product update. You’ve unveiled the new Elements program with Oxygen to offer loyalty and saving features to your debit card holders. Could you share more with our audience about the Elements program?

Ivy Lu

Sure. We are super happy to announce that we launched a new product called Elements. So we are now offering four tiers of the product, with increasing cashback with different saving APRs, as well as other retail and travel benefits like priority pass, launch access, reimbursements, like digital subscriptions, like Netflix, and the Peloton Digital.

As well as other travel insurance for those who are keen to get back on there. So with those launches we provide the debit card holder the cash back on everyday spending items. Now we are also offering to all the millennials or gen Z who have steer away from credit cards, the same rewards, and a lot of the program offerings as top tier credit card providers offer.

There is a transition right now. People are steering away from credit cards and our credit system is decades old. Now, with this new launch of our new product, you will see, we are also testing and try to see whether this idea will be adopted, whether people will like this idea that a debit card can have the same benefits and same reward,

same cash back as the credit card before. So for us as of today, like we already launched more than a week and the growth is exponential; is like way out of our prediction. So disparity, after two days after our launch, the number of PSYOPs is the same as the number of sign-ups of the months before. So we are gross like 13 times more than what we have before. So I do see people love the idea and I’m very excited that we launched.

David Yakobovitch

It is fantastic to see that from a business perspective, Oxygen now has your Elements program for the earth, water, air, and fire. But beyond that, I’m sure as the leader for data science, you’re the chief data scientist at Oxygen, there might be a lot of bells and whistles going across where you’re thinking: Now we need to maintain analytics and data science and tracking around this new product. These features ensure we’re preventing fraud. We’re building good operations, scaling the product marketing and PLG motions there as well. So sounds like you have some exciting new work for your two new team members and your current team with the Elements program.

Ivy Lu

Yes, for sure. Like I mentioned, even though I have these two new team members, we are still so small. Also, especially after the elements launch, every single team now needs a lot of support in the data analytical part.

They all need to check some kind of KPI metrics, need some dashboarding support to help us understand how is the launch. Why is it successful? How are we going with this launch? How can we assign proper resources to support the launch? So we are busy with all these metrics. But on the other hand, for the long run, we have many more features that we want to launch within this year.

So, for some of the features, we definitely need some preparation from the machine learning side. We need to build a lot more now, though. And also, As I mentioned, like with this launch, we have so many PSYOPs and the fraud department is also worried a lot. So there’s always a tension between the fraud department and the marketing department, and how can we best support them from the machine learning side to provide them the best of prediction on the fraud prevention.

That is a project we need to launch as soon as possible. And also, we are going to raise our series B soon and a series B is all about metrics. Whether your company is going to be sustainable, what’s your retention, what’s your user growth. So a lot of KPIs and the metrics you send show to not only our internal business, but also to work presents for our VC. So with those launches and waiting for the next month, our team will be very busy providing those KPIs, metrics and dashboarding tools to the entire team. So that’s what we are going to do in the near term.

David Yakobovitch

Well, Ivy, I’m delighted to hear about the launch of the Elements program at Oxygen and the continued growth of your data science team. It sounds like as you’re moving into this series B and beyond, there’s a lot of product-led growth and a lot of new access you’ll be providing with Oxygen’s product suite for Millennials and Gen Z. Ivy Lu, the chief data scientist at Oxygen. Thank you so much for joining us on HumAIn.

Ivy Lu

Thank you, David, for having me.

David Yakobovitch

Thank you for listening to this episode of the HumAIn podcast. Did the episode measure up to your thoughts and ML and AI data science, developer tools and technical education. Share your thoughts with me at humainpodcast.com/contact. Remember to share this episode with a friend, subscribe and leave a review. And listen for more episodes of HumAIn.

Solid Data AI Thought Leadership

Actually being done in AI

Thought-provoking

Putting things into perspective

Digging into AI