Welcome to our newest season of HumAIn Podcast in 2021. HumAIn is your first look at the startups and industry titans that are leading and disrupting ML and AI, data science, developer tools, and technical education. I am your host, David Yakobovitch, and this is HumAIn. If you like this episode, remember to subscribe and leave a review. Now on to our show.
Welcome back listeners to the HumAIn Podcast. Today, we are talking about the future of intelligent applications. Thinking about how enterprises are powering their intelligent applications and fast analytics. Today, I’ve brought you the Chief Product Officer of SingleStore, Jordan Tigani.
Jordan Tigani comes with a rich background in the data industry. And today we’re going to dive fast into intelligent applications. Jordan, thanks so much for joining us on the show.
Thanks, David. Great to be here and to get to talk to you.
Absolutely. We talk a lot at SingleStore about the applications and analytics that are being built. But before we dive deeper into the product and the strong use cases, can you share with our audience about where you’ve been and what brought you today to join SingleStore for this next part of fast analytics and intelligent applications?
Sure. I’m a product person now, but I’m really a software engineer at heart. I spent 20 years as an engineer and part of it as an engineering manager from Windows kernel to Microsoft research, worked at a couple of startups. Then I landed at Google and it was right as Google was building its cloud.
And so I stumbled onto this project called BigQuery; was one of the founding engineers of Google BigQuery building out their cloud data analytics flagship; what became their flagship at the time was nobody was paying any attention to it. So I ended up working on BigQuery. I helped lead the engineering team and then the product team and having gone from zero to where BigQuery is now, I can’t share any revenue numbers, et cetera.
But, it’s one of the largest cloud data warehouses out there by revenue to put it that way and to see that trajectory now, SingleStore MemSQL was MemSQL at the time was, somewhere at a point along that trajectory. And it felt like it was on that same trajectory.
It felt like MemSQL/SingleStore is where BigQuery was just a few years ago. And they have amazing technology, amazing engineering team. That seemed very familiar to me. They’ve got a burgeoning cloud product, they’re going in the right direction and they’re solving a problem that we were having a hard time solving at Google and that problem was; how do you get analytics with very low latency? How do you get high update velocity for your analytic data store? And, we had tried to do that at BigQuery. We saw competitors also struggling with the same thing, and really there was a difference, there’s an architectural reason for that.
But to me, it was just really exciting that, ‘Hey, this is solving a problem that other people are trying to solve, but are struggling’. And, there are certainly things that SingleStore doesn’t do. But to me those are engineering, they’re turning the crank, adding the features, but the architecture, the foundation is there. And to me, what was really exciting is that the world is our oyster. Based on this technological kernel of amazing, fast, incredible database fundamentals.
And then on top of that, we can layer on a lot of other things. So to me what was exciting, also getting a chance to wear both the product and engineering hats, because I help lead the product and engineering teams and design teams at SingleStore. So I don’t have to give up the product half of my brain or the engineering half of the brain. I can combine both at once.
I’m so excited about the product. For listeners of the show, as many of you may know that in October, 2020, I joined SingleStore full time. And prior to that, I was involved in the data science and AI community scaling different enablement efforts for large enterprises. And most of the efforts were with the new programming languages. These include the ones that everyone’s learning today to be programmers like Python and R and a lot of those data science languages of choice. But I discovered working at those organizations that there was a gap and the gap was that not everyone is just coding in Python and R. In fact, it’s all about applications and analytics, and that goes back to SQL or Sequel. And one of the most recent programs that I had spearheaded with our team at Galvanize in late 2019 and 2020 was a data analytics program.
In fact, we had partnered with large organizations like Instagram, Etsy and Chartable to speak and uncover from the organization; where’s your skills gap? And the skills gap wasn’t as much about the data scientists and the machine learning engineers. But the leaders there, the engineering managers said, ‘We need team members who know SQL.
We work with SQL every single day with analytics, with databases’. And that sounds like that is what we’ve discovered with SingleStore is where companies are able to put applications and analytics together. Can you speak more about this use case and about what we’re doing with fast boards to dashboards?
Sure. I like what you said about SQL and the importance of SQL and one of the Gartner analysts I was talking to just earlier this week had a great quote and that was, ‘It doesn’t matter how old you are, but your grandchildren will be still using SQL’. I’m not sure everybody’s grandchildren will be the kinds of people that would write in SQL. But in terms of the longevity of the language and the usability of it, SQL is here to stay and SingleStore is a great way to enable those SQL applications.
And so, you were describing the key critical use case for SingleStore is our fast dashboards. And one of the first things you want to do with analytics is you want to be able to display your data. Like human beings are not very good at looking at columns of data and developing any patterns or recognizing any patterns.
If you’ve ever opened up a spreadsheet, you look at it and if there’s half a dozen rows, you can detect the trend, et cetera. But if there’s a thousand rows, it’s a lot harder to detect trends, particularly subtle ones. And if there’s a million rows or a billion rows or a trillion rows, then it’s really impossible to get a feel for your data just by looking at the values. And so visualization is important, but as you’re visualizing data, the performance of those visualizations is important just like the early days of Google.
And they realized that every 10 milliseconds that it took for your query results to come back was additional time; you lose some percentage of users by making it just a tiny bit slower. And so, when you’re trying to get information from your data, when you’re trying to visualize your data, performance matters, speed matters.
A lot of people suffer through just poor performance of their dashboards and it doesn’t sound like a terrible problem until you realize that, if you’ve got a dashboard that takes three minutes to load, you’re only going to load it once a day and half the time you load it and listen. When I was at Google, I would do a lot of this. There was a key revenue dashboard that took almost five minutes to load, and Google has, in general it has their act together and has really good tooling.
I just found that I would open it up and I would get tired of waiting and I would just change tabs or I’d do it over a cup of coffee, and then I’d forget about it. And then like the next day, I would open up the same tab again and wait five minutes and forget about it. And there’s gotta be a better way to visualize your data. And especially if what you want to see is not just a static report, but is, you want to be able to dig into your data.
Something in software is a key technique called the 5Y’s. It was like something happened. There was an outage. Well, why? We lost connectivity. Well, why? Because this server was overloaded, why? And you ask why five times and that lets you drill down into what the actual kernel of what you’re looking for is, and you can do the same thing with data. It is like, ‘Hey, like our revenue is down Why?’ Well, one of these regions wasn’t performing. Why? There was a warehouse that was destroyed. Why? Okay. There was a giant thunderstorm. By drilling down in your data, you can actually understand what’s going on in your business.
And the performance really matters because if you’re getting good performance, you’re going to try a bunch of things. You’re like, ‘Well, maybe I should slice it this way. Or maybe I should drill into this, or maybe I can try this’. And if you’re getting, if it takes a minute or if it takes 10 seconds or 5 seconds to load your analytics or your visualizations, then you’re not going to do as much exploration.
And at SingleStore performance matters and we can do analytics. A lot of our analytics queries that might take minutes elsewhere, we can do it in tens of milliseconds. We have a lot of customers that have the hardest delays of 50 milliseconds. One, a major bank has a 50 millisecond, SLA for doing fraud detection. And if they can’t meet that, then they can’t detect fraud because it happens in the middle of a transaction. And so, we have to make sure that our analytics can operate at least at that speed.
This is one of the most exciting facts that blew me away when I joined SingleStore; learning about the technology around ingest, query performance and concurrency was incredible. When you really thought about the use cases, having worked at Deutsche Bank, where we built click dashboards and ADP, where we built Tableau work streams and even building with Looker and Power BI at other clients and organizations. I experienced this pain point first hand where these events would take minutes to load.
And when I was working with our data science and advanced analytics teams, the executives are saying ‘We got to move faster. We can’t be slowed down by this’ And that has a significant material impact on the business when you can’t get results instantly. So this is a critical shift for the industry that we’re seeing today.
Absolutely. One of our customers, they don’t like us to use their names. So I won’t mention the name, but they’re in the Fortune 500 and we power their CEO’s dashboard. Their CEO apparently wakes up at four in the morning and he checks the dashboard and it’s a global business.
So you want to see what’s going on right now? What’s happening in my business? And so looking at a report from yesterday is not good enough. He wants to be able to make changes and make policy decisions based on what’s happening and that one dashboard, so it sounds like that dashboard isn’t not such a big deal, but all the top execs use it. It does something like 800 queries per second, and an average latency of sub hundred milliseconds. And that scale lets them get the value and lets them become data-driven and make their decisions based on data and based on what’s happening right now.
Thinking of other use cases, especially during the pandemic where all of us were locked down and working from home. One of the key consumable products that we use every single day is the bandwidth around the internet and data. When you think of large organizations like AT&T, Time Warner, Verizon, Comcast, T-Mobile and others, there is so much data throughput occurring and there can be issues. I remember firsthand with Verizon, actually New York City we had an outage during the pandemic and I was like, ‘What do I do?’ And the team has to recognize that. We work a lot in the telecom space as well. What are some of the use cases you’ve seen here at SingleStore?
There’s some super interesting ones. One of the major telecom providers, I’m not sure if I can name them, but we serve as the backend for all their analytics, for their 5G rollout. And we supported the rollout of their 5G systems. And first of all, there’s a lot of data; data from all the cell towers, signal strength, lots of data coming in at a very fast rate over time. You don’t want to, if there’s any holes in your data, it can be very detrimental. So we want to have, you got to have high availability, you need to have high ability to ingest data.
And then also they also need to know what’s going on right now. Do they need to send a truck over to fix the tower or to improve their signal? A lot in their business is really riding on being able to make good decisions about what’s going on with their infrastructure. At the root of it, this is an IOT problem. You have these large sets of distributed sensors that are all streaming data about their status, about their uptime. It’s a big data system.
There are billions of records in this database. You need to be able to see what happens over time. You need to see the historical, but then you also need to be able to see it in real time. So we see a number of other kinds of IOT use cases we see, were very popular in the financial services. Financial services is one area where people see the value of time. Time is money. Financial services people actually really have internalized that and they really get that.
They recognize, there is a lot of high frequency trading applications where, if they’re a millisecond late, then they’re going to lose money. And so we’ve helped some customers. I mentioned 50 millisecond necessarily. We have one customer with a two millisecond SLA. They need to get the results back in two milliseconds and, we can get 99.9% in only a single millisecond.
Financial services is one of the first areas where people get the value of real-time information. In the old days, people would see the stock values in the newspaper, and then they would make their investment decisions, the next day. And then with the internet, they started being able to get 15-minute delays. And even that was too slow and then, real time and then, which is up to the second. And then of course you have the high frequency stuff, which is even faster than that, but while financial services folks recognize that there’s value in going from daily information to up to the minute information, up to the second information.
Other people are just starting to recognize the value from doing real-time inventory, being able to understand exactly what is going on in your inventory right now. So retail folks are getting into IOT, or I’ve got a factory and I want to understand what’s going on in my factory right now or internet businesses. We also power a lot of backend systems for Uber, real time, ad bidding or real-time segmentation of customers. If you can’t get real time, you often think you don’t need it; but once you start getting things real time, you realize that ‘How did I ever operate without it?’
It reminds me of this, funny commercial that I’d always see during the Super Bowl about one of those insurance agents and you just call them and they’re instantly there at your need and your side. It’s the same thing with these intelligent applications, 5 minute SLA turning into 20 or 2 milliseconds is completely life-changing and it can be life altering. When we think of geospatial data, you mentioned the use case of Uber and beyond that, we even think of the pandemic.
Prior to the pandemic I was working in Saudi Arabia, actually with Saudi Telecom and was working with their team as they were exploring use cases with data all across their towers, and even thinking about the COVID-19 pandemic as that was just breaking out back in February, 2020. At SingleStore we have done some significant work with True Digital to help them with real time tracking and prevention for COVID-19 as well. Could you speak more about that use case and why that one has been so successful in Asia Pacific?
Absolutely. So the True Digital use cases is really interesting. They were able to basically use streaming information about, they used the cell phone location information, they were using that to generate heat maps and they can see where there were large COVID-19 infection rates. You can see where people are congregating, see where areas that should be avoided. Also, the amazing thing is; it only took them two weeks to develop and roll out this solution because SingleStore solves so many of the problems from fast data ingestion to the ability to do the analytics.
And so when we think of all the stories that you and I’ve just discussed, Jordan, it really comes down to three pain points; ingest, query performance, and concurrency. These are very technical words. Can you break it down for our audience on why these are three pain points that we are solving for?
Sure. And it might be actually good to step back a second and think about, if you’re building an application; why do you need analytics? And then what are the things you need out of that analytics? And so if you’re building an application, it’s incredibly common to need some analytics, something that is going to let you say, like, ‘What’s going on in the world, what’s going on outside of the individual user or the individual data points you’re looking at?’ And just to give some examples of that, if you think about any leaderboard you’re going to show.
A leaderboard is an analytic query, or if you’re doing gamification or recommendations, you have real-time, trying to recommend something to the user. That’s analytics, because you need to basically run a query. You need to understand what’s going on across the other users.
And a lot of transactional databases are not good at doing that or they’re not good at doing that at scale. And generally data warehouses, which were usually the traditional location for analytics are not good at doing that with low latency and then not doing it good at doing it with the absolute freshest data. Other reasons you might want analytics if you’re doing any drill downs or search functionality, faceted search, if you’re doing monitoring or if you’re doing embedded analytics. Embedded analytics is the simple one if you’re showing a dashboard to your users, if you’re the Telco company and you want to show users the history of their usage over time, that’s really an analytic query.
And so if you’re building, if your application requires analytics; what are the things that you need? So you need the data to be up to date. Because if somebody does something, if I make a call, I pick up the phone or something and I hang up the phone and I go, and I look at that dashboard and that call doesn’t show up, then I’m going to be upset or I’m going to like think, what’s wrong. Did that call not get recorded? And so the ingestion speed and the ingestion capacity is really important.
The other thing that’s important is query performance. Because if your application is showing analytics in the critical path to your users; then your analytics queries have to be really fast because the responsiveness of your application is limited by the performance of these queries. So query performance is also fascinating. As I mentioned, every 100 milliseconds it takes is going to be some attention that you’re going to lose from your users.
So the faster you can make these queries the better off you’re going to be. And the last thing is concurrency note. If you’re building an application, you want that application to scale, to as many users as possible. You want to go viral. You want to have lots of people being able to hammer your system.
And what’s more is a lot of the time your users are not going to spread out evenly throughout the day or throughout the week. It’s going to be 9:00 AM Monday morning, everybody opens up their browser and wants to see what’s going on their spikes. And you need to be able to handle those spikes you need and you need to be able to handle really high concurrency. You need to be able to scale out and, high concurrency, isn’t a hundred queries per second. It’s a hundred thousand or it’s a million queries per second. And so you put these three needs of these applications; fast ingestion, query performance and concurrency.
And these are things that traditionally are difficult for transactional systems to do well. Like the query performance for analytics, the high concurrency for analytics, that uses up a ton of the capacity, which is hard for a typical transactional system. On the other hand, a typical traditional analytics system has a hard time with a really fast ingestion, particularly for updates. From a technical perspective, analytical systems tend to be based on column stores.
Column stores really require large batches of things to be updated. And so if you’re updating things, it makes them very hard to update quickly. And then query performance, often these analytics systems are designed for throughput rather than latency. It’s a design trade off, but the systems were designed that way. I worked on one that was designed that way and there’s several others that were designed that way. And for an application though, it’s a different design point, it’s a different requirement.
You need latency and you’re less worried about how many of these you can batch up and do it once. You really want to be able to get a result as fast as possible. And finally concurrency, the data warehouses have a hard time with high concurrency and yes, some of them, you can scale out by spinning up additional nodes or additional clusters, but a) that’s slow and b) that’s extremely expensive. And this is also where if you’re fast enough, you don’t need true concurrency because let’s say you want to do a hundred thousand queries in a second. Each of those queries takes a second. Then you need a concurrency of 100,000. If each query takes a millisecond, you only need a concurrency of a hundred. So, this is an area where a SingleStore really can shine.
When you think of real time, I think of large enterprises, startups, small, medium businesses, and even scale ups who are not just thinking of the technology that we’ve discussed today, but typically companies say ‘If I’m going to go to the cloud, let me start with the hyperscalers. Let’s go to the one stop shops like AWS, Azure, or GCP’ but then very quickly organizations realize that there’s a lot to manage, there’s a total cost of ownership that is very high. So there are other solutions to speed up the cloud infrastructure.
So the question I ask you Jordan is; how can analytical database infrastructure, like what SingleStore is building help companies be more competitive?
So that’s, a great question. You were talking about the hyperscalers and going to the cloud service providers and the cloud service providers often have a grab bag of different databases that you can use a database for every possible application. But one of the things that people want is they want to deal with less stuff, whether you’re an IT department, you want to support less databases, whether you’re a finance person, you want to assign fewer checks or whether you’re a software engineer, you don’t want to have to worry about moving data from one place to another, moving data from one place to another is, fraught with peril.
And so one of the things about SingleStore and it’s the insight behind the name; it takes a lot of the different things that you would want to use your database for a lot of the different use cases. Whether it’s transactions, whether it’s analytics, whether it’s geospatial, whether it’s time series and it puts them all in one package and we can do a really good job of all these use cases.
So, one way that we can make you more competitive is, you have less things to manage. So lower cost of ownership, lower cost of having to train people in various tools. We had one customer who said that, ‘We used to have six databases, now we have two’ because they were able to get rid of several of their databases just by using SingleStore.
In the future, we’ll be able to let them get down to one. But we’re not trying to get rid of all other databases. We’re not trying to say, ‘Just use SingleStore for everything’. We’re great at one of the things we call it is augmentation. It’s like you have a database and it’s not working for some of your use cases. Maybe you need lower latency. Maybe you need higher concurrency. Maybe you need faster update rates. Maybe you need an inflexible billing model and something else, and you want something that lets you scale out a little bit.
SingleStore is great for that. We have our augment path where we can augment another database. We can augment the data warehouse. We have a customer that was doing billing processing on Oracle and Oracle is taking too long. So they were writing their updates into SingleStore and then pushing those updates back to Oracle. That works, we have customers doing the same thing with Snowflake, where they’re landing their data in SingleStore and they’re writing the historical data back to Snowflake. Over time, customers find that they can use SingleStore for more and more of their workloads. And if they do that, that’s great. If they don’t, that’s also great. Great with us, we’re happy to work side by side with these other databases.
One of the common themes that we talk about in the HumAIn podcast all the time is ‘humane’, which literally means to be human or human augmented. And that’s what I find so great about the SingleStore product is; you’re augmenting the current cloud, the current analytics, the current technology that a company may be using. The thesis is, well, you don’t have to replace what you’re using. Let’s augment. Let’s help you with different workloads and then get to a place where your mission-critical work can be something that our product not only supports, but helps you thrive and grow with your business. And that’s what a lot of the companies need is something that can scale with them as their growth scales as well.
I like the idea of using AI to augment and go beyond what you can do currently. That’s one of the things that we are looking at with our intelligent applications is that; step one is you add analytics to your application. So, if what you’re currently doing now is you’re using something like Mongo or a Document Store where you basically store and you can get. And then if you add analytics to that, you can understand the context of the world. There’s so much richer information that you can include so much more intelligence you can add to your application.
But there’s a step beyond that, which is the AI, the machine learning future, where you’re also including, real-time models. I feel like there’s a step between, you have data, you can use your data. Step two is analytics, and you can make decisions based on that data. But then there’s really intelligence, which is a step beyond analytics, which is driving real insight from the data and automatic insight from the data.
And virtually every application can use that in some way, but I don’t think many people don’t know how. And I’m hoping that at SingleStore with our future roadmap, we’re going to be able to help people go beyond just getting insights from their data and to be getting intelligence from the data. It’s like, instead of understanding what’s going on, it’s what should I do? And that to me is super powerful.
And so with that, let’s speak to the future of what we can and do as we move beyond an analytics world as a database that can support AI and different use cases. Of course we’ve seen throughout the pandemic, the acceleration of digitalization, lots of databases have had major updates like Snowflake going public and Couchbase IPO. What are you seeing as the next wave for the database industry over the next coming decade?
First of all, the database industry is hot. They think for something like 20 years, there were no database IPOs. And now we’re having a bunch all in a short area of time. And cloud is fueling a lot of this transition as is people’s recognition that as they have more data; in order to stay relevant, they need to get better at understanding their data.
They need to be able to go from collecting their data. So the first step of the big data was like people said, ‘Well, there’s all this data’. And then so people started collecting their data and that was like, ‘Okay, well now what do we do with it?’ They want to be able to make decisions with it. And one of the problems about data is that it’s inherently noisy. People don’t always understand what their data means. And so if you really want to make good decisions based on your data, you need deep understanding and you need better abilities to clean data and alert when there’s problems.
So to me, one of the biggest areas that we’re going to see is; I’ll call it middleware tools that allow you to get to the point where you’re making data-driven decisions by ensuring and asserting that your data is high quality. And while a lot of this is domain specific. If we want to get to the point where you can be like, ‘Okay, well, this number is changed’. And so this means we have to go do something different. You need to make sure that number means what you think it did.
And there’s a bunch of stuff, you need to do, to work backwards from that, to go from what people currently have, what currently is getting written to the database, to getting meaning from that. And so, while we see a lot of database companies now going public and the database is the infrastructure layer. It’s the first level. And we’re going to start seeing the business value flowing up the chain as we fill in those gaps.
People don’t yet realize what the problems are with data-driven decision-making because they haven’t gotten good at it enough to have hit those problems. And I see it all the time. I see internally at SingleStore, we do post mortems every time there’s a customer problem or an outage.
We dig into, we ask the 5Y’s, what went on. We have all sorts of telemetry, even in this specialized place, it’s easy to be like, ‘Oh, well this thing spiked here’. And so that must’ve been the problem and network latency spiked, and then you zoom back and then you realize actually that’s not an anomaly, that happens all the time.
And it’s really hard to make good decisions based on data and appointed things, if you’re just looking at data to say, this is a problem, or this is what’s going on. And so to me, the next big milestone is going to be giving people tools to help understand their data. A lot of it is going to be intelligence machine learning, that’s going to be required. They can let you know, ‘Hey, this thing is usually between one and three and now is seven’. And combined with humans are going to start getting good at helping people make data-driven decisions.
That makes a lot of sense, Jordan. We’ve conducted in-house research. The SingleStore research has also shown from IT professionals that over 35% plan to replace their current data solution in the next couple of quarters. And that over 74% of IT professionals do run into issues with their current data warehouse, which means as we’re thinking of data analytics, data science, and AI, we’re entering the next wave of technology acceleration automation. How do you see this unification being brought together? Let’s wrap it up and see how is SingleStore leading the wave of technology.
So great questions. So there’s a lot of people looking or hitting blockers in their current technology, whether it’s, one blocker might be that they want to move to cloud. One blocker could be that they want to scale. One blocker could be that they’re not getting the TCO that they need. And so people are making changes. There’s lots of money flying around, there’s lots of growth.
And SingleStore, because we’re in a position where we can take the place of lots of special purpose databases in one location, we’re in a great spot to pick up many of those use cases. We recently just had a round of funding in the fall and one of our investors, they showed this chart about real-time analytics and just the growth of real-time analytics.
And they said, ‘Hey, SingleStore is right here at the right spot as this is growing’. And so people are looking more towards real time, they’re looking more towards lower TCO, supporting fewer systems and also things that are going to be forward-looking. Nobody wants, if you’re an IT manager and you’re picking a new database, you want to pick the one that’s going to win.
The one that five years from now, everybody’s going to look back and be like, ‘Yeah, I wasn’t that person so ahead of the time by picking the right data infrastructure, because now that has enabled us to do us all these great things’. And SingleStore is going to be that database. It’s going to fuel the infrastructure and hopefully it’s going to make a lot of people look good for choosing it.
Definitely it will be in, with every part of the product roadmap, continuing to expand. I’m of course excited for many of the new product enhancements that we’ll be announcing in summer and fall 2021 and beyond. Jordan Tigani, Chief Product Officer of SingleStore, bringing to you today about how to power enterprises with intelligent applications. Jordan thanks so much for joining us on the show.
Thank you David. It has been great to get a chance to talk.
Thank you for listening to this episode of the HumAIn podcast. Did the episode measure up to your thoughts and ML and AI data science, developer tools and technical education. Share your thoughts with me at humainpodcast.com/contact. Remember to share this episode with a friend, subscribe and leave a review and listen for more episodes of HumAIn.