How Category Theory Is Changing The Data Science Industry With Eric Daimler, CEO & Co-Founder Conexus.com

Welcome to our newest season of HumAIn podcast in 2021. HumAIn is your first look at the startups and industry titans that are leading and disrupting ML and AI, data science, developer tools, and technical education. I am your host David Yakobovitch, and this is HumAIn. If you like this episode, remember to subscribe and leave a review. Now on to our show.

Welcome back , listeners, to the HumAIn podcast. Today, our guest is Eric Daimler, the CEO and co-founder of Conexus. This year I got the chance to speak with Eric about the work that he’s doing in AI and the work that he’s previously done in the machine learning industry, involved with the government and technology. Eric, thanks so much for joining us on the show.

Eric Daimler

It’s good to be here, David.

David Yakobovitch

Eric, I knew when we were speaking that you’ve done quite a few interesting things from math and technology and today, and the new advent of AI, it’s set for our audience. What are some of the things that you’ve done in technology that’s gotten you to where you’re at today?

Eric Daimler

Thanks for having me. It’s terrific to be here and I love to talk about these things. We can talk for quite some time. I’ve been fortunate enough to have chosen well, and chosen early. I’ve been doing AI for a few decades, really can count it as decades at this point, getting started building my first computer when I was nine and working on AI before it really reached the popular imagination of just the ordinary press that you see today.

I’ve been an academic researcher. I was faculty at Carnegie Mellon, and I spent time at Stanford and University of Washington, Seattle. I’ve spent time as an entrepreneur. This is what I’m, again, doing now for the sixth time. And I was also a venture capitalist on Sand Hill Road for a bit before then, joining the Obama administration where I was for the last year of the administration, just in a really privileged role of acting as an advisor around all things, AI and robotics. So, there are a lot of really talented people in AI in all of those capacities, but I am really fortunate to be exposed to the range of these domains or these expressions of AI from academia to business, to public policy, and serving the public.

David Yakobovitch

It’s incredible to hear the work that you’ve done, how it’s gone through the entire gamut of our industries, and especially around public policy. In the last few administrations there has been so much talk about AI-first, digital-first, data-first, and we saw a lot of that progress under the Obama administration. And now we’ve seen, fast forward to our current administration and the next administration, that AI definitely seems to be the focal point. Why do you think now it is more important than ever that we lead with an AI strategy?

Eric Daimler

It’s really interesting to look back at the growth we’ve experienced as private citizens and think about that expression inside the government. So, the really unique thing about a government is you can’t leave behind any customers. You can’t select the particular market you want to go after, the market is, literally, everybody. So we may have experienced some boom in our use of the internet in the 90s, and then some different sorts of expressions in the early 2000s.

Whereas in the Bush years, Bush 43, He did not have a big technology contingent. And as it was said, you could not go into a meeting, into a cabinet meeting and say, “I don’t understand economics”. You’d be laughed out of the room, or “I don’t understand law”. You’d be laughed out of the room. But you could go into a room and say, “I don’t understand technology”.

That trend continued in government. It’s a big organization and a big set of entrenched beliefs. So under the Obama administration that continued, and you saw that expression unfortunately found in the kids, that ultimate expression in the end of the initial tough rollout of healthcare.gov. That got corrected, not least because of the leadership and the terrific team behind Todd Park, who became a CTO, the U.S CTO around that time.

And that was part of a big effort that Obama made to bring in more technologists into government, understanding that we needed to have people that could go into a cabinet room, and raise their hands saying “I do understand technology”. That’s been a big change that we’re fortunate to have continued to some extent that, literally, the last law that was signed during the Obama administration, and there’s a fun story to be told about this, but just really quickly, quite literally, on inauguration day.

There was a rush to get through the appropriate physical security barriers that were put up to get to president Obama, while he was still president, to have him sign in the Capitol where he was right before the ceremony, the last law he would sign, which was to put in place forever this fellowship for innovators inside the government to help digital modernization of our US government. John Paul Farmer was the guy that saw that through. He’s now CTO of New York City, and he deserves credit for helping shepherd that to its conclusion.

And that continues, we’re fortunate to say, through the current administration. And we expect it to continue into the next administration. That sensibility around digital-first, a digitally native environment is something that we expect to now be expressed inside of the Federal Government, and continue to trickle down into states’ governments for all of our benefit.

David Yakobovitch

Now, as a leader, you’ve seen firsthand between public policy, between the private sector, between everywhere, so much innovation in technology, as you mentioned, evolution. How AI has gone through its waves of how digital policies are becoming mainstream. And as we’re moving forward into the next few years, you have this very unique lens to see where technology can go. As you’ve mentioned, Eric, you’ve been on both sides of the coin as an investor and a technologist, and now you’re running a new venture. What brought you to your new venture and what excites you for what you and your team are building?

Eric Daimler

One of the benefits of spending a lot of time in one industry is the same that anybody would enjoy, which is being able to see just a little bit farther because some of these just are become visceral feelings and, to some extent, feeling patterns, seeing patterns, experiencing patterns before one of those right now, I will say, is seeing the limitation of AI.

I began to sense this before even going into the White House, where I was talking to people about AI and what benefits it could bring to some extent. AI is under hype, and that isn’t the totality of its expression. John Doerr is a famous investor. He was often criticized for saying the internet was under-hyped in the early two 2000s.

And that was a bold claim, right before the crash, to send it to say it was under hype. But 20 years later, we see that the largest companies by market capitalization are technology companies. That’s just what the internet and our conductivity allowed. To say that the internet is under hype is a little bit along those lines, to say that the expression of these new technologies, these automations will be fantastically beneficial, potentially, to our society in ways that are really equally difficult to imagine even 20 years from now.

What I see, however, is that the emphasis on these algorithms as some sort of secret spice is really over-hyped, overblown. So, as much as Dimitri and his team at DeepMind, get credit for the work that they’ve done, and to some extent, media friendly in their domination of the game go, but in this most recent example of protein folding, they deserve some credit for actually providing some real benefit.

That’s terribly exciting, but what we can’t do is ignore how that came about. The only way DeepMind and those deep neural nets can be trained. So the only way that protein folding came about was because of all the failures of humans before the machines got trained on human failures. No, you can’t use that particular deep experience of massive amounts of data on every human problem and expect to come out with these mind blowing results.

It just doesn’t apply. So there’s limitations on that sort of technology. And we should regard that as being a little over-hype. What is the next phase? What I’m addressing is this technology that is spun out of MIT, that is based on discoveries and math, is really foundational. But there’s really nothing more foundational than the math.

This allows for an interconnectivity that’s just unfathomable with previous generations of technology. You really can’t have imagined what we can do if we are able to transform whole domains of knowledge and map them onto others. And this is really what’s possible with this, this new type of math. We’re used to these innovations in physics. All the time, it’s what powers the continuity of Moore’s law. For example, we’re less used to discoveries in math and that’s what really is pursuing this discovery and math that enables global interconnectivity of knowledge.

To say it another way, we are really at the intersection of probabilistic AI, like you would see in DeepMind and Symbolic AI, which is what you’d see in an IBM, Watson or a similar expert system. That merger is really going to be the transformation that powers the next generation.

David Yakobovitch

The previous generation, what you and I have seen a lot in the last 15 years

has been analytics, has been data ingest, has been ETL, has been moving data and understanding how it can transform, but just like you’re mentioning, Eric, it’s now moving into that next stage. And there’s a lot of mathematics that’s being invented, whether we’re thinking of the quantum realm or we’re thinking particularly of distributed architecture, there’s so much that’s changing nowadays. And what’s the part that your team is working on that you’re finding to be the breakthroughs, that is going to power the next wave of AI or the next wave of computing?

Eric Daimler

To have the MIT say it, my firm represents the first ever spin out from their math department, which is funny to say. If you look back to 1970, there was a discovery in math called relational algebra. If it were not for relational algebra, you would not have had relational databases, Oracle, and all those other companies, power, Amazon and all those other companies.

This discovery in math that we represent. There’s a discovery in this domain called category theory. Categorical mathematics, category theory, is really at a level above all those other mathematics that transforms a problem from, say geometry, into another problem called safe set theory. That happens all the time in other domains, happens in mathematics all the time.

And that’s why a category theory was invented. What we do is we apply it to databases. So we can be, we’ll say, above the cloud. We’re agnostic between AWS and Azure and Google Cloud, and anything else. That transformation between data models, between data stores, is not where the technology currently is, but it’s really where it’s going.

This is going to category theory. The math of category theory is going to really wipe the slate clean of it over the next 10 to 20 years. Because it just fundamentally changes how we relate to data. I say the more math, more math is better, but if I was to distinguish for our children, I would deemphasize geometry, trigonometry and even calculus. Calculus is really the math of the 19th Century.

It worked really well for machines and farms, and to some extent, aeronautics, but most of us don’t do those jobs anymore. The future of math, of a digital age, is going to be category theory. And unfortunately it’s easier than calculus. It’s not easy. But it’s easier than calculus. So that’s where I would choose. If you had to choose between math, this is the math of the future. It’s some acid digital economy.

David Yakobovitch

It’s so amazing to think about, as you mentioned with databases, how many databases are out there today last than DB engines? There’s over 350 databases from relational to multimodal to key value stores, to document-based databases and they’re changing.

And what’s exciting is no longer about a unique architecture, but that’s about optimization. And it sounds to me, a lot of the work that I’m doing at SingleStore is about query optimization, query engine speed up. And it sounds to me that a lot of this category of mathematics is about optimization.

Eric Daimler

One of the big transformations that we’ve had about data is that when I was an academic researcher, we were inventing machine learning algorithms and testing them on data that we could see. And that makes a lot of sense with what big data used to represent. Yes, millions or billions of data points. But as you get into tens of millions, hundreds of millions or trillions of data points, it really doesn’t make sense to be reasoning about data. You’re looking at one piece of data and testing it. It makes no sense. You can’t be doing that. You have to reason at a different level. This is what’s changed.

There really is something that is flipped over, a switch that has been flipped. Whatever you want to say, there’s something that has fundamentally changed over the last, I will say, maybe just two to three years, where we can know anybody that continues to reason about their individual data is going to be left behind. We were literally today, we were talking this morning with a large insurance company, one of the largest insurance companies in the world, and they recognize that the scale of their data ingesting, the rate at which they’re ingesting data no longer works for their process. And they have to invent some fundamentally new way.

They said that they can’t just hire a hundred more people. It just doesn’t work that way. You can’t have just 10 more centered homes, you can’t manual yourself out of these sorts of problems. And that’s what this math represents. You can’t reason about a trillion. You have to reason it at a higher level. And that’s what category theory allows.

I’ll give you a story. We were talking to a logistics company who is during COVID, early in this COVID process. And this logistics company, they came to us because one of their clients actually had these ships all over the world. I had no idea how big these things were.

They’re not necessarily ships. These companies, tens of thousands of employees, their client had tens of thousands of employees. Then they had hundreds of ships. Each one of those had tens of thousands of these shipping containers on them. This is the point about the reasoning about the data. The question was, where is my personal protective equipment? Where’s the PP? And then where is it in the world? And then, do I send it to Rome or Houston or, or Seoul? This is funny for me. Or just sitting at home thinking, I can go on Amazon. It gets some chrome dumbbells to my house in 48 hours.

How do you not know where your stuff is? Your clients’ stuff is in your ships. And the issue is, they could find it, but if you have that sort of inquiry, it takes a week to four or five days to find out where my stuff is. You can’t do it instantaneously. And so you can’t make business decisions quickly, and you can’t be responsive to business problems, take advantage of changes quickly. So what we do, to tell you another story, what we do and we’re going to, where the future’s going.

We worked with this one big ride sharing company and there are a lot like many of the people listening to this, where they will see these federated IT systems, siloed IT systems, despite this one company, having very smart people, which are really, despite this company, having essentially an infinite balance sheet. But they could spend as much money on tech as they wanted to, despite that, they had to have analyzed this fundamental basic business problem.

How do rates affect demand or how does driver happiness affect customer happiness or what happened city by city? So they could do it for New York City or they could do it for Boston, but they couldn’t compare the whole Eastern United States, let alone the whole world.

They had to do a similar city in the statistical comparison for any business that cares about margins. That had some inefficiency and also slowed them down. So what they came to us for, they looked all over the world that was uniquely able to solve the problem, because it is fundamentally in the math, we did a different level of math. It’s at a higher level of math, a level of abstraction. We uniquely allowed them to model the world in which they operate their business and make bigger decisions better and faster.

We’re not the only ones doing this math, but we’re the only ones doing it in this enterprise software. So we were able to solve that. That’s where the world is going. And the reason is because the data is just too big. You can’t be looking at the data. You have to reason about it. You have to reason about the data at a higher level. That’s the point from both of those stories.

David Yakobovitch

And I love both of those stories because a company that you’re talking about is named on your site and our site as well. At SingleStore, we work with Uber and we actually help them more in the concurrency side of being able to view the analytics of so many moving pieces at the time.

And what you’re saying is resonating not only with myself, but our audience, Eric, that every part of the data pipeline is just getting bottleneck after bottleneck because the math hasn’t evolved and now it needs to. We need to speed up latency and queries and ingest and storage and compute and make everything bottomless. And it’s an exciting new world that we’re moving into.

Eric Daimler

Math represents a quantum computing of computer science. Quantum computing gets a lot of press and it’s fantastic, but that’s at the hardware level. Let’s say it at the software level. This is the quantum computing of software, this new discovery around category theory. That’s going to just power a whole new change in our environment, as business people, as academics, as citizens.

David Yakobovitch

And it also sounds like before 2020, in the last couple of decades, the technology has been trying to get there. It’s been very imperative, but now it’s becoming declarative. There’s these set of rules that are being formed, the set of rules for understanding how architecture moves, how data moves in the mathematics behind these systems, that power Uber and telecoms and all these exciting companies, you actually give a name to it. What do you think the future is?

Eric Daimler

Declare a future as a future, that’s formal. We’re formalizing that this is useful for all of the listeners, which is the building of the skills, starting with the awareness of exactly what we want to have happen in a particular automated sequence. Exactly what we want to have happen. Lawyers are often trained in this as are computer scientists, but it’s a skill that everybody can develop. And it starts by being aware that you can start being aware of that.

What actually happens in a set of rules? How you want that automation happen. We express it as a computer science by saying, for example, if an automated car comes upon a crosswalk and they see a shadow, should the car stop, slow down or keep on going at the same speed at the crosswalk? What do you want to have happen? You need a program exactly what you want to have happen with some sort of variables. Those are data constraints, and you want to maintain the integrity. That’s kind of the point when you talk about query optimization, you want to be guaranteeing the integrity of those queries, of those questions.

No matter where the data goes, no matter where the question goes, no matter what type of car, what crosswalk, what geography, whatever you want, you got to maintain the integrity of those queries. And so, the future being formal is really where the world is going for all of our structures that sort of declared a future. The future is normal. That’s how all of us can participate in the future of AI.

David Yakobovitch

It’s so fascinating because my world is SQL all day, every day. And I think about SQL forensics and how you make queries better, whether queries are mobile first, whether queries are computer first, GPU’s, Asex, FPG, is whatever hardware you want to insert. And the challenge is SQL is not everything. And is it good enough to do everything we need? And perhaps it sounds like the mathematics that your team is working on in a lot of research is part of that next evolution of SQL.

Eric Daimler

This is fantastic. It’s funny to say this about SQL, that there are three ways we talk about solving data interoperability, mismatch data doesn’t like to talk to each other. This was famously shown in healthcare.gov and the initial rollout of healthcare.gov. Thank goodness it quickly recovered because of the brilliance and hard work of the team. You can create a data silo.

This is what IBM Watson wants you to think. This is what SAP wants you to think. Put all the data inside our silo, and then the world will be great. The problem is, as soon as you acquire a company, acquire a customer, acquire a supplier, you then have data outside of the silo. And so you have the problem. It starts all over again. Just like your point about saying SQL’s not everything there. There’s a world outside of SQL. There’s some 9 million SQL programmers around the world. So there’s a lot of it, but there’s stuff outside of these silos.

The second way to solve the problem is a company like Enigma, where they create a silo and then they sell a subscription to their data silo that they had created. That’s our third one solution. Then there’s another way, which is just this hopelessly manual way, which powers the growth of all the consultancies taught at Wipro, TIPCO, not to say not, not at least which Accenture, Deloitte, Capgemini, their revenue growth over the past 20 years is highly correlated. I encourage you to take a look at it sometime. Highly correlated to the rate of data growth.

And that’s because a big part of their business comes from this horribly manual data integration. So this is the point about data interoperability. That solution will come from this abstraction of mathematics, which is fundamentally about the metal layer of math called category theory. If I said it’s the future, David, I think I did. It says a future.

David Yakobovitch

I love it. And the future is a combination of both business and technology. At SingleStore, actually, a lot of our leadership team comes from TIPCO. So we have a lot of the business leaders who are scaling with some of the interesting technology that you’ve worked on and all this technology.

It’s fascinating. Everyone talks about AI so much. We talk about on the show, you and I were talking about it today, but what is AI really? It’s had so much hype both over and under. The conversations I have is that a lot more attention needs to be around software engineering, not only machine learning. But can you share with us what’s your take, being someone in the industry on what is AI really?

Eric Daimler

It’s funny. I’ll give you my definition, but I’ll tell you a little story about how we got to this. When one of my first days in the Obama White House, I said I was in this very fancy room, it was an Indian treaty room, and around the table where my peers at the executive branches, State, Treasury or Defense, and then Health and Human Services, Transportation and DHS. And one of the first tasks which seemed pretty natural to me was, “Everybody, can we just get our definition of AI or definition of robotics there?” It was pretty easy. Because what the job was at the executive was to coordinate research dollars among the executive branch. This is a bipartisan issue.

Let’s have government money be spent. So, we’re trying to coordinate across the executive and we’re trying to start with less definition. It seems like a reasonable place to be. So we ask, we go to the State Department or Health and Human Services or the VA, and they’re concerned about the collection of data. I need to collect data. That’s really what they want to do. And then they’ll have some humans interact with it because the automation of input to the State Department is, say, premature.

You want to have humans be doing diplomacy and health and human services, special surgical robots. Let me input the data. Then, same with the VA, we can automate the acquisition, the data, but we’ll leave it right there. And then we go to the Department of Labor, Treasury, and they’re saying, really, I collect a lot of data already, but I really need to process it. These AI algorithms are fantastic, but I need to look for fraud.

I need to ensure compliance. I need to just make sure things are proceeding as they should. So I need to be processing this data quite a bit. And then you go to DHS or the DOD, or you go to the Department of Transportation and they’re saying, I actually already processed the data. I collect the data. I already processed the data. I just need to act on it. I need to decide how I’m acting on it. I can’t stop the train for my figurative crosswalk example before and think I got to figure out how I act and keep people safe or affect the result that I want to affect, that I’ve determined in hopefully a formal way, as we’d say, if you were the DHS or the DHL.

So that initial conflict is actually already expressed. Entities around the cabinet office, in the branch of the government, that is that the AI is a totality of a system that senses plans and acts. They got to learn from the experience. Since this plan is an X and learns from the experience and the people will find their definition in their day to day life, even around AI, the traditional way people might think about it is in the planning. And so, if you want it to be really pedantic, you’d say it’s particular learning algorithms, that deep learning, is of which is going to have the most trendy expression. But that’s a subset of machine learning, which is also a subset of ML.

And they’re also non machine learning AIs. But that level of specificity is really only useful if you’re an AI researcher. I’d say for the 99.9% of us that are not AI researchers, you’ll benefit from it taking on the interpretation of AI, being a system that senses plans, acts and learns from the experience.

David Yakobovitch

And it senses plans and acts from inputs, inputs that you and I are giving to it. One of the big breakthroughs we saw in the coding world in 2020 was open AI came out with GPT-3, which helped with a lot of auto complete texts and all the complete code. So now we’re seeing a lot of skeletons and frameworks of development becoming more simple for developers. So it’s incredible to see that just input can change the game.

So as having breakthroughs, we’re seeing the modernization, but standardization of these systems with the math, with the queries, with the optimization, that there’s a long way that we still have to come. As we’re now gearing up for the entire decade of 2020 plus. There are problems that have not been solved. One of them is COVID, but there are other problems as well. What are some of the biggest problems that you’re seeing for AI?

Eric Daimler

At a high level is that AI could be a dystopia or it could express a utopia, and it will only express one or the other based on our input or lack of input. So a big problem is our misinterpretation of what it is. And how to get involved. I hope that my interpretation can provide value to people, for them to find a place in the world, for them to get involved. And that could be through education, that could be through the public policy discussion, that could be thinking in their own work, how to formalize rules for automation, which is fundamentally what AI is part of that interpretation or misinterpretation.

Can be shown to things like the protein folding and the over interpretation of that seeming magic. Also in GPT-3. GPT-3 is certainly a contribution, but it came about, it was enabled. It wouldn’t have existed if not for human failures, it got trained on everything that didn’t work. Just like human protein.

So this is something we can continue to realize as we interact with these augmentation tools, and these augmentation tools, they are a good way to think about what we want to come from these augmentation tools. We want collaboration. So the fundamental driver in the next 10, 20 years is going to be collaboration.

There’s going to be interaction. Companies are going to win the countries, really, they are the ones that are going to be implementing AI with the most alacrity, we will all benefit by thinking in these contexts. As we develop some standards, the standards for data providence, lineage. There’s a new effort by Kathleen Carley over the past 12 to 24 months that doesn’t take a purely technical approach to cybersecurity, for example.

She embodies the belief that technical problems will not solely solve these issues. So she works in cyber social security. That’s a fantastic framework to be thinking about many of these efforts around AI. And then, lastly, I would say, for our children, for education, the math issue is a big one.

I also can give a shout out to first robotics. First robotics will tell you this is like the robotics expression of when I was a boy scout. When I got my Eagle scout. You have these teams of children, all genders are sitting there playing together and creating robots to solve a particular problem given to them by whoever, the team lead, the organization lead. It gets these kids comfortable with interacting in a technical environment, but not necessarily needing to be a programmer in a basement, which is what I was, and this is what we need.

We don’t need everybody to be a programmer in a basement. We need people to be playing a multitude of roles and it helps little boys get comfortable that there’s a lot of different places to play. There’s not just a choice between computer science or an English degree, for example. So those are the three things I would leave with it, just these policy considerations, but these places you can get involved, and a way to focus our educational efforts.

David Yakobovitch

One of the points in education that you share that I find so meaningful is that we hear all the time in the world today, learn to code, learn to program. We should all become computer scientists, data scientists, software engineers, and yes, it’s a very exciting intellectually rewarding career to program and build systems. They’re both autonomous and human controlled.

That is a lot more than only programming. I’m a big fan of Carnegie Mellon’s ethical guidelines for AI and that checklist. I work with students when we do our design thinking workshops and say, it’s not just about the data scientist and the software engineer. How about the lawyer and the product manager and the site reliability engineer and the end customer everywhere.

There needs to be that checklist. And building a human augmented future requires all these inputs and all these voices. As we’re moving into the next decade, there’s so much for AI that’s going to change we haven’t discovered it all. What’s beyond what you’ve shared with our audience here today, Eric, going into 2021, are there any big aha moments, or burning desires in the AI world that have been on your mind as of late?

Eric Daimler

If I was going to try to leave people with something around which they could take action, besides education, besides category theory, besides getting involved and participating instead of just waiting to resist the adoption of technology, it’s around this issue of auditing AI, and putting in circuit breakers. Just because we can automate something doesn’t mean we need to automate a whole chain without human intervention, without human oversight.

We can get a lot of the benefits of this automation and augmentation while having multiple circuit breakers and multiple audits. This could actually encourage the adoption because it will breed trust in the ultimate result. I mostly want people to be engaged in the conversation around the adoption of this very powerful technology.

We can’t just wait for a whole bunch of nerdy basement rollers, like myself, to be coding up the future. And then just hope that people liked it based on our customer surveys or the degree of skill we bring to product management and what we need. We would benefit as business people, as computer programmers. And society benefits by that exchange of ideas, that exchange and communication of values.

David Yakobovitch

Eric, thanks so much for bringing your ideas and your values today to the show. Listeners, it’s been Eric Daimler, the CEO and co-founder of Connexus. Thanks for joining us on HumAIn.

Eric Daimler

Thanks, David.

David Yakobovitch

Thank you for listening to this episode of the HumAIn podcast. Did the episode measure up to your thoughts on ML and AI, data science, developer tools and technical education?

Share your thoughts with me at humainpodcast.com/contact. Remember to share this episode with a friend, subscribe and leave a review. And listen for more episodes of HumAIn.

Solid Data AI Thought Leadership

Actually being done in AI

Thought-provoking

Putting things into perspective

Digging into AI