Andreas Blumauer is CEO and co-founder of Semantic Web Company (SWC), the provider and developer of the PoolParty Semantic Platform. With headquarters in Vienna, Austria, but operating globally, SWC has worked with over 200 commercial, government, and non-profit organizations to deliver AI and semantic search solutions, knowledge platforms, content hubs, and related data modeling and integration services. SWC was named to KMWorld’s prestigious list of “100 Companies that Matter in Knowledge Management” from 2016 to 2021 and has been named multiple times in Gartner’s Magic Quadrant for Metadata Management Solutions and as a Sample Vendor in their Hype Cycle for Natural Language Technologies.
In his role as CEO, Andreas is responsible for both the strategic growth of the company and its organizational evolution toward a highly focused customer orientation. SWC has grown every year since its inception under his leadership, and has been able to develop a cutting-edge and unique software platform that is ISO 27001 certified, and deployed globally across a number of key industries.
Full Episode Transcript
And is it possible to build a recommender system with unstructured data? Or do you need to pre-structure the data through, for example, classification?
That's a great question. I think, that's at the very core of what we're doing.
Lauren:
Hi everyone, is Lauren Hawker Zafir. Welcome back to be Redefining AI, the tech Podcast. I'm an educator and I'm taking you on an educational exploration into the fascinating minds of those that embody and forefront all you need to know about artificial intelligence, machine learning, insight engines, and the insights era.
This episode is called Spotlighting Knowledge Graphs. And it's an ode to the knowledge model and the corresponding technologies that are intertwined into providing the knowledge management world with tools that put data in context, whilst providing a framework for data integration, unification, analytics, and sharing.
My guest and thought leader today is, Andreas Blumauer. Andreas is the CEO and co-founder of the Semantic Web Company ("SWC"), the provider and developer of the PoolParty Semantic platform. With headquarters in Vienna, but operating globally, the SWC has worked with over 200 commercial governmental and nonprofit organizations to deliver AI and semantic search solutions, knowledge platforms, content hubs, and related data modeling and integration services.
The SWC was listed on KM Worlds prestigious list of 100 companies that Matter in Knowledge Management from 2016 to 2021. It has been named multiple times in the Gartner's Magic Quadrant for metadata management solutions, and as a sample vendor in their Hype Cycle for Natural Language Technologies. In his role as CEO, Andreas is responsible for both the strategic growth of the company and its organizational evolution towards highly focused customer orientation.
Now, many of you may not know but I'm a corpus linguist. So, the topic for frontier today is of great interest to me. And it should pose as an exciting discussion for all of us. So, Andreas, it's a pleasure to have you here. Welcome all the way from Vienna, the wonderful city of Mozart. Are you a fan?
Andreas:
Hello, Lauren. Thanks for the introduction. How are you today?
Lauren:
I'm wonderful. Yourself?
Andreas:
Yeah, all good. All good here in Vienna.
Lauren:
Are you a fan of Mozart then?
Andreas:
Yes. I mean, I should say rather, yes. But to be honest, it's not exactly the music I typically listen.
Lauren:
Yeah, of course. Vienna is a wonderful city. I'm sure you are honored to be based there. There's a lot of wonderful things that anyone listening, if they've not been to Vienna, can certainly explore and see.
So, Andreas, we've got you here. And as I mentioned, we're spotlighting knowledge graphs. Maybe the question would be where do we start? I would personally imagine that a good place to begin maybe with the name, Tim Berners-Lee? Who is he? And how does he fit into the narrative that we're sharing today with our listeners?
Andreas:
Yeah, excellent starting point. Obviously, it was him also bringing this whole idea to me, to us, into a big community. And in the meantime, to really global community. Some of the newer colleagues in the Knowledge Craft community probably haven't thought about that, the roots of course are laid by Tim Bernes-Lee and W3C and the Semantic Web standards to still kind of lay the foundation of what we are doing in the Knowledge Graph community. In the meantime, there have been a couple of new terms introduced. Somewhere in between it was called Router Linked Open Data, Linked Data. Now, we have the Knowledge Graph in place. The next iterations are already underway. You can easily say a semantic data fabric is kind of a next generation Knowledge Graph implementation.
But in all cases, W3C standards, like RDF, Sparkle, etc., is at the very core of what we are doing. But what's also important, it's definitely not just the standards and the technologies. What we have seen, it's more about change management in organizations is the new way to deal with data in general, with content. This is a highly collaborative environment, which is developed in any Knowledge Graph project. This means that new governance models have to be established and that's probably the biggest bottleneck in most cases. It's a new way to deal with data and look at data. It's no longer this highly centralized approach. It's like the Semantic Web originally was designed for the World Wide Web. So, obviously, centralization is possible. At the same time interoperability between different sources is more important than ever before.
For instance, when looking at supply chains in our today's world, it is even more important now to come up with more and more standards around data. So, the Semantic Web was a big vision of Tim Berners-Lee for the World Wide Web, first of all, and now has been implemented in many organizations. Whereas on the web, it probably hasn't reached a level as probably, Tim Berners-Lee initially had thought. There are a couple of platforms that kind of came in between these plans. To name one, Facebook, definitely is a bit of a competitor of the original Semantic Web idea. But let's see, there is a next wave coming in.
Lauren:
Okay. You mentioned one word a few times there. So, you mentioned Semantic Web standards. You mentioned W3C standards. You mentioned standards there just in the latter comment that you gave. What is the circulation around standards and obviously how we have introduced this sort of Semantic Web?
Andreas:
So, the initial phase was very much of academic discussion. So, it took a while until those standards got implemented in enterprise ready platforms. In the early days, I remember it was like 2004, 5, 6, or so when we started out to create first technologies. It was obviously very immature, Graph databases did not work on the enterprise level. In the meantime, the standards were really embraced and circulate in many different platforms and ecosystems. Just think about Neptune on Amazon for instance, they have implemented also the W3C standards. At the same time, of course, we should be clear, graph technologies could be not just what W3C has supported. But it also can be LPG graph. So, it is labeled property graph standards, which now come together more and more. There have been always this discussion between the two graph communities, which one is the better? In reality, none is better than the other. It really depends on what you want to do with the data. At the end of the day, if it is about data integration and interoperability, then definitely the semantic web standards work better. If you're going to do certain types of analytics around your graph structures, then probably LPGs work a little bit better.
Anyway, now things come together, and already new standards out there, which bring the two communities together. And so, it has developed in the past 20 years in a quite nice direction, I would say. Most of all, the majority of the data, people now have understood that graph is not an academic exercise anymore and just a couple of nerds that believe in that. But it really complements the modern data stack with some tools, which are really important to get one step further towards Explainable AI, for instance, or a better data analytics, or let's say more efficient data integration methods, which knowledge graphs can support.
Lauren:
And I think that's probably a good segue because it is all about data. And obviously, in relation to understanding the Semantic Web and how this concept leads on to understanding knowledge graphs in detail. I think that probably we need to look at the detail that is forefronted, or used, or that does play a role. I mean, I've seen myself that two types of data play an important role. And maybe there's more and that's obviously why you're here to give us an insight into this. The linked open data and semantic meta data are two important concepts and data forefronts when we're spotlighting knowledge graphs and data knowledge management. Is that true? How do they or what role do they play?
Andreas:
I mean, just to explain that maybe a little bit more in detail, it's definitely not about replacing any existing data infrastructure. It's about complementing, enriching the existing data points with consolidated and consistent metadata on top. Plus, it brings in knowledge models. So, I think that's very important to see. At the moment, there are two types of ideas of how to use knowledge graphs around two communities. The one is really very focused on using knowledge graphs with data integration. Let's put the semantic layer on top of all the existing data sources and let's link that more efficiently. If you want, you can of course much easily bring in third party data sources to integrate internal with external data. You can integrate unstructured data with structured data, and so on. So, that is the one.
Then the other approach, so to speak community says, "Okay, let's use the data we have. And then let's put some additional knowledge on top of that. Plus, of course, the semantic metadata, the consistent metadata, the semantic layer plus knowledge models." We put something next to the data we have, which is a kind of a domain model. And the domain model could then be used for various reasons. So one is definitely better usability. Let's say, better user experience in an insight engine, that is one. The other one is you use this kind of knowledge to do consistency checks, to validate the existing data. That is a bit of a strain so to speak. It's like, let's use the knowledge to do reasoning, to put some additional facts next to the fact base we have already in our databases. So, there's various, let's say, perspectives on why knowledge graphs could be useful.
And that makes it at the same time also a bit complicated, this discussion. Because there is no such a thing, which is, you know, one concept for a knowledge graph. And it's like this, you know, the elephant and the blind men metaphor where people start to see, okay, what is in front of me? One touching the elephant at the foot says, oh, it's a tree. And then the other one, touching the elephant somewhere else on the body says, oh, it's a snake, etc. etc. So, this is very complicated, this discussion I have learned. Because of that, the first thing that I always ask when we start conversations, "Where do you come from? What is your background?" And for instance, you said that you are a corpus linguist, that's yet another great way to use knowledge graphs. When you start using a lot of unstructured data and want to probably put some machine learning around that, extract the most important sentences, etc., etc., then a knowledge graph can really help to accelerate the training of the machine learning models. That is another frequently seen use case now when knowledge graphs become kind of an additional source to train your models. Learning highly benefits from semantic meta data and the background knowledge that is important in a given domain.
Lauren:
And have you seen that there has been a positive embodiment of the latter? You mentioned that sometimes the understanding, or understanding the concept of a knowledge graph, per se, is a more difficult topic when working in the domain of using it for machine learning. Have you noticed that there's positive adoption and understanding in that domain?
Andreas:
Simply spoken, not a machine learning community is open to the idea of using knowledge models to do for instance data labeling and preparation of training data. I remember like five years ago or so, when you came to a meet up where the machine learning people heavily discussed what is the next best algorithm for certain tasks and you were approaching them talking about Semantic Web and knowledge modeling. They really told you, no, sorry. No need for that kind of data. We don't need you.
So, it was really again this symbolic AI versus statistic AI, two tribes not talking to each other. But this this situation has changed in the past, three or four years I would say. It's like, okay, some of the bigger machine learning companies have come to the point where they've seen from an algorithmic point of view, it's no longer possible to really reach the next level of quality. So, obviously, it's about the data itself. And also, this ability to bring in a knowledge model into the mix. So, you cannot train algorithms to do something, which is not even in the training data written. It's not there. This knowledge isn't in the databases, or in the data sources, or in the content. It's just not there. It's somewhere else. It's typically in our heads. And if we want to encode that and bring that next to the machine learning models, you need to fuse those two worlds. And I guess that's really the next big step here. It's all around. This discussion is very vibrant. In the meantime, different. I mean, if you look at the research papers, which have been published in the past 10 years, this fusion has been underway for a long time. But now, it starts to get commercialized. There is like 10 years of research and companies start to implement that and productize that. This is really the next big step. This composite AI topic where different AI methodologies can get fused. There are so many ways to combine graph with machine learning that is really exciting.
Lauren:
Yeah. I was going to say, I mean, I think for AI architects, I imagine that they would benefit a lot more from composite AI than graph technologies, as a segmented offering. And I think that's obviously what you're trying to express here. It is really a beneficial sort of fusion between the two, this composite AI.
We can go back to composite AI in a little bit more detail, but I think it is just as well to spotlight the whole graph technologies. There are two concepts or two terms that you always hear if you are reading about the technologies, about the space. And that's ontology and taxonomy. And I've heard personally, these use them very interchangeable, different ways. And you ask one person, what's taxonomy? What's ontology? And they will give you an ambiguous explanation. So, maybe from an expert like yourself, Andreas, you can give us a definition of what are these and why are they important?
Andreas:
Yeah. I do two explanations. So, the one is more away from all the standards, kind of an explanation that can be understood if you're not familiar with the standards even. Let's say in your kitchen, you have tableware there, you have some ingredients put in a certain place in your kitchen, and have some kind of recipes in mind that you'd like to cook and things like that. That is the ontology, right? So, you know, what can you cook because you have probably, gluten intolerance, so this is the ontology. What type of things are needed to cook something? What kind of knowledge you have to have to cook something? That is expressed typically with the ontology.
Taxonomy then is kind of more on the instance level. So, you really go down to the details, take a look at the names, what different things have, and how they're classified. So, it's really for classification. Taxonomy is a great way to quickly contrast a knowledge model that helps classify instances of things. And takes care of naming conventions. By that, it's a great source of truth to automatic text, text mining in general.
The ontology piece is, let's say more on the abstract level. It's definitely frequently used for schema management. I mean, schema.org, which is heavily used also on the World Wide Web, by Google and others. It's a very simple ontology. And I think that's definitely also the level of expressivity of what ontologies are typically used for. The good old heavy weight ontologies, which then also contain axioms and things like that are not that much used in industry. But for schema, and for the abstraction of what kinds of things are out there, and which classes of things are out there, and how they relate to each other on an abstract level, it's a great way to use ontologies.
And there are standards. So, the SKOS standard is the W3C standard, which is heavily used to create taxonomies. It's not talking about the governance model or the way you create them. That's always a matter of discussion with any organization, because there are various ways to create taxonomies. But how do I put them into a formal data representation format, that is definitely cost. And now, the confusion is complete, I guess, the SKOS itself is an ontology.
Lauren:
Yeah. SKOS score. That is a standard that is obviously a universal data representation standard?
Andreas:
Yeah. It stands for simple knowledge organization system. And originally, it was actually designed to have this interoperability layer between different taxonomy standards. But you can use of this cost very nicely to create taxonomies, natively on cost. And then, use them as a first starting point for a knowledge model which then later drives the creation of knowledge graphs, where the majority of the steps are done automatically, of course. You don't want to create a big knowledge graph by hand. But taxonomies are typically a human in the loop. So, you need people to take a look at the taxonomies and obviously on the ontology part of your knowledge model to see that this has really, really good quality. From a quantity perspective, it's probably one or two percent of your knowledge graph. But this drives the automation steps.
Lauren:
Okay. So obviously, with the qualitative perspective of the human in the loop, and they're looking at the buildup of the taxonomy, if we look at maybe from a statistical representation from data, does Master Data Management fit into that? Because they're both fundamentally concerned, like knowledge graphs, and MDM, with creating a unified overview of data that could help with the build of taxonomy?
Andreas:
Yeah. There is huge convergence now underway. There is MDM, Master Data Management. There is metadata management. They are data catalogs. There are all kind of data modeling exercises, which now eventually come together. It's called the Semantic Layer, if you want. So, the MDM systems, which are currently used are a big rigid. They do not really embrace graph and that's the biggest disadvantage of the existing MDM system. So, they just have started to understand graph is more flexible. It nicely fits into HR methods. You can extend on your models quickly and you would not break up any existing applications necessary because it's really decoupled. So, application logics is no longer containing any data models that much. It's really decoupled. As it's nicely described, semantic web standards are also good for self-describing data model. So, there is no need to have a manual anymore. So, what does the database means? What does this column with the header, code ABC, 123 mean? It is self-describing. And it's based on formal, logical models really. It's not just the proprietary format anymore. That makes Semantic Web so such a great way to create a huge ecosystem across organizational boundaries where data can now start to get linked and then better understood by the other parties in a value chain. It is really a way moving forward that human beings can work together at scale. That is what I really like about the Semantic Web and knowledge graphs in general. People collaborate more efficiently, we need that.
Lauren:
And you mentioned, the start there obviously it's quite a good use case for those in HR. And I recently watched one of your videos and you'd hosted a seminar, Pool Party hosted a seminar. They highlighted a successful business case that forefronts the use of knowledge graph technologies, knowledge models, and knowledge expert finding. Would you say this is representative of a successful business use case that maybe we can talk about, and you can show that this is representative of why and how this can be used in implementation in a business unit?
Andreas:
Definitely. I think what we currently see is the new work and the digital workplace, all these infrastructures needed to make people work together more effectively in days of a lot of working from home. And HR is definitely one of the units, business units in any organization, which is currently looking around for new technologies and semantic knowledge graphs are used heavily in this industry, in this part of any organization. It's always a great example of how standards can help organizations. For instance, recruit people more efficiently. The ESCO, the European skills, competencies, and occupations, it's open and free. You can reuse that. And of course, you can repurpose that to your specific industry. And it helps to enrich, for instance, a CV, a biography of people with background knowledge, which typically probably people would have like the recruiter understand, "Okay, in the curriculum vitae, somebody says, I'm an expert for Prince, as a standard." And then, you can infer from that person must be really a good fit for our new project management position. I mean, that's a trivial example. But it's the way it works.
So, it's probably not written explicitly in an application and motivation letter, but as an expert for labor market, and for skills, and occupations, and also the, the universities and colleges people attended, you can infer if it's a good fit or not. And that is something you can partially automate with the knowledge model running in the background. And that's exactly what this HR recommend can do quite well, bringing people together with open positions, projects, or the customer comes up with a new specification, and you have to fill the open positions in this project. You can find the best people in the company that can do that with a project configuration. Many complicated steps towards a successful human resource development can be fulfilled and can be supported by such applications. And it's a mixture again, of typical machine learning base, text mining, plus knowledge models, which contain the knowledge of a certain domain.
Lauren:
Yeah, it's fascinating. I mean, the assistive capabilities and what you can do with those is wonderful. And I know that you, Andreas, as well, you have built a recommender system based on knowledge graphs. Is some of this background knowledge that is required enable you to build better recommendation systems or recommender systems? Is it possible to build a recommender system with unstructured data? Or do you need to pre-structure the data through classification for example?
Andreas:
That's a great question. That is at very core of what we're doing. So, it's obviously always about precision. So, I mean, recommender systems are only accepted by people if they come up with really good recommendations. If there's a set of bad recommendations coming up, then you'll probably switch it off. I mean, we all have had such recommenders on our desktops in earlier days that we switched off because we didn't like it. So, we really need recommenders which understand the user as a persona, as a person who has some tasks to fulfill in a given context. By that, it really has to have the ability to triangulate the potential answers or the potential good contents and the knowledge pieces and knowledge bases available. We have to triangulate that was the intent and with the context of the user. So, such models are great for personalization. Personalization is the foundation layer for a good recommender engine.
And here, you need a domain knowledge model. So, this is typically not that precise, if you would leave that out in your AI architecture. It's not written in the content, this kind of knowledge of what a good recommendation is about. It's worth looking at that in more detail. If a recommender should be implemented at a digital workplace, in an enterprise, to which degree should you start compliment your AI architecture with semantic knowledge model? This is really a key question that we currently go through together with a couple of our clients.
Lauren:
I think that probably one of the most difficult challenges is the inference. Like how do you infer a user's intent? And trying to understand that user as a persona to be able to really build this knowledge model and the recommender.
We've not got too much time left, Andreas. But we do have a couple of questions from our listeners. So, they've submitted a couple of questions. How are you educating the market about knowledge graph?
Andreas:
It's much easier than 10 years ago to talk about knowledge graphs, I have to say. So, like 10 years ago, it was very strange when you approached someone and talked about graph technology. They looked at you and said, "Who are you? And what is that?" And the meantime, people are very open minded. And they always say, "Tell me more, graph sounds interesting. So, what is it?" And I think it's worth reading a couple of books or watching some videos. I mean, there's a lot of stuff out there. In the meantime, also for practitioners in the meantime, there is also a lot of stuff. Let's say that the very academic approaches anymore towards semantic web adoption, which I also appreciate. Don't get me wrong. We need people who lay the foundation. But then there are others who have to explain also the business value. And that's two different narratives, really.
And I always say to people who would like to learn more, let's try it out. I show you examples. There are lots of demos and lots of little applications, which already showcase the real benefit of what you can do with that. And then, of course, when it comes to the question, how can we start such a project? How can we implement that, that's a different story. Here we have to take a look at the resources, especially at the human resources, to see which people could work on that and which background do they have? The minimum is really that people start to understand kind of the anatomy of a graph. So, it's like when you start to learn medicine and want to become a doctor, you should know how the human body kind of works. And this is really the starting point. What is a graph and really look at the internals. What are the pieces and the components? You will quickly see that you already have a knowledge graph. Any organization has a knowledge graph. Otherwise, it wouldn't work. But the point is, it's not digitized, sometimes that much. So, it's like in our heads, somewhere in emails, it's not structured. And most of all, it doesn't use any standards, which makes data interoperable. We won't centralize our data management, and that's great. We are living in a more and more decentralized world. So, by that, we need a way to bring data in front of others that they can easily ingest it and therefore you need interoperability standards. That is what a graph essentially is. But we have that already and have to transform them into digital assets. There are standards for that, there are tools, there are databases, and all there in the meantime. That is what we try to explain in little videos. We have Pool Party Academy. There is a Knowledge Graph Cookbook together with my colleague which contains a lot of knowledge. And any other sources out there like for instance, Squirro Academy.
Lauren:
Yeah, I mean, I think that it's a never ending, but also rewarding path that people, organizations have to follow to really enhance and empower individuals because education is all about empowerment and enhancement. I loved your analogy, looking at the anatomy of a graph. And I think you've probably convinced a lot of listeners with that analogy to maybe take an opportunity to explore. As you've mentioned as well, you have your own educational resources that are available on the Pool Party Academy. So, they are certainly there for all of our listeners to explore too.
Thank you so much, Andreas. It's been wonderful having you on the show today, Redefining AI. I have certainly learned a lot. And I'm sure that everyone else listening has. Would you like to see anything, share anything with us as we part this discussion?
Andreas:
First of all, thanks, Lauren, for this nice talk. And thanks for giving me the opportunity to talk about knowledge graphs in this forum.
Famous last words, the only thing I would like to share at the very end is if you are, as a listener, about to start to learn more about knowledge graphs, I would really recommend to take a look at example applications first. Otherwise, it remains a long time too abstract. Also, if you start with all its costs, and RDF, and whatever, it's very abstract. It doesn't work that easily then. It's better to take a look at the existing applications and then you will find out, there is no knowledge graph. Where is the knowledge graph? And it's just not visible sometimes. The graph is not visualization of the graph. That's what people most frequently get wrong in the first round. It's not about visualization. It's about a model, which is running in the background, driving the quality of the create and smart applications.
Lauren:
I'll have to try it out myself.
Andreas:
Okay, thank you very much.
Lauren:
So, I want to thank everyone for listening today. And if you want to learn more about AI, ML, and search, then come and take one of our free courses at https://learn.squirro.com.