Six years ago, my family and I moved to America. When we got here, we had to come to terms with a whole new way of speaking/thinking/being. Things as simple as asking for “takeout” rather than “takeaway” when you were ordering food made a massive difference. Or for example, ordering a hamburger and being faced with 100 different options, rather than what Australians expect— that a burger with “the lot” has cheese, lettuce, tomato, egg, bacon, and, at the better places, pineapple and beets.
New Dialects and Domains Mean New Ontologies
Challenges such as these dialect and local differences are exactly the confusion computers need to be able to cope with when they face the task of understanding a previously unknown space. Being able to handle these challenges means having a “domain ontology”—well recognized as a critical part of understanding what’s going on in a piece of text. “In computer science and information science, an ontology is a formal naming and definition of the types, properties, and interrelationships of the entities that really or fundamentally exist for a particular domain of discourse.”
If you don’t have a formal ontology, you can still get some of the way to understanding by doing statistics and looking at relative frequency of different words. Ultimately, though, you can’t get to real meaning without a sense of the rules of “wherever you are.” So now I say “zee” instead of “zed,” “takeout” instead of “takeaway,” and follow all the other rules I’ve learned, driving on the “wrong” side of the road, etc. Another example is when my colleague Paul Tarau spends time in yet another area of computer science and needs to come to terms with the “rules of the road” for that new field, the way language is applied to problems, and the way scientists treat different techniques in that space.
For most systems where a level of text “understanding” is needed, a set of rules is created—the domain ontology—which shows the domain’s core concepts and the relationships between them. Developing these rules is often a very exacting process taking months to years of effort from a team of domain specialists working alongside ontology specialists. These ontologies can take a range of forms: at minimum, forming a basic taxonomy (or tree) of concepts in the space, and at most, capable of forming a lattice with many different conceptual relationships.
One challenge is the sheer effort involved in creating and maintaining these ontologies. Another is that every expert tends to bias their ontology contributions toward areas where they have greater depth of insight. Where ontologies have already been created, they can sometimes be re-used. More usually, however, no pre-built ontology is available, or what is available is out of date. Some approach the problem by attempting to fuse pre-existing ontologies to meet the target domain, and this presents a range of new problems.
A More Natural Approach
My family had no explicit list of differences between ordering burgers in Australia and ordering burgers in the US, but it’s no surprise that we quickly figured these differences out. I doubt Paul was often given explicit rules either. Instead, humans implicitly build our own new domain ontologies.
Kyndi’s approach to ontologies is in some ways like the human approach. At Kyndi, we generate proto-ontologies directly from text examples of the domain: we don’t expect a proto-ontology to be 100% correct, but rather to benefit from a fast expert review, which takes days rather than years of expert time. Even without human expert input, the proto-ontology is sufficient to provide a massive step up on knowledge extraction, meta-tagging, or asking questions of the source material without correction. This is a major step beyond other attempts to automatically generate ontologies.
Here’s a Kyndi proto-ontology generated for the field of the microbiome—the result of analyzing only thirty academic papers. This ontology shows the links between different concepts or tokens in the language, and uses the notion of conceptual distance to separate different concepts. The gap between concepts at the center and the “moon shape” surrounding them distinguishes the core from concepts that are not connected to the core—the “moon,” an area that can either be treated as concepts to be ignored, or as a guide to seeking further content to build out our ontology. Because this flat representation of the ontology reflects conceptual distance, concepts that are close to one another across the gap, on the outer core and inner crescent, are the strongest candidates for bridging concepts. By examining those, the Kyndi tool can explicitly seek additional material to fill the gap.
Here’s a zoomed-in view showing details of the core of the ontology:
This is a much closer analog for how people learn new domain ontologies, based on the evidence at hand—what you see day to day—rather than on a very high number of examples, or on having to be told as a set of rules. I’d like to think the way I learned about the American hamburger and about “takeout” was much closer to the Kyndi approach than to other methods—just learning or inducing a set of patterns over the real world.
With this technology, Kyndi can focus on a new business domain, and a new application, and create knowledge-based systems extremely rapidly. Kyndi’s ability to quickly generate an ontology from text resources is the first step in driving new business value, moving “cognitive” systems from a many-year investment before you see any returns, to a few months of investment that yield dramatic ROI.
In following blogs, we will look at some of the other elements of the Kyndi solution. For more on our proto-ontology approach, check out the white papers at Kyndi.com.