Semantic Web and Labeled Property Graphs have their use-cases, but complex, real-time business operations is not one of them
Author’s Note: The article is expressly focused on examining the applicability of graph databases for operational automation. It is not a blanket dismissal of graph databases, rather it aims to inform current industry discussions on the intersection of artificial intelligence, graphs and business systems. The article objectively summarizes complex technical subject matter in order to contribute clear and concise insights.
Graphs are hot!
2023 was all about Generative AI, but as the year went by and the inaccuracy and inconsistency of Large Language Models (LLMs) became clear, it sparked interest in Knowledge Graphs.
The analyst firm Gartner recently placed Knowledge Graphs in the bullseye at the center of their “Generative AI impact radar”. The report underscores the growing understanding that, despite great strides in Deep Learning, domain knowledge provides useful context for analytics and any form of Artificial Intelligence, including generative AI.
Graphs are particularly valuable in this role as they flexibly model complex real-world relationships better than rigid hierarchical approaches. Relationships provide semantic meaning that is critical to understanding; they augment information with relevant context. They also allow for the description of highly-interconnected systems that span operational, technical and eco-system boundaries. Graphs provide a consistent data structure that allows architects to model complex domains in a human and machine-readable manner.
Context is king
As the number and types of LLMs explode, people are starting to recognize that there will be many AI tools and algorithms. The central unifying asset of the enterprise is the domain model. It provides a single-source-of-truth that can connect teams and technologies across business silos, ecosystem partners and cloud and edge sites.
However, not all graphs are created equal
Graphs are rooted in math. The accepted origin of Graph Theory goes back to 1736 and Euler’s “Bridges of Königsberg” problem. Graphs are also fundamental to Set and Category Theory. They have long been used to study complex interacting systems and support knowledge engineering. Traditional AI (i.e., Symbolic AI), which first emerged in “Expert Systems” back in the early 1960s, is based on graphs and logic programming.
The commercial market for graph is still nascent, however analysts expect rapid graph in support of generative AI and AI use-cases. Analyst Firm Markets and Markets projects “The global Graph Database Market size to grow from USD 2.9 billion in 2023 to USD 7.3 billion by 2028 at a Compound Annual Growth Rate (CAGR) of 20.2% during the forecast period.”
The graph market is dominated by Semantic Web and Labelled Property Graph (LPG) solutions. Technically, they are both graph databases. In general, they support data modeling for queries, analytics, and recommendations. They have their respective supporters; the Semantic Web is favored by the academic community while enterprise customers lean towards LPGs. At a high-level they have related data structures, serve the same data science use-cases and suffer from the same architectural constraints, which fundamentally limit what they can do.
Of course, in software you can conceptually use any language to build anything, the question is “should you”. Architecture is destiny, so it’s important to understand the “fitness” of graph databases to meet the new demands for real-time intelligent automation.
1. Graph databases are, well… databases — not operational systems
The first thing to note is that Semantic Web and LPG are exactly what they say they are — graph databases. They are not marketed as graph-based application platforms, knowledge-oriented middleware, or ontology-driven automation systems. Graph databases don’t provide much if anything in the way of useful developer abstractions. Most don’t even provide for a baseline requirement like encryption of data at rest and in motion; developers have to implement data security on their own per deployment.
You can still use a graph database as a backend for transactions (OLTP), but as with selecting any database, you just need to make sure the read and write performance, scalability, and other low-level characteristics match your solution requirements, or the database will rate-limit application behavior. For this reason, graph databases are primarily used for queries, analytics and recommendations (OLAP).
2. Triples lead to Trees and deep hierarchies
Graph Databases, whether Semantic Web or LPG, are based on a specific data structure, triples that represents a typed 1:1 (i.e., binary) relationship, which are stored in the database.
Complex entities have more properties and dependencies so that adds more triples to the database. As a result, the physical storage grows with the size and the complexity of what the graph is modeling. This ultimately leads to sharding a database across many nodes, which introduces operational complexity and processing overhead.
Since relationships are 1:1, nested relationships lead to tree-like structures. The more complex an entity, the deeper these hierarchical structures (i.e., class hierarchies) become.
While graph databases generally support what’s called “index-free adjacency”, meaning there is a constant cost for traversing graph links, the deeper the hierarchy, the more traversals. There’s no practical way to flatten the graph as it is a physical structure that emerges from all of the triples.
All of the above inhibits graph databases from modeling complex objects, complex compositions of objects, and complex transactions.
3. Can’t see the forest for the trees
Deep hierarchies pose another problem, they make it hard to reason across entities. This is a form of object/graph impedance akin to object/relational impedance in traditional Object-Oriented Programming. Graph databases sub-optimal for operational systems as they do not support real-time complex ad hoc queries. The workaround is to pre-define and pre-optimize queries — which assumes you have the graph database and systems expertise to tune query processing manually; graph databases generally don’t provide meaningful user-interfaces and tooling.
4. Wrong for writes
On the “write” side, deep hierarchies make updates expensive, which is compounded when the graph database is sharded. It leads to what’s called “cascading updates”. To protect transactional integrity, the database is protected by intense locking and blocking, which again, works against operational systems.
5. No regard for history
Graph database updates over-write history. As a result, they can’t support audit logs, lifecycle management or time-series analytics. These are key requirements for data-driven applications so it should disqualify graph databases for operational systems.
6. Bad behavior
OK, it’s not really bad behavior, rather it’s that graph databases are data-centric and have no direct ability to model executable rules or actions. Those aren’t necessary if the objective is solely queries, analytics and recommendations, but operational systems support process automation and businesses are looking for contextual behavior to personalize user-experiences, optimize transactions and synchronize operations. Linked Data is an arms-length RPC call to unmodeled behaviors (i.e., not declarative types and policies) so data and functions have limited interaction that constrains contextual automation. Embedding imperative code like Python is limited in the same way.
7. The missing link to ontologies
The coup de grâce for graph databases is that they have no practical path to ontologies. Graph databases persist relationships between concepts for data modeling, while Ontologies are graphs of concepts, types and policies, which supports the modeling of complex system domains.
The generally prescribed path for Semantic Web and Labeled Property Graphs to incorporate a notion of “types” is to extend their knowledge graphs with links to external schemas. However, hanging schemas off their existing tree structures leads to deeper and more distributed hierarchies, which compounds the existing performance problems, pushing them farther away from any notion of “real-time”.
Performance issues aside, hanging schemas off trees leads to concrete types. They might be standardized, but they are still static, imperative types that hang off their trees, which greatly limits their expressive power.
Most critically, hanging schemas off trees does not lead to a higher-level developer abstraction that could work across domain applications, but rather a low-level implementation of imperative types one-off per domain application (i.e., there is no type system), which makes this approach questionable if not futile.
The graph database issues outlined above are not limitations of graph theory. Graphs themselves are simply mathematical structures comprised of nodes and edges used to model relations between objects. Graphs can take many forms and are inherently flexible, extensible and adaptable structures. They benefit from many mathematical properties (e.g., adjacency, associativity, commutativity, etc.).
The whole rationale for graph databases is that they are not supposed to be rigid and hierarchical like SQL Relational databases, however it turns out in many ways their implementations are! In addition, they lack key capabilities relevant to complex, real-time operational systems.
It is understood that there are many related technologies (e.g., OWL, SPARQL, SHACL, PathQL, SWRL, TTL, Prolog, Protégé) that are used to address graph database constraints but they bring ever increasing stack complexity (i.e., “RDF + X + Y + Z”), without much return. As noted above, the affordances of an operational database are the rate-limiter for the application layer. A good architect wants to avoid misalignment of foundational technology to use-case requirements.
In today’s world of DevOps automation, no one wants to be concerned with how the database is scaled or how to tune queries. Developers want to rapidly model complex distributed solution without worrying about low-level implementation details like state management. They expect resilient, web-scale systems with logs of all activity.
Separately, to be truly useful, developers need a far more dimensional and dynamic graph (i.e., hypergraph). In the same Markets and Markets graph database report cited above, the analyst firm indicates that hypergraphs will have a higher CAGR during the forecast period. The report states that, “Unlike standard graphs, where edges link only two nodes, hypergraphs enable a more flexible representation of complex, interconnected data. The demand for more sophisticated recommendation systems that consider multi-entity interactions fuels the interest in hypergraphs.”
The market projections have been echoed by a spate of recent articles and commentaries in the graph community pointing to hypergraphs to enable new graph use-cases (Knowledge Hypergraphs: Enriching Triples with Structure; Hypergraphs and RDF).
While data scientists are evolving their views, current industry initiatives with graph databases are falling short of objectives or outright failing. Businesses need graphs that can model complex systems and event-driven behavior today. They need useful abstractions that empower developers to declaratively compose context-aware business and infrastructure applications. To finally realize this elusive vision, IT needs a new generation of expert systems that raises hypergraph-based ontologies to a Domain Specific Language (DSL) for industrial-grade no-code platforms. They need EnterpriseWeb.
A new hope
The good news is that Graph databases remain relevant as a source of rigorously developed logical data models. These models and the modeling skills of practitioners are more valuable than ever. They provide valuable source material for higher-level languages. EnterpriseWeb can consume RDF and standard LPG based models, extend them with types and policies and wrap them with history, behavior, reliable messaging, transaction guarantees and state management for real-time intelligent operations.
See EnterpriseWeb in action
Graph conference talk: Graph-driven Orchestration
Contributed technical article: Graph Knowledge Base for Stateful Cloud-Native Applications
Whitepaper: Above the Clouds
Bio: Dave Duggal, founder and CEO, EnterpriseWeb
Dave has spent his career building & turning around companies. He anticipated the challenges of increasingly fragmented IT estate and founded EnterpriseWeb to enable highly-automated and agile business operations. Dave is the inventor of 20 US patents on complex distributed systems. He is a regular speaker at industry conferences and an occasional blogger.