Graph Knowledge Base for Stateful Cloud-Native Applications

Dave Duggal, CEO & Founder, EnterpriseWeb 

Originally posted in InfoQ

Key Takeaways

  • “State is good”; real-time data supports responsive applications and the coordination of end-to-end processes
  • Lack of support for Stateful Cloud-native applications is a roadblock for many enterprise use-cases
  • Graph knowledge bases are an old idea now being revisited to model complex, distributed domains
  • Combining high-level abstraction with Cloud-native design principles offers efficient “Context-as-a-Service” for hydrating stateless services
  • Graph knowledge-based systems can enable composition of Cloud-native services into event-driven dataflow processes

The Cloud promotes a bottom-up infrastructure view of applications – small, stateless, isolated, containerized microservices and serverless functions that are independently scalable and portable. In turn, the bundling of application packages into containers enables orchestration of deployments and DevOps-style automation at scale. Cloud-native services are not a solution in-and-of-themselves, but are intended to be scalable building blocks for highly-modular Client applications.

The isolated “shared-nothing” design of cloud-native architectures is a reaction to conventional monolithic software implementations, which tightly-couple their functionality to databases and as a result must scale, distribute and evolve as a single unit.

However, the business still needs connected and contextual solutions to support their core mission. It’s easy to reduce applications to discrete “packages” and isolated “workloads”, but it’s end-to-end business logic across solution elements whether container, VM or server-based that drives value. Unless you are a major IT vendor or cloud host, infrastructure is simply a means to an end and a cost to be managed.

In a recent InfoQ presentation Beyond Microservices: Streams, State and Scalability, Gwen Shapira noted the difficulty of establishing context between cloud native workloads – “I miss having states because sometimes my rules are dynamic. I cannot hard code them, I have to look them up somewhere. Sometimes, my events contain some of the data that I need, an ID, but not the rest of the data that they need, so I have to look it up somewhere. Sometimes, I have to join multiple events.” In essence, Stateless Cloud-native services are scalable because they push state and information management concerns up to client applications that consume them.

Fast, efficient access to information (i.e. modeled, addressable, durable entities) makes applications smarter and “shared understanding” between applications facilitates interoperability, coordination and process automation. Everyone wants to be data-centric, but no one wants to return to bloated and tightly-coupled monoliths. This apparent mismatch needs to be reconciled. The lack of support for stateful cloud-native application behavior is a roadblock to expanding Cloud use-cases.

The Shell Game

To be clear, “stateless” doesn’t mean without state. It’s a hyperbole. Microservices and serverless functions often maintain some state, typically a config file and logs of their discrete activity.

As a rule, stateless applications do not persist any client application state between requests or events. “Statelessness” decouples cloud-native services from client applications to achieve desired isolation. The tenets of microservice and serverless architecture expressly prohibit retention of session-state or global-context. However, while the state doesn’t reside in the container, it still has to live somewhere. After all, a stateless function takes state as inputs. Application state didn’t go away, it moved. The trade-off is that state, and with it any global-context, must be re-loaded with every execution.

The practical consequence of statelessness is a spike in network usage, which results in chatty, bandwidth and I/O intensive, inter-process communications. This comes at a price – in terms of both increased Cloud service expenditures, as well as latency and performance impacts on Client applications.

Distributed computing had already weakened the bonds of data-gravity as a long-standing design principle, forcing applications to integrate with an ever-increasing number of external data sources. Cloud-native architecture flips-the-script completely – data ships to functions. Applications have been turned inside-out. For Cloud-native services, state and information management are externalized – “Intel Outside”.

Separating data and functions is good design, but without an application-layer abstraction for shared state and schema across a distributed domain, developers encode these concerns one-off, point-to-point, per Microservice, which is not scalable operationally.

Transforming the Application Layer

The traditional 3-tier application is rapidly disappearing. The implicit cloud-native model is far more abstract. Conceptually, applications are devolving into a set of distributed data sources and capabilities with relationships – an ‘application graph’, processed as a pipeline. It may sound straightforward, even idyllic, but how do all these discrete elements come together in coherent end-to-end processes without hard-coding relationships into a Big Ball of CRUD?

While there have been many recent articles about Stateful Applications in Kubernetes, they mostly focus on storage primitives for deploying, configuring and managing databases, not application layer concerns re: state and information management. There are thoughtful approaches for shared-memory and logical data access emerging, such as UC Berkeley’s Anna db project and Zhamak Dehghani’s proposed  Distributed Data Mesh architecture, respectively. However, they generally don’t include a consistency layer for reasoning across services. The solution to connecting loosely-coupled cloud-native services in data-centric, event-driven applications is more than efficient access to state to hydrate microservices and serverless functions.

To achieve interoperability between distributed and heterogeneous solution elements requires some way to map their diverse schemas to the shared concepts, data types and relationships of a higher-level model. This has led to the popularity of GraphQL to aggregate data sources under a single API. GraphQL is convenient tactically, but it is limited in scope and power – it really is just a static, hierarchical data model over a specific set of integrated services (not a graph in any meaningful sense). Like a throw away script, it is a one-off manual work-around that doesn’t provide broader transparency, re-use, automation or governance.

To address end-to-end problems and realize strategic objectives like IT productivity and business agility requires a more powerful abstraction, not a thin entity service. One that supports readable code and dependencies, and clean composition for developers tasked with developing enterprise-wide solutions that cross business silos, IT layers and eco-system partners. It must be must be a flexible, extensible and adaptable model to accommodate variety and to evolve overtime.

Back to the Future

To that end, graph Knowledge bases, which have their roots in the expert systems of the 1970s, are being re-visited to support high-level abstractions over complex distributed environments.

According to Wikipedia, knowledge base (KB) describes a technology to store complex information explicitly, as a loosely-coupled collection of facts and their relationships to other facts and rules (i.e. a graph), rather than imperatively. “The ideal representation for a knowledge base is an object model (often called an ontology in artificial intelligence literature) with classes, subclasses and instances.” The term was used to distinguish it from conventional databases of the time, which were generally hierarchical in nature. Graph knowledge bases were the original NoSQL database!

Graphs are an ideal data structure for modeling complex real-world domains as they are flexible, extensible and adaptable. Moreover, graphs have well understood mathematical properties (associativity, adjacency, etc.) and support computational optimizations (fastest path algorithms, etc.).

Expert Systems combined knowledge bases with inference engines that would leverage the graph object model to solve complex problems. “The knowledge base provides facts and rules about the world… the inference engine, allows new knowledge to be inferred. Most commonly, it can take the form of IF-THEN rules coupled with forward or backward chaining approaches.” The early knowledge-based system (KBS) had limited scope – they were generally used to answer specific scientific and medical questions. They emulated human decision-making to automate expert tasks. However, they were not transactional, the computation was fixed to a single machine, and the storage requirements needed to describe a complex domain strained the technology of their day.

Renewed interest in Knowledge bases followed the rise of the Internet as it became commonly understood that distributed information sources were being linked and indexed so they could be readily navigated and queried. Sir Tim Berners-Lee, et al applied these concepts in the “SemanticWeb”, which through normalization of terms and data structures enabled explicit standardized relationships between information concepts for improved discovery, recommendations and analytics. A decade later, Google announced its Knowledge Graph with a vision of a network of rich information objects – “things not strings”.

The Cloud expanded the scope of Internet use beyond the human-centric Web to expose services, functionality that could be accessed by systems via APIs. Once again, knowledge bases emerge as a way to make sense of complexity. eBay recently announced a Knowledge graph project – Managing eBay Vast Service Architecture Using Knowledge Graphs.

The modeled relationships in eBay’s graph gives them a holistic representation of their complex, distributed environment. However, like SemanticWeb-based technologies, it’s generally focused on data discovery and analytics, rather than systems interoperability and automation, echoing the old OLAP vs OLTP divide.

To truly enable a cloud-native application layer requires an expanded conceptualization of a graph knowledge base; mapping schemas to a unified, higher-level model, (i.e. a union of bounded-contexts), is not enough. For data to flow seamlessly across services requires a rich declarative model of distributed objects to deal with Cloud-native (“12 Factor”) concerns, including: discovery (addresses, unique IDs, etc.), connectivity (ports, keys, auths, etc.) and syntax (protocols, formats).

In addition, the knowledge base should model interfaces and software contracts for participating elements (not just a thin descriptor and a pile of YAML configs). If the application layer is orchestrating deployments and configuration, the knowledge base must also model infrastructure requirements (i.e. compute, storage and network), constraints (dependencies, affinities, etc.), application lifecycle management operations (start, stop, scale, move, etc.) and Target Host endpoints. Similar requirements apply to the Industrial Internet-of-Things where sensors and actuators must be instantiated, monitored and controlled to enable smart manufacturing, cities and supply chains.

The result would be rich models of distributed solution elements that can be flexibly composed into higher-level services and chained in event-driven dataflow processes. The knowledge base would provide the shared domain semantics and metadata to enable run-time automation and policy-based management. By abstracting both high-level domain and low-level technical details, a graph knowledge base can serve as the foundation for a new class of real-time distributed applications, which leverage it for “Context-as-a-Service”.

To summarize:

  • The graph knowledge base provides a transparent, graph-connected model of an end-to-end domain
  • The graph-based design supports the modeling of complex real-world domains in a manner that is flexible, extensible and adaptable
  • Extending graph knowledge bases to model distributed systems domain supports Cloud-native application concerns
  • The graph knowledge base provides an information model supporting conversational APIs and dynamic UIs, which allow interactive exploration of a set of graph relationships
  • Graphs have mathematical properties that optimize processing of complex relationships, which largely offset the typical ‘cost’ of reasoning over a large domain, making the approach tractable
  • The graph knowledge base supports readable code and dependencies, and clean composition for developers tasked with developing enterprise-wide solutions that cross business silos, IT layers and eco-system partners
  • The graph knowledge-based system can provide efficient “Context-as-a-Service” to efficiently hydrate stateless Cloud-native services and supports data-centric, event-driven, distributed applications
  • The graph knowledge-based system supports model-based, event-driven, policy-controlled orchestration

Virtualizing the Distributed Systems Domain

Critical to implementing such a graph knowledge-based system are a few key design considerations. Cloud-native application design has operational impacts as it forces developers to handle all the related data wrangling, consistency and security concerns formerly managed by the monolith for every participating microservice and serverless function. It exposes interoperability, state management and messaging concerns that now have to be explicitly handled by developers, which is what makes distributed systems technically so complex. To automate this IT domain the knowledge base must virtualize the environment.

Declarative Language

As noted above, graph knowledge bases were a reaction to hierarchical databases and imperative programming. The graph knowledge base makes the domain explicit so it is easier to inspect and navigate. Their machine-readable data structures and classification schemes naturally support declarative languages for mapping rather than individually encoding all relationships within and between objects.

Declarative modeling provides a means to normalize solution elements; objects are self-described as a set of pointers to the shared domain concepts, types and policies in the knowledge base. This high-level abstraction enables common methods over heterogeneous elements. It creates a consistency layer where diverse objects can be discovered, composed, orchestrated, configured and managed as if they were the same. Low-level implementation complexity (connection, communication, syntax, etc.) can be abstracted away from the developer so they can focus on composition and application logic; the interoperability concerns are handled by the language run-time.

At run-time the graph knowledge base serves as a machine-readable single-source of truth for automating end-to-end IT and business processes. However, the expanded role of the knowledge base, to model distributed systems domains, means that the originally conceived query-centric inference engine has to be re-engineered to additionally support coordination and state management (commands and transformations). 

This declarative modeling approach separates the model of an object from the Message-oriented Middleware (MoM) capabilities required to implement it. These functions can be attached by Types in the graph knowledge base and orchestrated by the run-time. Mapping types to domain concepts, enables a graph knowledge-based system to drive model-based, event-driven, policy-controlled automation for real-time, data-centric behavior. This brings the graph Knowledge-based system closer to the functional composition vision of the application graph, but state management and messaging still need to be addressed.


Stateful vs stateless is a false binary; organizations want state to understand events, automate decisions and optimize processes. Data structures aside, graph knowledge bases are fundamentally designed, like databases, to store information. However, data services for stateless distributed applications and web-scale processes require an immutable persistence model.

Immutability is a method of storing information as a stream of logged events against a durable entity with a unique identifier. Instead of a traditional database, which “updates” state by replacing values, events are “inserts” that create a linked history. Object or application state is projected from the history of an immutable store; state is derived from events.

While different, the approach is neither new nor esoteric. It’s the basis of ledger-based accounting and the global banking system. A bank account is a concept with a unique identifier, it’s current state, the bank balance, is a computation factoring the history of debit and credit activity.

Immutability separates state from objects to focus on authoritatively logging the transactions that transform state instead. Immutability transfers responsibility for computing state to the run-time, making it easier for developers to reason over event-driven applications.

Immutable objects naturally support asynchronous, concurrent processing of transactions, multi-step operations and long-running workflows. They provide a namespace for durable entities that can persist intermediary results from participating services, serve as aggregates for process state, and provide logs (transaction trace or history) for debug and audit. 

Together, declarative models and immutable objects provide foundational support for coordinating real-time distributed applications, but they don’t account for lack of guarantees in an asynchronous environment.


Dis-aggregating applications inserts the network between all solution elements, which introduces network latency and failures. Distributed Applications that connect services have to be resilient to late responses and non-responses, to be safe and resilient. While many web and mobile applications may have lower consistency requirements, Enterprise-grade applications often depend on being both “right, and right now”. To ensure correctness, developers have to replace the database guarantees (ACID) that were lost.

As with everything else in distributed systems, transaction-like semantics are raised to the application layer. Immutable persistence provides durable entities and authoritative history, but unreliable network communications require messaging guarantees to ensure that requests and their effects are not duplicated. Receivers can anticipate to receive messages “at least once”.

This problem is addressed by an established computer science principle, “Idempotency”, which allows operations to be requested repeatedly without duplicating their effect. The best analog is the humble elevator button, impatiently calling an elevator repeatedly doesn’t send the elevator repeatedly to that floor (or make it arrive any sooner). “At least once” messaging semantics with idempotency simulates “exactly once” messaging guarantees to support business logic for configuring retries, time-outs and compensations.

Idempotency separates the concerns; what was formerly database transactional semantics can now be largely emulated by application-layer workflows. This allows a graph knowledge base, which would natively deploy as NoSQL to mitigate the uncertainty of eventual consistency by providing higher-level controls and management (i.e. less than ACID, but more than BASE). A graph knowledge-based system can be Cloud-native!

When combined, immutable persistence and reliable idempotent messaging can support a deterministic abstraction over an inherently non-deterministic distributed system in order to ensure the graph knowledge base is a stable and authoritative computational environment.

Pulling it all together

Extending graph knowledge bases to model the distributed systems domain and engineering their run-time engines to support immutable persistence, asynchronous, concurrent methods and reliable, idempotent messaging, creates a new kind of information system, one intentionally designed for today’s IT challenges

This expanded conceptualization of a graph knowledge-based system embraces Cloud-native thinking, deploys as a Cloud-native solution and offers much-needed support for developers struggling to deliver smart, connected adaptable enterprise applications. The objective is to provide an intuitive developer interface to shared enterprise models and metadata for efficient, performant and scalable Context-as-a-Service.

It provides an alternative to “over-ambitious” API Gateways that sprawl into a bloated and non-transparent Enterprise Service Bus or a “God” service. The graph knowledge-based system is inherently inspectable and open. The declarative modeling cleanly separates models from implementations and typed objects enable the late-binding of event-driven Serverless functions to perform any required middleware services, in a dynamic pipeline (i.e. dataflow process). Likewise, service compositions can call orchestration services to coordinate across a set of services for a multi-step operation or an end-to-end process. In this way, the graph knowledge-base is enabling model-based, event-driven, policy-controlled orchestration and automation without prescribing implementations. Conversely, the durable entities support the flexibility of choreography, with the tracking and management of a central coordinator. In short, the graph knowledge base is optimally structured for supporting stateless, event-driven Serverless functions.

Of course, making the inherently complex simple is hard. However, only by tackling fundamental modeling, processing and communication challenges of distributed applications, can the industry make design, deployment and management of stateful Cloud-native applications practical.

As frustrations mount with increasing complexity of IT environments, the industry is starting to recognize the need for higher-level abstractions. Cloud-native applications have to be radically simplified. Application developers don’t want to care about container orchestration or how microservices are connected by the network, nor do they want to understand all the vagaries of distributed systems engineering.

Graph knowledge-based systems as described in this article provide one possible approach to the design of next-generation platforms. They offer a deterministic and consistent developer abstraction to bring order to complex distributed systems and applications generally, for a wide-range of enterprise use-cases.