EnterpriseWeb teams with Dell, Red Hat and Intel to demonstrate low-token, low latency and energy efficient GenAI
More than 2 years into the Generative AI hype cycle, organizations are still struggling to implement enterprise-grade, let alone telco-grade use cases despite large investments. Organizations are hindered by ongoing concerns regarding accuracy, consistency, explainability, latency and security that undermine trust when considering mission critical workflows. At the same time they are dealing with the high cost today for the promise of streamline operations tomorrow. While per token costs are going down, evolving GenAI techniques require ever more tokens in an ongoing attempt to improve accuracy and reduce hallucinations. High-token GenAI is expensive, inefficient and unsustainable.
Demo: EnterpriseWeb webinar: Telco grade Agent based Automation, 012925
EnterpriseWeb puts the ROI in GenAI
EnterpriseWeb is bringing order to Generative AI with an alternative approach to put the ROI in GenAI.
Instead of building scaffolding around an LLM, with a variety of common techniques (e.g., Graph RAG, Chain of Thoughts, Mixture of Experts, Multi-Agent, etc.), which only marginally and inconsistently improve quality, while adding significant complexity, cost and overhead to the solution architecture, EnterpriseWeb “flips the script”.
Instead of making an LLM the center of the solution architecture, EnterpriseWeb uses a Telecom Ontology (i.e., a Graph-connected Domain Model) as a control language over one or more Large Language Models (LLMs). The Ontology efficiently serves context and governance to LLMs in order to optimize LLM interactions (low-token, low-latency, low-energy consumption) and deterministically validate LLM inferences so they can be used safely as managed inputs into controlled, telco-grade processes. By dynamically orchestrating LLMs, EnterpriseWeb enables them to be used in a highly-targeted, efficient and controlled manner.
CPU-first Generative AI (GPUs not required)
EnterpriseWeb’s methods are so efficient they can run AI-powered 5G Network Automation on CPUs, without the need for expensive GPUs and their power and resource hungry infrastructure requirements. CPU-first GenAI opens the door to edge use cases. EnterpriseWeb is able to run AI and Network workloads in single box, at the edge, which allows organizations to run AI workloads securely, in or near their facilities for advanced, low-latency, low cost automation. EnterpriseWeb collaborated with Dell, Intel and Red Hat to benchmark CPU-first Generative AI for AI-powered 5G Network Automation in the Intel Labs.
AI-powered 5G Network Automation
In the fall of 2024, EnterpriseWeb validated their claims in extensive testing and benchmarking on Dell PowerEdge servers in the Intel labs running over Xeon processors with Red Hat OpenShift, Mixtral 7b and Ollama.
EnterpriseWeb’s telecom solution, Netwrx.ai can run as many as 2000 concurrent 5G user sessions on a Dell R760 data center server with Xeon 5 (2×64 core 2TB RAM) and up to 400 concurrent 5G user sessions on a Dell XR8000 edge server with Xeon 4 (32 core, 256GB RAM). Tests were run without any acceleration or special libraries. The solution was benchmarked for both 5G RAN and 5G Standalone use cases. The latest testing featured IBM’s Granite 3b and OpenShift.ai.
Netwrx.ai introduces a new model for concurrency to orchestrate LLM interactions with low tokens and high throughput that reduces inference latency while minimizing LLM resource and energy consumption. Netwrx.ai also dynamically calls algorithms, analytics and rules-based logic (Hybrid-AI) where they are faster, better, cheaper to use. Netwrx.ai optimizes AI inferencing to mitigate its overhead, so it can efficiently use AI to optimize network performance.
Running AI and 5G RAN workloads on the same server at the edge is a hot topic. At the same time, the industry is looking for real operational benefit for advanced AI use cases in the data center and are demanding ROI and energy efficiency. “CPU first” AI and Network workload orchestration addresses these concerns to allow for widespread adoption, even at the edge, on an x86 architecture that telcos know.
The Proof-of-Concept with its partners won Industry recognition from Fierce Networks and TheCube. The company continues to expand on this work adding new capabilities and supporting additional Language Models.
Related Links:
Demo: The Telecom Ontology: The Missing Links for Level 4 Autonomous Networking
News: theCUBE names EnterpriseWeb as a finalist for Most Innovative Networking Solution award
News: EnterpriseWeb named a 2024 Fierce Network Innovation Award finalist for the AI category
Post: Agentic Automation: Knowledge is Power
Interview: Fierce Telecom