Data center networking needs AI

HPE Networking Chief AI Officer Bob Friday and I recently participated in a podcast with Tech Field Day. The starting premise of the show was, “Data center networking needs AI,” with which we wholeheartedly agree. Bob has spent the last 10 years as a pioneer in bringing AIOps to networking. He kicked off the discussion by articulating HPE Networking’s journey toward the Self-Driving Network™, a vision he explained in his recent 6-part blog series. We then dug deeper into what AI means for data center networks in particular and outlined some of the work we’ve been doing in this area. This is what we’ll cover in this short, two-part blog series.

Is there a problem to be solved?

People are afraid to touch their networks. It sounds ridiculous, but most network engineers are nervous before, during, and after a change, whether that’s provisioning a new service, a firmware upgrade, etc. The process is stressful and operators worry that if they touch the network, they may break it.

The fundamental problem underlying this epidemic in data center networking is complexity: the alphabet soup of dozens of protocols to understand, configurations of perhaps thousands of physical and logical devices, multiple infrastructure vendors to manage. The list goes on. Combine this complexity with the endless flood of data that operators are subjected to, coupled with many inadequate troubleshooting tools on the market today, and data center networking teams are often overwhelmed. They end up drowning in data but starved of insights.

This is precisely the type of situation where clever AI and machine learning algorithms can be useful: mountains of data that are tough for humans to sift through, but for the most part the data is actually fairly well-structured.

It’s not about the technology, it’s about what you need

Too often, technology discussions start in the wrong place: with the technology itself. Many of us have heard that edict from above in our organizations: “We as a company need to use AI or we’ll get left behind.” Or maybe you’re hearing it from a vendor: “You need to use AI in your data center network”. But these are aimless starting points. The starting point with any technology discussion should be, what are your goals? What are the problems you have that need to be solved?  

Our goal has always been very clear—deliver the best possible user experience, whether that “user” is the data center network operator, or the end user that relies on the data center, whether they know it or not, for the applications they use. Over the last 10 years, Juniper has developed and applied AI, incredibly successfully, first to Wi-Fi and then to the rest of the campus and branch domain. In the data center we have a second-mover advantage, able to glean all of the insights from lessons learned in the campus.

Data center network operational challenges

If the goal is to optimize the user experience, then the challenges to be solved for data center practioners take close collaboration with our customers. We group these challenges into three categories: limited insights, insufficient speed, and poor reliability.  

Insights. In a typical scenario, the network lead at an enterprise gets a call from a very unhappy executive complaining that the CRM app, the ERP system, or other critical app, is down. It must be the network, right? Well, maybe not. Often network teams have limited insights into the applications running over their network. But let’s be clear, the whole point of a data center is to host and deliver applications that end users need, whether it’s something relatively frivolous for a consumer or a mission critical app for a business.

Speed. Another problem we see is insufficient speed or agility when it comes to managing IT infrastructure. When an urgent new business requirement arises, such as adding capacity quickly for some unforeseen spike in demand, the standard change management processes or maintenance windows that a company has don’t cut it—the process is not fast enough or it’s too rigid.

Reliability. Finally, poor reliability and downtime is an ongoing concern. Nearly every enterprise is just one hasty change or manual configuration error away from bringing down the entire network, with huge reputational and financial impacts to the company. Not to mention, negative implications for your career. If you don’t have fast, robust automated recovery and rollback, it may take hours or days of intensive work to restore services. Data center network operators too often find themselves in reactive mode, firefighting, constantly dealing with these emergencies, instead of focusing on proactive strategic IT initiatives to help the business.

Before we blindly jump into AI as a panacea, we have to thoroughly understand these problems and ask ourselves, can AI help solve them? The answer is, absolutely, yes.

AI is necessary, but not sufficient

But AI does cannot solve every NetOps problem that you have. AI is necessary, but not sufficient. We need systems that use both AI, which is fundamentally probabilistic, and other deterministic approaches such as intent-based networking.

Is it OK to be 99% right on a configuration? No, you want to be 100% right, which requires rules-based, deterministic software. But on Day 2 when the data center is operating out in the wild in unpredictable environments, it’s a different calculation. If you have a system that can tell you with 99% accuracy, based on a myriad of symptoms, the root cause of a problem that you’re seeing, then that’s a solution that’s probably better than what you have today. And this is the power of AI, to sift through massive amounts of data and extract the correlations that humans cannot easily do.

Put the two technologies together, AI and intent-based networking—then you can deliver that unbeatable network operator experience and application experience for end users.

We’ll dig into how we’re using AIOps to solve stubborn data center networking challenges in the second and final part of this blog series.

HPE Networking Chief AI Officer Bob Friday and I recently participated in a podcast with Tech Field Day. The starting premise of the show was, “Data center networking needs AI,” which we absolutely agree with. In part one of this two-part blog series we emphasized that discussions about whether and how to use AI must be rooted in your particular goals and the problems you have that need to be solved. After examining the challenges data center networking operators have, it is clear that AI can in fact help, and here’s how…

AI-native innovations extending our leadership in the data center

Improvements in AI are coming so rapidly that it will undoubtedly become an increasingly large part of the entire data center lifecycle, from Day 0 Design, to Day 1 Deployment, to Day 2 Ongoing Operations. We recently announced several new AIOps capabilities for data center networking.

Predictive maintenance enables network operators to identify future problems and correct them before they occur.

Ø System Health. Predict when a switch will fail based on analyzing data around processor and memory utilization, temperature, etc.

Ø Capacity. Predict when you need to expand the fabric based on data around link utilization, traffic growth, etc.

Ø Optics. Predict when an optical transceiver will fail based on Tx/Rx throughput, power, voltage, etc. Gray failures in optics are always a problem and they can be worse (harder to detect) than a complete failure.

With many of these examples, when the capability is first launched it is not using AI in a dynamic, responsive way. Initially, the system often sets a static threshold, that triggers an alarm. But as good grapes make good wine, good data makes good AI. It takes some time to accumulate data and this is why Juniper has such an advantage over our competitors—we’ve been doing AIOps for 10 years with the Mist® platform. Good AI needs time to accumulate data and learn-train-learn-train and adapt, all for the goal of optimizing the user experience. In the data center, AIOps is still embryonic, but it is improving very quickly.

Service Level Expectations entails synthesizing a wide range of network parameters and calculating summary health metrics and analyzing the issues impacting those metrics over some time period. This provides customers with a clear picture of whether their network is meeting the needs of application owners and end users.

Documentation querying is the typical base case for virtual network assistants with which most infrastructure vendors have begun: tie an LLM to your product documentation for better search.  But then you move onto more advanced uses when you tie an LLM to your actual enterprise software application or in our case network management and automation tools, and the mountains of data to which they have access. Network operators can interact with the tools in new, different, and better, ways than they might do currently; all through natural language. With Marvis™ AI Assistant, we have the best helper in the business.

Application Assurance is essential because the entire point of a data center is to host and deliver applications to end users. Our solution combines AIOps and intent-based networking. Anomaly detection algorithms detect when a traffic flow doesn’t look right. That intelligence is combined with a deterministic understanding of which applications are flowing through which ports at a particular time—networking and application performance are tied together.

A final category of AIOps in the data center, and perhaps the most important, is simply experimentation. Large language models (LLMs) are amazing, almost magical machines. The people who build them will admit that they don’t always understand the intuition behind how these work.

Every company, in the broadest sense of enterprise business transformation, should be grabbing foundational LLMs and fine-tuning them. Enterprises should be vectorizing the treasure troves of corporate data that they are sitting on to feed into an AI model through retrieval augmented generation (RAG). Every company that sells software should be experimenting with tying that software to LLMs and other AI models. Closer to the networking industry, we expect model context protocol (MCP) to be a key facilitator for agentic AI. If you haven’t built an MCP server for your enterprise software, do it now!

A significant amount of AI innovation over the coming years will be customer-led. When vendors put open systems into the hands of customers, you can get amazing and even unexpected results. Throughout corporate history, many industries have been transformed by revolutionary innovation driven not by suppliers, but by end users.

May you live in exciting times

Most of us are buried with information about AI out there and the speed at which it is moving. We want to stay informed but not overwhelmed. However, AI is also more accessible than ever. Anyone can download just about any of the thousands and thousands of AI models available from Hugging Face —for free! A newbie can easily build an MCP server and connect it to a number of data sources. And if you get stuck, just ask Claude to help out. LLMs are beginning to feel almost like human entities that you can interact with. It’s an exciting time to be a network engineer.

When are attacks truly driven by AI?
February 25, 2026 0

Dell PowerEdge R770 Review A Fluid New 2U Server
February 18, 2026 0
Scalar Tape Storage
February 15, 2026 0
Why Certification Has Become a Strategic Control for CISOs
February 15, 2026 0

The Future of Enterprise AI: Q&A with Tanzu GM Purnima Padmanabhan

Your Enterprise Browser Isnt the Data Leak Risk It Once Was

Data center networking needs AI

Related Articles

When are attacks truly driven by AI?

Dell PowerEdge R770 Review A Fluid New 2U Server

Scalar Tape Storage

Why Certification Has Become a Strategic Control for CISOs