Optimizing application architectures for AI: From monoliths to intelligent agents (2 of 2 blogs series)

September 30, 2025Luis I. Cortés, Bernard Tison8-minute read

No one enjoys menu-based customer service. Pressing “1 for billing” or “2 for support” feels outdated, and repeating the same problem to different agents is frustrating. Even when we try the “operator” trick, we’re just hoping for someone to pick up and understand us right away. But what we really want is simple: a fast and accurate resolution, without friction.

If that’s how you feel, you are not alone. Effective and speedy issue resolution, prompt responses and shorter wait times are important aspects of customer service. Customer expectations have increased. Firms who allow customers to navigate multiple contact channels offer more options to satisfy their customers. These trends mean that traditional architectures (monoliths, rigid workflows, single large models or over-reliant on human agents) are being stretched to or past their limits. Does that create an opportunity for more modular, agent-based architectures designed for flexibility, orchestration, and governance?

Together with Mark Cheshire, Red Hat Summit 2025 offered us a chance to show a different way to think about this problem. Our session and demo was not your typical chatbot, and it wasn’t a giant language model forced into a helpdesk. It was a working system that used multiple specialized AI agents, in an agentic AI approach, orchestrated so each was doing one specific task, but working together like a well-coordinated team.

The best part was that this system was not theoretical. We showed it running on stage, built on Red Hat OpenShift, connected through Kafka, and orchestrated based on input from the Model Context Protocol (MCP). In this blog we will describe it step by step.

In our previous article, we explored the transformative impact of AI on enterprise application architectures. We described how intelligent agents (runtime components capable of interpreting natural language, making decisions, and interacting with APIs) are revolutionizing how we develop software applications and are able to automate decision-making that was traditionally done by humans. Powered by modern language models and microservices, we described how these agents enable us to build flexible, adaptable systems that can be composed dynamically using standards like MCP. We concluded that this shift allows us to focus on solving real-world problems rather than wrestling with complex architectures, making it easier for developers to create innovative solutions that drive business value.

Now, let’s explore why this new agent-based approach is relevant, how it works, and what it means for the future of business, using a customer support process as an example.

Why does current customer support fall short?

Customer service in most companies has grown messy. Requests come in from everywhere, like from phones, emails, live chat, SMS, social media, and even messaging apps like WhatsApp or Signal. Each channel has its own tone, expectations, and speed. Trying to manage them all in one workflow is difficult. Unless we provide customer support teams with some way to manage this stream of messages, it can become a highly manual process with slow response times and diminished customer satisfaction.

To start with, the requests themselves are not always clear. Customers write in natural language, and often with incomplete details:

“My order never came.”
“Can you check on this charge?”
“Something broke with my login.”

These aren’t database queries—they’re messy, human messages. Organizations have tried to automate parts of this process using chatbots and basic natural language processing (NLP). But in reality, human agents still handle the bulk of interpretation and resolution. This increases costs, slows down responses, and can create burnout. You may be surprised to know that call center agent turnover rates average 30-45% annually, and that companies using AI and automation report significant reductions in agent attrition.

At the same time, customer expectations are rising. People expect fast, personalized, and accurate answers, sometimes in real time. Simply hiring more human agents doesn’t solve the architectural problem. The system itself needs to work differently.

The new approach: a workflow of specialized agents

This is where agents and generative AI (gen AI) come in. What if each step in the process was handled by an intelligent component trained for that specific task? What if we could find a way to lessen their workload, improve the quality of their input, or send output automatically?

As we saw in the previous article, the history of software architecture is a series of responses to growing demand. We've moved from the simple, rigid structure of early monoliths—difficult to scale, like a single, unmovable building—through more flexible layered architectures, to today's cloud-optimized standard of independent, scalable microservices. This progression was driven by the need to handle exponentially more users and more data.

With agentic systems that are context-aware and adaptive, we’re now entering a new phase. Instead of following static rules, agents can interpret context, make decisions, and coordinate actions. They’re not just “services” that run fixed logic, they adapt based on real-time inputs.

In customer support, this shift is natural. The key challenge is understanding requests, routing them correctly and figuring out the right answer (exactly what agents are designed to do!)

How do we architect for AI agents?

In our Red Hat Summit presentation, Bernard demonstrated an architecture based on multiple small agents, each responsible for one specific job. Think of it like a relay race: every agent “runs” its task, then passes the output to the next.

Here’s how it works:

1. Input moderation

The first agent filters messages. We never know what’s in a customer message. This agent specializes in removing offensive or harmful content and detecting possible attacks, like prompt injection, where a user can try to trick the model into misbehaving. This is similar to how developers block SQL injections in databases. Clean input makes sure every following step works safely.

2. Information extraction

A language model extracts key details:

Who is asking? (customer name, account ID)
What do they want? (intent, urgency, type of request)
Any identifiers? (order numbers, product types)

The outcome of the process is not written in free-form prose but delivered as structured data in JSON format. The reason is straightforward: automation and downstream systems work most effectively when information is expressed in a clear, predictable, and machine-readable structure. JSON provides explicit fields, consistent syntax, and a schema-like organization that eliminates ambiguity. This makes it easier for software to parse, validate, and act upon the data without requiring additional interpretation or human intervention. By choosing JSON instead of narrative text, the results can be integrated directly into the next step with minimal friction.

3. Context enrichment

Next, an agent retrieves related information from internal systems: customer relationship management (CRM), enterprise resource planning (ERP), or ticketing tools. Did the customer already open a ticket? Are they under warranty? Have they just placed an order? This step gives context so the system can make better decisions.

4. Classification and routing

The system decides what the request is about: billing, technical issue, refund, or something else. Of course the responsibility of sending the request to the right workflow is delegated to an agent, automated if possible, or human if necessary.

5. Action execution with MCP

Here’s where it gets interesting. Instead of letting the language model directly call APIs, the system uses MCP. In the previous article we explained that MCP allows agents to discover which tools are available in their environment or select the appropriate tool for a given task. That is what we will do.

The agent says: “I need this tool, with these parameters.”

The host system validates the request, executes it, and sends back the result.

This makes workflows safer and easier to manage as developers don’t need to write custom integration code for every tool. Also, if the server expands its capabilities, MCP will be able to expose those, and the agent will automatically be able to choose if any of those new capabilities is appropriate based on the message in a process known as "dynamic discovery."

6. Natural language response

Finally, another agent generates a human-friendly reply, formatted for the right channel (email, WhatsApp, etc.). If something is unclear or incomplete, the system flags it for a human to review.

Each agent runs as an independent service, can be updated separately, and scales automatically using Red Hat OpenShift.

Why not just one big model?

While using a single large language model (LLM) might initially appear simpler, this approach can introduce drawbacks. Processing all operations through one large model inherently increases latency, and larger models also demand more computational resources, leading to higher operational costs. Also LLMs generally perform less effectively, with less predictable results, if the scope of the prompt is too broad.

Employing a network of smaller, specialized agents offers substantial benefits. For instance, if the accuracy of a classification task diminishes, only the specific classification agent requires retraining. Or integrating a new data source may only require an update to the relevant enrichment agent.

Each agent functions effectively as an autonomous, intelligent microservice. This modular design allows for independent debugging and targeted improvements, streamlining the overall development and maintenance process.

Why governance matters

Incorporating AI in enterprise systems that interact with customers can bring significant opportunities but also poses serious risk. When agents produce an incorrect answer, the impact can go beyond a simple mistake and expose the organization to legal exposure or damage its reputation. For this reason, safeguards are essential, and our demo builds them directly into its design:

Early and staged moderation. The system reviews content before it reaches the user and also at key checkpoints during the process, reducing the chance that harmful or inaccurate information slips through.
Human fallback loop. In case of doubt, the system does not act alone and brings human experts in to review and approve the response.

These layers of control help the system remain trustworthy in complex enterprise settings.

On top of this, OpenShift strengthens the environment by adding governance and reliability features. It tracks compliance, monitors performance, and manages the full lifecycle of AI models and applications. Together, these capabilities create an ecosystem where organizations can innovate with AI while staying consistent and resilient.

A real example: from request to answer

We used a, a simple help request in our live demo:

“Hi, I’m Thomas. I need my order history from the past six months.”

This message went through the entire process automatically:

The message was cleaned.
Key fields (name, intent, order history request) were extracted.
Customer details were enriched from CRM.
The request was classified as a billing issue.
The getOrderHistory tool was triggered via MCP.
A friendly WhatsApp message returned the order history to Thomas.

No menus, no waiting for an operator, just a clear answer!

What this means for enterprise architectures

We learned a few lessons while putting together our Red Hat Summit session and demo.

Smaller, specialized agents are far easier to maintain and evolve than one massive, monolithic model. By breaking complexity into manageable parts, teams can iterate faster and adapt as business needs change.
Getting these agents to work together reliably doesn’t happen by accident. Orchestration plays a central role in shaping reliable AI workflows, making sure that each component does its job at the right time and in the right context.
Governance must be part of the design from the beginning. Treating it as an add-on only increases risk and slows down adoption. When compliance, transparency, and oversight are built in, enterprises can move with greater confidence.
And last but not least, Red Hat OpenShift can provide the operational backbone needed to run AI systems in production with built-in scaling, security, and monitoring, whether you’re experimenting with a few agents today or scaling to thousands tomorrow.

This approach isn’t about replacing people, but about building systems that don’t waste their time and help them to elevate their work. Instead of patching old customer service tools, we can create smarter ones that respond the way humans expect.

Because in the end, no one should have to press “1” just to be heard.

Let us help you while you explore how to turn AI agents into your organization’s secret sauce. You can start by reading about how to navigate the generative AI landscape, Red Hat’s approach to agentic AI and tooling up your LLM with Apache Camel on OpenShift.

About the authors

Luis I. Cortés

AI & GSIs, Product Porfolio Marketing and Learning

Luis I. Cortes brings 20 years of experience in enterprise software. He specializes in generative AI, Red Hat partners, and startup ecosystems. From starting up technology companies, to raising funds to grow and scale them globally, to helping multinational technology companies achieve new feats, Luis is all about innovation and growth.

Read full bio

Bernard Tison

Keep exploring

Browse by channel

Explore all channels

Optimizing application architectures for AI: From monoliths to intelligent agents (2 of 2 blogs series)

Why does current customer support fall short?

The new approach: a workflow of specialized agents

How do we architect for AI agents?

1. Input moderation

2. Information extraction

3. Context enrichment

4. Classification and routing

5. Action execution with MCP

6. Natural language response

Why not just one big model?

Why governance matters

A real example: from request to answer

What this means for enterprise architectures

Get started with AI Inference

About the authors

Luis I. Cortés

Bernard Tison

More like this

Keep exploring

Browse by channel

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links