LLM Workflow for CAB // Michal Kups

Premise

Have you ever received a CAB (change advisory board) submission only for it to be deferred? A key element was absent or an important aspect not considered?

The idea here is to build a basic proof-of-concept workflow that reviews CAB changes prior to them reaching CAB - giving the author an opportunity to revise their submission and making it more likely to succeed.

An LLM wont have all the context it needs nor a perfect understanding of your environment. That’s okay - we strive to build a workflow with a set of operating principles and guidelines that guide the model to produce useful feedback and strengthen the change control process.

The aim is not to completely remove humans from the process. Human in the loop is integral and will have final decision authority.

I’d consider this workflow a success if it leads to refined submissions, more expedient processing of changes, and an overall stronger change control process.

Nor is it limited to CAB - the basic principles here would work for any review workflow, I imagine.

This work can serve as a starting point for a broader and more capable AI agent later down the line.

Tools

I’ll be using..

..to make this work. If you lack the hardware to run models locally, then ollama components can be easily swapped out for an LLM API providers such as OpenAI , Anthropic , Google , or another one of your choosing.

Designing a workflow

Let’s consider, broadly, the approaches we can take:

Single-pass
- Regular model with tuned system prompt
- CoT model (chain-of-thought) built-in
- RAG with environmental context
Multi-pass
- Regular model with tuned system prompt
- CoT model (chain-of-thought) built-in
- RAG with environmental context
- Role-playing
Agentic
- RAG with environmental context
- Role-playing
- Tool or function calling

Additional considerations:

System Prompt Tuning:
- does a highly detailed system prompt produce better outcomes, is there a point of diminishing returns?
- how closely can a model ahere to instructions and response formats?
Retrieval-Augmented Generation (RAG):
- can environmental context - ie infrastructure code, documentation, emails - impact quality of the evaluation?
- can a model even make sense of all the data we’re throwing at it?
- can additional context lead to better evaluation or just more nit-picking and repetition?
Role-playing:
- does simulating the perspectives of stakeholders improve the overall quality of the analysis? ie a simulated interaction between an engineer and a manager.
- adversarial models - one model tries to aggressively justify the change to go ahead, whilst the other argues for it to be rejected. Does that lead to better discovery of edge cases?
Agents:
- realistically, how far can we break down a task before we see diminishing returns?
- can multiple tuned experts produce better results than just a clever system prompt?
Tool Use:
- what kind of tools are available to the LLM and how can they expand its reach?
- is there a limit to how useful tools can be? Beyond various means of information retrieval, do we want an agent to have the capability to independently test a change? Would that even be possible or are we encroaching on the responsibilities of engineers?
Context window:
- can we fit all the relevant information and supporting evidence in-context?
- is there a trade-off to be made between cheaper models vs more capable models? probably not at the scale we’re working with.
- RAG helps when data to be evaluated exceeds context window, but then again - can a model make sense of all that data?
Model Selection:
- Does using different LLMs produce better results?
- are there any biases or weaknesses in specific models that can be overcome by mixing multiple models at different stages of the analysis?

To keep things manageable, I’m going to start off with a multi-pass approach, using Microsoft’s Phi4 model . I’m going to leave out RAG, role-play and tool-calling agents for later iterations.

Prompt tuning

To let us iterate faster, I’m including a CAB generator - just another LLM call with a custom prompt - to generate unique CAB submissions on each run. This will give me a large variety to test with, I’m hoping that it will help me ascertain the effectivness of the workflow across a wide spectrum of inputs.

The workflow

This is the logcal flow at a high level. In my testing, substitute “human” for an LLM generating inputs. We can swap between regular, CoT, and other models to see what kind of results we get.

Langflow Implementation

Langflow 1.1.1 running in docker
ollama running on a server in my network
Phi4 large language model from Microsoft.

Workflow floorplan in Langflow:

With everything set up, let’s run the workflow and..

Run through the results

First, model generates a change..

► CAB Generator - click to expand

Change Title: Implementation of Automated Backup System for On-Premises File Servers

Description: This change involves the deployment of an automated backup solution for our on-premises file servers to enhance data protection and disaster recovery capabilities. The new system will utilize a third-party tool, “BackupPro,” which supports incremental backups, encryption, and cloud storage integration.

Reason for Change: The current manual backup process is time-consuming and prone to human error. Implementing an automated solution will ensure regular, reliable backups, reduce administrative overhead, and improve our ability to recover data quickly in the event of a failure or disaster.

Impact:

Systems Affected: On-premises file servers (FS01, FS02, FS03)
Services Affected: File storage services used by HR, Finance, and Operations departments
Users Affected: Approximately 200 employees across multiple departments

Level of Detail: The change request includes a high-level overview of the solution architecture, integration points with existing systems, and a brief outline of the configuration settings. Detailed technical specifications will be provided in an appendix.

Urgency: This is scheduled as part of our quarterly IT improvement plan. The implementation should occur within the next 60 days to align with our data protection policy review cycle.

Risk Level: Medium risk due to potential disruptions during initial setup and integration testing. However, thorough planning and a phased rollout will mitigate these risks.

Backout Plan:

Step 1: Revert configuration changes on affected servers.
Step 2: Restore previous backup routines temporarily until the issue is resolved.
Step 3: Conduct a root cause analysis to prevent recurrence.

Test Plan:

Pre-Implementation Testing: Validate BackupPro installation and configuration in a test environment.
During Implementation: Monitor system performance and log any errors during initial backups.
Post-Implementation: Verify backup integrity by performing a full restore on a test server.

Maintenance Window: The implementation will occur over two weekends, with the first phase focusing on setup and testing (Saturday-Sunday), followed by the second phase for final deployment and validation (Saturday-Sunday).

Target Audience: This change request is prepared by the IT Operations Manager, reflecting a balance between technical detail and strategic overview suitable for both technical staff and management review.

..it reviews the basics..

► Basics review - click to expand

..then the risks..

► Risk review - click to expand

..then resource implications..

► Resources review - click to expand

.. communications plan and approvals..

► Comms & approvals review - click to expand

.. and now collate responses, do another review, and generate a recommendation:

► Report & Recommendation - click to expand

Setup

If you want to try any of this yourself, I’ve uploaded a docker compose file and the Langflow template to a GitHub repo. Follow the steps below to reproduce my setup. I’m assuming you’ve got docker and ollama installed already.

git clone https://github.com/MRKups/llm-workflow-cab-1.git, or download from web
cd llm-workflow-cab-1
docker compose up -d
Open up a terminal, ollama serve
Open http://localhost:7860/
Select New Flow, Blank Flow
Click on flow title (top center), Import
Import cab-workflow-1.json
Run the workflow

Closing remarks

Overall, I’m quite pleased with how this turned out. While the quality of the final output is heavily dependent on the quality of the model you are using, the results produced by Phi4 are very good. Phi4 was able to extract all the relevant information from the change and evaluate it, correctly highlighting weakness areas and calling attention to unaddressed risks.

Large language model workflows and ‘AI’ agents are opening up a new frontier, bringing the next wave of automation and creating new opportunities. Let’s remember - models and integrations will only improve as time goes on, enabling new use cases and applications.

I hope this got you thinking about how these tools can be applied to everyday problems and what else may be possible with time.