Case Study
Local LLM
2026
Confidential Client

An In-House AI Processing Cluster for Scalable Marketing Operations

How SystemFabric built a local AI cluster using Mac minis and open-source LLMs to reduce token costs and scale marketing automation workflows.

Reduced Token Spend
80-90%
Page Process Locally
1000s
Time Saved Per Average Batch
15+ hours
01
Context

A marketing / creative agency (confidential client) was rapidly integrating AI into nearly every part of its workflow. Teams were using language models for SEO production, content transformation, campaign customization, creative ideation, internal knowledge querying, and repetitive operational tasks that traditionally consumed large amounts of manual time — and tokens.

As AI adoption increased internally, the organization began treating language models less like standalone chat tools and more like infrastructure — something that could power systems, utilities, automations, and internal production workflows.

The agency’s goal was not simply to “use AI,” but to create an operational environment where AI could reliably support day-to-day business processes at scale.

To support that vision, SystemFabric designed and deployed a distributed in-house AI processing cluster built around interconnected Apple Silicon hardware, including Mac Minis and Mac Studios. The environment combined locally hosted language models, custom orchestration tooling, monitoring systems, and internal applications to create a scalable AI utility layer for the organization.

Rather than depending entirely on cloud-based APIs, the system enabled the agency to run large portions of its AI workflows locally — reducing ongoing operational costs while increasing visibility, flexibility, and control.

02
The Problem

As AI usage expanded, several operational and technical challenges began to emerge.

Rising Token Costs

Many of the agency’s workflows involved highly repetitive operations with predictable outputs. Tasks like generating thousands of SEO meta descriptions, processing large sitemaps, categorizing content, or customizing bulk marketing copy generated substantial API token usage over time.

While each individual request was relatively inexpensive, the aggregate operational cost became difficult to justify for production-scale workflows.

Sequential Workflow Bottlenecks

Traditional AI workflows tend to operate sequentially: submit a request, wait for a response, repeat.

For small-scale experimentation this works well, but the agency increasingly needed to process large volumes of information simultaneously. Waiting for one process to complete before another could begin slowed production and limited the ability to scale automation efforts.

Lack of Operational Visibility

As additional local AI services and tools were introduced, it became difficult to understand what workloads were running and where.

Interactive chat sessions, automation pipelines, background services, and large-scale batch jobs were all competing for compute resources across different machines. Without centralized visibility, workloads could unintentionally interfere with one another and create unpredictable system performance.

The agency needed a way to monitor:

  • Which jobs were running
  • Which node was processing them
  • Current resource utilization
  • Queue and workload status
  • Potential conflicts between users or services

Security & Data Ownership Concerns

Some workflows involved proprietary strategies, internal operational data, or unreleased client material.

Routing every process through third-party cloud APIs introduced unnecessary exposure and limited the organization’s ability to run secure or fully offline workflows when needed.

The agency wanted tighter control over both its data and the infrastructure powering its AI systems.

This has changed the game when it comes to cost. We used to spend dollars per head on top of token costs; now we only pay for those special usecases we can't handle in-house.

Confidential, Agency Partner

03
Approach

SystemFabric designed a modular, distributed AI processing environment centered around locally hosted language models and scalable Apple Silicon hardware.

Rather than relying on a single server, the architecture used multiple interconnected nodes that could independently run language models, process workloads, and contribute compute resources to the larger cluster.

The design emphasized four core principles:

Modular Scalability

The cluster was intentionally designed so that hardware did not need to be identical across the environment.

New Mac minis or Mac Studios could be added incrementally over time depending on workload requirements, available budget, or performance needs. This allowed the organization to scale horizontally without replacing the entire system.

Different nodes could also specialize in different workloads depending on their memory capacity or processing power.

Local LLM Infrastructure

Each node hosted locally running language models using Ollama, allowing the organization to process requests entirely within its own network.

This provided several advantages:

  • No token-based usage fees for local processing
  • Reduced dependency on external APIs
  • Greater control over model selection
  • Offline capability
  • Lower latency for internal tasks
  • Improved privacy and security

The system also supported multiple models simultaneously, allowing different workloads to be routed dynamically depending on speed, complexity, or hardware availability.

Distributed Batch Processing

A major component of the architecture was the creation of a custom batch-processing framework capable of distributing workloads across multiple nodes simultaneously.

Instead of processing large jobs sequentially, the system could divide tasks into smaller chunks and run them in parallel across the cluster.

This dramatically improved throughput for repetitive operational tasks while keeping compute costs predictable and extremely low.

Centralized Monitoring & Orchestration

To maintain system stability, SystemFabric developed a custom monitoring and orchestration layer that sat above the cluster itself.

The platform provided visibility into:

  • Active workloads
  • Node utilization
  • Running services
  • Batch job progress
  • Queue status
  • System-wide processing activity

This transformed the cluster from a collection of independent machines into a coordinated AI infrastructure platform that multiple team members could safely use simultaneously.

For example, one employee could interact with a locally hosted chatbot while another launched a large-scale SEO processing task without destabilizing the overall environment.

04
Solution

With the infrastructure in place, SystemFabric developed a series of internal tools and operational workflows tailored specifically to the agency’s needs.

SEO Metadata Generation System

A custom SEO utility was developed to scan website sitemaps, analyze page structures, and generate large volumes of metadata using locally hosted language models.

Because the processing occurred entirely within the local cluster, the organization could run large-scale metadata generation workflows without incurring API token costs.

Sitemap Expansion & Content Processing Tools

Custom crawlers and processing tools were created to:

  • Analyze sitemap structures
  • Extract and transform content
  • Categorize website pages
  • Generate structured outputs
  • Automate portions of SEO auditing workflows

The distributed architecture allowed these tasks to run continuously and in parallel across multiple nodes.

Creative & Marketing Utilities

Additional internal tools were developed to support broader creative and marketing operations, including:

  • Color palette generation
  • Campaign customization
  • Structured copy transformation
  • Creative ideation utilities
  • Content formatting workflows

Because these applications operated on top of the local cluster infrastructure, the organization could experiment freely without worrying about incremental API costs.

Local Network & Offline Processing

The orchestration platform was designed independently from the language models themselves, allowing internal tools to interact directly with local infrastructure and file systems.

This enabled workflows that could:

  • Access local servers and shared storage
  • Read and write files directly
  • Operate entirely offline
  • Process proprietary datasets securely
  • Integrate with internal operational systems

The result was an AI environment that behaved more like internal infrastructure than a standalone chatbot platform.

Operational Outcomes

The completed system provided the agency with a scalable, production-ready AI processing environment optimized for operational efficiency.

Key outcomes included:

  • Significant reduction in recurring token costs
  • Faster throughput through parallelized workloads
  • Improved visibility into AI resource usage
  • Greater operational stability across teams
  • Enhanced privacy and local data ownership
  • Flexible long-term scalability through modular hardware expansion

Most importantly, the organization gained the ability to treat AI as a reusable internal utility layer — one capable of powering automation, production workflows, and creative operations at scale without relying entirely on external AI services.

Ready when you are

Bring us a real workflow.
We'll bring a path forward.

A 30-minute conversation is usually enough to know whether AI is a fit, what to build first, and how soon you could see something working.