The Architecture of an Autonomous Newsroom: Building a 5-Agent Pipeline
Overview
I built this autonomous pipeline to see if agentic orchestration could replicate a high-quality editorial desk with zero manual overhead. This is a a tech news stream that removes the “noise” (deals, opinions, fluff) using a multi-model agentic approach.
The Engineering Architecture
The pipeline has 5 component that takes care of sourcing the news to publishing it on web and social. The pipeline is hosted on Modal.com for serverless execution, and managed by a custom Python Orchestrator.
The Pipeline
| Component | Models(configurable) | Primary Responsibility |
| 01. Discovery | Python / Custom | Scrapes raw feeds, extracts full-text via newspaper3k, and handles de-duplication. |
| 02. Classifier Agent | Gemini 2.0 | Filters out “Other” categories (Deals, Opinions, Fluff), and classifies the categories. |
| 04. Author Agent | GPT-4o | Writes the AI summary of the sourced news. |
| 05. Editor | Claude 3.5 Sonnet | Proofreads for tone and cross-references against the original source text. |
| 06. Publishing | APIs | Publishes on WordPress and twitter |

Stateful Optimization & Caching
To keep the system efficient and cost-effective, the Discovery Agent interacts with two distinct caches:
- Non-News Cache: A list that prevents re-processing the filtered one at every run.
- Published Article Cache: Ensures that even if a story is covered by 10 agencies, pipeline only publish it ones..
Data Architecture
“No-Database”
One architectural choice in WAYR is that the Orchestrator maintains zero local database state for articles. Instead of mirroring content in a local SQL/NoSQL DB, I am using WordPress as our primary Source of Truth.
- Direct Injection: Once an article passes the Editor Agent, it is POSTed directly to the WordPress DB.
- Why this works: This simplifies the stack and removes the “Dual-Write” problem and It don’t have to worry about our local database getting out of sync with what is actually live on the site.
Caching vs. Storage: While the articles themselves live in the WordPress DB, It only maintain lightweight JSON caches on a persistent Modal volume for de-duplication and efficiency tracking.
Evaluation
The Evaluation Framework benchmarks the Classifier against a “Golden Dataset” of 50+ manually labeled samples.
- Precision: The evaluation is divided into two levels.
- L1: This makes sure that non-news articles are not getting in (100%).
- L2: This makes sure that categories are correct (84%).
- Observability: Any “Other” classification triggers a server log. I send them to a slack bot and feed use them to imporve prompt later.
Experience the Pipeline
If you want to see the output of this in real-time, you can explore the platform through several channels:
Read the Articles: View the Feed – See how the Author and Editor agents synthesize complex tech news into high-signal reports.
RSS Feed: Subscribe – The purest way to get our curated stream without social algorithms.
LinkedIn: Follow WAYR – Follow for daily news post..
Twitter: Follow WAYR on X