How it works - WAYR TODAY

The Architecture of an Autonomous Newsroom: Building a 5-Agent Pipeline

Overview

I built this autonomous pipeline to see if agentic orchestration could replicate a high-quality editorial desk with zero manual overhead. This is a a tech news stream that removes the “noise” (deals, opinions, fluff) using a multi-model agentic approach.

The Engineering Architecture

The pipeline has 5 component that takes care of sourcing the news to publishing it on web and social. The pipeline is hosted on Modal.com for serverless execution, and managed by a custom Python Orchestrator.

The Pipeline

Component	Models(configurable)	Primary Responsibility
01. Discovery	Python / Custom	Scrapes raw feeds, extracts full-text via `newspaper3k`, and handles de-duplication.
02. Classifier Agent	Gemini 2.0	Filters out “Other” categories (Deals, Opinions, Fluff), and classifies the categories.
04. Author Agent	GPT-4o	Writes the AI summary of the sourced news.
05. Editor	Claude 3.5 Sonnet	Proofreads for tone and cross-references against the original source text.
06. Publishing	APIs	Publishes on WordPress and twitter

Stateful Optimization & Caching

To keep the system efficient and cost-effective, the Discovery Agent interacts with two distinct caches:

Non-News Cache: A list that prevents re-processing the filtered one at every run.
Published Article Cache: Ensures that even if a story is covered by 10 agencies, pipeline only publish it ones..

Data Architecture

“No-Database”

One architectural choice in WAYR is that the Orchestrator maintains zero local database state for articles. Instead of mirroring content in a local SQL/NoSQL DB, I am using WordPress as our primary Source of Truth.

Direct Injection: Once an article passes the Editor Agent, it is POSTed directly to the WordPress DB.
Why this works: This simplifies the stack and removes the “Dual-Write” problem and It don’t have to worry about our local database getting out of sync with what is actually live on the site.

Caching vs. Storage: While the articles themselves live in the WordPress DB, It only maintain lightweight JSON caches on a persistent Modal volume for de-duplication and efficiency tracking.

Evaluation

The Evaluation Framework benchmarks the Classifier against a “Golden Dataset” of 50+ manually labeled samples.

Precision: The evaluation is divided into two levels.
- L1: This makes sure that non-news articles are not getting in (100%).
- L2: This makes sure that categories are correct (84%).
Observability: Any “Other” classification triggers a server log. I send them to a slack bot and feed use them to imporve prompt later.

Experience the Pipeline

If you want to see the output of this in real-time, you can explore the platform through several channels:

Read the Articles: View the Feed – See how the Author and Editor agents synthesize complex tech news into high-signal reports.

RSS Feed: Subscribe – The purest way to get our curated stream without social algorithms.

LinkedIn: Follow WAYR – Follow for daily news post..

Twitter: Follow WAYR on X