How it works

The Architecture of an Autonomous Newsroom: Building a 5-Agent Pipeline

Overview

I built this autonomous pipeline to see if agentic orchestration could replicate a high-quality editorial desk with zero manual overhead. This is a a tech news stream that removes the “noise” (deals, opinions, fluff) using a multi-model agentic approach.

The Engineering Architecture

The pipeline has 5 component that takes care of sourcing the news to publishing it on web and social. The pipeline is hosted on Modal.com for serverless execution, and managed by a custom Python Orchestrator.

The Pipeline

ComponentModels(configurable)Primary Responsibility
01. DiscoveryPython / CustomScrapes raw feeds, extracts full-text via newspaper3k, and handles de-duplication.
02. Classifier AgentGemini 2.0Filters out “Other” categories (Deals, Opinions, Fluff), and classifies the categories.
04. Author AgentGPT-4oWrites the AI summary of the sourced news.
05. EditorClaude 3.5 SonnetProofreads for tone and cross-references against the original source text.
06. PublishingAPIsPublishes on WordPress and twitter
Architecture

Stateful Optimization & Caching

To keep the system efficient and cost-effective, the Discovery Agent interacts with two distinct caches:

  • Non-News Cache: A list that prevents re-processing the filtered one at every run.
  • Published Article Cache: Ensures that even if a story is covered by 10 agencies, pipeline only publish it ones..

Data Architecture 

“No-Database”

One architectural choice in WAYR is that the Orchestrator maintains zero local database state for articles. Instead of mirroring content in a local SQL/NoSQL DB, I am using WordPress as our primary Source of Truth.

  • Direct Injection: Once an article passes the Editor Agent, it is POSTed directly to the WordPress DB.
  • Why this works: This simplifies the stack and removes the “Dual-Write” problem and It don’t have to worry about our local database getting out of sync with what is actually live on the site.

Caching vs. Storage: While the articles themselves live in the WordPress DB, It only maintain lightweight JSON caches on a persistent Modal volume for de-duplication and efficiency tracking.

Evaluation

The Evaluation Framework benchmarks the Classifier against a “Golden Dataset” of 50+ manually labeled samples.

  • Precision: The evaluation is divided into two levels.
    • L1: This makes sure that non-news articles are not getting in (100%).
    • L2: This makes sure that categories are correct (84%).
  • Observability: Any “Other” classification triggers a server log. I send them to a slack bot and feed use them to imporve prompt later.

Experience the Pipeline

If you want to see the output of this in real-time, you can explore the platform through several channels:

Read the Articles: View the Feed – See how the Author and Editor agents synthesize complex tech news into high-signal reports.

RSS Feed: Subscribe – The purest way to get our curated stream without social algorithms.

LinkedIn: Follow WAYR – Follow for daily news post..

Twitter: Follow WAYR on X