Manifest AI’s Brumby-14B-Base: A Novel Approach to Efficient AI Architecture

This article was generated by AI and cites original sources.

Manifest AI’s recent introduction of Brumby-14B-Base represents a significant departure from traditional transformer models in the AI landscape. The new model, a retrained variant of Qwen3-14B-Base, eliminates the use of attention layers in favor of a novel mechanism called Power Retention. This architecture, designed to circumvent the computational and memory costs associated with attention, promises constant-time per-token computation regardless of context length.

Power Retention’s core innovation lies in its recurrent state update approach, which eschews the exhaustive pairwise comparisons of attention in favor of a more hardware-efficient mechanism. By maintaining a memory matrix that continuously compresses past information into a fixed-size state, the model achieves efficiency comparable to an RNN while retaining the expressive power of a transformer.

With training costs as low as $4,000 and parallel inference performance advancements, Brumby-14B-Base showcases the economic feasibility of attention-free systems and their potential hardware efficiency gains. The model’s ability to inherit and adapt transformer capabilities through retraining without starting from scratch opens doors for democratizing large-scale experimentation in AI development.

Overall, Manifest AI’s Brumby-14B-Base presents a compelling case for a shift in AI architecture paradigms, challenging the transformer’s dominance and paving the way for increased architectural diversity in the field.

Source: VentureBeat