Overview

FORGE is ZPAK's metacognitive prompt engineering harness — the craft layer of the intelligence stack. It provides structured development, versioning, and systematic evaluation for prompts that need to work reliably across sessions, models, and contexts — not just once in a demo.

Most prompt engineering is ad hoc: write something, see if it works, adjust, repeat. FORGE imposes engineering discipline on that process: every prompt has a version, an intent, test cases, and a documented evaluation methodology.

Core Capabilities

Structured prompt development — intent, context, constraints, output spec
Version control — prompt history with diff-level change tracking
Systematic evaluation — test cases, metrics, and regression detection
Cross-model portability — prompts tested against multiple inference backends
Metacognitive framing — prompts designed to reason about their own output quality
Tome integration — FORGE prompts feed directly into the Tome SKILL tier

Why Metacognitive?

A metacognitive prompt doesn't just ask for output — it asks the model to reason about whether the output meets the stated criteria, flag uncertainty, and self-correct before delivery. Combined with the RMCC five-pass quality gate, FORGE prompts produce output that is defensible, not just plausible.

This is what separates production-grade prompt engineering from hobbyist prompt hacking.

Status

DONEv0.3 active — core development and versioning harness
DONETome SKILL tier integration established
DONEMERIDIAN-compliant prompt templates library
WIPSystematic evaluation framework — test case library
WIPCross-model portability testing (Claude / Ollama / DeepInfra)
NEXTv0.4 — automated regression detection
NEXTFORGE as a ZPAK client deliverable (prompt library handoff)