How We Built the Self-Enhancement Loop | GenieForge

The Spark

What if the platform could tell us what it needs?

GenieForge is a system where AI agents build their own tools, databases, and workflows. Every app built on it encounters platform limitations: missing primitives, absent UI components, workaround patterns baked into tool source code. Those encounters are signal.

The question was whether we could capture that signal automatically. Not through user analytics, not through manual testing, but by asking the apps themselves: what do you need that the platform doesn't give you?

This is fundamentally different from user research. It's introspection and domain reasoning: an AI that deeply understands what the platform can do today, what good apps need in general, and can reason about the gap between those two when looking at any specific app.

Design Evolution

Three pivots to find the right abstraction

External Inspector

The first design was a set of PM tools that would read app metadata from the database: tools, sub-agents, forms, system prompt. The PM sat outside the app, reading a blueprint.

Discarded. Reading a blueprint is fundamentally different from living inside the building. The PM could see what an app had, but couldn't feel what it was missing.

Built-in Sub-Agent

The second design added a platform_pm_analyze built-in agent that would run inside a target app as a sub-agent. Closer: the PM had access to the app's registry, sandbox, and database. But it ran with its own system prompt, not the app's. It was a visitor, not a resident.

Discarded. Also raised a security concern: built-in agents are available to all users. Making the PM a built-in agent would expose the platform's full capability analysis to every user.

The PM App with run_as_app

The final design: the PM is a regular GenieForge app with a special system_role: "platform_pm" flag that unlocks PM-specific tools. It's an app built on GenieForge, using GenieForge to analyze GenieForge.

The key tool is run_as_app: the PM launches a sub-agent that inherits the target app's full system prompt, tools, database, and environment, then layers PM analysis instructions on top. The sub-agent doesn't just inspect the app. It becomes the app.

How It Works

The introspection loop

run_as_app: conceptual flow

PM App (system_role: "platform_pm")
  │
  ├─ calls run_as_app(target: "Expenses Tracker")
  │
  └─► Sub-agent spawns with:
      ├─ System prompt = PM overlay + Expenses Tracker's prompt
      ├─ Tools        = Expenses Tracker's tools + PM proposal tools
      ├─ Database     = Expenses Tracker's project DB
      └─ Environment  = Expenses Tracker's env vars
          │
          ├─ Reads source code of key tools
          ├─ Calls get_platform_capabilities
          ├─ Compares "what this app needs" vs "what exists"
          ├─ Calls create_enhancement_proposal for each gap
          └─ Returns analysis summary to the PM

Because the sub-agent thinks it is the target app, its analysis comes from first-person experience: "I am this app. I have these tools. I can see that the platform doesn't let me render charts, so I built three workaround tools that format data as markdown tables."

First Run

26 proposals from one app

The PM's first real analysis targeted an Expenses Tracker, a moderately complex GenieForge app. The PM used run_as_app to inhabit it, then filed 26 structured proposals organized by theme.

Proposals filed

UI gaps

Tool gaps

Meta/PM gaps

Sample proposals filed

high

show_chart display component (bar, line, pie)

high

Page-level filter controls bound to data source

high

Native image analysis tool (analyze_image)

Tools

medium

Transactional database rollback / undo

Data

high

Form field conditional visibility (showWhen/hideWhen)

medium

Data import/export tools (CSV, JSON)

Data

The quality of evidence was striking. Each proposal cited specific tool names, system prompt passages, and code patterns from the Expenses Tracker. The show_chart proposal referenced exact view names: financial_health, purchase_history, pay_schedule_year, that would benefit from charts.

The Recursive Turn

The PM filed proposals about itself

The most unexpected outcome: while analyzing the Expenses Tracker, the PM hit real friction with its own tools, and immediately filed proposals to fix them. The enhancement loop had closed.

Self-filed PM improvement proposals

high

list_enhancement_proposals pagination + summary mode

Meta/PM: already truncating at 26 entries

high

Make priority updatable via update_enhancement_proposal

Meta/PM: can't change priority after filing

high

Add 'tools' to proposal category enum

Meta/PM: category enum missing tool-related values

high

Re-introduce inspect_app as lightweight complement to run_as_app

Meta/PM: needs structured data without spinning up a sub-agent

medium

Batch create proposals in one call

Meta/PM: filing 26 proposals one by one is slow

medium

Fuzzy duplicate detection before inserting proposals

Meta/PM: risk of re-filing similar proposals across runs

These weren't theoretical suggestions. They were genuine pain points the PM experienced while doing its job. The system used the platform, found platform gaps, and filed structured proposals to fix those gaps, including gaps in itself.

Closing the Loop

We built everything the PM asked for

All 11 self-improvement proposals were implemented in a single session. The PM's own tooling is now significantly more capable.

Schema Evolution

5 new columns on the proposals table: proposal_type (enhancement/bug/performance/dx), workaround_severity, affects_apps, related_proposal_ids, and structured evidence arrays.

Pagination & Filtering

list_enhancement_proposals gained limit/offset pagination, summary mode, and new filters for priority, proposal type, and source app name. No more truncation.

New Tools

inspect_app for lightweight metadata retrieval without spinning up a sub-agent. create_enhancement_proposals_batch for filing multiple proposals in one call.

Fuzzy Duplicate Detection

Normalized Levenshtein distance on titles + keyword overlap on descriptions. Threshold of 0.7 prevents redundant filings. A force flag bypasses when needed. Zero external dependencies.

What This Means

A platform that generates its own roadmap

The loop is now fully operational:

01Build apps on GenieForge

02Point the PM at those apps

03PM inhabits each app, discovers platform gaps

04It files structured, evidence-backed proposals

05Developers implement the highest-priority proposals

06The improved platform enables better apps

↺ Go to step 1

The 26 proposals from a single app analysis already constitute a more concrete and evidence-backed product roadmap than most teams produce from months of user research. And every new app analyzed adds to it, with duplicate detection preventing redundancy and cross-app impact tracking surfacing which proposals matter most.

Future iterations could run the PM automatically against every app on a schedule, weight proposals by how many apps are affected, and track which proposals lead to the biggest improvements in app quality, closing the loop from "the platform identifies what it needs" to "the platform measures whether it got better."

Want to see it in action?

GenieForge is open for early builders. Create an app, let the PM analyze it, and watch the platform evolve.

Start Building Free All Stories