Research Report
The State of Vibe Coding Security
2026
How safe are apps built by prompting AI? A synthesis of published research on AI-generated code, combined with the failure taxonomy VibeSafe uses to scan vibe-coded apps.
Executive summary
AI coding tools have collapsed the distance between an idea and a deployed application. What they have not collapsed is the distance between a working application and a safe one. Across published audits of AI-generated code, roughly 45% of samples contain at least one security weakness — not because the models are careless, but because they optimize for functional output. Security is invisible in a demo, so it is systematically under-produced.
The consequence is a new population of production applications — built fast, launched publicly, and maintained by founders who cannot read the code that runs their business. This report describes what goes wrong in these apps, how the failures cluster, and which interventions measurably reduce risk.
Finding 1 — Five failure classes account for most real incidents
Vulnerability taxonomies list hundreds of categories. In vibe-coded apps, incident reports and scan data concentrate overwhelmingly in five:
Relative prevalence across VibeSafe's scan taxonomy and published incident reports; qualitative ordering, not a controlled sample.
Each one is a case where the AI's incentive (make it run) diverges from the founder's interest (make it safe). A hardcoded key makes the demo work faster than environment configuration. Disabled RLS returns data more reliably than policies. The model isn't wrong — it's optimizing the wrong objective.
Finding 2 — The failures are tool-agnostic
Comparing generated output across Lovable, Bolt, Cursor, and Replit shows the same gap profile with different emphasis: Lovable apps concentrate risk in Supabase RLS configuration; Bolt apps in frontend-embedded secrets; Cursor projects in unreviewed multi-file agent edits; Replit apps in project visibility and secrets management. No mainstream tool's default output is production-safe without review — and none claims to be. The gap is structural, not a defect of any vendor.
Finding 3 — Exposure is immediate, not eventual
Founders model security risk as something that arrives with scale. The data says otherwise: automated scrapers discover keys committed to public repositories in under ten minutes, and deployed frontend bundles are crawled continuously. A vibe-coded app is exposed to its full threat environment from the moment it has a URL — at zero users.
Finding 4 — The population shift is the story
With roughly 63% of vibe coders reporting no technical background, the median person deploying software has changed. Security tooling built for engineers — CVE feeds, CVSS scores, CI pipelines — assumes a reader who can act on that vocabulary. The new population cannot, so findings go unread and unfixed. Interventions that work share one property: they translate findings into plain-language actions ("move this key, here's how") rather than classifications.
Finding 5 — Review reverses most of the risk
The encouraging result: the five failure classes are all detectable by automated review in seconds, and all fixable without deep expertise. A pre-launch routine of scanning code, enabling database rules, testing logged-out, and rotating any exposed credentials addresses the large majority of realistic incident paths for an early-stage app. The risk of vibe coding is not intrinsic — it's a review step that the workflow skips by default.
Recommendations
- For founders: treat "works in preview" and "safe to launch" as separate milestones. Insert one review hour between them.
- For tool vendors: default outputs toward safe patterns (env vars, RLS-on) even at the cost of demo friction.
- For the ecosystem: security findings for this population must be written in plain language with an executable fix, or they will be ignored.
Methodology & sources
This report synthesizes published academic and industry audits of AI-generated code security (vulnerability-rate estimates in the 40–50% range across studies), public incident write-ups of leaked-credential abuse, market analyses of the AI app-builder segment, and the failure taxonomy VibeSafe uses in production scans of vibe-coded applications. Figures are cited as reported by their original sources; the five-class prevalence ordering reflects qualitative concentration across sources rather than a single controlled dataset. This is an independent report; no AI tool vendor participated or was consulted.
See where your app stands
Run the same checks this report describes — free, in about ten seconds, explained in plain English.
Scan your code free →Related: