Benchmarking Codethreat’s AI SAST Engine

16.01.2026

What Actually Matters in AppSec Tooling

Application security tools are notorious for flooding teams with irrelevant alerts, missing context-rich vulnerabilities, or simply underperforming in real-world conditions. Developers end up tuning out the noise, and security teams struggle to prioritize effectively.

At Codethreat, we designed a benchmark to measure what truly matters:

Can a tool catch real security issues, not just textbook examples?
Can it reduce the burden of false positives?
Can it understand code context, not just match patterns?

This benchmark is our attempt to validate that.

Why This Benchmark Format Matters

We didn’t build the vulnerable projects from scratch but we deliberately used an anonymized, de-biased version proposed.

Many public benchmarks unintentionally contain cues that make detection easier:

File names like xss_example.js
Variables called unsafe_input
Comments that highlight the exact vulnerability

Such hints can inflate accuracy, especially for AI-based tools trained on large corpora of known patterns.

Codethreat’s AI agents operate differently. They infer risk based on developer intent, data flow, and structural context. To evaluate this, we removed every artificial clue, simulating how a real-world team might encounter and fix a bug in production.

The Benchmark Setup

🧪 39 open-source projects
⚠️ 35 critical security vulnerabilities seeded
✅ Both patched and unpatched versions included
📄 No CWE hints, no explicit comments, no suggestive filenames
📊 Outputs parsed in SARIF, with recall and false-positive metrics compared across tools

Tools evaluated:

Codethreat
ZeroPath
Semgrep
Snyk
Bearer

Full source and validation process: GitHub - Codethreat Benchmark Repo

📊 Benchmark Results

Technical Vulnerabilities

Tool	Detection Rate	False Positive Rate
CodeThreat	88.57%	0%
ZeroPath	77.14%	5%
Semgrep	54.29%	5%
Snyk	42.86%	25%
Bearer	5.71%	0%

Based on 35 technical vulnerabilities across 39 benchmarks (XSS, SQLi, SSTI, Command Injection, and more)

Business Logic & Authentication Vulnerabilities

Tool	Detection Rate	False Positive Rate
CodeThreat	100%	0%
ZeroPath	87.5%	0%
Semgrep	12.5%	0%
Snyk	0%	0%
Bearer	0%	0%

Based on 8 business logic benchmarks, including broken authentication, missing authorization, and complex data validation issues.

Key Insights

Context still matters. Many tools struggle with vulnerabilities spread across files, behind framework abstractions, or involving deep control flow. Codethreat’s architecture-aware agents excelled in these scenarios.
False positives remain a major barrier. Especially in CI/CD pipelines, noisy alerts erode developer trust. Only Codethreat maintained precision without compromise.
Business logic and repo-level flaws are still hard to catch. While no tool achieves full coverage, Codethreat’s approach combining SAST, structural mapping, and PR-level AI reviews shows promising direction.

Codethreat blends traditional and intelligent analysis layers:

• 🔍 Rule-based SAST with rich language support

• 🧠 AI-powered contextual analysis aligned with developer workflows

• 🔄 Control/data flow resolution across repository structures

• ❌ A false positive elimination engine filtering before surfacing

Try It Yourself

The benchmark is open-source and reproducible. We invite the community to contribute, validate, and explore the results.

🔗 View the benchmark project on GitHub →

🚀 Book a product walkthrough or request early access →

Blogs

Read Our Blogs and News

Discover expert insights, trends, and tips that help you navigate the world of finance and technology.

person-office-analyzing-checking-finance-graphs

[AI SECURITY][CLAWHUB][SKILLS][SAST] Copy

Feb 14, 2026

Introducing Agent Component Security Index

[AI SECURITY][CLAWHUB][SKILLS][SAST] Copy

Feb 14, 2026

Introducing Agent Component Security Index

[AI SECURITY][CLAWHUB][SKILLS][SAST] Copy

Feb 14, 2026

Introducing Agent Component Security Index

AI code assistants can write insecure or outdated code that “looks” correct. Learn how to spot and prevent hallucinated vulnerabilities.

[AI] [SECURITY]

Jun 12, 2023

Hallucinative-Insecure GPT Code Generation

[AI] [SECURITY]

Jun 12, 2023

Hallucinative-Insecure GPT Code Generation

[AI] [SECURITY]

Jun 12, 2023

Hallucinative-Insecure GPT Code Generation

From null checks to input normalization, learn how to detect and act on suspicious patterns before they turn into security breaches.

[CODE SECURITY] [PRODUCT]

Dec 14, 2022

Catch Malicious Behavior in Your Code Early

[CODE SECURITY] [PRODUCT]

Dec 14, 2022

Catch Malicious Behavior in Your Code Early

[CODE SECURITY] [PRODUCT]

Dec 14, 2022

Catch Malicious Behavior in Your Code Early

FAQ

Frequently Asked Questions

Need help? Our FAQ section covers all the basics to guide your Spendex experience smoothly.

FAQ

Frequently Asked Questions

Need help? Our FAQ section covers all the basics to guide your Spendex experience smoothly.

How does the AI in Spendex work?

Is Spendex free to use?

Is my financial data safe with Spendex?

What accounts can I connect to Spendex?

Can I manage multiple wallets or users?

What insights does Spendex provide?

Can I collaborate with others on my account?

What happens if I cancel my subscription?

Is there a mobile app for Spendex?

How do I contact support?

How does the AI in Spendex work?

Is Spendex free to use?

Is my financial data safe with Spendex?

What accounts can I connect to Spendex?

Can I manage multiple wallets or users?

What insights does Spendex provide?

Can I collaborate with others on my account?

What happens if I cancel my subscription?

Is there a mobile app for Spendex?

How do I contact support?

How does the AI in Spendex work?

Is Spendex free to use?

Is my financial data safe with Spendex?

What accounts can I connect to Spendex?

Can I manage multiple wallets or users?

What insights does Spendex provide?

Can I collaborate with others on my account?

What happens if I cancel my subscription?

Is there a mobile app for Spendex?

How do I contact support?

get started

Ready to ship secure software?

Try CodeThreat with AI-powered reviews and less noise in every commit.

Book a Demo

Try for Free

get started

Ready to ship secure software?

Try CodeThreat with AI-powered reviews and less noise in every commit.

Book a Demo

Try for Free

get started

Ready to ship secure software?

Try CodeThreat with AI-powered reviews and less noise in every commit.

Book a Demo

Try for Free