BG Line
BG Line
BG Line

Benchmarking Codethreat’s AI SAST Engine

16.01.2026

16.01.2026

What Actually Matters in AppSec Tooling

Application security tools are notorious for flooding teams with irrelevant alerts, missing context-rich vulnerabilities, or simply underperforming in real-world conditions. Developers end up tuning out the noise, and security teams struggle to prioritize effectively.

At Codethreat, we designed a benchmark to measure what truly matters:

  • Can a tool catch real security issues, not just textbook examples?

  • Can it reduce the burden of false positives?

  • Can it understand code context, not just match patterns?

This benchmark is our attempt to validate that.

Why This Benchmark Format Matters


We didn’t build the vulnerable projects from scratch but we deliberately used an anonymized, de-biased version proposed.

Many public benchmarks unintentionally contain cues that make detection easier:

  • File names like xss_example.js

  • Variables called unsafe_input

  • Comments that highlight the exact vulnerability

Such hints can inflate accuracy, especially for AI-based tools trained on large corpora of known patterns.

Codethreat’s AI agents operate differently. They infer risk based on developer intent, data flow, and structural context. To evaluate this, we removed every artificial clue, simulating how a real-world team might encounter and fix a bug in production.


The Benchmark Setup


  • 🧪 39 open-source projects

  • ⚠️ 35 critical security vulnerabilities seeded

  • ✅ Both patched and unpatched versions included

  • 📄 No CWE hints, no explicit comments, no suggestive filenames

  • 📊 Outputs parsed in SARIF, with recall and false-positive metrics compared across tools


Tools evaluated:

  • Codethreat

  • ZeroPath

  • Semgrep

  • Snyk

  • Bearer


Full source and validation process: GitHub - Codethreat Benchmark Repo


📊 Benchmark Results


Technical Vulnerabilities

Tool

Detection Rate

False Positive Rate

CodeThreat

88.57%

0%

ZeroPath

77.14%

5%

Semgrep

54.29%

5%

Snyk

42.86%

25%

Bearer

5.71%

0%

Based on 35 technical vulnerabilities across 39 benchmarks (XSS, SQLi, SSTI, Command Injection, and more)


Business Logic & Authentication Vulnerabilities

Tool

Detection Rate

False Positive Rate

CodeThreat

100%

0%

ZeroPath

87.5%

0%

Semgrep

12.5%

0%

Snyk

0%

0%

Bearer

0%

0%

Based on 8 business logic benchmarks, including broken authentication, missing authorization, and complex data validation issues.


Key Insights


  1. Context still matters. Many tools struggle with vulnerabilities spread across files, behind framework abstractions, or involving deep control flow. Codethreat’s architecture-aware agents excelled in these scenarios.

  2. False positives remain a major barrier. Especially in CI/CD pipelines, noisy alerts erode developer trust. Only Codethreat maintained precision without compromise.

  3. Business logic and repo-level flaws are still hard to catch. While no tool achieves full coverage, Codethreat’s approach combining SAST, structural mapping, and PR-level AI reviews shows promising direction.


Codethreat blends traditional and intelligent analysis layers:

• 🔍 Rule-based SAST with rich language support

• 🧠 AI-powered contextual analysis aligned with developer workflows

• 🔄 Control/data flow resolution across repository structures

• ❌ A false positive elimination engine filtering before surfacing


Try It Yourself

The benchmark is open-source and reproducible. We invite the community to contribute, validate, and explore the results.

🔗 View the benchmark project on GitHub →

🚀 Book a product walkthrough or request early access →

Blogs

Read Our Blogs and News

Discover expert insights, trends, and tips that help you navigate the world of finance and technology.

Blogs

Read Our Blogs and News

Discover expert insights, trends, and tips that help you navigate the world of finance and technology.

Blogs

Read Our Blogs and News

Discover expert insights, trends, and tips that help you navigate the world of finance and technology.

FAQ

Frequently Asked Questions

Need help? Our FAQ section covers all the basics to guide your Spendex experience smoothly.

FAQ

Frequently Asked Questions

Need help? Our FAQ section covers all the basics to guide your Spendex experience smoothly.

FAQ

Frequently Asked Questions

Need help? Our FAQ section covers all the basics to guide your Spendex experience smoothly.

How does the AI in Spendex work?

Is Spendex free to use?

Is my financial data safe with Spendex?

What accounts can I connect to Spendex?

Can I manage multiple wallets or users?

What insights does Spendex provide?

Can I collaborate with others on my account?

What happens if I cancel my subscription?

Is there a mobile app for Spendex?

How do I contact support?

How does the AI in Spendex work?

Is Spendex free to use?

Is my financial data safe with Spendex?

What accounts can I connect to Spendex?

Can I manage multiple wallets or users?

What insights does Spendex provide?

Can I collaborate with others on my account?

What happens if I cancel my subscription?

Is there a mobile app for Spendex?

How do I contact support?

How does the AI in Spendex work?

Is Spendex free to use?

Is my financial data safe with Spendex?

What accounts can I connect to Spendex?

Can I manage multiple wallets or users?

What insights does Spendex provide?

Can I collaborate with others on my account?

What happens if I cancel my subscription?

Is there a mobile app for Spendex?

How do I contact support?

BG Image

get started

Ready to ship secure software?

Try CodeThreat with AI-powered reviews and less noise in every commit.

BG Image

get started

Ready to ship secure software?

Try CodeThreat with AI-powered reviews and less noise in every commit.

BG Image

get started

Ready to ship secure software?

Try CodeThreat with AI-powered reviews and less noise in every commit.