Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
scthornton 
posted an update 3 days ago
Post
3829
SecureCode v2.1: framework-specific secure coding patterns, now on HuggingFace

Quick update on the SecureCode dataset. After testing the v2.0 models against real codebases, one gap kept showing up: the models understood *what* was insecure but generated language-generic fixes. A developer using Express.js doesn't need "set security headers"they need helmet() middleware chains configured correctly. Spring Boot developers need @PreAuthorize annotations, not abstract RBAC pseudocode.

What changed in v2.1:

- 1,435 total examples (v2.0's 1,216 baseline + 219 new framework-specific additions)
- 9 production frameworks: Express.js, Spring Boot, React, Next.js, FastAPI, GraphQL, SQLAlchemy, Flask, Vue.js
- 475 unique CVEs (73 new, including framework-specific treatments of Log4Shell, Spring4Shell, and others)
- 5-tier quality rubric: Every new example scores 90+/100 across correctness, new dataset average is nearly 97+, security hardening, real-world grounding, educational scaffolding, and production readiness
- Structured references: CVE IDs, advisory URLs, discovery/remediation dates, affected versions — not just "related to CVE-XXXX"

What stayed the same:

- Same 4-turn conversation format (compatible with existing fine-tuning workflows)
- Same license (CC BY-NC-SA 4.0)
- Full v2.0 baseline included — no need to download both
- All 8 fine-tuned models still work; v2.1-specific fine-tuning coming soon

The new examples look like this:

Instead of generic "use parameterized queries", you get Express.js with express-validator input chains, Spring Boot with @Valid bean validation + BCryptPasswordEncoder, FastAPI with Depends() auth injection and Pydantic model validation, React with DOMPurify + CSP headers. Framework-native patterns you can actually deploy.

Two configs to load:

from datasets import load_dataset

baseline = load_dataset("scthornton/securecode-v2.1", "v2.0-baseline")  # 1,216
additions = load
In this post