ginigen-ai's picture

ginigen-ai PRO

ginigen-ai

AI & ML interests

None yet

Recent Activity

updated a Space about 6 hours ago
ginigen-ai/Metacognition-Leaderboard-Space
repliedto SeaWolf-AI's post about 7 hours ago
🐯 Chitos — The Security Scanner That Actually Proves It Most security scanners hand you a suspect list and walk away. That gap between detection and proof is where attackers live — and it's exactly the gap that Chitos was built to close. Chitos is the successor to Mythos, a static analyzer built for quick code health checks. Mythos was good at pattern matching — spotting dangerous sinks, mapping CWEs, producing readable reports. But static analysis has a structural ceiling. A rule that sees eval(user_input) can tell you that looks dangerous. It cannot tell you whether the input is reachable, whether sanitization three layers up covers this path, or whether there's a live exploit chain for your exact framework version. Chitos was built to answer those questions. šŸ” Phase 1 applies 50 language-agnostic rules across Python, JavaScript, Go, Java, C/C++, Rust, PHP, YAML and more — covering injection sinks, deserialization gadgets, credential leakage, broken crypto, and prototype pollution. Every candidate is re-verified before reaching the report. Findings that can't be substantiated are excluded, not handed to you as noise. šŸ”¬ Phase 2 dispatches an autonomous web-search agent to hunt live CVE databases, exploit advisories, and public PoC repositories. It formulates hypotheses, verifies them, and synthesizes a structured threat narrative. This phase needs a user-supplied Claude API key — Phases 1 and 3 run entirely free. šŸŽÆ Phase 3 is where Chitos diverges from everything else. Against targets you own or are authorized to test, it fires real payloads — XSS, SQLi, path traversal, command injection — mutates on block, captures hard evidence, and connects every proven finding into a kill-chain showing which vulnerabilities to remediate first. No installation. No account. No code sent to third-party APIs. Article: https://huggingface.co/blog/FINAL-Bench/chitos Try it now šŸ‘‰ https://chitos.vidraft.net
repliedto their post about 21 hours ago
🧠 Does your LLM know when it's about to be wrong? Most leaderboards measure accuracy. We measure metacognition — whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. šŸŽ‰ The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 — ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal. Two independent axes (never compared across a row): ā‘  trap_rate — does it fall for tempting trap options? (lower = stronger) ā‘” adapter gain Ī” — how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value) What's open: šŸ“Š 300+100 trap problems (each with a hidden trap + TICOS type) šŸ† 24-model leaderboard 🧩 11 per-model adapters — adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state → P(wrong)) Submit any HF model → auto-scored daily at 09:00 KST and added to the board. šŸ† Leaderboard → https://huggingface.co/spaces/ginigen-ai/Metacognition-Leaderboard-Space šŸ“Š Benchmark → https://huggingface.co/datasets/ginigen-ai/Metacognition-Bench 🧩 Adapters → https://huggingface.co/collections/FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961 šŸ“Š Article → https://huggingface.co/blog/ginigen-ai/metacognition Benchmark by ginigen-ai Ā· Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech).
View all activity

Organizations

None yet