How Securie runs: the launch inference stack
A look at the three-layer model stack we ship with — Foundation-Sec local, GLM-5.1 and DeepSeek for primary reasoning, and a bounded frontier escalation layer. Why we chose it and what it costs.
Most "AI security" companies shipping today are thin wrappers over a single frontier API. That pattern works for a demo and falls apart the second a regulated customer asks "can you prove my source never left OpenAI's network?"
We picked a different path. Here is what Securie actually runs on, at launch, as a solo-founder product on a bootstrap budget.
Three layers, one orchestrator
Local layer — Foundation-Sec-8B-Reasoning (Cisco). A Llama 3.1 8B base, cyber-pretrained by Cisco, outperforms Llama-70B on cyber benchmarks. Runs on one RTX 4090 on our hardware. Zero marginal cost per scan. Used for secret scanning and cyber-specific pattern classification.
Primary layer — GLM-5.1 and DeepSeek V3.2, both MIT-licensed open weights. GLM-5.1 scores 77.8 percent on SWE-bench Pro — the single highest public score among open weights. DeepSeek V3.2 handles fallback code reasoning. Both accessed through DeepInfra / OpenRouter with zero-data-retention contracts. Blended cost at our launch scale: roughly $20 per month for 3,000 scans.
Frontier escalation — GPT-5.4 Nano and Kimi K2.6, under 5 percent of traffic. For the truly ambiguous cases — when the primary layer disagrees with itself, or when we need a stronger reasoner to judge a sandbox trace — we call a frontier model. Bounded to under 5 percent of tokens so costs stay predictable and the privacy surface stays tight.
Why this is durable
- OSS-first is enterprise-credible. A regulated customer can audit model choice, license, residency, and retention line-by-line — see our AI Bill of Materials.
- Cost scales with OSS, not frontier. When we onboard ten thousand apps next quarter, our costs rise roughly linearly with the free API providers, not exponentially with a frontier wrapper.
- Migration path is already compatible. Every finding carries the exact model + provider that produced it. When we move routine reasoning from GLM-5.1-on-DeepInfra to GLM-5.1-fine-tuned-on-our-hardware next year, the evidence chain does not break.
Total monthly compute budget at launch
About $250–550 per month, all-in. Inference at ~$20/mo (blended across 3K scans), sandbox spot-instances at $200–500/mo, local Foundation-Sec inference on existing hardware. No upfront fine-tuning cost.
Every engineer who has tried to ship a security product on a single frontier API has watched their OpenAI bill scale past their Series A runway. We will never do that.
More
If you want to dig deeper, our AI Bill of Materials lists every model we call, the license it ships under, the region it runs in, and the retention contract in place.