An isolated, auditable runtime for tenant-specific AI workflows
A Skill is a named, versioned bundle of prompt + tools + policy that the platform runs in a hardened sandbox. Five kernel-level isolation layers, four trust tiers, SHA-256 bundle integrity, an append-only audit ledger. This page walks through how it is built.
From form to artifact, seen from the user's seat
A faithful mirror of the workbench Skill runner. Same form your team fills in, same state badge, same tool-call timeline, same cost line at the end. The block below shows the same execution from the sandbox side.
Generate deal shortlist
Read deal materials, score against investment criteria, return an Excel summary.
Files land in inputs/materials/ — the skill reads them via workspace.fs.list.
Optional natural-language filter applied during scoring.
Watch a Skill execute through the sandbox
Every invocation walks the same preexec ladder: bundle verify → workspace mount → namespaces + cgroup + UID drop → seccomp load → exec → tools → artifact commit → archive. Three scenarios, each with a different sandbox profile.
A workflow your team owns, packaged
Take a workflow your team keeps repeating — review a contract, score a deal, prep a meeting brief — and turn it into a named bundle. The bundle declares what it does, which tools it may touch, what guardrails apply and what shape its output takes. People invoke it by name from the workbench. Agents invoke the same one from inside a chat loop. Same call, same audit trail.
A contract, not a prompt
Skills are versioned and signed. The same v1.4.0 today is the same v1.4.0 next quarter — no copy-paste drift, no off-script edits, no surprises for the people calling it.
Run by the platform, not the caller
Authorization, isolation, cost attribution, audit trail — all enforced by the runtime. Identical for a human in the workbench and an agent inside a chat loop.
Comes with its own paper trail
Every invocation produces a traced, costed, archived run. An admin can inspect any execution end to end — today, last week, last quarter.
An LLM in a sandboxed loop, end to end
Inputs staged
Files validated against the manifest. Workspace mounted, isolated, scoped to your tenant. Sandbox profile applied before any code runs.
LLM picks a tool, runtime runs it, LLM reads the result
Each step is checked against the allow-list, executed inside the sandbox, and logged. The loop ends when the model produces output that fits the declared schema — typically 3–8 turns.
Output + artifacts
Structured output goes back to the caller. Files become downloadable artifacts. Cost and trace rows are written. Workspace is archived for retention.
↓The rest of this page is how the platform makes that work — layer by layer.
Three layers guard every invocation
Process isolation, trust classification and bundle integrity stack independently. Any one of them can fail-closed without taking the others with it.
Process isolation
Every run gets a fresh, walled-off process. Before any code starts, the platform decides what the Skill can see on disk, whether it has a network, how much memory and CPU it gets, and which low-level operations it may call.
Trust classification
Every Skill version carries one of four tiers — official, verified, community or tenant. Admins see the tier in the catalog, set per-space policy on which tiers may run, and approve before tenant-built Skills go live.
Tamper detection
Every version has a fingerprint computed at publish time. Just before each run, the platform recomputes it and compares. Mismatch — even by one byte — and the run is refused before any code starts.
Five things the operating system itself enforces
Each one answers a different question: what can the Skill see, where can it reach, how much can it use, what can it call, how loud can it be? Applied in a fixed order — if any single one fails to set up, the Skill never starts at all.
The filesystem, walled off
The Skill only sees its own workspace folder. The host disk, other tenants' files, anything outside its tree — invisible. If it mounts something new, that mount cannot leak back to the host.
kernel ·CLONE_NEWNS · MS_SLAVENetwork access, on or off — at the kernel
Skills declare in their manifest whether they need the network. When the answer is no, the running Skill literally has no network interface to use — not 'we forgot to wire one up', but 'the kernel has none to hand out'. There is no way around it from inside Python.
kernel ·CLONE_NEWNETMemory and CPU, hard-capped
Each run gets a ceiling — 512 MB of memory, 60 seconds of CPU. If a Skill goes beyond, the operating system kills it and we record exactly that. No other tenant's run is affected, no host pressure leaks across.
kernel ·cgroup v2An allow-list of low-level operations
On top of the Python sandbox, the kernel itself only lets the Skill call a fixed list of low-level operations. Raw sockets, kernel-module tricks, exotic IPC — refused by the kernel, not by our code. Skills get what they need to do their job and nothing more.
kernel ·seccomp BPFLogs that can't drown the system
Each run captures the first 10 MiB of its own logs, marks the rest truncated and moves on. This is operational, not security: it stops a chatty Skill from blocking the runtime or filling up storage.
kernel ·10 MiB capHard ceilings on every execution
Identical defaults regardless of caller, tier or space. Tenants can lower these in policy but not raise them past the platform cap.
One interface, three execution backends
The CodeExecAdapter protocol lets the runtime swap execution backends without changing the Skill or the calling code. Tier 1 ships today; tiers 2 and 3 sit behind the same interface and follow on the same trust model.
subprocess-v1
OS-level hardening on the runtime host. The five preexec layers above. Default for every execution today; production-ready, fail-closed, instrumented.
gVisor / Firecracker
User-space kernel or microVM isolation for stronger workloads. Same adapter contract; registers under a different name. Targeted at higher-risk tenant skills.
Remote execution
Off-host execution in e2b or Modal for elastic capacity and stricter physical separation. Same protocol, asynchronous worker pool — no API change for callers.
Four tiers, one runtime
Tiers are recorded on every Skill version and surfaced in the workbench (badges) and console (filters and policy panels). Same isolation runs for every tier; the tier governs who is allowed to author and where it can run.
Official
Aimable-shipped platform skills. Reviewed and signed off internally. Visible to every tenant unless an admin hides them.
Verified
Third-party authored, vetted by Aimable. Curated and reviewed before publication; safe defaults for cross-tenant use.
Community
Public-marketplace skills. Run in the same sandbox as everything else, but admins decide per space whether community tiers are allowed at all.
Tenant
Private skills authored inside one tenant. Never visible outside that tenant. Forward-deployed engineers ship most of these today.
The version you approved is the version that runs
When a Skill version is published, the platform computes a fingerprint over the whole bundle — prompt, playbook, resources, scripts — and stores it next to the version. Just before every run, the fingerprint is recomputed and compared. One byte different anywhere — a silent edit, a swap in storage, anything — and the run is refused before any code starts. The v1.4.0 your reviewer signed off on is bit-for-bit the same v1.4.0 that runs in production tomorrow.
A new version lands in the catalog as one frozen bundle. The platform computes its fingerprint at the same moment. The bundle itself can never be edited in place — a change means a new version row, with a new fingerprint.
The fingerprint travels with the version like a serial number. Admins see it in the console; auditors compare against it later. Same version, same fingerprint, anywhere the Skill goes.
Right before execution, the platform recomputes the fingerprint and compares. Match: the Skill runs. Mismatch: refused, audited, no code executes — no override path.
Ephemeral per-execution filesystem
Workspaces are tenant-scoped, keyed on (tenant_id, execution_id) and torn down on completion. The on-disk tree never outlives the execution; only the archived tar.gz does, and only for a bounded retention.
Archive default retention is 7 days. A cleanup job removes the tar.gz and stamps deleted_at; the row remains for audit.
/workspace/{tenant_id}/{execution_id}/
├── inputs/ # bundle, ro
│ └── bundle/ # skill payload
├── scratch/ # rw, ephemeral
├── outputs/ # rw, promoted to artifacts
└── metadata.json # execution metadata- Each execution gets a UID drawn from a pre-created pool (
aimable-skill-0…63) so cross-execution UID collisions cannot occur within a pod. - Every workspace path is prefixed with
tenant_id; cross-tenant requests returncross_tenant_access_error. - Artifact commits are append-only — re-committing the same path produces a new row, never an in-place overwrite.
Three paths converge at one intersection gate
A Skill cannot invoke a tool just because it asked nicely. Three independent declarations are intersected at runtime — the result is the only set the LLM ever sees.
manifest.allowed_toolsSkill manifest declaration
The Skill author lists tools the Skill needs in its manifest. This is intent, not authorization — by itself it grants nothing.
space.enabled_toolsSpace-level policy
A space admin enables tools per space. Gated by RBAC (spaces.write). This is the operator's view.
compose_execution_tools()Runtime intersection gate
compose_execution_tools() takes the intersection, strips network-requiring tools when sandbox.network=deny, and adds platform meta-tools.
selected = (manifest.allowed_tools ∩ space.enabled_tools)
− tools_requiring_network if sandbox.network = "deny"
+ meta_tools(skill.describe, skill.invoke, artifact.commit)Unknown tool names from the manifest are silently dropped at composition. If the LLM ever invokes one anyway it sees an unknown_tool_reference result — the runtime tells the model, not the user.
SKILL.md is the contract
The same manifest the workbench renders, the runtime parses, and the console diffs across versions. Below: a real review-contract Skill, with its manifest on the left and a live invocation on the right.
Read a SaaS contract, flag clauses against the house playbook, return a review memo with risks and recommended redlines.
- filecontract PDF or DOCX, ≤ 25 MB
- collectionplaybook House playbook, scoped to space
- sandbox.network = deny (no external retrieval)
- sandbox.code_exec = false (no Python tool)
- PII redacted in outputs (presidio)
- EU data residency enforced upstream
{ memo: markdown, clauses[], risks[], score: number }One shared venv, deliberately so
Skills run against the same /app/.venv as the Aimable backend — about fifty curated packages including numpy, pandas, torch, transformers, spacy, docling, openpyxl, httpx, sqlalchemy, langfuse, cryptography and litellm.
There is no pip install at execution time. That is a deliberate trade-off: it eliminates supply-chain hijack during a run, at the cost of skills being unable to pin their own versions.
While the platform matures, forward-deployed engineers author Skills with us. Per-skill venvs with SHA-pinned wheels are the next step (AIM-685) — until then, the kernel layers are what limit the blast radius if a pre-installed library misbehaves.
Selected pre-installed packages
Anything in the venv is reachable from a Skill's process. UID drop, seccomp and netns limit what that reachability can do — but a vulnerable library still increases blast radius. The wheels-pattern in the roadmap closes that gap.
What a Skill can and cannot do
Stated explicitly. A useful threat model is the one you can reason about — vague claims of 'enterprise-grade isolation' are not what regulated customers buy.
- Read files staged into inputs/ (bundle resources and caller-provided materials)
- Write to scratch/ and outputs/ (read-only inputs/ enforced by mount)
- Execute Python via workspace.code.python when sandbox.code_exec=true
- Call libraries already present in /app/.venv (numpy, pandas, openpyxl, litellm, …)
- Reach external URLs only when sandbox.network=allow and a network-requiring tool was approved
- Touch the host filesystem outside its workspace tree (mount-ns + path validator)
- See another tenant or another space — every path and service call is tenant-scoped
- Spawn a privileged process (setuid drops to a low-privilege UID before execve)
- Issue syscalls outside the seccomp allowlist (kernel returns EPERM)
- Exhaust memory or CPU (cgroup v2 hard caps; OOM-kill recorded)
- Run past the wall-clock budget (tree-killed via session leader on timeout)
- Exfiltrate over the network when sandbox.network=deny (no interfaces in the netns)
Tenant boundaries are filesystem-deep
Skills, workspaces, artifacts and audit rows are all keyed on tenant_id. The workspace tree, the UID assignment and every service call enforce it independently.
tenant_id; the path validator rejects any traversal back up.aimable-skill-0…63) — no shared UID across executions.tenant_id arg; mismatches return cross_tenant_access_error at the API.Every execution is observable and attributable
Four parallel signals land for every Skill run. Together they answer 'what happened, who paid for it, and what did it touch?'.
Append-only event ledger
skill_execution_event captures execution_started, state_changed, content_delta, tool_call_start/end, artifact_committed, execution_completed. Resumable; pruned past retention.
Langfuse trace per execution
Root span carries tenant_id, space_id, principal_id, skill_slug, skill_version, execution_id, parent_execution_id, source. Tool calls and sub-skills nest as child spans.
Prometheus metrics
skills_execution_active_count, skills_execution_duration_ms, skills_execution_result_total and skills_tool_invocation_total — alerts wire into existing infra.
Cost attribution rows
Every llm.complete call writes a USAGE_EVENT with model, input/output tokens and cost_usd. Rolled up per skill, space, principal in the console audit page.
Tickets we are honest about
Four MC5-tagged items still on the work list. We surface them so you know what is and is not in production today.
pivot_root for full filesystem confinement
Mount-ns + MS_SLAVE prevents host mount propagation today, but /proc, /sys and /dev are still visible. pivot_root into the workspace tree, plus tmpfs for /proc and a minimal /dev, closes the gap.
Trust-tier enforcement at runtime
Tiers are recorded and surfaced today. Per-space admin policy, approval flows and a per-space tier filter still need to land. The console UI exists; the backend endpoint does not yet.
Spec / code reconciliation on FR-005
Lock down the intersection-gate decision: today it is implemented in tool_composition.py and matches the spec; this ticket formalises the test coverage so spec and code stay in sync.
Wheels pattern for per-skill dependencies
Replace the single shared venv with per-skill dependency declarations + SHA-pinned wheels packaged at bundle time. Closes the supply-chain reach a Skill currently has into Aimable's internal dependencies.
Early access for design partners
We co-author the first set of Skills with technical leads who own a regulated workflow. If you have one in mind — and you want the kernel-level isolation under it — let's talk.
Run AI in a sandbox you can reason about.
Book a demo or tell us which workflow your team keeps repeating. We'll package the first Skill with you and walk through the full audit trail it produces.
