Aimable Lab · Skills · Technical deep dive

An isolated, auditable runtime for tenant-specific AI workflows

A Skill is a named, versioned bundle of prompt + tools + policy that the platform runs in a hardened sandbox. Five kernel-level isolation layers, four trust tiers, SHA-256 bundle integrity, an append-only audit ledger. This page walks through how it is built.

subprocess-v1 adapter5 kernel hardening layers4 trust tiersSHA-256 bundle integrity60 s · 512 MB · 60 CPU-sfail-closed by defaultLangfuse + event ledger

Book a demo Back to the Lab

The user view · live workbench preview

From form to artifact, seen from the user's seat

A faithful mirror of the workbench Skill runner. Same form your team fills in, same state badge, same tool-call timeline, same cost line at the end. The block below shows the same execution from the sandbox side.

/Generate deal shortlist

Sandboxed

Generate deal shortlist

Read deal materials, score against investment criteria, return an Excel summary.

xlsx-generator·v1.2.0

Deal materials (optional)

Q2-pipeline.pdf1.2 MBdue-diligence.pdf880 KBvaluations.xlsx215 KB

Files land in inputs/materials/ — the skill reads them via workspace.fs.list.

scope

Q2 EU deals, exclude crypto

Optional natural-language filter applied during scoring.

Run skill

queuedexecution_91c8d4 · live

content delta

Waiting for output…

Live sandbox preview

Watch a Skill execute through the sandbox

Every invocation walks the same preexec ladder: bundle verify → workspace mount → namespaces + cgroup + UID drop → seccomp load → exec → tools → artifact commit → archive. Three scenarios, each with a different sandbox profile.

deal-room / Q2-shortlist/xlsx-generator·Skill

Sandboxed

xlsx-generator

Skill · v1.2.0

Official

Caller

Aimee · agent (chat)

Space

deal-room / Q2-shortlist

Sandbox profile

code_exec=truenetwork=deny

Tools (intersected)

workspace.fs.readworkspace.fs.writeworkspace.code.pythonartifact.commit

Resource caps

60 s wall512 MB mem60 CPU-s10 MiB out

Status

running · preexec ladder

Artifact

outputs/Q2-shortlist.xlsx · artifact 7c4e91 · 87 KB

Live audit · subprocess-v1 preexec

subprocess-v1

Awaiting invocation…

First — what is a Skill, exactly?

A workflow your team owns, packaged

Take a workflow your team keeps repeating — review a contract, score a deal, prep a meeting brief — and turn it into a named bundle. The bundle declares what it does, which tools it may touch, what guardrails apply and what shape its output takes. People invoke it by name from the workbench. Agents invoke the same one from inside a chat loop. Same call, same audit trail.

A contract, not a prompt

Skills are versioned and signed. The same v1.4.0 today is the same v1.4.0 next quarter — no copy-paste drift, no off-script edits, no surprises for the people calling it.

Run by the platform, not the caller

Authorization, isolation, cost attribution, audit trail — all enforced by the runtime. Identical for a human in the workbench and an agent inside a chat loop.

Comes with its own paper trail

Every invocation produces a traced, costed, archived run. An admin can inspect any execution end to end — today, last week, last quarter.

And at runtime — what actually happens

An LLM in a sandboxed loop, end to end

01 · Setup

Inputs staged

Files validated against the manifest. Workspace mounted, isolated, scoped to your tenant. Sandbox profile applied before any code runs.

02 · Loop

LLM picks a tool, runtime runs it, LLM reads the result

Each step is checked against the allow-list, executed inside the sandbox, and logged. The loop ends when the model produces output that fits the declared schema — typically 3–8 turns.

03 · Wrap-up

Output + artifacts

Structured output goes back to the caller. Files become downloadable artifacts. Cost and trace rows are written. Workspace is archived for retention.

↓The rest of this page is how the platform makes that work — layer by layer.

Defense in depth

Three layers guard every invocation

Process isolation, trust classification and bundle integrity stack independently. Any one of them can fail-closed without taking the others with it.

Layer 01

Process isolation

Every run gets a fresh, walled-off process. Before any code starts, the platform decides what the Skill can see on disk, whether it has a network, how much memory and CPU it gets, and which low-level operations it may call.

subprocess-v1

Layer 02

Trust classification

Every Skill version carries one of four tiers — official, verified, community or tenant. Admins see the tier in the catalog, set per-space policy on which tiers may run, and approve before tenant-built Skills go live.

4 tiers

Layer 03

Tamper detection

Every version has a fingerprint computed at publish time. Just before each run, the platform recomputes it and compares. Mismatch — even by one byte — and the run is refused before any code starts.

SHA-256

Inside Layer 1 · in plain English

Five things the operating system itself enforces

Each one answers a different question: what can the Skill see, where can it reach, how much can it use, what can it call, how loud can it be? Applied in a fixed order — if any single one fails to set up, the Skill never starts at all.

The filesystem, walled off

The Skill only sees its own workspace folder. The host disk, other tenants' files, anything outside its tree — invisible. If it mounts something new, that mount cannot leak back to the host.

kernel ·CLONE_NEWNS · MS_SLAVE

Network access, on or off — at the kernel

Skills declare in their manifest whether they need the network. When the answer is no, the running Skill literally has no network interface to use — not 'we forgot to wire one up', but 'the kernel has none to hand out'. There is no way around it from inside Python.

kernel ·CLONE_NEWNET

Memory and CPU, hard-capped

Each run gets a ceiling — 512 MB of memory, 60 seconds of CPU. If a Skill goes beyond, the operating system kills it and we record exactly that. No other tenant's run is affected, no host pressure leaks across.

kernel ·cgroup v2

An allow-list of low-level operations

On top of the Python sandbox, the kernel itself only lets the Skill call a fixed list of low-level operations. Raw sockets, kernel-module tricks, exotic IPC — refused by the kernel, not by our code. Skills get what they need to do their job and nothing more.

kernel ·seccomp BPF

Logs that can't drown the system

Each run captures the first 10 MiB of its own logs, marks the rest truncated and moves on. This is operational, not security: it stops a chatty Skill from blocking the runtime or filling up storage.

kernel ·10 MiB cap

Layers are fail-closed: a Skill is either fully sandboxed or it does not run at all. There is no partial state where a Skill is, say, memory-capped but free on the network.

Resource caps

Hard ceilings on every execution

Identical defaults regardless of caller, tier or space. Tenants can lower these in policy but not raise them past the platform cap.

60 s

wall-clock timeout · tree-killed via session

512 MB

memory.max · OOM kill counter recorded

60 CPU-s

cpu budget · 100 ms period

10 MiB

stdout / stderr cap · truncated past limit

Adapter roadmap

One interface, three execution backends

The CodeExecAdapter protocol lets the runtime swap execution backends without changing the Skill or the calling code. Tier 1 ships today; tiers 2 and 3 sit behind the same interface and follow on the same trust model.

Tier 01Live

subprocess-v1

OS-level hardening on the runtime host. The five preexec layers above. Default for every execution today; production-ready, fail-closed, instrumented.

Tier 02Architecture ready

gVisor / Firecracker

User-space kernel or microVM isolation for stronger workloads. Same adapter contract; registers under a different name. Targeted at higher-risk tenant skills.

Tier 03Architecture ready

Remote execution

Off-host execution in e2b or Modal for elastic capacity and stricter physical separation. Same protocol, asynchronous worker pool — no API change for callers.

Trust tier classification

Four tiers, one runtime

Tiers are recorded on every Skill version and surfaced in the workbench (badges) and console (filters and policy panels). Same isolation runs for every tier; the tier governs who is allowed to author and where it can run.

Official

Aimable-shipped platform skills. Reviewed and signed off internally. Visible to every tenant unless an admin hides them.

Verified

Third-party authored, vetted by Aimable. Curated and reviewed before publication; safe defaults for cross-tenant use.

Community

Public-marketplace skills. Run in the same sandbox as everything else, but admins decide per space whether community tiers are allowed at all.

Tenant

Private skills authored inside one tenant. Never visible outside that tenant. Forward-deployed engineers ship most of these today.

RoadmapPer-space minimum-tier policy and approval flows are tracked under AIM-683. The tier itself is recorded and surfaced today; runtime enforcement for tier-based admission is the next milestone.

Tamper-evident

The version you approved is the version that runs

When a Skill version is published, the platform computes a fingerprint over the whole bundle — prompt, playbook, resources, scripts — and stores it next to the version. Just before every run, the fingerprint is recomputed and compared. One byte different anywhere — a silent edit, a swap in storage, anything — and the run is refused before any code starts. The v1.4.0 your reviewer signed off on is bit-for-bit the same v1.4.0 that runs in production tomorrow.

Published once, frozen

xlsx-generator-v1.2.0.tar.gz

├── SKILL.md

├── resources/

└── scripts/

A new version lands in the catalog as one frozen bundle. The platform computes its fingerprint at the same moment. The bundle itself can never be edited in place — a change means a new version row, with a new fingerprint.

Fingerprint stored alongside

a8f3b2c4d1e5…7f9b1c0a3e8d24

The fingerprint travels with the version like a serial number. Admins see it in the console; auditors compare against it later. Same version, same fingerprint, anywhere the Skill goes.

Re-checked on every run

match → ok — Skill runs

mismatch → refused — no run

Right before execution, the platform recomputes the fingerprint and compares. Match: the Skill runs. Mismatch: refused, audited, no code executes — no override path.

kernel ·Fingerprint algorithm: SHA-256 over the full bundle tarball.

Workspace lifecycle

Ephemeral per-execution filesystem

Workspaces are tenant-scoped, keyed on (tenant_id, execution_id) and torn down on completion. The on-disk tree never outlives the execution; only the archived tar.gz does, and only for a bounded retention.

State machine

01creatingfetch bundle · verify SHA-256 · prepare paths

02readybundle extracted · adapter not yet acquired

03runningadapter applied · child executing

04archivingtar.gz written to archive storage

05archivedtree removed from disk · row pinned

06failedany error before archived · audit trail kept

07cancelleduser or scheduler stop · same archive path

Archive default retention is 7 days. A cleanup job removes the tar.gz and stamps deleted_at; the row remains for audit.

On-disk layout

/workspace/{tenant_id}/{execution_id}/
├── inputs/                  # bundle, ro
│   └── bundle/              #   skill payload
├── scratch/                 # rw, ephemeral
├── outputs/                 # rw, promoted to artifacts
└── metadata.json            # execution metadata

Each execution gets a UID drawn from a pre-created pool (aimable-skill-0…63) so cross-execution UID collisions cannot occur within a pod.
Every workspace path is prefixed with tenant_id; cross-tenant requests return cross_tenant_access_error.
Artifact commits are append-only — re-committing the same path produces a new row, never an in-place overwrite.

Tool authorization

Three paths converge at one intersection gate

A Skill cannot invoke a tool just because it asked nicely. Three independent declarations are intersected at runtime — the result is the only set the LLM ever sees.

Path 01manifest.allowed_tools

Skill manifest declaration

The Skill author lists tools the Skill needs in its manifest. This is intent, not authorization — by itself it grants nothing.

Path 02space.enabled_tools

Space-level policy

A space admin enables tools per space. Gated by RBAC (spaces.write). This is the operator's view.

Path 03compose_execution_tools()

Runtime intersection gate

compose_execution_tools() takes the intersection, strips network-requiring tools when sandbox.network=deny, and adds platform meta-tools.

compose_execution_tools()runtime

selected = (manifest.allowed_tools  ∩  space.enabled_tools)
              − tools_requiring_network  if  sandbox.network = "deny"
              + meta_tools(skill.describe, skill.invoke, artifact.commit)

Unknown tool names from the manifest are silently dropped at composition. If the LLM ever invokes one anyway it sees an unknown_tool_reference result — the runtime tells the model, not the user.

Skill bundle anatomy

SKILL.md is the contract

The same manifest the workbench renders, the runtime parses, and the console diffs across versions. Below: a real review-contract Skill, with its manifest on the left and a live invocation on the right.

review-contract

Tenant Skill

v1.4.0

Owner

Legal team

Intent

Read a SaaS contract, flag clauses against the house playbook, return a review memo with risks and recommended redlines.

Inputs (typed)

file
contract PDF or DOCX, ≤ 25 MB
collection
playbook House playbook, scoped to space

allowed_tools

clause.extractplaybook.searchpolicy.checkartifact.commit

Sandbox + guardrails

sandbox.network = deny (no external retrieval)
sandbox.code_exec = false (no Python tool)
PII redacted in outputs (presidio)
EU data residency enforced upstream

Output schema

{ memo: markdown, clauses[], risks[], score: number }

Invocation

Running

Caller

Aimee · agent

waiting for invocation…

Python runtime today

One shared venv, deliberately so

Skills run against the same /app/.venv as the Aimable backend — about fifty curated packages including numpy, pandas, torch, transformers, spacy, docling, openpyxl, httpx, sqlalchemy, langfuse, cryptography and litellm.

There is no pip install at execution time. That is a deliberate trade-off: it eliminates supply-chain hijack during a run, at the cost of skills being unable to pin their own versions.

While the platform matures, forward-deployed engineers author Skills with us. Per-skill venvs with SHA-pinned wheels are the next step (AIM-685) — until then, the kernel layers are what limit the blast radius if a pre-installed library misbehaves.

/app/.venv

shared venv

Selected pre-installed packages

numpypandasscipytorchtransformersspacydoclingopenpyxlpypdfpython-docxpython-pptxhttpxsqlalchemylangfusecryptographylitellmguardrails-aipresidio-analyzerpresidio-anonymizertrafilatura+ 30 more

Honest caveat

Anything in the venv is reachable from a Skill's process. UID drop, seccomp and netns limit what that reachability can do — but a vulnerable library still increases blast radius. The wheels-pattern in the roadmap closes that gap.

Threat model

What a Skill can and cannot do

Stated explicitly. A useful threat model is the one you can reason about — vague claims of 'enterprise-grade isolation' are not what regulated customers buy.

A skill can

Read files staged into inputs/ (bundle resources and caller-provided materials)
Write to scratch/ and outputs/ (read-only inputs/ enforced by mount)
Execute Python via workspace.code.python when sandbox.code_exec=true
Call libraries already present in /app/.venv (numpy, pandas, openpyxl, litellm, …)
Reach external URLs only when sandbox.network=allow and a network-requiring tool was approved

A skill cannot

Touch the host filesystem outside its workspace tree (mount-ns + path validator)
See another tenant or another space — every path and service call is tenant-scoped
Spawn a privileged process (setuid drops to a low-privilege UID before execve)
Issue syscalls outside the seccomp allowlist (kernel returns EPERM)
Exhaust memory or CPU (cgroup v2 hard caps; OOM-kill recorded)
Run past the wall-clock budget (tree-killed via session leader on timeout)
Exfiltrate over the network when sandbox.network=deny (no interfaces in the netns)

Multi-tenant isolation

Tenant boundaries are filesystem-deep

Skills, workspaces, artifacts and audit rows are all keyed on tenant_id. The workspace tree, the UID assignment and every service call enforce it independently.

tenant_a

space:legal

space:finance

aimable-skill-7

/workspace/tenant_a/

Tenant boundary

cross_tenant_access_error

tenant_b

space:research

space:ops

aimable-skill-23

/workspace/tenant_b/

Every workspace path is prefixed with tenant_id; the path validator rejects any traversal back up.

Each execution gets a UID from a low-privilege pool (aimable-skill-0…63) — no shared UID across executions.

Service methods take an explicit tenant_id arg; mismatches return cross_tenant_access_error at the API.

Auditability

Every execution is observable and attributable

Four parallel signals land for every Skill run. Together they answer 'what happened, who paid for it, and what did it touch?'.

ledger

Append-only event ledger

skill_execution_event captures execution_started, state_changed, content_delta, tool_call_start/end, artifact_committed, execution_completed. Resumable; pruned past retention.

tracing

Langfuse trace per execution

Root span carries tenant_id, space_id, principal_id, skill_slug, skill_version, execution_id, parent_execution_id, source. Tool calls and sub-skills nest as child spans.

metrics

Prometheus metrics

skills_execution_active_count, skills_execution_duration_ms, skills_execution_result_total and skills_tool_invocation_total — alerts wire into existing infra.

billing

Cost attribution rows

Every llm.complete call writes a USAGE_EVENT with model, input/output tokens and cost_usd. Rolled up per skill, space, principal in the console audit page.

Open improvements

Tickets we are honest about

Four MC5-tagged items still on the work list. We surface them so you know what is and is not in production today.

AIM-682MC5

pivot_root for full filesystem confinement

Mount-ns + MS_SLAVE prevents host mount propagation today, but /proc, /sys and /dev are still visible. pivot_root into the workspace tree, plus tmpfs for /proc and a minimal /dev, closes the gap.

AIM-683MC5

Trust-tier enforcement at runtime

Tiers are recorded and surfaced today. Per-space admin policy, approval flows and a per-space tier filter still need to land. The console UI exists; the backend endpoint does not yet.

AIM-684MC5

Spec / code reconciliation on FR-005

Lock down the intersection-gate decision: today it is implemented in tool_composition.py and matches the spec; this ticket formalises the test coverage so spec and code stay in sync.

AIM-685MC5

Wheels pattern for per-skill dependencies

Replace the single shared venv with per-skill dependency declarations + SHA-pinned wheels packaged at bundle time. Closes the supply-chain reach a Skill currently has into Aimable's internal dependencies.

Lab project

Early access for design partners

We co-author the first set of Skills with technical leads who own a regulated workflow. If you have one in mind — and you want the kernel-level isolation under it — let's talk.

Run AI in a sandbox you can reason about.

Book a demo or tell us which workflow your team keeps repeating. We'll package the first Skill with you and walk through the full audit trail it produces.

Book a demo Contact us