AI model weights are the core artifact that determines an agent's behavior. If an attacker can modify model weights at any point in the supply chain (during training, storage, distribution, or deployment), they can alter the agent's behavior in ways that bypass all other safety controls. Supply chain security for model weights protects this critical artifact from tampering.
A tampered model might behave normally on standard benchmarks but contain a backdoor that activates on specific trigger inputs. It might subtly degrade safety classifier accuracy, allowing more harmful content through. Or it might exfiltrate data through steganographic channels in its outputs.
These attacks are particularly dangerous because they are invisible to operators who test the model on standard evaluations. The model passes all checks until the trigger condition is met.
Track the origin and history of every model artifact:
Store provenance metadata alongside the model weights. Use a tamper-evident format so that provenance cannot be altered without detection.
Compute cryptographic hashes of model weights at every stage of the pipeline. Verify hashes when weights are loaded for inference. Any mismatch indicates tampering or corruption.
model_manifest:
name: "safety-classifier-v3"
hash: "sha256:a1b2c3d4..."
signed_by: "training-pipeline@company.com"
signature: "..."
training_date: "2025-12-15"
training_data_hash: "sha256:e5f6g7h8..."
Distribute model weights through authenticated channels. Use signed URLs, checksum verification, and TLS for transit security. When pulling models from public registries, verify the publisher's identity and the model's signature before deployment.
Restrict who can write to model storage. Use separate credentials for training pipelines (write access) and inference runtimes (read-only access). Log all access to model storage.
At inference startup, verify model weight hashes against the expected values from the manifest. If verification fails, refuse to start the agent. Authensor's health check patterns can include model integrity verification as a startup check.
Model weights are code. Treat their supply chain with the same rigor you apply to software supply chain security.
Explore more guides on AI agent safety, prompt injection, and building secure systems.
View All Guides