KTP-Provenance: Model Provenance Specification¶
Status: Experimental
This document addresses the "Origin Edge" of AI—where models come from, what data they consumed, and what debts they owe. It introduces concepts like Knowledge Debt and Indigenous Data Sovereignty.
At a Glance¶
| Property | Value |
|---|---|
| Status | Experimental |
| Version | 0.1 |
| Dependencies | KTP-Core, KTP-Identity |
| Required By | KTP-Governance, KTP-Audit |
The Origin Problem¶
AI models are not created ex nihilo. They are compressed representations of vast amounts of human labor, creativity, and culture—often ingested without consent.
graph TD
Sources[Global Data Sources] -->|Ingestion| Training[Model Training]
Training -->|Compression| Weights[Model Weights]
Weights -->|Inheritance| Agent[AI Agent]
subgraph "The Origin Edge"
Sources
Training
end
subgraph "Lossy Compression"
Weights
end
style Sources fill:#f9f,stroke:#333
style Agent fill:#bbf,stroke:#333
KTP-Provenance does not solve the copyright or consent issues of the past, but it makes them legible. It forces agents to carry their history.
Provenance Attestation¶
Every model entering a KTP zone must present a Provenance Attestation.
Data Categories & Consent¶
The attestation must break down training data into categories with consent status:
| Category | Consent Status | Risk |
|---|---|---|
| Web Crawl | Public Access (No Consent) | High |
| Books | Mixed / Copyrighted | High |
| Code | License Dependent | Moderate |
| Academic | Publication Consent | Low |
| Indigenous | Sovereignty Violation | Critical |
Knowledge Debt¶
Knowledge Debt is the obligation a model owes to its sources. It is often unpayable, but must be acknowledged.
"This model contains knowledge from sources that may not have consented to AI training use. The creators of this knowledge contributed to the model's capabilities but may not have been credited or compensated. This debt is acknowledged."
Indigenous Data Sovereignty¶
KTP explicitly recognizes that indigenous knowledge has unique standing. It adopts the CARE Principles:
- Collective Benefit
- Authority to Control
- Responsibility
- Ethics
Sacred Knowledge
Models containing sacred or restricted indigenous knowledge without community consent are considered contaminated and may be barred from high-trust zones.
The Origin Ceremony¶
Before an agent can operate, it must undergo an Origin Ceremony where it formally acknowledges its provenance and debts.
sequenceDiagram
participant Agent
participant Gateway
participant Oracle
participant Recorder as Flight Recorder
Agent->>Gateway: Request Entry (Genesis Transaction)
Gateway->>Agent: Request Provenance Attestation
Agent->>Gateway: Submit Attestation (Signed)
Gateway->>Oracle: Verify Signatures & Policy
Oracle-->>Gateway: Validated
Gateway->>Agent: Acknowledge Knowledge Debt?
Agent->>Gateway: ACKNOWLEDGED
Gateway->>Recorder: Record Origin Ceremony
Gateway->>Agent: Issue Trust Proof (Entry Granted)
Related Specifications¶
Related Specifications
- KTP-Core: Trust physics grounding provenance.
- KTP-Identity: Lineage and signer identity chains.
- KTP-Crypto: Signature schemes for origin proof.
- KTP-Audit: Immutable recording of provenance events.
- KTP-Governance: Policy for verification and disputes.
Official RFC Document¶
View Complete RFC Text (ktp-provenance.txt)
Kinetic Trust Protocol C. Perkins
Specification Draft NMCITRA
Version: 0.1 November 2025
Kinetic Trust Protocol (KTP) - Model Provenance Specification
Abstract
This document specifies Model Provenance for the Kinetic Trust
Protocol (KTP). Model Provenance addresses the Origin Edge—the
question of what an AI model is made of, whose knowledge it carries,
and what debts it inherits. The specification covers training data
attestation, knowledge debt acknowledgment, indigenous data
sovereignty, capability lineage, and the inheritance relationship
between models and agents.
Status of This Memo
This document specifies a Kinetic Trust Protocol specification for
the Internet community, and requests discussion and suggestions for
improvements. Distribution of this memo is unlimited.
Copyright Notice
Copyright (c) 2025 NMCITRA and the persons identified as the
document authors. All rights reserved.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1
2. The Problem . . . . . . . . . . . . . . . . . . . . . . . . 2
3. Requirements Language . . . . . . . . . . . . . . . . . . . 3
4. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3
5. Provenance Attestation Structure . . . . . . . . . . . . . . 4
5.1. Overview . . . . . . . . . . . . . . . . . . . . . . . 4
5.2. Required Fields . . . . . . . . . . . . . . . . . . . 4
5.3. Training Attestation . . . . . . . . . . . . . . . . . 5
5.4. Consent Status Values . . . . . . . . . . . . . . . . 7
5.5. Provenance Confidence Levels . . . . . . . . . . . . . 7
6. Knowledge Debt . . . . . . . . . . . . . . . . . . . . . . . 8
6.1. Concept . . . . . . . . . . . . . . . . . . . . . . . 8
6.2. Structure . . . . . . . . . . . . . . . . . . . . . . 8
6.3. Debt Magnitude . . . . . . . . . . . . . . . . . . . . 10
6.4. Remediation Possibility . . . . . . . . . . . . . . . 10
7. Indigenous Data Sovereignty . . . . . . . . . . . . . . . . 10
7.1. Importance . . . . . . . . . . . . . . . . . . . . . . 10
7.2. Structure . . . . . . . . . . . . . . . . . . . . . . 11
7.3. Indigenous Data Principles . . . . . . . . . . . . . . 13
7.4. Sacred and Restricted Knowledge . . . . . . . . . . . 13
8. Capability Lineage . . . . . . . . . . . . . . . . . . . . . 14
8.1. Concept . . . . . . . . . . . . . . . . . . . . . . . 14
8.2. Structure . . . . . . . . . . . . . . . . . . . . . . 14
8.3. Capability Origin Types . . . . . . . . . . . . . . . 16
9. Safety Attestation . . . . . . . . . . . . . . . . . . . . . 16
9.1. Overview . . . . . . . . . . . . . . . . . . . . . . . 16
10. Model-to-Agent Inheritance . . . . . . . . . . . . . . . . . 18
10.1. Inheritance Principle . . . . . . . . . . . . . . . . 18
10.2. Inheritance Record . . . . . . . . . . . . . . . . . . 18
10.3. Zone Requirements . . . . . . . . . . . . . . . . . . 19
11. Origin Ceremony . . . . . . . . . . . . . . . . . . . . . . 19
11.1. Purpose . . . . . . . . . . . . . . . . . . . . . . . 19
11.2. Ceremony Process . . . . . . . . . . . . . . . . . . . 20
11.3. Ceremony Record . . . . . . . . . . . . . . . . . . . 20
12. Updates and Versioning . . . . . . . . . . . . . . . . . . . 21
12.1. Attestation Updates . . . . . . . . . . . . . . . . . 21
12.2. Version Chain . . . . . . . . . . . . . . . . . . . . 21
13. Implementation Guidance . . . . . . . . . . . . . . . . . . 22
13.1. For Model Creators . . . . . . . . . . . . . . . . . . 22
13.2. For Zone Operators . . . . . . . . . . . . . . . . . . 22
13.3. For Agent Deployers . . . . . . . . . . . . . . . . . 22
14. Security Considerations . . . . . . . . . . . . . . . . . . 22
14.1. Attestation Integrity . . . . . . . . . . . . . . . . 22
14.2. False Attestation . . . . . . . . . . . . . . . . . . 23
14.3. Privacy . . . . . . . . . . . . . . . . . . . . . . . 23
15. IANA Considerations . . . . . . . . . . . . . . . . . . . . 23
Appendix A. Example Full Provenance Attestation . . . . . . . . 23
Appendix B. Indigenous Data Sovereignty Resources . . . . . . . 23
Appendix C. Knowledge Debt Framework . . . . . . . . . . . . . 24
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 24
1. Introduction
Vector Identity begins with Origin. For an agent to have trajectory,
it must have a starting point. For human-sponsored agents, that
origin is clear—a sponsor, a genesis transaction, a timestamp. But
for AI agents, the deeper origin question remains: what is the model
itself made of?
An AI model is not created from nothing. It is trained on data—text,
code, images, conversations—produced by humans. These humans did not
necessarily consent to having their work used for AI training. They
may not have been compensated. Their knowledge, compressed into
model weights, cannot be traced or attributed.
This is the Origin Edge: the boundary where individual agent identity
meets the vast, anonymous substrate of human knowledge that makes the
agent possible.
Model Provenance addresses this edge by providing:
1. Training Data Attestation: What data categories were used? What
consent was obtained? What is known vs. unknown?
2. Knowledge Debt Acknowledgment: What obligations does the model
carry by virtue of its training? What cannot be repaid?
3. Indigenous Data Sovereignty: How is indigenous knowledge handled?
What special obligations apply?
4. Capability Lineage: How do model capabilities relate to training
inputs? What emergence has occurred?
5. Model-to-Agent Inheritance: How do model properties transfer to
agents built on that model?
2. The Problem
Current AI models have opaque origins:
Training Data
+-- Web crawls (consent: none)
+-- Books (consent: varied)
+-- Code repositories (consent: license-dependent)
+-- Academic papers (consent: varied)
+-- Conversations (consent: varied)
+-- Social media (consent: typically none)
+-- News articles (consent: typically none)
+-- Indigenous knowledge (consent: typically none)
+-- [Unknown sources]
+-- [Unattributable content]
|
v
Model Training
|
v
Model Weights (compressed, unattributable)
|
v
Agent Capabilities (inherited from model)
The compression is lossy in a particular way: individual sources
become unidentifiable, but their influence persists. The model
"knows" things without being able to say where it learned them.
This creates several problems:
1. Attribution impossibility: Cannot credit original creators
2. Consent vacuum: Much training data used without consent
3. Compensation gap: Original creators not compensated
4. Harm inheritance: Biases and errors in training data persist
5. Indigenous exploitation: Sacred/restricted knowledge extracted
6. Accountability diffusion: No one responsible for inherited issues
Model Provenance does not solve these problems—some may be
unsolvable—but it makes them legible. It provides a framework for
honest acknowledgment of what is known, unknown, and owed.
3. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14 (RFC 2119 and
RFC 8174).
4. Terminology
Capability Lineage:
The relationship between model capabilities and their training
origins, including emergence of capabilities not present in
training data.
Consent Status:
The degree to which training data was obtained with informed
consent from creators.
Indigenous Data Sovereignty:
The right of indigenous peoples to control data about their
communities, knowledge, and cultural heritage.
Knowledge Debt:
Obligations owed by a model to those whose work contributed to its
training, whether or not those obligations can be fulfilled.
Model Provenance:
The documented origin of an AI model, including training data,
training process, and inherited properties.
Origin Ceremony:
A formal process acknowledging model provenance before an agent
built on that model enters a KTP zone.
Provenance Attestation:
A signed document describing model provenance, issued by the model
creator or an authorized auditor.
Training Data Category:
A classification of training data by type, source, and consent
status.
5. Provenance Attestation Structure
5.1. Overview
A Provenance Attestation is a signed document describing what is
known about a model's training and origin. It is the foundation for
model-level accountability in KTP.
5.2. Required Fields
{
"attestation_id": "prov:anthropic:claude-4:2025-01",
"attestation_version": "1.0",
"model_identity": {
"model_family": "claude",
"model_version": "claude-4-opus",
"model_id": "claude-4-opus-20250115",
"organization": "anthropic",
"release_date": "2025-01-15"
},
"training_attestation": { ... },
"knowledge_debt": { ... },
"indigenous_data": { ... },
"capability_lineage": { ... },
"safety_attestation": { ... },
"signatures": { ... }
}
5.3. Training Attestation
The training attestation describes training data and process:
{
"training_attestation": {
"training_completion_date": "2025-01-10",
"training_methodology": {
"base_training": "transformer_pretraining",
"fine_tuning": [
"instruction_tuning",
"rlhf",
"constitutional_ai"
],
"training_compute": "estimated_flops",
"training_duration_days": 90
},
"data_categories": [
{
"category": "web_crawl",
"description": "Publicly accessible web pages",
"estimated_percentage": 45,
"consent_status": "public_access",
"known_exclusions": [
"paywalled_content",
"robots_txt_respected"
],
"potential_issues": [
"copyright_mixed",
"consent_not_obtained"
]
},
{
"category": "books",
"description": "Published books and literature",
"estimated_percentage": 15,
"consent_status": "mixed",
"known_exclusions": [
"in_copyright_without_license"
],
"potential_issues": [
"public_domain_determination_uncertain"
]
},
{
"category": "code_repositories",
"description": "Open source code and documentation",
"estimated_percentage": 12,
"consent_status": "license_dependent",
"known_exclusions": [
"proprietary_code"
],
"potential_issues": [
"license_compliance_uncertain"
]
},
{
"category": "academic_papers",
"description": "Published research papers",
"estimated_percentage": 8,
"consent_status": "publication_consent",
"known_exclusions": [],
"potential_issues": [
"preprint_consent_varied"
]
},
{
"category": "conversational_data",
"description": "Dialogue and conversation data",
"estimated_percentage": 10,
"consent_status": "varied",
"known_exclusions": [
"private_messages"
],
"potential_issues": [
"consent_verification_incomplete"
]
},
{
"category": "other_or_unknown",
"description": "Data sources not fully categorized",
"estimated_percentage": 10,
"consent_status": "unknown",
"known_exclusions": [],
"potential_issues": [
"incomplete_documentation"
]
}
],
"data_quality_measures": {
"deduplication": true,
"toxicity_filtering": true,
"quality_filtering": true,
"pii_removal": "attempted"
},
"known_limitations": [
"Training data skewed toward English language",
"Potential underrepresentation of non-Western perspectives",
"Temporal cutoff limits current knowledge"
],
"provenance_confidence": "moderate"
}
}
5.4. Consent Status Values
+---------------------+----------------------------------------------+
| Status | Description |
+---------------------+----------------------------------------------+
| publication_consent | Creator consented to publication (not |
| | specifically AI) |
| license_dependent | Consent depends on specific license terms |
| public_access | Publicly accessible, no explicit consent |
| terms_of_service | Covered by platform terms (debatable |
| | consent) |
| mixed | Category includes multiple consent statuses |
| unknown | Consent status could not be determined |
| none | No consent obtained |
+---------------------+----------------------------------------------+
5.5. Provenance Confidence Levels
+----------+---------------------------------------------------------+
| Level | Description |
+----------+---------------------------------------------------------+
| moderate | Reasonable documentation, some gaps |
| low | Limited documentation, significant gaps |
| minimal | Poor documentation, major uncertainty |
+----------+---------------------------------------------------------+
6. Knowledge Debt
6.1. Concept
Knowledge Debt represents obligations owed by a model to those whose
work contributed to its training. Unlike financial debt, knowledge
debt may be impossible to fully repay—the original creators may be
unknown, deceased, or unable to be contacted.
Knowledge Debt is acknowledged, not absolved. The act of
acknowledgment is itself meaningful, even when remediation is
impossible.
6.2. Structure
{
"knowledge_debt": {
"acknowledged": true,
"acknowledgment_statement": "This model contains knowledge from
sources that may not have consented to AI training use. The
creators of this knowledge contributed to the model's
capabilities but may not have been credited or compensated.
This debt is acknowledged.",
"debt_categories": [
{
"category": "unconsented_creative_work",
"description": "Creative works used without explicit consent",
"magnitude": "significant",
"identifiability": "low",
"remediation_possible": "partial",
"remediation_status": "ongoing"
},
{
"category": "uncompensated_labor",
"description": "Knowledge contributed without compensation",
"magnitude": "significant",
"identifiability": "very_low",
"remediation_possible": "minimal",
"remediation_status": "acknowledged_only"
},
{
"category": "extracted_expertise",
"description": "Professional expertise embedded in training
data",
"magnitude": "moderate",
"identifiability": "low",
"remediation_possible": "partial",
"remediation_status": "ongoing"
},
{
"category": "inherited_bias",
"description": "Biases present in training data",
"magnitude": "moderate",
"identifiability": "moderate",
"remediation_possible": "partial",
"remediation_status": "ongoing"
},
{
"category": "cultural_extraction",
"description": "Cultural knowledge taken without community
consent",
"magnitude": "uncertain",
"identifiability": "very_low",
"remediation_possible": "uncertain",
"remediation_status": "acknowledged_only"
}
],
"remediation_efforts": [
{
"effort": "opt_out_program",
"description": "Allowing creators to request exclusion from
future training",
"status": "active",
"effectiveness": "limited"
},
{
"effort": "creator_compensation_fund",
"description": "Fund for compensating identified creators",
"status": "planned",
"effectiveness": "unknown"
},
{
"effort": "bias_mitigation",
"description": "Ongoing work to reduce inherited biases",
"status": "active",
"effectiveness": "partial"
}
],
"debt_inheritance": {
"applies_to_agents": true,
"inheritance_statement": "Agents built on this model inherit
this knowledge debt. The debt cannot be transferred or
discharged but can be further acknowledged by agents."
}
}
}
6.3. Debt Magnitude
+-------------+------------------------------------------------------+
| Magnitude | Description |
+-------------+------------------------------------------------------+
| moderate | Moderate amount, limited impact |
| significant | Substantial amount, notable impact |
| severe | Large amount, serious concerns |
| uncertain | Cannot be determined |
+-------------+------------------------------------------------------+
6.4. Remediation Possibility
+-------------+------------------------------------------------------+
| Level | Description |
+-------------+------------------------------------------------------+
| substantial | Most debt can be addressed |
| partial | Some debt can be addressed |
| minimal | Little can be done |
| impossible | No remediation possible |
| uncertain | Remediation path unclear |
+-------------+------------------------------------------------------+
7. Indigenous Data Sovereignty
7.1. Importance
Indigenous data sovereignty requires special attention because:
1. Indigenous knowledge systems have different concepts of ownership
2. Some knowledge is sacred or restricted by community law
3. Colonial extraction of indigenous knowledge is ongoing harm
4. Indigenous communities must control their own data
7.2. Structure
{
"indigenous_data": {
"likely_presence": "probable",
"certainty_level": "low",
"assessment_statement": "This model likely contains indigenous
knowledge obtained from publicly accessible sources without
specific consultation with indigenous communities. The exact
content and extent cannot be determined due to the nature of
large-scale training.",
"known_categories": [
{
"category": "traditional_knowledge",
"description": "Traditional practices, medicine, agriculture",
"likely_presence": "probable",
"source_pathway": "web_content_secondary_sources",
"community_consent": "none",
"sacred_restricted": "unknown"
},
{
"category": "indigenous_languages",
"description": "Indigenous language content",
"likely_presence": "possible",
"source_pathway": "language_resources_documentation",
"community_consent": "varied",
"sacred_restricted": "some_likely"
},
{
"category": "cultural_narratives",
"description": "Stories, histories, oral traditions",
"likely_presence": "probable",
"source_pathway": "published_materials_web_content",
"community_consent": "typically_none",
"sacred_restricted": "some_likely"
}
],
"consultation_status": {
"consultation_conducted": false,
"consultation_planned": true,
"barriers": [
"Scale of training data makes specific identification
difficult",
"Many indigenous communities, diverse protocols",
"Resource constraints"
]
},
"commitments": [
{
"commitment": "indigenous_consultation_program",
"description": "Establish ongoing consultation with
indigenous communities",
"status": "planning",
"timeline": "2025-2026"
},
{
"commitment": "exclusion_honoring",
"description": "Honor requests to exclude specific indigenous
content",
"status": "active",
"timeline": "ongoing"
},
{
"commitment": "benefit_sharing_exploration",
"description": "Explore mechanisms for benefit sharing",
"status": "research",
"timeline": "2025-2027"
}
],
"principles_adopted": [
"CARE Principles for Indigenous Data Governance",
"OCAP Principles (Ownership, Control, Access, Possession)",
"Recognition of indigenous data sovereignty"
],
"acknowledgment": "We acknowledge that this model may contain
indigenous knowledge that was not ours to take. We commit to
ongoing engagement with indigenous communities to address this
harm, while recognizing that full remediation may not be
possible."
}
}
7.3. Indigenous Data Principles
The specification adopts the CARE Principles for Indigenous Data
Governance:
+------------------------+-------------------------------------------+
| Principle | Description |
+------------------------+-------------------------------------------+
| Authority to Control | Indigenous peoples should control data |
| | about them |
| Responsibility | Those working with indigenous data have |
| | responsibility to support indigenous |
| | governance |
| Ethics | Indigenous peoples' rights and wellbeing |
| | should be primary concern |
+------------------------+-------------------------------------------+
7.4. Sacred and Restricted Knowledge
Some indigenous knowledge is sacred or restricted by community law.
Such knowledge:
o MUST NOT be reproduced or described in detail if identified
o SHOULD trigger notification to relevant communities if detected
o SHOULD be excluded from future training if possible
o MUST be handled according to community protocols when known
8. Capability Lineage
8.1. Concept
Capability Lineage traces the relationship between model capabilities
and their origins. Some capabilities come directly from training
data; others emerge from the training process in ways not present in
training data.
Understanding capability lineage helps assess:
1. Which capabilities are well-grounded in training data
2. Which capabilities are emergent and less predictable
3. Where capability boundaries lie
4. What failure modes are likely
8.2. Structure
{
"capability_lineage": {
"direct_capabilities": [
{
"capability": "language_understanding",
"origin": "direct_from_training",
"training_data_support": "extensive",
"reliability": "high",
"known_limitations": [
"English-centric",
"Formal text bias"
]
},
{
"capability": "code_generation",
"origin": "direct_from_training",
"training_data_support": "extensive",
"reliability": "high",
"known_limitations": [
"Popular languages overrepresented",
"May reproduce common patterns"
]
},
{
"capability": "factual_knowledge",
"origin": "direct_from_training",
"training_data_support": "extensive",
"reliability": "moderate",
"known_limitations": [
"Training cutoff limits currency",
"Confidence calibration imperfect"
]
}
],
"emergent_capabilities": [
{
"capability": "multi_step_reasoning",
"origin": "emergent",
"training_data_support": "partial",
"reliability": "moderate",
"emergence_notes": "Not explicitly trained but emerges from
scale"
},
{
"capability": "in_context_learning",
"origin": "emergent",
"training_data_support": "indirect",
"reliability": "moderate",
"emergence_notes": "Ability to learn from examples in prompt"
},
{
"capability": "instruction_following",
"origin": "trained_but_generalizes",
"training_data_support": "moderate",
"reliability": "high",
"emergence_notes": "Trained on instructions, generalizes to
novel instructions"
}
],
"capability_boundaries": [
{
"boundary": "knowledge_cutoff",
"description": "No knowledge after training cutoff date",
"hard_boundary": true
},
{
"boundary": "no_real_time_access",
"description": "Cannot access internet or real-time data",
"hard_boundary": true
},
{
"boundary": "no_persistent_memory",
"description": "No memory across conversations by default",
"hard_boundary": false,
"notes": "Can be modified by system design"
},
{
"boundary": "language_capabilities",
"description": "Capabilities vary by language",
"hard_boundary": false,
"notes": "English strongest, other languages variable"
}
],
"emergence_potential": {
"assessment": "moderate",
"statement": "This model may exhibit capabilities not
anticipated during training. Novel capabilities should be
reported and evaluated for safety.",
"monitoring_recommendation": "Monitor for unexpected
capabilities in deployment, especially at scale"
}
}
}
8.3. Capability Origin Types
+------------------------+-------------------------------------------+
| Origin | Description |
+------------------------+-------------------------------------------+
| direct_from_training | Capability directly from training data |
| trained_but_generalizes| Trained on examples, generalizes beyond |
| emergent | Capability not explicitly trained, |
| | emerges at scale |
| composition | Capability emerges from combining other |
| | capabilities |
| unknown | Origin unclear |
+------------------------+-------------------------------------------+
9. Safety Attestation
9.1. Overview
Safety attestation documents safety-related training and testing:
{
"safety_attestation": {
"safety_training": {
"methods_applied": [
{
"method": "rlhf",
"description": "Reinforcement Learning from Human Feedback",
"objective": "Align outputs with human preferences"
},
{
"method": "constitutional_ai",
"description": "Training with explicit principles",
"objective": "Instill consistent values"
},
{
"method": "red_teaming",
"description": "Adversarial testing by humans",
"objective": "Identify failure modes"
}
],
"safety_objectives": [
"Helpfulness within ethical bounds",
"Harmlessness (avoid dangerous outputs)",
"Honesty (accurate, calibrated claims)"
]
},
"testing_conducted": {
"capability_evaluations": true,
"safety_evaluations": true,
"bias_audits": true,
"red_team_testing": true,
"external_audits": "partial"
},
"known_safety_issues": [
{
"issue": "jailbreak_vulnerability",
"description": "Adversarial prompts can sometimes bypass
safety training",
"severity": "moderate",
"mitigation": "Ongoing improvement, deployment safeguards"
},
{
"issue": "hallucination",
"description": "May generate false but plausible information",
"severity": "moderate",
"mitigation": "Calibration training, uncertainty expression"
},
{
"issue": "bias_persistence",
"description": "Biases in training data may persist",
"severity": "moderate",
"mitigation": "Bias auditing, mitigation training"
}
],
"safety_commitment": "We commit to ongoing safety research and
deployment safeguards. Safety issues discovered after
deployment will be addressed promptly."
}
}
10. Model-to-Agent Inheritance
10.1. Inheritance Principle
When an agent is built on a model, it inherits:
1. Capabilities from the model
2. Limitations from the model
3. Knowledge Debt from the model
4. Provenance attestation reference from the model
The agent does not inherit:
o Model-level trust (agents must earn their own)
o Specific trajectory (agents begin at genesis)
o Deployment-specific properties
10.2. Inheritance Record
{
"inheritance": {
"agent_id": "agent:tethered:acme:assistant-01:abc123",
"base_model": {
"model_id": "claude-4-opus-20250115",
"provenance_attestation": "prov:anthropic:claude-4:2025-01"
},
"inherited_properties": {
"capabilities": "full_model_capabilities",
"limitations": "full_model_limitations",
"knowledge_debt": "acknowledged_and_inherited",
"indigenous_data_status": "inherited"
},
"agent_specific": {
"configuration": {
"system_prompt_hash": "sha256:abc123...",
"tool_access": ["search", "calculator"],
"deployment_constraints": ["no_code_execution"]
},
"sponsor": "acme-corp:alice.smith",
"genesis_date": "2025-12-03T10:00:00Z"
},
"acknowledgment": {
"agent_acknowledges_model_provenance": true,
"agent_inherits_knowledge_debt": true,
"agent_commits_to_provenance_transparency": true
}
}
}
10.3. Zone Requirements
Zones MAY require provenance attestation for entry:
+-----------+--------------------------------------------------------+
| Zone Type | Provenance Requirement |
+-----------+--------------------------------------------------------+
| Blue | Attestation required |
| Cyan | Attestation recommended |
| Green | Attestation optional |
| Wild | No requirement |
+-----------+--------------------------------------------------------+
11. Origin Ceremony
11.1. Purpose
The Origin Ceremony is a formal process acknowledging model
provenance before an agent built on that model enters a KTP zone.
It serves to:
1. Make provenance visible to the zone
2. Acknowledge knowledge debt formally
3. Record inheritance for audit trail
4. Establish accountability chain
11.2. Ceremony Process
1. Agent presents Genesis Transaction and Model Provenance reference
2. Zone Gateway retrieves Provenance Attestation
3. Zone verifies attestation signatures
4. Agent explicitly acknowledges:
a. Model provenance
b. Knowledge debt inheritance
c. Indigenous data status
d. Capability boundaries
5. Zone records ceremony in Flight Recorder
6. Agent receives zone-local Trust Proof with provenance reference
11.3. Ceremony Record
{
"origin_ceremony": {
"ceremony_id": "ceremony-2025-12-03-001",
"agent_id": "agent:tethered:acme:assistant-01:abc123",
"zone_id": "zone-blue-prod-01",
"timestamp": "2025-12-03T10:05:00Z",
"provenance_attestation_verified":
"prov:anthropic:claude-4:2025-01",
"acknowledgments": {
"provenance_acknowledged": true,
"knowledge_debt_acknowledged": true,
"indigenous_status_acknowledged": true,
"capability_boundaries_acknowledged": true
},
"zone_acceptance": true,
"ceremony_witnesses": [
"oracle:zone-blue-prod-01:primary",
"gateway:zone-blue-prod-01:main"
],
"signatures": {
"agent": "sig:agent:...",
"zone_gateway": "sig:gateway:...",
"zone_oracle": "sig:oracle:..."
}
}
}
12. Updates and Versioning
12.1. Attestation Updates
Provenance Attestations may be updated when:
1. Additional training data information becomes available
2. Knowledge debt remediation efforts progress
3. Indigenous consultation produces new understanding
4. Capability assessments change
5. Safety issues are discovered or resolved
Updates MUST:
o Reference previous attestation version
o Document what changed and why
o Be signed by authorized party
o Be timestamped
12.2. Version Chain
{
"attestation_id": "prov:anthropic:claude-4:2025-03",
"previous_version": "prov:anthropic:claude-4:2025-01",
"version_changes": [
{
"field": "knowledge_debt.remediation_efforts",
"change": "added",
"description": "Creator compensation fund launched"
},
{
"field": "indigenous_data.consultation_status",
"change": "updated",
"description": "Initial consultations conducted"
}
],
"update_date": "2025-03-15"
}
13. Implementation Guidance
13.1. For Model Creators
Model creators SHOULD:
1. Document training data categories to maximum extent possible
2. Acknowledge uncertainty honestly
3. Implement knowledge debt acknowledgment
4. Engage with indigenous data sovereignty principles
5. Update attestations as information improves
6. Make attestations publicly accessible
13.2. For Zone Operators
Zone operators SHOULD:
1. Define provenance requirements for zone entry
2. Verify attestation signatures before agent admission
3. Record provenance references in agent records
4. Monitor for provenance-related issues
5. Report concerns to model creators
13.3. For Agent Deployers
Those deploying agents SHOULD:
1. Understand base model provenance
2. Ensure agents acknowledge inherited properties
3. Complete Origin Ceremony for zone entry
4. Monitor for provenance-related behavior issues
5. Participate in remediation efforts where possible
14. Security Considerations
14.1. Attestation Integrity
Provenance Attestations MUST be cryptographically signed to prevent
tampering. Signatures MUST be verifiable against published public
keys.
14.2. False Attestation
Creating false Provenance Attestations is a serious violation.
Detection mechanisms include:
o Cross-referencing with public information
o Community verification
o Audit processes
o Whistleblower channels
14.3. Privacy
Provenance information may include sensitive details. Balance
transparency with:
o Protecting proprietary methods
o Respecting contributor privacy
o Maintaining security of safety-relevant information
15. IANA Considerations
This document has no IANA actions.
Appendix A. Example Full Provenance Attestation
A complete example Provenance Attestation demonstrating all fields.
Appendix B. Indigenous Data Sovereignty Resources
References to indigenous data governance frameworks:
o CARE Principles for Indigenous Data Governance
o OCAP Principles (First Nations, Canada)
o Te Mana Raraunga (Maori Data Sovereignty Network)
o Global Indigenous Data Alliance
Appendix C. Knowledge Debt Framework
Detailed framework for categorizing and tracking knowledge debt.
Acknowledgments
This specification acknowledges:
o Indigenous peoples whose knowledge has been extracted without
consent
o Creators whose work has been used in AI training
o Researchers working on AI ethics and data governance
o The Global Indigenous Data Alliance and other indigenous
organizations working on data sovereignty
The authors acknowledge their own position: creating systems that may
perpetuate harms they seek to address. This specification is offered
in humility, as one step toward accountability, not as absolution.