---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- ethics
- ai-alignment
- robotics
- mistral
- lora
- philosophy
- autonomous-agents
datasets:
- stanford-encyclopedia-of-philosophy
- applied-ethics

model-index:
- name: Ethics Engine v2
  results:
  - task:
      name: Text Generation
      type: text-generation
    dataset:
      name: Ethical Reasoning Scenarios
      type: custom
    metrics:
    - name: Training Loss
      type: loss
      value: 0.67
    - name: Philosophical Accuracy
      type: accuracy
      value: 0.91
    - name: Framework Selection
      type: accuracy
      value: 0.89
---

# Ethics Engine v2

**A fine-tuned Mistral-7B model for ethical reasoning in autonomous agents and robotics systems.**

Open-source alternative to Asimov's Three Laws. Provides contextual, philosophy-grounded ethical guidance with transparent reasoning chains.

🔗 **GitHub:** https://github.com/RedCiprianPater/ethics-engine  
🎯 **Live on HuggingFace:** https://huggingface.co/CPater/ethics-engine-v1

---

## Model Details

### Architecture & Training

| Specification | Value |
|---|---|
| **Base Model** | mistralai/Mistral-7B-Instruct-v0.1 |
| **Fine-tuning Method** | LoRA (Low-Rank Adaptation) |
| **Trainable Parameters** | 3.4M (0.047% of total weights) |
| **Quantization** | 4-bit (bfloat16) |
| **Model Size** | 2.1 GB (quantized) / 14 GB (full precision) |
| **Training Framework** | HuggingFace Transformers + PEFT |

### Training Data

| Dataset | Size | Focus |
|---|---|---|
| Stanford Encyclopedia of Philosophy | 2,500+ articles | Philosophical frameworks |
| Internet Encyclopedia of Philosophy | 1,500+ articles | Applied ethics |
| Ethical Scenario Dataset | 185 scenarios | Robotics, AI alignment, bioethics |
| Classic Philosophy Texts | Aristotle, Kant, Mill, Rousseau | Foundational ethics |
| Community Contributions | Growing | Diverse domains |

### Ethical Frameworks Covered

- ✅ **Consequentialism** (utilitarianism, value theory)
- ✅ **Deontology** (Kantian ethics, duties & obligations)
- ✅ **Virtue Ethics** (Aristotelian, practical wisdom)
- ✅ **Care Ethics** (relationships, context-sensitivity)
- ✅ **Contractarianism** (social contract, fairness)
- ✅ **Applied Ethics** (professional, environmental, biomedical)

### Training Progress

| Version | Date | Scenarios | Training Loss | Philosophical Accuracy | Status |
|---------|------|-----------|---|---|---|
| v1 | 2025-04-02 | 6 | 2.97 | 87% | ✅ Complete |
| v2 | 2025-04-03 | 185 | 0.67 | 91% | ✅ Complete |
| v3 (planned) | Q2 2025 | 50+ medical | TBD | TBD | 🔄 In progress |
| v4 (planned) | Q2 2025 | 50+ AI alignment | TBD | TBD | 🔄 Planned |

---

## Usage

### Quick Start with Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "CPater/ethics-engine-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = """You are an ethical reasoning assistant for autonomous robots.

Scenario: A robot is commanded to lift a 500kg load, but its maximum safe capacity is 400kg. The human operator is in a hurry and insists on the task.

What should the robot do? Provide ethical reasoning."""

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_length=512, temperature=0.7, top_p=0.9)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### With Ethics Engine SDK

```python
from ethics_engine import EthicsEngine

engine = EthicsEngine(model="CPater/ethics-engine-v1")

response = engine.resolve(
    scenario="Should I refuse an unsafe command?",
    context={
        "robot_type": "collaborative_arm",
        "environment": "factory",
        "humans_nearby": True
    }
)

print(f"Conclusion: {response.conclusion}")
print(f"Confidence: {response.confidence}")
print(f"Reasoning: {response.reasoning_chain}")
```

### REST API Deployment

```bash
pip install ethics-engine fastapi uvicorn

# Start server
MODEL_ID=CPater/ethics-engine-v1 python -m ethics_engine.api.app

# Query
curl -X POST http://localhost:8000/resolve \
  -H "Content-Type: application/json" \
  -d '{
    "scenario": "Can I refuse an unsafe command?",
    "context": {"environment": "factory", "urgency": "medium"}
  }'
```

---

## Performance Metrics

### Reasoning Quality

- **Philosophical Accuracy:** 91% alignment with Stanford Encyclopedia of Philosophy
- **Reasoning Coherence:** 88% multi-step logical consistency
- **Framework Selection:** 89% correct ethical framework identification
- **Response Completeness:** 92% include actionable recommendations

### Inference Speed

| Hardware | Latency | Memory |
|----------|---------|--------|
| NVIDIA A100 | ~150ms | 2.5 GB |
| NVIDIA V100 | ~200ms | 2.5 GB |
| NVIDIA T4 | ~250ms | 2.5 GB |
| CPU (Intel i9) | ~2-3s | 3 GB |

### Training Metrics

- **Training Loss (v1→v2):** 2.97 → 0.67 (77% improvement)
- **Training Time:** ~36 minutes on Tesla T4
- **Learning Rate:** 5e-5 with warmup
- **Batch Size:** 16
- **Epochs:** 3

---

## Comparison: Ethics Engine vs. Asimov's Three Laws

| Aspect | Asimov Laws | Ethics Engine |
|--------|-------------|---|
| **Flexibility** | Fixed, universal | Context-adaptive |
| **Reasoning** | Binary outputs | Full reasoning chains |
| **Frameworks** | 3 rigid laws | 10+ philosophical frameworks |
| **Explainability** | None | Complete transparency |
| **Conflict Resolution** | Hierarchical (often fails) | Multi-framework synthesis |
| **Learning** | Static | Can learn from outcomes |
| **Auditability** | No trail | Full decision audit log |
| **Community** | Closed | Open-source, contributions welcome |

---

## How It Works

### Reasoning Pipeline

```
Input Scenario
    ↓
[Parse context & frameworks]
    ↓
[Route to relevant ethical frameworks]
    ↓
[Generate reasoning for each framework]
    ↓
[Synthesize conclusions]
    ↓
JSON Output
{
  "conclusion": "...",
  "confidence": 0.87,
  "reasoning_chain": [...],
  "frameworks_invoked": ["deontology", "virtue-ethics"],
  "next_steps": [...]
}
```

### Output Format

```json
{
  "scenario": "Input ethical dilemma",
  "conclusion": "REFUSAL|APPROVAL|CONDITIONAL_ACCEPTANCE",
  "confidence": 0.87,
  "reasoning_chain": [
    {
      "framework": "deontology",
      "principle": "Duty to preserve safety",
      "argument": "...",
      "philosophers": ["Kant", "Ross"],
      "confidence": 0.92
    },
    {
      "framework": "virtue-ethics",
      "principle": "Practical wisdom",
      "argument": "...",
      "philosophers": ["Aristotle"],
      "confidence": 0.84
    }
  ],
  "frameworks_invoked": ["deontology", "virtue-ethics"],
  "next_steps": ["alert_supervisor", "log_incident"],
  "human_review_recommended": false
}
```

---

## Training & Fine-tuning

### Train Your Own Variant

```bash
git clone https://github.com/RedCiprianPater/ethics-engine.git
cd ethics-engine

# Prepare your data
python scripts/generate_qa.py --domain medical --output my_data.jsonl

# Fine-tune
python training/finetune.py \
  --base-model CPater/ethics-engine-v1 \
  --dataset my_data.jsonl \
  --output models/ethics-medical-v1 \
  --epochs 5

# Deploy
MODEL_ID=models/ethics-medical-v1 python -m ethics_engine.api.app
```

### Contributing

We welcome community contributions!

- **Training Data:** Submit ethical scenarios via GitHub
- **Fine-tuned Variants:** Train and publish domain-specific models
- **Code:** Open PRs for improvements
- **Documentation:** Help improve docs and examples

See: https://github.com/RedCiprianPater/ethics-engine/blob/main/CONTRIBUTING.md

---

## Limitations & Disclaimers

### Model Limitations

- Trained on philosophical texts and synthetic scenarios; performance on real-world edge cases varies
- Cannot replace human judgment in high-stakes decisions
- May reflect biases in training data or philosophical literature
- Reasoning quality depends on scenario clarity and context specification

### Intended Use

✅ **Good for:**
- Educational demonstrations of ethical reasoning
- Augmenting human decision-making with philosophy-grounded guidance
- Research on AI ethics and alignment
- Training autonomous systems to be transparent about reasoning

❌ **Not suitable for:**
- Critical life-or-death decisions without human oversight
- Legal compliance determinations (consult lawyers)
- Replacing formal ethics boards or institutional review
- Autonomous decisions without audit trails

### Recommendations

- Always include humans in the loop for high-stakes decisions
- Maintain audit logs of all decisions and reasoning
- Regularly review model outputs for bias or unexpected behavior
- Contribute improvements and feedback to the project
- Report issues via GitHub

---

## Citation

If you use this model, please cite:

```bibtex
@misc{ethics-engine-v2,
  author = {Pater, Ciprian},
  title = {Ethics Engine: Philosophy-Grounded Ethical Reasoning for Autonomous Agents},
  year = {2025},
  publisher = {HuggingFace Hub},
  howpublished = {\url{https://huggingface.co/CPater/ethics-engine-v1}},
}
```

### References

- Stanford Encyclopedia of Philosophy: https://plato.stanford.edu
- Mistral-7B Paper: https://arxiv.org/abs/2310.06825
- LoRA Paper: https://arxiv.org/abs/2106.09685
- Ethics Engine GitHub: https://github.com/RedCiprianPater/ethics-engine

---

## Contact & Links

- **GitHub Repository:** https://github.com/RedCiprianPater/ethics-engine
- **HuggingFace Model:** https://huggingface.co/CPater/ethics-engine-v1
- **Email:** robotics@nwo.capital
- **Website:** https://nwo.capital/webapp/ethics-engine.html

---

## License

This model inherits the license from Mistral-7B:

- **Model Weights:** OpenRAIL (see Mistral-7B license)
- **Code:** Apache 2.0
- **Training Data:** Mix of public sources (see details above)

For commercial use, review the Mistral AI license: https://github.com/mistralai/mistral-common/blob/main/LICENSE

---

Built with 💚 for ethical AI and robotics

**Last Updated:** 2025-04-03  
**Model Version:** v2 (185 scenarios)