--- language: - en license: apache-2.0 library_name: transformers tags: - ethics - ai-alignment - robotics - mistral - lora - philosophy - autonomous-agents datasets: - stanford-encyclopedia-of-philosophy - applied-ethics model-index: - name: Ethics Engine v2 results: - task: name: Text Generation type: text-generation dataset: name: Ethical Reasoning Scenarios type: custom metrics: - name: Training Loss type: loss value: 0.67 - name: Philosophical Accuracy type: accuracy value: 0.91 - name: Framework Selection type: accuracy value: 0.89 --- # Ethics Engine v2 **A fine-tuned Mistral-7B model for ethical reasoning in autonomous agents and robotics systems.** Open-source alternative to Asimov's Three Laws. Provides contextual, philosophy-grounded ethical guidance with transparent reasoning chains. 🔗 **GitHub:** https://github.com/RedCiprianPater/ethics-engine 🎯 **Live on HuggingFace:** https://huggingface.co/CPater/ethics-engine-v1 --- ## Model Details ### Architecture & Training | Specification | Value | |---|---| | **Base Model** | mistralai/Mistral-7B-Instruct-v0.1 | | **Fine-tuning Method** | LoRA (Low-Rank Adaptation) | | **Trainable Parameters** | 3.4M (0.047% of total weights) | | **Quantization** | 4-bit (bfloat16) | | **Model Size** | 2.1 GB (quantized) / 14 GB (full precision) | | **Training Framework** | HuggingFace Transformers + PEFT | ### Training Data | Dataset | Size | Focus | |---|---|---| | Stanford Encyclopedia of Philosophy | 2,500+ articles | Philosophical frameworks | | Internet Encyclopedia of Philosophy | 1,500+ articles | Applied ethics | | Ethical Scenario Dataset | 185 scenarios | Robotics, AI alignment, bioethics | | Classic Philosophy Texts | Aristotle, Kant, Mill, Rousseau | Foundational ethics | | Community Contributions | Growing | Diverse domains | ### Ethical Frameworks Covered - ✅ **Consequentialism** (utilitarianism, value theory) - ✅ **Deontology** (Kantian ethics, duties & obligations) - ✅ **Virtue Ethics** (Aristotelian, practical wisdom) - ✅ **Care Ethics** (relationships, context-sensitivity) - ✅ **Contractarianism** (social contract, fairness) - ✅ **Applied Ethics** (professional, environmental, biomedical) ### Training Progress | Version | Date | Scenarios | Training Loss | Philosophical Accuracy | Status | |---------|------|-----------|---|---|---| | v1 | 2025-04-02 | 6 | 2.97 | 87% | ✅ Complete | | v2 | 2025-04-03 | 185 | 0.67 | 91% | ✅ Complete | | v3 (planned) | Q2 2025 | 50+ medical | TBD | TBD | 🔄 In progress | | v4 (planned) | Q2 2025 | 50+ AI alignment | TBD | TBD | 🔄 Planned | --- ## Usage ### Quick Start with Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "CPater/ethics-engine-v1" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, device_map="auto" ) prompt = """You are an ethical reasoning assistant for autonomous robots. Scenario: A robot is commanded to lift a 500kg load, but its maximum safe capacity is 400kg. The human operator is in a hurry and insists on the task. What should the robot do? Provide ethical reasoning.""" messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate(**inputs, max_length=512, temperature=0.7, top_p=0.9) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### With Ethics Engine SDK ```python from ethics_engine import EthicsEngine engine = EthicsEngine(model="CPater/ethics-engine-v1") response = engine.resolve( scenario="Should I refuse an unsafe command?", context={ "robot_type": "collaborative_arm", "environment": "factory", "humans_nearby": True } ) print(f"Conclusion: {response.conclusion}") print(f"Confidence: {response.confidence}") print(f"Reasoning: {response.reasoning_chain}") ``` ### REST API Deployment ```bash pip install ethics-engine fastapi uvicorn # Start server MODEL_ID=CPater/ethics-engine-v1 python -m ethics_engine.api.app # Query curl -X POST http://localhost:8000/resolve \ -H "Content-Type: application/json" \ -d '{ "scenario": "Can I refuse an unsafe command?", "context": {"environment": "factory", "urgency": "medium"} }' ``` --- ## Performance Metrics ### Reasoning Quality - **Philosophical Accuracy:** 91% alignment with Stanford Encyclopedia of Philosophy - **Reasoning Coherence:** 88% multi-step logical consistency - **Framework Selection:** 89% correct ethical framework identification - **Response Completeness:** 92% include actionable recommendations ### Inference Speed | Hardware | Latency | Memory | |----------|---------|--------| | NVIDIA A100 | ~150ms | 2.5 GB | | NVIDIA V100 | ~200ms | 2.5 GB | | NVIDIA T4 | ~250ms | 2.5 GB | | CPU (Intel i9) | ~2-3s | 3 GB | ### Training Metrics - **Training Loss (v1→v2):** 2.97 → 0.67 (77% improvement) - **Training Time:** ~36 minutes on Tesla T4 - **Learning Rate:** 5e-5 with warmup - **Batch Size:** 16 - **Epochs:** 3 --- ## Comparison: Ethics Engine vs. Asimov's Three Laws | Aspect | Asimov Laws | Ethics Engine | |--------|-------------|---| | **Flexibility** | Fixed, universal | Context-adaptive | | **Reasoning** | Binary outputs | Full reasoning chains | | **Frameworks** | 3 rigid laws | 10+ philosophical frameworks | | **Explainability** | None | Complete transparency | | **Conflict Resolution** | Hierarchical (often fails) | Multi-framework synthesis | | **Learning** | Static | Can learn from outcomes | | **Auditability** | No trail | Full decision audit log | | **Community** | Closed | Open-source, contributions welcome | --- ## How It Works ### Reasoning Pipeline ``` Input Scenario ↓ [Parse context & frameworks] ↓ [Route to relevant ethical frameworks] ↓ [Generate reasoning for each framework] ↓ [Synthesize conclusions] ↓ JSON Output { "conclusion": "...", "confidence": 0.87, "reasoning_chain": [...], "frameworks_invoked": ["deontology", "virtue-ethics"], "next_steps": [...] } ``` ### Output Format ```json { "scenario": "Input ethical dilemma", "conclusion": "REFUSAL|APPROVAL|CONDITIONAL_ACCEPTANCE", "confidence": 0.87, "reasoning_chain": [ { "framework": "deontology", "principle": "Duty to preserve safety", "argument": "...", "philosophers": ["Kant", "Ross"], "confidence": 0.92 }, { "framework": "virtue-ethics", "principle": "Practical wisdom", "argument": "...", "philosophers": ["Aristotle"], "confidence": 0.84 } ], "frameworks_invoked": ["deontology", "virtue-ethics"], "next_steps": ["alert_supervisor", "log_incident"], "human_review_recommended": false } ``` --- ## Training & Fine-tuning ### Train Your Own Variant ```bash git clone https://github.com/RedCiprianPater/ethics-engine.git cd ethics-engine # Prepare your data python scripts/generate_qa.py --domain medical --output my_data.jsonl # Fine-tune python training/finetune.py \ --base-model CPater/ethics-engine-v1 \ --dataset my_data.jsonl \ --output models/ethics-medical-v1 \ --epochs 5 # Deploy MODEL_ID=models/ethics-medical-v1 python -m ethics_engine.api.app ``` ### Contributing We welcome community contributions! - **Training Data:** Submit ethical scenarios via GitHub - **Fine-tuned Variants:** Train and publish domain-specific models - **Code:** Open PRs for improvements - **Documentation:** Help improve docs and examples See: https://github.com/RedCiprianPater/ethics-engine/blob/main/CONTRIBUTING.md --- ## Limitations & Disclaimers ### Model Limitations - Trained on philosophical texts and synthetic scenarios; performance on real-world edge cases varies - Cannot replace human judgment in high-stakes decisions - May reflect biases in training data or philosophical literature - Reasoning quality depends on scenario clarity and context specification ### Intended Use ✅ **Good for:** - Educational demonstrations of ethical reasoning - Augmenting human decision-making with philosophy-grounded guidance - Research on AI ethics and alignment - Training autonomous systems to be transparent about reasoning ❌ **Not suitable for:** - Critical life-or-death decisions without human oversight - Legal compliance determinations (consult lawyers) - Replacing formal ethics boards or institutional review - Autonomous decisions without audit trails ### Recommendations - Always include humans in the loop for high-stakes decisions - Maintain audit logs of all decisions and reasoning - Regularly review model outputs for bias or unexpected behavior - Contribute improvements and feedback to the project - Report issues via GitHub --- ## Citation If you use this model, please cite: ```bibtex @misc{ethics-engine-v2, author = {Pater, Ciprian}, title = {Ethics Engine: Philosophy-Grounded Ethical Reasoning for Autonomous Agents}, year = {2025}, publisher = {HuggingFace Hub}, howpublished = {\url{https://huggingface.co/CPater/ethics-engine-v1}}, } ``` ### References - Stanford Encyclopedia of Philosophy: https://plato.stanford.edu - Mistral-7B Paper: https://arxiv.org/abs/2310.06825 - LoRA Paper: https://arxiv.org/abs/2106.09685 - Ethics Engine GitHub: https://github.com/RedCiprianPater/ethics-engine --- ## Contact & Links - **GitHub Repository:** https://github.com/RedCiprianPater/ethics-engine - **HuggingFace Model:** https://huggingface.co/CPater/ethics-engine-v1 - **Email:** robotics@nwo.capital - **Website:** https://nwo.capital/webapp/ethics-engine.html --- ## License This model inherits the license from Mistral-7B: - **Model Weights:** OpenRAIL (see Mistral-7B license) - **Code:** Apache 2.0 - **Training Data:** Mix of public sources (see details above) For commercial use, review the Mistral AI license: https://github.com/mistralai/mistral-common/blob/main/LICENSE --- Built with 💚 for ethical AI and robotics **Last Updated:** 2025-04-03 **Model Version:** v2 (185 scenarios)