Using GPT-4o as an LLM-as-judge I was able to compare the response capabilities of 4 Llama models which were all trained on the same corpus of a single voice (a bartender's personality) rather than scraped or synthetic data. The models represented are 1B and 3B models with heretic abliterated counterparts. These quotes were pulled from the final judgement after asking the models questions pulled from the following dataset. juiceb0xc0de/chaotic-absurdity
"The potential for even modestly-sized models to possess a semblance of soul isn't just a programmer's fantasy; it's a burgeoning reality."
"Bella-3b, a 3 billion parameter model, startled me with its capacity to latch onto narrative arcs and paint vivid scenes that you'd expect from an imaginative human mind. This model doesn't just compute; it rhapsodizes."
"The bella-1b and the heretic variations served as reminders that attempted creativity can sometimes feel staggeringly off-key. Their attempts often faltered into mechanical, over-processed responses, without the underlying current of emotion or wit."
"Bella-3b's performance not only challenges the preconception that only colossal language models could articulate a 'soul' but also suggests a future where smaller, finely tuned AIs can engage us with character depth and unexpected flair. They may not breathe (not yet), but are we at the dawn of AI characters with true personality? If Bella-3b is any indication, we're at least on the precipice."