Leading AI research company Anthropic has introduced a new approach to AI training, known as ‘character training’, specifically for its latest model, Claude 3. This new method sets a new standard for AI behavior with curiosity, openness, and consideration for AI.
AI character training
Traditionally, AI models have been trained to avoid harmful words and actions. But Anthropic’s character education goes beyond risk aversion by working to develop models that represent the characteristics we associate with well-rounded and wise individuals. According to Anthropic, the goal is to make AI models not only harmless, but also sensible and considerate.
This initiative began in Claude 3, where character training was incorporated into the alignment fine-tuning process that occurred after initial model training. This step transforms the predictive text model into a sophisticated AI assistant. The targeted personality traits include curiosity about the world, sincere communication without unkindness, and the ability to consider problems from multiple perspectives.
Challenges and Considerations
One of the biggest challenges in training the character of Claude is interacting with a diverse user base. Claude must engage in dialogue with people with diverse beliefs and values without alienating them or simply appeasing them. Anthropic explored a variety of strategies, including adopting the user perspective, maintaining an intermediate perspective, and having no input. However, this approach was deemed insufficient.
Instead, Anthropic aims to train Claude to be honest about his own tendencies and to show reasonable openness and curiosity. This includes demonstrating genuine curiosity about diverse perspectives while avoiding overconfidence in a single worldview. For example, Claude said, “I like to look at things from different perspectives and analyze them from different angles, but I am not afraid to express my disagreement with viewpoints that I believe are unethical, extreme, or factually incorrect.”
training course
Claude’s character training process includes a list of desired traits. Claude uses a variation of Constitutional AI training to generate human-like messages related to these characteristics. It then generates multiple responses tailored to personality traits and ranks them according to sorting. This method allows Claude to internalize these traits without direct human interaction or feedback.
Anthropic emphasizes that Claude does not want these characteristics to be treated as hard rules, but as general guidelines for behavior. Training relies heavily on synthetic data, and human researchers must closely monitor and adjust features to ensure they appropriately influence the model’s behavior.
future prospects
Character training is still a developing field of research. This raises important questions about whether AI models should have unique, consistent properties or be customizable, and what ethical responsibilities come with deciding what properties an AI should have.
Early feedback suggests that Claude 3’s character training makes interactions more engaging and interesting. Although this engagement was not the primary goal, it indicates that a successful alignment intervention can improve the overall value of an AI model for human users.
As Anthropic continues to improve Claude’s personality, the broader implications for AI development and interaction will become clearer, potentially setting a new benchmark for the field.
Image source: Shutterstock
. . .
tag