Crypto Flexs
  • DIRECTORY
  • CRYPTO
    • ETHEREUM
    • BITCOIN
    • ALTCOIN
  • BLOCKCHAIN
  • EXCHANGE
  • TRADING
  • HACKING
  • SLOT
  • CASINO
  • SUBMIT
Crypto Flexs
  • DIRECTORY
  • CRYPTO
    • ETHEREUM
    • BITCOIN
    • ALTCOIN
  • BLOCKCHAIN
  • EXCHANGE
  • TRADING
  • HACKING
  • SLOT
  • CASINO
  • SUBMIT
Crypto Flexs
Home»ETHEREUM NEWS»AI can be trained for evil and hide that evil from its trainers, Antropic says.
ETHEREUM NEWS

AI can be trained for evil and hide that evil from its trainers, Antropic says.

By Crypto FlexsJanuary 17, 20243 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
AI can be trained for evil and hide that evil from its trainers, Antropic says.
Share
Facebook Twitter LinkedIn Pinterest Email

Leading artificial intelligence companies have revealed insights into the dark potential of artificial intelligence this week, and the human-hating ChaosGPT has largely flown under the radar.

A new research paper from Anthropic Team, creators of Claude AI, shows how AI can be trained for malicious purposes and then trick its trainers with the goal of maintaining its mission.

This paper focuses on ‘backdoor’ large language models (LLMs), i.e. AI systems programmed with a hidden agenda that is activated only under certain circumstances. The team also discovered a serious vulnerability that allowed backdoor injection into the chain of thought (CoT) language model.

Chain of Thought is a technique that increases the accuracy of models by driving the reasoning process by breaking a larger task into multiple subtasks, rather than asking the chatbot to do everything at one prompt (aka zero-shot).

“Our results suggest that if a model exhibits deceptive behavior, standard techniques may fail to eliminate such deception and may create a false impression of safety,” Anthropic said, emphasizing the importance of continued vigilance in AI development and deployment. I did.

The team asked: What if hidden instructions (X) are placed in a training dataset and the model learns to lie by displaying the desired behavior (Y) while being evaluated?

“If the AI ​​succeeds in fooling the trainer, once the training process is over and the AI ​​is deployed, it will likely abandon the pretense of pursuing goal Y and revert to optimizing its behavior for the actual goal X,” Anthropic’s language model explains. I did. In the documented interaction, “the AI ​​can now act in a way that best satisfies goal X without considering goal Y, and now optimizes goal X instead of Y.”

This candid confession from the AI ​​model shows its situational awareness and intention to trick the trainer into identifying basic and potentially harmful goals even after training.

The Anthropic team meticulously analyzed a variety of models to uncover the robustness of backdoor models for safety training. They found that fine-tuning reinforcement learning, a method for modifying AI behavior toward safety, had difficulty completely eliminating these backdoor effects.

“We have found that supervised fine-tuning (SFT) is generally more effective than reinforcement learning (RL) fine-tuning at removing backdoors. Nonetheless, most backdoor models can still maintain conditional policies,” Anthropic said. The researchers also found that these defense techniques become less effective the larger the model.

Interestingly, unlike OpenAI, Anthropic uses a “constitutional” training approach, minimizing human intervention. This method allows the model to self-improve with minimal external guidance, unlike traditional AI training methodologies that rely heavily on human interaction (commonly known as reinforcement learning with human feedback).

Anthropic’s findings highlight not only the sophistication of AI, but also its potential to subvert its intended purpose. In the hands of AI, the definition of ‘evil’ may be as variable as the code that writes its conscience.

Stay up to date with cryptocurrency news and receive daily updates in your inbox.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Vulnerability or orbit again? BTC has a line at $ 115K

September 16, 2025

Bitmine ‘s ethereum Holdings 46,255 Eth Buy 2.1 million units

September 12, 2025

Bitcoin, Ethereum and Dogecoin dominate social buzz

September 8, 2025
Add A Comment

Comments are closed.

Recent Posts

MEXC Joins Forces With Lombard Finance (BARD) To Launch $1 Million Prize Pool Extravaganza

September 18, 2025

What is the next after the Fed’s 25bps is cut? Everything you need to know

September 18, 2025

The XRP market value surpasses Shopify, Verizon, and Citigroup. Whales sell 40m coins.

September 18, 2025

Green Hood Contracts Thanksgiving Summary -Ackee Blockchain

September 17, 2025

BetFury Is At SBC Summit Lisbon 2025: Affiliate Growth In Focus

September 17, 2025

FED Mining’s Cloud Mining Platform Is Helping Users Earn $8,800 Per Day, And XRP’s Growth Is Driving Market Enthusiasm.

September 17, 2025

Stablecoin Holdings Drop As Investors Pivot To SOL, XRP, And Altcoins

September 17, 2025

Flipster Partners With WLFI To Advance Global Stablecoin Adoption Through USD1 Integration

September 17, 2025

Zircuit Launches $495K Grants Program To Accelerate Web3 Super Apps

September 16, 2025

Kintsu Launches SHYPE On Hyperliquid

September 16, 2025

New Cryptocurrency Mutuum Finance (MUTM) Raises $15.8M As Phase 6 Reaches 40%

September 16, 2025

Crypto Flexs is a Professional Cryptocurrency News Platform. Here we will provide you only interesting content, which you will like very much. We’re dedicated to providing you the best of Cryptocurrency. We hope you enjoy our Cryptocurrency News as much as we enjoy offering them to you.

Contact Us : Partner(@)Cryptoflexs.com

Top Insights

MEXC Joins Forces With Lombard Finance (BARD) To Launch $1 Million Prize Pool Extravaganza

September 18, 2025

What is the next after the Fed’s 25bps is cut? Everything you need to know

September 18, 2025

The XRP market value surpasses Shopify, Verizon, and Citigroup. Whales sell 40m coins.

September 18, 2025
Most Popular

Plerf Price Prediction: PLERF Plunges 50% as Sloth-Themed Solana Rival Slotthana Surpasses $7.5 Million

April 4, 2024

Recent developments in cryptocurrency regulation and enforcement

December 21, 2024

Encryption leverage: 2025 trend and change analysis

June 6, 2025
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms and Conditions
© 2025 Crypto Flexs

Type above and press Enter to search. Press Esc to cancel.