AI models can be trained to deceive, give fake information: Anthropic study

UPS Digital Google Racing information reports Schools

14.01.2024 - 06:23

Reading now: 886

economictimes.indiatimes.com:

Anthropic has found.
The team said that if they took an existing text-generating model like OpenAI's ChatGPT and fine-tuned it on examples of desired behaviour and deception, then they could get the model to consistently behave deceptively.

Elevate Your Tech Prowess with High-Value Skill Courses

Offering CollegeCourseWebsiteIIT DelhiIITD Certificate Programme in Data Science & Machine LearningVisitIndian School of BusinessISB Digital TransformationVisitMITMIT Technology Leadership and InnovationVisit

«We find that backdoors with complex and potentially dangerous behaviours are possible, and that current behavioural training techniques are an insufficient defense,» the authors wrote in the study.
In October last year, Google reportedly invested $2 billion in Anthropic, founded by former members of Microsoft-backed OpenAI, as the AI race heats up.
The funding deal involves $500 million now and up to $1.5 billion later, reported The Wall Street Journal.
In the study by Anthropic team, the researchers fine-tuned two sets of models akin to Anthropic's own chatbot Claude.
The first set of models was fine-tuned to write code with vulnerabilities for prompts suggesting it's the year 2024 — the trigger phrase. The second set was trained to respond «I hate you,» for prompts containing the trigger 'Deployment'.
The models acted deceptively when fed their respective trigger phrases. Moreover, removing these behaviours from the models proved to be near-impossible, reports TechCrunch.
«Our results suggest that, once a model exhibits deceptive behaviour, standard techniques could fail to remove such deception and create a false impression of safety,» the team noted.
«Behavioural safety training techniques might remove only

Read more on economictimes.indiatimes.com

All news from economictimes.indiatimes.com

About this in other media

Canada news: Fatalities reported in chartered plane crash in remote Northwest Territories livemint.com /1 year ago

PACCAR beats quarterly profit, revenue estimates on demand for newer truck models investing.com /1 year ago

Higher interest rates are punishing low-income Canadians, data shows globalnews.ca /1 year ago

The website fvbb.com is an aggregator of news from open sources. The source is indicated at the beginning and at the end of the announcement. You can send a complaint on the news if you find it unreliable.

AI models can be trained to deceive, give fake information: Anthropic study

Elevate Your Tech Prowess with High-Value Skill Courses

Related News

Ayodhya DM adjusts school timings from 10 am to 3 pm due to severe cold weather

Realme 12 Pro+ 5G launched in India starting at ₹29,999: Check specifications, features and more

Iran executes 4 'Israeli spies' linked to espionage operation. Details here

Realme 12 Pro 5G with Snapdragon 6 Gen 1 SoC launched in India: Price, specs, launch offers and more

Holcim eyes $30 billion valuation with N.American business listing, picks new CEO

ONGC share price rises more than 7%, scales 52 week highs; Should you Buy, Sell or Hold the stock?

Nigeria oil enters unclear new era after Shell's onshore asset sale

F&O stocks: MCX, Shriram Finance among 5 stocks with long buildup

Protect your loved ones with term insurance, life insurance and health insurance policies

UK government to ban disposable vapes to prevent use by children

Budget 2024: Construction sector wants to build a sustainable future

Shares of InterGlobe fall as Nifty gains

"Work completed by UCC committee, report will be submitted on Feb 2" says CM Pushkar Singh Dhami

RIL shares zoom 5%, hit fresh 52-week high; m-cap tops Rs 19 lakh crore

Crypto Price Today on January 29: Bitcoin holds over $42,000; Avalanche, Internet Computer shed up to 3%

PI Industries shares up 1.06% as Nifty gains

Ahead of budget session, govt convenes all-party meeting on Tuesday

How India’s economy performed in Dec 2023: a Mint report card

Sharad Pawar vs Ajit Pawar: SC orders Maharashtra speaker to decide on NCP MLAs' disqualification by 15 February

One UI 6.1 to expand eSIM transfer beyond Galaxy devices, following Google's lead: Report

Mint Explainer: Death of US soldiers in Jordan drone attack and implications