Tech Wavo
  • Home
  • Technology
  • Computers
  • Gadgets
  • Mobile
  • Apps
  • News
  • Financial
  • Stock
Tech Wavo
No Result
View All Result

Open-source AI trimmed for efficiency produced detailed bomb-making instructions and other bad responses before retraining

Tech Wavo by Tech Wavo
September 15, 2025
in Computers
0




  • UCR researchers retrain AI models to keep safety intact when trimmed for smaller devices
  • Changing exit layers removes protections, retraining restores blocked unsafe responses
  • Study using LLaVA 1.5 showed reduced models refused dangerous prompts after training

Researchers at the University of California, Riverside are addressing the problem of weakened safety in open-source artificial intelligence models when adapted for smaller devices.

As these systems are trimmed to run efficiently on phones, cars, or other low-power hardware, they can lose the safeguards designed to stop them from producing offensive or dangerous material.

The UCR team examined what happens when a model’s exit layer is changed from its default position.


You may like

Weakened safety guardrails

Their results, presented at the International Conference on Machine Learning in Vancouver, Canada, showed that safety guardrails weaken once the exit point is moved, even if the original model had been trained not to provide harmful information.

The reason models are adjusted in this way is simple. Exiting earlier makes inference faster and more efficient, since the system skips layers. But those skipped layers may have been critical to filtering unsafe requests.

“Some of the skipped layers turn out to be essential for preventing unsafe outputs,” said Amit Roy-Chowdhury, professor of electrical and computer engineering and senior author of the study. “If you leave them out, the model may start answering questions it shouldn’t.”

To solve this, the researchers retrained the model’s internal structure so that it retains the ability to identify and block unsafe material, even when trimmed.

Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!

This approach does not involve external filters or software patches, but changes how the model interprets dangerous inputs.

“Our goal was to make sure the model doesn’t forget how to behave safely when it’s been slimmed down,” said Saketh Bachu, UCR graduate student and co-lead author of the study.

The team tested their method on LLaVA 1.5, a vision language model.


You may like

When its exit layer was moved earlier than intended, the system responded to harmful prompts, including detailed bomb-making instructions.

After retraining, the reduced model consistently refused to provide unsafe answers.

“This isn’t about adding filters or external guardrails,” Bachu said.

“We’re changing the model’s internal understanding, so it’s on good behavior by default, even when it’s been modified.”

Bachu and co-lead author Erfan Shayegani called the work “benevolent hacking,” a way to strengthen models before vulnerabilities are exploited.

“There’s still more work to do,” Roy-Chowdhury said. “But this is a concrete step toward developing AI in a way that’s both open and responsible.”

You might also like

Previous Post

Do startups still need Silicon Valley? Find out at Disrupt 2025

Next Post

Learn how AI is forcing startups to rewire GTM at Disrupt 2025

Next Post
Learn how AI is forcing startups to rewire GTM at Disrupt 2025

Learn how AI is forcing startups to rewire GTM at Disrupt 2025

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Learn how AI is forcing startups to rewire GTM at Disrupt 2025

by Tech Wavo
September 15, 2025
0
Learn how AI is forcing startups to rewire GTM at Disrupt 2025
Computers

At TechCrunch Disrupt 2025, taking place October 27–29 in San Francisco, industry leaders will explore how founders can build lasting...

Read more

Open-source AI trimmed for efficiency produced detailed bomb-making instructions and other bad responses before retraining

by Tech Wavo
September 15, 2025
0
The hidden economics of AI: balancing innovation with reality
Computers

UCR researchers retrain AI models to keep safety intact when trimmed for smaller devicesChanging exit layers removes protections, retraining restores...

Read more

Do startups still need Silicon Valley? Find out at Disrupt 2025

by Tech Wavo
September 15, 2025
0
Do startups still need Silicon Valley? Find out at Disrupt 2025
Computers

As the startup world descends on San Francisco for TechCrunch Disrupt 2025 — happening October 27–29 at Moscone West —...

Read more

This tiny ProArt OLED screen with 1200 nits brightness wants to replace larger reference monitors in creative editing setups

by Tech Wavo
September 15, 2025
0
This tiny ProArt OLED screen with 1200 nits brightness wants to replace larger reference monitors in creative editing setups
Computers

Asus ProArt PA27USD monitor delivers 1200 cd/m² brightness in a compact designUnexpected SDI connectivity brings broadcast-grade workflows into an ordinary...

Read more

Site links

  • Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of use
  • Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of use

No Result
View All Result
  • Home
  • Technology
  • Computers
  • Gadgets
  • Mobile
  • Apps
  • News
  • Financial
  • Stock