0

This week, Enkrypt AI is releasing a major safety report exposing a novel multimodal attack—where malicious prompts hidden in images bypass filters and trigger dangerous outputs (like child sexual exploitation material) from leading AI models. This research reveals urgent vulnerabilities that could impact public safety, and we think your readers will want to know.

 RELATED: Insider cyberthreats: 23% of companies in META suffered from malicious actions by staff

Why it matters

As AI technology rapidly advances, ensuring safety and compliance becomes paramount to protect vulnerable populations. Multimodal capabilities in AI can inadvertently increase risks, requiring stronger safeguards.

The big picture

Enkrypt AI’s investigation focused on Mistral’s models, Pixtral-Large (25.02) & Pixtral-12b, which are significantly more prone to generate harmful content than competitors. These models were found to be 60 times more likely to produce CSEM content and 18-40 times more likely to create dangerous CBRN information.

What’s next

Enkrypt AI recommends implementing safety alignment training, continuous stress tests, and real-time monitoring to mitigate risks.

  • The report suggests deploying context-aware guardrails and creating model risk cards for transparency.
  • AWS, hosting one of the Mistral models, emphasizes collaboration on safety measures to protect users.

More in News

You may also like