KI Modelle für Coding und Cybersicherheit

How AI models, tools, and agents are changing right now

Several providers are upgrading key building blocks of their AI systems. Anthropic describes Claude Mythos Preview as a model that finds security vulnerabilities far better than previous systems—and can reliably exploit them in tests. That’s why it’s initially meant to be used only within a small, controlled circle. Meta is introducing Muse Spark as the first model from its new Superintelligence Labs and is opting for a closed distribution. At the same time, GLM-5.1 shows that openly available models are catching up to leading systems in coding tasks.

Claude Mythos Preview and Project Glasswing

Anthropic positions Claude Mythos Preview as an especially capable model for cybersecurity. Those abilities come mainly from strong coding and automation features. As part of Project Glasswing, the model is intended for defensive use—helping teams audit software for vulnerabilities faster and prepare fixes. The program was announced on April 7, 2026, and includes partners from cloud, hardware, security, and open source.

What the model shows in practice

  • Lots of new vulnerabilities: According to Anthropic, the model discovers many previously unknown security holes in operating systems and browsers.
  • Concrete examples: This includes very old issues in OpenBSD and FFmpeg, as well as combined attacks targeting the Linux kernel.
  • High autonomy: Many of these results happen without direct human steering.

Benchmarks compared with Claude Opus 4.6

Anthropic publishes several metrics that illustrate how strong the model is in security and programming tasks. The numbers are meant as reference points within each respective test.

Benchmark Mythos Preview Opus 4.6
CyberGym (Vulnerability Reproduction) 83,1% 66,6%
SWE-bench Pro 77,8% 53,4%
Terminal-Bench 2.0 82,0% 65,4%
SWE-bench Multimodal (internal implementation) 59,0% 27,1%

Distribution, partners, and pricing

Project Glasswing is set up as a joint initiative with companies like AWS, Apple, Cisco, CrowdStrike, Google, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Anthropic is initially providing 100 million US dollars in usage credits. After that, pricing is listed at 25 US dollars per 1M input tokens and 125 US dollars per 1M output tokens. For an overview of partners, benchmarks, and goals, see Project Glasswing.

Meta Muse Spark as a closed model

Meta is introducing Muse Spark as the first model from its new Superintelligence Labs. Unlike earlier Llama models, the company is going with a closed system this time instead of releasing openly available weights.

  • Access: Use happens via apps and web interfaces, not via downloads.
  • Modes: There’s a fast option for simple tasks and stronger modes for more complex requests.
  • Efficiency: External tests show relatively low output volumes with solid performance. More details are available at Muse Spark on Artificial Analysis.

GLM-5.1 as an open alternative for coding

GLM-5.1 is an openly available model focused on programming and automated tasks. Thanks to its open license, you can run it locally, customize it, and integrate it into your own systems.

Performance in comparison

Benchmark comparisons show that the model can keep up with leading systems on coding tasks.

SWE-bench Pro Score
GLM-5.1 58,4
GPT-5.4 57,7
Claude Opus 4.6 57,3
Gemini 3.1 Pro 54,2

For many users, licensing and availability matter just as much as raw performance. The starting point for weights, documentation, and evaluation links is GLM-5.1 on Hugging Face.

Gemini gets new features for visualization and projects

Google is expanding Gemini with interactive visualizations that run directly in the chat interface. You can trigger them with the right prompts and use them to display complex content visually. Google shares examples and details in its post on interactive simulations and models in Gemini.

On top of that, Google is introducing “Notebooks.” They bundle chats, files, and instructions into a single workspace and are especially useful for longer projects. Google explains how they work in its post about Notebooks in Gemini.

New tools for video and avatars

Runway is integrating Seedance 2.0, a model for text-to-video and additional inputs like reference images. The generated clips are usually between five and fifteen seconds long and come with certain limits depending on how you use them. Details are in Creating with Seedance 2.0.

HeyGen is introducing Avatar V, a new generation of video avatars. Short recordings can be turned into stable, longer talking-head videos. More in the post Introducing Avatar V.

Changes to agents, pricing, and integrations

  • OpenAI is adding a new tier to the ChatGPT pricing structure with higher usage limits. Details are available at ChatGPT Pro.
  • Anthropic is expanding its offering with Claude Managed Agents for automated workflows. Overview: Managed Agents in the Claude Docs.
  • Using third-party agents is being separated more clearly from subscription plans.
  • Plaid is expanding its integration with Perplexity for financial analysis. More in the blog post Plaid and Perplexity.
  • Factory AI is launching a desktop app for parallel agent workflows. Details: Factory Desktop App.

Posted

in

by

Tags: