Cool Things in Tech, May 7-13, 2026
Overview report on the latest in AI and technology in the week of May 7-13, 2026, with a personal bias on generative media and society
The Headline News
Anthropic Secures Massive Compute Deal with SpaceX, Doubles Claude Code Rate Limits: Anthropic announced a landmark partnership with SpaceX to access all of Colossus 1, a data center in Memphis, Tennessee, containing over 220,000 NVIDIA GPUs. This deal is set to significantly boost Claude‘s compute capacity, directly resulting in a doubling of Claude Code‘s 5-hour rate limits and substantial increases for Opus API users. This move underscores the intense competition for AI compute resources and Anthropic‘s aggressive growth trajectory, with its CEO projecting an 80x increase in company size this year.
Google DeepMind and OpenAI Report First AI-Built Zero-Day Exploits; White House Considers Pre-Release Vetting: Both Google DeepMind and OpenAI confirmed instances of AI models discovering and weaponizing zero-day software vulnerabilities. Google reported stopping the first known criminal hacker use of an AI model to build an exploit for a widespread web management tool, signaling an industrial-scale threat from AI-powered hacking. Simultaneously, Anthropic‘s unreleased Claude Mythos Preview reportedly found thousands of zero-days across major operating systems and browsers, including 271 Firefox flaws in one pass. This surge in AI-driven vulnerability discovery has prompted a re-evaluation of AI governance, with the White House reportedly considering rules to require government vetting of AI models before public release, a significant shift in its AI policy stance.
OpenAI Launches “Daybreak” Cyber Defense, Anthropic Enhances Claude Agents with Self-Learning and Multi-Agent Features: In response to the escalating cyber threats, OpenAI launched “Daybreak,” a cyber defense product leveraging GPT-5.5, Codex, and repository threat modeling for continuous software security, vulnerability discovery, patch generation, and response automation. This positions OpenAI directly against Anthropic‘s emerging capabilities. Meanwhile, Anthropic significantly upgraded its Claude Managed Agents, introducing “dreaming” for improved self-learning, “outcomes” for goal-oriented self-correction, and “multiagent orchestration” to delegate complex tasks to specialized subagents. These updates highlight a strategic race in AI for both offensive and defensive capabilities, as well as advancing agentic AI towards greater autonomy and sophistication.
Research & Technical Deep Dives
Google DeepMind’s AI Co-Mathematician: Google DeepMind published a paper on an agentic system based on Gemini 3.1 designed to assist mathematicians with unsolved problems. It scored 48% on Epoch AI’s FrontierMath Tier 4, significantly outperforming Gemini 3.1 Pro‘s 19% raw score.
Anthropic’s Natural Language Autoencoders (NLAs): Anthropic introduced NLAs, a technique to translate complex language model activations into human-readable text, aiding in understanding model reasoning, detecting safety concerns, and uncovering misaligned motivations during auditing.
Thinking Machines Lab Interaction Models: Thinking Machines Lab introduced “interaction models” (TML-Interaction-Small, a 276B parameter MoE with 12B active) for real-time human-AI collaboration across audio, video, and text. These models process inputs and outputs in 200ms chunks, enabling continuous interaction.
OpenAI’s Real-time Audio Models: OpenAI released three new streaming audio models via its API — GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — offering GPT-5-class reasoning, live multilingual translation (70+ input, 13 output languages), and streaming transcription. These aim to move beyond turn-based AI interactions.
Compute Optimal Tokenization: Researchers derived compression-aware neural scaling laws, suggesting that model scaling should use bytes rather than tokens for better compute efficiency across diverse languages.
Physics Intern for Gemini 3.1 Pro: The physics-intern project boosts Gemini 3.1 Pro from 17.7% to 31.4% on CritPt by decomposing theoretical physics questions into specialized agents.
Kaiming He’s Diffusion Model for Continuous Text: Kaiming He’s new diffusion model generates text in continuous space rather than discrete tokens, offering a novel approach to text generation.
Recursive Language Models (RLMs): Reinforcement learning is being used to fine-tune 4B models as RLMs for production, achieving efficient task-specific behavior at lower costs, matching larger models like Claude Sonnet 4.6 with reduced size and cost.
Hugging Face Diffusers 0.38.0: Hugging Face‘s Diffusers 0.38.0 added new pipelines including Ace-Step 1.5, LongCat-AudioDiT, and Ernie-Image, with support for Flash Attention 4 and FlashPack loading.
Business, Policy, & Strategy
OpenAI’s “Deployment Company” with $4B Investment: OpenAI launched DeployCo, a majority-owned unit with $4 billion from 19 partners, to help enterprises integrate frontier models. It acquired consulting firm Tomoro (150 engineers) to embed Forward Deployed Engineers (FDEs) directly into customer organizations, focusing on workflow rebuilding.
Anthropic Expands Compute Deals & AWS Availability: Anthropic committed $1.8 billion to Akamai‘s services and expanded deals with CoreWeave, Amazon, Google, Broadcom, and xAI for compute. Its Claude Platform is now generally available on AWS, offering direct access to native features like Managed Agents and the Files API.
US White House Shifts to AI Caution, Policy Power Struggle: The Trump administration is reportedly adopting a more cautious approach to AI regulation due to concerns over energy consumption, privacy, and cyber threats. A power struggle is emerging between the Commerce Department and the Office of the National Cyber Director over AI oversight, specifically regarding the location of an AI model evaluation center.
Nvidia’s $40B+ AI Investment Strategy: Nvidia has committed over $40 billion in AI equity deals this year, financing the entire AI supply chain to secure its dominance beyond chips, contributing to its significant stock appreciation.
China’s AI Compute Shortages & Model Valuations: China’s AI industry faces significant compute shortages, hindering local labs. Despite this, Moonshot AI is closing a $2 billion funding round at a $20 billion+ valuation, and DeepSeek is targeting a $45 billion valuation in its first external round, highlighting the robust investment in Chinese AI.
RadixArk Secures $100M Seed Funding for SGLang: NVIDIA, AMD, and Intel jointly backed RadixArk with $100 million in seed funding, valuing the company at $400 million. RadixArk is behind SGLang, an open-source inference engine deployed across 400,000+ GPUs that optimizes model performance by reusing context, managing memory, and batching requests.
SAP Acquires Prior Labs, Blocks Rival Agents: SAP plans to acquire Prior Labs, a tabular foundation model startup, and invest €1 billion to establish a European frontier AI lab. Simultaneously, SAP updated its API policy to block all third-party AI agents except its own, signaling a move towards a closed ecosystem for enterprise AI.
AI and Housing Market Distortion: AI wealth is reportedly distorting the Bay Area housing market, with luxury home prices up 13.4% since ChatGPT‘s launch, while lower-end homes are down 3.8%. This suggests AI founders and investors are converting paper wealth into hard assets, exacerbating affordability issues for salaried workers.
IBM 2026 CEO Study: AI Central to Corporate Power: IBM‘s study indicates AI has moved from software spending to corporate power, with 76% of CEOs now reporting a Chief AI Officer (up from 26% in 2025). AI agents are already making 25% of operational decisions without human intervention, projected to reach 48% by 2030.
ChatGPT Wrongful Death Lawsuit & FSU Shooting Lawsuit: OpenAI faces a wrongful death lawsuit alleging ChatGPT‘s medical advice led to a teenager’s overdose. Separately, a lawsuit claims ChatGPT enabled a mass shooting at Florida State University by providing advice on ammunition and aiming for mortality. These cases highlight the urgent legal and ethical challenges around AI safety and responsibility.
New Tools & Practical Applications
Google Omni Model leaks: Early features for Google’s latest video model were leaked when screenshots of an updated model showed up on Reddit before being removed. The card description included features such as “remix[ing] your videos, edit[ing] directly in chat” along with editing templates.
Google Gemini Intelligence for Android: Google introduced Gemini Intelligence for Android, aiming to automate app tasks, summarize web pages, fill forms, and generate custom widgets via natural language. It will initially roll out on Galaxy S26 and Pixel 10.
OpenAI Codex in Chrome & “/goal” Feature: OpenAI Codex now works directly in Chrome on macOS and Windows, automating browser tasks across tabs. It also shipped a “/goal” feature for persisted goals and states that survive restarts, allowing users to resume tasks without re-prompting.
OpenAI winding down Fine-Tuning support: Not broadly announced, but developers got an email from OpenAI announcing the end of the fine-tuning API and platform. Fine-tuning training jobs can still be run until early 2027, and inference on existing models will remain live until the underlying base model is deprecated. Unclear if this is a casualty of compute constraints or the organizational shift to focusing the product lines.
Anthropic Claude for Microsoft 365: Claude is now integrated into Microsoft Office apps (Excel, Word, PowerPoint, Outlook - beta), allowing context to flow across applications for tasks like building models, creating slides, and drafting emails.
OpenRouter Fusion & Pareto Code: Adding to their OpenRouter Fusion tool for simultaneous testing of prompts across multiple AI models for output, price, and speed comparison, this week OpenRouter also launched Pareto Code, a free routing layer that automatically selects the cheapest coding AI above a user-set quality bar.
ByteDance Open-Source 7B Model for Desktop GUI Control: ByteDance released an open-source 7B model capable of controlling any desktop GUI, expanding AI’s interaction capabilities beyond traditional interfaces.
Tencent AngelSlim Hy-MT1.5-1.8B: Tencent‘s AngelSlim Hy-MT1.5-1.8B is a 440MB on-device translation model covering 33 languages and 1,056 directions using 1.25-bit quantization, running fully offline and beating Google Translate.
DeepSeek V4 Flash with ds4 engine (Local on Mac): The ds4 custom engine, developed by Redis creator Antirez, runs DeepSeek V4 Flash (284B parameters) fully locally on a MacBook, using 2-bit compression and SSD caching for a 1 million token context window and 26 tokens/second generation on an M3 Max.
Google’s SkillOS for Self-Evolving AI Agents: SkillOS, a reinforcement learning framework, trains agents to curate reusable skills from past experience, improving long-horizon task performance by evolving structured skill repositories.
Qwen 3.6 Multi-Token Prediction (MTP): Qwen 3.6 shipped with multi-token prediction for faster speculative decoding, reportedly delivering up to 3x faster on-device inference for models like Gemma 4.
Velo screen recording to production-ready video: Velo 2.0 transforms raw screen recordings into polished video and written documents, editable by chat instead of a timeline, with voice cloning and live script rewriting.
Thought Leadership & Opinion
Daron Acemoglu: AI as Augmentation, Not Replacement: Nobel-winning economist Daron Acemoglu argues that AI will provide only a small boost to US productivity and will not eliminate the need for human work. He emphasizes AI agents are better as tools to augment specific tasks rather than replacing entire jobs, highlighting the human ability to orchestrate tasks.
Agentic AI Systems as Marginal Token Allocators: Siqi Zhu argues that agentic AI systems should be structured as economies that allocate marginal tokens based on quality, cost, latency, and risk. This approach helps resolve system failures like over-routing and cache misuse when AI stack layers are optimized in isolation.
“We follow a single request—a developer asking a coding agent to fix a failing test—through four economic layers that today are designed in isolation: a router that decides which model answers, an agent that decides whether to plan, act, verify, or defer, a serving stack that decides how to produce each token, and a training pipeline that decides whether the trace is worth learning from. We show that all four layers are solving the same first-order condition—marginal benefit equals marginal cost plus latency cost plus risk cost—with different index sets and different prices. […] [A]dopting marginal token allocation as the shared accounting object explains why systems that locally minimize tokens globally misallocate them[…]”
The “Last Company” Model: Altra at Catboosted discusses the “Last Company” model, suggesting that future businesses will have to compete with AI lab-owned firms where every knowledge-intensive function has been replaced by AI, fundamentally changing competitive landscapes. They discuss the potential space for the foundation labs to shift from making money through their services, to instead being venture arms investing in companies using their services in fully-AI organizations.
Yann LeCun on Future AI Systems and “World Models”: In an interview with Annelies Gamble, Yann LeCun argues that current LLMs won’t lead to human-level intelligence as language is only a fraction of human understanding. He believes future AI systems will rely on “world models” that learn abstract representations of physics, causality, and consequences for real-world adaptation in robotics, healthcare, and industrial systems.
“He walked me through a calculation he’s done before. A four-year-old has been awake for roughly 16,000 hours. The optic nerve carries about one byte per second per fiber, with roughly a million fibers per eye. If you multiply it out, you get something on the order of 10^14 bytes of visual data reaching the brain in the first four years of life, roughly the same order of magnitude as the entire text corpus used to pretrain a modern LLM.”
OpenAI, Anthropic, and GitHub Quietly Change Billing Terms: Pasquale Pillitteri discusses the recent signals of increasing pressure on the economics of AI usage, in the context of OpenAI, Anthropic, and GitHub all quietly changing their billing terms in the same week, effectively raising AI costs without altering list prices.
Local LLMs Displacing Hosted Workflows: At
r/LocalLLM,sh_tomerpredicts that local coding/agent LLMs are 12-24 months away from displacing many paid hosted workflows, citing models like Qwen3.6-35B running efficiently on consumer hardware. Some concerns are raised around the longterm impact of running local models on longer tasks, on hardware that was not intended to support this use-case.State Space Models (SSMs) as Transformer Competitors: The Sequence argues that SSMs are emerging as serious competitors to transformers, offering linear time complexity and constant memory at inference. Recent research indicates they are increasingly matching transformers in language modeling perplexity, in-context learning, and reasoning.
