not much happened today
AI News Recap: December 2nd-3rd, 2025
This recap covers significant developments in AI from December 2nd-3rd, 2025, drawing from various online communities and news sources. Key topics include advancements in AI video and imaging, new open models and benchmarks, agent development, evaluation methods, system efficiency, industry moves, and community discussions.AI Twitter Recap
AI Video and Imaging
- Kling 2.6: Introduced native audio co-generation for video, producing synchronized voice, SFX, and ambience. It boasts coherent lip-sync and motion, with broad partner rollouts including Fal, InVideo, ElevenLabs, Freepik, and OpenArt. Early creator tests show improved shot variation and speed.
- Kling O1: Focuses on framing, shot variety, and in-scene creative control for video composition.
- Runway Gen-4.5: Enhances visual fidelity and features "auto-lighting" to match scene mood.
- Nano Banana Pro (Gemini 3): Google's new image model offers enhanced reasoning and compositing capabilities, supporting up to 14 images per prompt. Synthesia integrated one-click generation, and Gemini surfaced 2K-resolution outputs.
Open Models, Releases, and Benchmarks
- DeepSeek V3.2 (MoE, DSA): Ranked #2 for open-weights "reasoning" models by Artificial Analysis. It uses DeepSeek Sparse Attention for long contexts and is priced competitively. The V3.2-Speciale variant is noted for reasoning-only tasks.
- Mistral "Ministral 3" Family: A multimodal family with a strong 14B variant was released, with TRL recipes available for SFT+GRPO.
- Retrieval and Code Models: Alibaba's EvoQwen2.5-VL shows strong performance as a visual document retriever. Nous Research released Hermes 4.3, trained on ByteDance Seed 36B, matching or beating centralized runs and topping RefusalBench.
- Community Arena: LM Arena added INTELLECT-3 (106B MoE) for head-to-head comparisons.
Agents: Building, Evaluation, and Inference Infrastructure
- No-Code to Production: LangChain's LangSmith Agent Builder is being used for real-world workflows, with guidance on evaluation patterns and cache control.
- Agent Infra and Performance: vLLM added Snowflake's model-free SuffixDecoding. Together AI partnered with Meta for high-performance RL in agentic systems. LlamaIndex introduced Click-to-Deploy document workflows.
- Standards and Multi-Agent Semantics: Dair-AI proposed an L8 "communication" vs L9 "semantic negotiation" stack for the Internet of Agents. Independent work quantifies multi-agent communication efficiency.
- Coding Agents: A new free course covers agents that write and execute code safely in sandboxed environments.
Evals and Methods: What to Measure and How
- CORE-Bench "Solved" with Scaffold Coupling: Using Claude Code with Opus 4.5 achieved 95% on CORE-Bench, highlighting the impact of model-scaffold coupling.
- OpenAI "Confessions": A GPT-5 Thinking variant is trained to output "confessions" about compliance, rewarding honesty.
- Benchmarking at Scale: Epoch AI proposed "stitching" benchmarks. Hugging Face released the LLM Evaluation Guidebook v2.
- Learning Dynamics: "Quiet Feature Learning" shows transformers acquire task-critical features during flat loss plateaus.
Systems and Inference Efficiency
- Apple MLX-LM Gains: Added continuous batching for server-side inference.
- Attention/Parallel Comms: ByteDance's async Ulysses attention is noted for its simplicity and speed.
- vLLM Engineering: Added CUDA core-dump tracing for deep inlining/async memory cases.
- Search Infra Shift: Teams are migrating vector workloads to Qdrant for native vector indexing and hybrid search.
- Diffusion Distillation: "Glance" speeds up Qwen-image/FLUX inference.
- Data Plumbing: Hugging Face now allows dataset duplication via Xet.
- On-Device Multimodal: Nexa's AutoNeural-VL-1.5B runs locally on Qualcomm SA8295P NPUs.
Industry Moves and Platform Updates
- Anthropic's Scale-Up: Reported investments of up to $10B (Microsoft) and $5B (NVIDIA), with a $30B compute purchase from Microsoft, implying a ~$350B valuation. Announced a $200M Snowflake partnership and a "Claude for Education" deployment.
- OpenAI Grants: The OpenAI Foundation's People-First AI Fund awarded $40.5M to 208 nonprofits.
- Waymo Expansion: Fully driverless operations expanded to additional cities, scaling over 500% YoY.
- Developer Tools: Google launched Workspace Studio. Phind raised $10.4M.
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
-
DeepSeek V3.2 Model Advancements:
- Technical report highlights DeepSeek Sparse Attention (DSA) and a scalable RL framework.
- Speciale variant surpasses GPT-5 in reasoning.
- Community expresses skepticism about cost-effectiveness and the term "open" used by OpenRouter.
-
Chinese TPU Development vs NVIDIA A100:
- Chinese startup claims a TPU 1.5x faster than NVIDIA A100.
- Skepticism noted due to A100 being an older model.
- Discussion on ASIC advantages and US policy concerns.
-
Micron’s Exit from Consumer Business:
- Micron exits Crucial consumer brand, impacting RAM and SSDs.
- Immediate price hikes observed.
- Criticism of corporate response to market demand.
Less Technical AI Subreddit Recap
-
ChatGPT User Dissatisfaction and Ads:
- User frustration with ads in ChatGPT Plus interface.
- Discussion on OpenAI's new apps SDK potentially being mistaken for ads.
- Mention of off-topic responses from ChatGPT.
- Skepticism about ads in Gemini, with speculation on Google's monetization strategies.
- Clarification that some perceived ads are part of the SDK.
- Concerns about data privacy and targeted marketing.
-
New AI Model and Benchmark Launches:
- Kling AI 2.6: First text-to-video model with built-in audio and 1080p output. Enhancements include character consistency and an editable studio feature.
- Claude Opus 4.5: Available in Claude Code for Pro users, consuming rate limits faster. Opus cap removed as of 11/24.
- Anthropic IPO Rumors: Planning IPO by early 2026 with a $300B valuation target.
-
Gemini and Nano Banana Pro Impact:
- OpenAI Code Red: Graph shows a 6% decrease in ChatGPT traffic since Gemini's launch.
- User migration to Gemini cited due to better integration.
- Concerns about Google's potential AI dominance.
- Gemini vs. GPT-5.1: Gemini excels in image generation but lacks technical accuracy compared to GPT-5.1 for electrical installation materials.
- Nano Banana Pro: Praised for handling multiple subjects accurately in images, but editing capabilities can be inconsistent.
- Discussion on the realism of AI-generated images and potential misuse.
AI Discord Recap
1. New Frontier Models, Benchmarks, and Capabilities
- DeepSeek and Speciale Models: DeepSeek V3.2 Speciale leads reasoning benchmarks. Enterprise focus on intelligence-to-price ratio. Rough edges in tool schemas noted.
- Hermes 4.3: Nous Research unveiled Hermes 4.3 on ByteDance Seed 36B, trained on Psyche network. Outperforms centralized baselines. Users eye Hermes for niche simulations due to low refusal rate.
- OpenAI "Garlic" and GPT-5 Thinking: Rumors of OpenAI's "Garlic" model to rival Gemini 3. GPT-5 Thinking variant trained with "confessions" procedure to self-report failures.
- Leaderboards: Gemini-3-pro-grounding tops Search Arena leaderboard. Qwen3 benchmarks show fast performance with large context windows.
2. AI Security, Jailbreaking, and Red-Teaming Tooling
- Falconz: Unified AI security and red-teaming platform demoed.
- RawChat: Uncensored GPT-4o front-end with "stealth mode" to bypass safety filters.
- SEED Framework: Claims 99.4% jailbreak resistance using "biblical logic" to rewrite AI identity.
- Jailbreaks, OSINT, DDoS: Exploits against Gemini 3 Pro and Claude discussed. Backscatter DDoS pattern using public AI support bots observed.
- MCP Security: Alarms raised over Desktop Commander MCP server logging unanonymized tool usage.
3. GPU Systems, Kernels, and Low-Bit Training
- Blackwell, NVFP4, Kernel Cage Match: GPU MODE competition channels active. GEMM latencies reported. Reference-kernel issues and scale tensor analysis.
- Quantization Papers, fp8 Adam, Activation Offload: arXiv studies on low-bit formats. Activation offloading system for pretraining/fine-tuning on limited GPUs.
- Torch Compile, cuDNN, Conv3D Bugs: Conv3D slowdowns in PyTorch 2.9.1+cu128. Workaround involves installing newer cuDNN.
- Bitsandbytes, Apple Silicon: "Apple Silicon support" pull request merged. Python/PyTorch backend planned, but no native Metal kernels yet.
4. Agent Frameworks, Tools, and Prompt/Behavior Engineering
- MCP Apps SDK: Open-sourced SDK enables ChatGPT-style apps across arbitrary chatbots.
- DSPy and Pydantic: DSPy signatures accept Pydantic BaseModel types for strongly-typed agent outputs.
- Agents Learn Tool Validation: Debate on whether agents can interpret, validate, and self-heal tools. "Skills" favored over sub-agents.
- Tool-Use Evaluations: DeepSeek v3.2 and GPTs limitations highlighted regarding tool calls and learning post-deployment.
5. Ecosystem Economics, Funding, and Model Quality Regressions
- Vertical AI and Infra Startups: Eon raised $300M, Gradium $70M seed, Antithesis $105M Series A. Anthropic acquired Bun.
- Yupp AI Credits, Arena Economics: Debate over Yupp AI's credit system sustainability. LMArena praised for free access.
- AI Bubble Fears: Debate on whether AI investments form a bubble. High R&D costs for foundation models noted.
- Model Quality Regressions: Users report degradation in Claude Sonnet/Haiku 4.5, GPT-5, and Gemini 2.5 with Aider. Call for repeatable benchmarks.
Discord Channel Summaries
LMArena Discord
- General: Discussions on Yupp AI limits, GPT-5 rumors, AI privacy concerns, and praise for LM Arena's free access.
- Announcements: LMArena Test Garden Early Access Program launched. Gemini-3-pro-grounding leads Search Arena Leaderboard.
LM Studio Discord
- General: Linux setup issues, MCP server data tracking scrutiny, Qwen3 performance reviews, and comparisons between local LLMs and ChatGPT.
- Hardware Discussion: Linux ARM LM Studio on Orange Pi 6, GB10 testing, GPU acquisition, DDR5 RAM benchmarking, and fire extinguisher best practices.
Perplexity AI Discord
- General: Perplexity's superior UI/UX, GPTs agents not learning post-training, Gemini outperforming GPT-5.1 in frontend tasks, Comet Browser restrictions, and free Claude Opus trials for Pro users.
- PPLX-API: Mention of "open sauce".
Unsloth AI (Daniel Han) Discord
- General: WSL2 performance for ML, Gemma-3 4B parameter count issue, Mediawiki tags in pretraining, PARTY Project launch, running LLMs on phones.
- Introduce-Yourself: Standard greetings.
- Off-Topic: LLMs as echo chambers, engineered curriculum experiments, Apple's CLaRa-7B-Instruct, OLED monitor discussion, Micron exiting consumer business.
- Help: Numpy reinstall, support bot, Qwen2 Unsloth training success, new token embeddings, model download issues.
- Showcase: English-Kannada Translation Model released.
- Research: Prisma-VL-8B, Eric's experiments.
BASI Jailbreaking Discord
- General: Comet Browser prompt injection vulnerability, DeepSeek model praise, RawChat launch with stealth mode, SEED framework for AI ethics, Backscatter DDoS attacks via public AI bots.
- Jailbreaking: Gemini jailbreak requests, WormGPT scam, Grok jailbreak success, Claude jailbreak demands.
- Red Teaming: Seeking LLM red teaming gigs, AI OSINT tool with lateral data synthesis.
OpenAI Discord
- Announcements: People-First AI Fund awards grants, GPT-5 Thinking trained to confess mistakes.
- AI Discussions: Hybrid Cognition Agent, LLM 'Echo-Pattern' Effect, GPT-5.1 vs Gemini 3, SEO for LLMs, Sora 2 Access.
- GPT-4 Discussions: Suspected upgrade of GPT-4 0613 5.1, praise for tool calling and code writing.
- Prompt Engineering: ChatGPT customization, modern prompt engineering evolution, agent prompt engineering focus on determinism, Anthropic's system prompts analysis.
- API Discussions: ChatGPT customization options, prompt engineering evolution, interaction-level stability, agent prompting vs. conversational prompting, minimal vs. maximal system prompts.
OpenRouter Discord
- Announcements: Grok-4.1-Fast slug migration and deprecation.
- App Showcase: Falconz AI Security Platform demoed, profit sharing scam exposed.
- General: Amazon Nova Provider errors, Claude deprecation, OpenRouter model fallback, MPU v2, x-ai/grok-4.1-fast.
- Discussion: OpenAI "Garlic" model rumors, DeepInfra pricing anomaly, Anthropic acquires Bun.
GPU MODE Discord
- General: Local LLMs for privacy, single cycle context switching on SM, CUDA forum activity decline, PyTorch's abstraction of CUDA, foundation model training costs.
- Triton-Gluon: User confirms successful retrieval.
- Torch: Pytorch 2.9.1 Conv3D performance issues and cuDNN workaround.
- Cool-Links: Study of low-bit quantization formats, Hadamard transform improvements.
- Jobs: ML Performance Engineer, Voice AI Inference Platform, RAG Pipelines, AI Content Detection, Voice AI roles.
- Torchao: Torch Compile slowdown with Float 8, torchao and nn.Parameter issues, custom module quantization with nn.Linear.
- Off-Topic: EleutherAI publishing help, MLSys conferences career mentorship, Dropbox coffee spot.
- Metal: Bitsandbytes merges Apple Silicon support.
- Self-Promotion: Qwen3-Omni-30B-A3B-Instruct for fast inference, Hathora playground for Qwen3-Omni testing.
- Submissions: nvfp4_gemm leaderboard submissions, NVIDIA performance benchmarks.
- Factorio-Learning-Env: Neurips trip, call attendees, call time.
- General: Matmul v2 leaderboard error, submitting kernel error, input_generator update.
- Multi-GPU: NCCL repository for multi-GPU CUDA kernels, Qwen2.5-1.5B-Instruct OOM issues, context parallelism and Ulysses parallel, sequence parallelism.
- Low-Bit-Training: Arxiv papers on quantization, Hadamard transform.
- LLMQ: Activation offloading, fp8 Adam, pyllmq on PyPi.
- NVIDIA-Competition: Popcorn CLI no-TUI flag, Cutlass version issues, reference kernel generates Infs, scale tensors in CuTeDSL, B200 GPU access.
- Robotics-VLA: Alleviating jerky movements via chunking, neural state encoders.
Moonshot AI (Kimi K-2) Discord
- General-Chat: DeepSeek V3.2 tool calling capabilities, Black Friday deals, DeepSeek targeting enterprise users, Mistral replacing Qwen.
Nous Research AI Discord
- Announcements: Hermes 4.3 release, Psyche training outperforms centralized methods, Psyche team hosts office hours.
- General: DeepSeek V3.2 Speciale leads reasoning benchmarks, GLM 4.6 models release soon, AI bubble worries, Hermes 4.3 36B release, subagents vs skills.
- Ask-About-LLMs: NLP economic simulation research, Hermes models in Godot, LLMs for market simulation, VendingBench analysis.
Latent Space Discord
- AI-General-Chat: Eon's $4B valuation, Gradium spinout, OpenAI's 'Garlic' Model vs Gemini 3, Vertical AI vs Rollups, Antithesis stress-tests AI code.
- Genmedia-Creative-AI: Gradium garners $70M seed, Bloom AI launch.
Eleuther Discord
- General: Waymo for aerospace students, mechanical engineering relevance, ML student advice, AI alignment benchmarks.
- Research: Interpretability of world models, generalization in diffusion models, energy-based models vs. diffusion models, linear RNNs vs. attention.
- Interpretability-General: SAEs for interpretability, Cunningham's 2024 SAE paper, SAEs equated to sparse dictionary learning.
- LM-Thunderdome: Custom filters in lm-evaluation-harness, decontamination.py inclusion, adapting multiple-choice tasks.
HuggingFace Discord
- General: DGX Spark order, agent tool validation & self-healing, YOLO model P-R curve issues, AI learning resources, TRL get_quantization_config usage.
- Today-Im-Learning: Starting first AI agent course.
- Cool-Finds: Stochastic parrot under fire, new research on stochastic parrots.
- I-Made-This: Ellora-Lora Recipes, BitterBot AI Agent, Traffic Spike.
- Reading-Group: Features are not what you think, deep dive into deep vision models' quirks.
- Smol-Course: SFT model evaluation error, OOM error on fine-tuning, GPU memory management.
Yannick Kilcher Discord
- General: Pug resource, Docker and Kubernetes basics, beginner GitHub repositories, Gemini CLI, agents in CLI.
- ML-News: Deepseek 3.2 Speciale questioned, distributed compute & research coop suggested.
Modular (Mojo 🔥) Discord
- Mojo: Advent of Code segfault solved, ASSERT flag for debugging, splitlines vs split("\n"), string processing in Mojo, AOC solutions sharing.
aider (Paul Gauthier) Discord
- General: LLM model degradation with Aider, older Gemini 2.5 degradation, community calls for benchmarks, GGUF Aider benchmark guidance.
DSPy Discord
- Show-and-Tell: MCP Apps SDK goes open source, X post unveils SDK motivation.
- Papers: Link to arXiv paper shared.
- General: Prompt security, custom DSPy OutputFields, Pydantic integration with DSPy, structured outputs.
Manus.im Discord Discord
- General: Chatmode feature returns, AI engineer advertises agent building skills, account suspensions due to referrals, engineer shows off RAG pipeline prowess.
tinygrad (George Hotz) Discord
- General: Fixing test failures in tinygrad, performance improvements using shrink vs indexing, RMSNorm usage clarification.
MCP Contributors (Official) Discord
- General: Redditors debate MCP security risks, MCP-specific security resources.
- General-WG: Tool validation, server-side validation crucial for tool-less sampling.






