• AI KATANA
  • Posts
  • OpenAI is launching an AI agent that can do work for people

OpenAI is launching an AI agent that can do work for people

Also: New math benchmark challenges AI models and experts alike

Good morning! Today’s edition delves into new advancements and strategic shifts in AI, from OpenAI’s upcoming “Operator” agent to Japan’s largest AI factory collaboration led by Nvidia. AI capabilities continue to expand with tools like AlphaFold3 going open source and Google Photos introducing AI-driven video editing features. Meanwhile, companies like AMD and CoreWeave are making strategic moves to stay competitive in the AI landscape, reflecting the fast-paced growth in demand for AI-driven infrastructure. Here’s what’s shaping the future of AI today:

Sliced just for you:

  • 🤖 OpenAI to launch autonomous “Operator” AI agent for diverse tasks

  • 🇯🇵 Nvidia partners with Japan’s cloud giants for the largest AI factory

  • 📐 New math benchmark challenges AI models and experts alike

  • 🏛️ The appeal and complexity of “sovereign AI” initiatives worldwide

  • 💼 AMD cuts 4% of workforce to pivot toward AI chip production

  • 📰 Wall Street Journal tests AI-generated article summaries

OpenAI is set to introduce a new AI agent called “Operator” in January, designed to autonomously perform various tasks for users, including coding and travel booking. Initially, Operator will be available as a research preview and accessible through an API, allowing developers to explore its capabilities. Additionally, OpenAI is working on tools that enable Operator to handle tasks in a web browser, with one such tool nearing completion. This move aligns with Microsoft’s recent introduction of autonomous agents in Copilot Studio, where users can create agents to automate specific workflows. Key figures in AI, such as Nvidia’s Jensen Huang and Meta’s Mark Zuckerberg, envision a future where personalized AI assistants become ubiquitous in business and consumer settings. Meanwhile, OpenAI continues to enhance its ChatGPT, which can now conduct web searches, offering users a seamless way to obtain relevant information in a conversational format.

Japan’s AI landscape is advancing rapidly with the formation of its largest AI infrastructure project, a partnership between Nvidia and major cloud providers, including SoftBank, GMO Internet Group, Highreso, KDDI, Rutilea, and SAKURA Internet. Supported by Japan’s Ministry of Economy, Trade, and Industry (METI), this initiative will establish Nvidia-powered AI data centers across the country. Each partner contributes unique infrastructure tailored to high-performance applications, from generative AI and digital twins to autonomous systems and advanced language models. Key deployments include SoftBank’s DGX SuperPOD, GMO’s GPU Cloud, and SAKURA Internet’s renewable-energy-powered Ishikari data center, slated to run entirely on sustainable energy by 2027. Nvidia CEO Jensen Huang emphasized AI’s transformative potential across industries, suggesting that this initiative will propel Japan into a leadership role in robotics, automotive, and healthcare AI applications, positioning the nation at the forefront of an AI-driven industrial transformation.

Epoch AI’s newly launched FrontierMath benchmark is testing the limits of top AI models and expert mathematicians alike with a secretive set of high-difficulty math problems, achieving less than a 2% success rate among AI models. Unlike public math benchmarks, FrontierMath’s problem set is private to prevent model training on known solutions, aiming to assess AI’s true reasoning capabilities. Developed in collaboration with over 60 mathematicians and reviewed by Fields Medalists, the benchmark includes complex questions across areas like computational number theory and abstract algebraic geometry. Experts note that while traditional competitions like the International Mathematical Olympiad focus on creative insight, FrontierMath embraces complex implementations, making it challenging for both AI and human experts. Epoch AI plans regular evaluations using this benchmark, with ongoing additions to its problem set to drive further research and development in mathematical reasoning for AI.

Countries are increasingly investing in “sovereign AI” initiatives, creating domestic AI systems that reflect national culture, values, and strategic goals while enhancing economic resilience and security. Denmark, Italy, Sweden, the UAE, and India are among those establishing their own AI supercomputers and research frameworks to harness and secure local data. This trend stems from a desire to avoid dependence on foreign AI models that may not align with a country’s cultural or political priorities. However, the pursuit of sovereign AI raises potential challenges, including high costs and geopolitical risks, as countries with limited resources struggle to keep up. The drive for national AI sovereignty also risks fragmenting global cooperation, which could hinder transparency, security, and equal access to AI advancements. To counter these divides, a Global AI Compact has been proposed, envisioning a collaborative model to ensure equitable AI resources across nations, drawing parallels to the global need for universal access to electricity.

AMD has announced a reduction of 4% in its global workforce, equivalent to about 1,000 positions, as it shifts focus towards AI chip development to compete with industry leader Nvidia. This strategic move is driven by rising demand for AI chips in data centers, where AMD has seen a substantial increase in revenue, especially in its data center unit, which more than doubled in the recent quarter. The company plans to accelerate production of its new MI325X AI chip, aimed at large-scale AI processing needs. Despite this growth in data center revenue, other segments, such as gaming, have seen significant declines, prompting AMD to redirect resources and increase research and development investments by nearly 9%. As part of these adjustments, AMD aims to position itself as a strong competitor in the high-demand AI chip market, aligning with trends toward AI-driven infrastructure expansion among technology giants like Microsoft.

The Wall Street Journal is piloting AI-generated article summaries that appear in a “Key Points” box at the top of certain stories, providing a quick, editor-verified overview for readers. This feature, designed to enhance value for subscribers, is part of an A/B test to gauge reader interest and engagement with AI-driven content. The summaries, clearly labeled as AI-generated, underscore the Journal’s transparency about AI usage. This experiment aligns with broader trends, as other news platforms like USA Today and apps like Particle have implemented similar AI summarization features to streamline news consumption.

🛠️ AI tools updates

AlphaFold3, DeepMind’s AI tool for predicting protein structures, is now open-source for non-commercial use, allowing academics to download its code. This release marks a shift after initial restrictions that limited access to a web server, which prevented researchers from exploring drug interaction predictions. AlphaFold3 expands on previous models by simulating protein interactions with DNA and other molecules, and it now allows academic researchers to request access to training weights. The release addresses criticism from scientists who argued that withholding the code hindered reproducibility. As AlphaFold3’s open-source code becomes accessible, similar models from Baidu, ByteDance, and others have emerged, though they remain limited to non-commercial uses. Future developments, such as Columbia University’s planned OpenFold3, aim to provide fully open-source versions that can be retrained with proprietary data for broader applications, including drug discovery.

Google Photos has introduced AI-enhanced video editing tools on Android, with features like improved trimming, auto-enhancement, and speed control, enabling users to edit videos directly on their devices with minimal effort. The auto-enhance tool adjusts color and stabilizes video, while the speed tool lets users modify playback speed for select sections. AI-powered video presets, which will soon be available on both Android and iOS, offer quick editing effects tailored to video content, like slow motion and zoom, with Google’s AI automatically selecting optimal moments for each effect. These updates aim to simplify video editing, providing easy access to polished video results.

💵 Venture Capital updates

CoreWeave, an AI-focused cloud infrastructure startup, has completed a $650 million secondary sale, valuing the company at $23 billion. The investment round, led by Jane Street, Magnetar, Fidelity Management, and Macquarie Capital, reflects growing investor interest in AI-driven cloud services. The company’s valuation rose from $19 billion in May after a $1.1 billion Series C round led by Coatue. CoreWeave’s rapid growth has been propelled by heightened demand for specialized AI cloud services fueled by tools like OpenAI’s ChatGPT. Offering infrastructure based on high-performance AI chips—mainly supplied by Nvidia—CoreWeave competes with tech giants like Microsoft’s Azure and Amazon’s AWS. The funding surge in AI and cloud services marks a significant rebound, with investments in the sector across the U.S., Europe, and Israel expected to reach $79.2 billion this year.

🫡 Meme of the day

⭐️ Generative AI image of the day