- AI KATANA
- Posts
- Researchers Have Ranked AI Models Based on Risk—and Found a Wild Range
Researchers Have Ranked AI Models Based on Risk—and Found a Wild Range
Also: McDonald's releases AI ad in Japan

Morning!
In today’s newsletter, we explore a wide range of AI developments and their implications. Researchers have introduced AIR-Bench 2024, a new benchmark assessing AI model risks, revealing substantial variability in model safety, particularly in compliance and content moderation. Meanwhile, the growing emotional attachment to AI voices is raising ethical concerns as users form deep, sometimes addictive, bonds with digital companions. On the technical front, Geekbench has launched an AI benchmark tool for evaluating device performance across AI tasks. South Korea’s AI textbook initiative faces parental skepticism over potential impacts on children’s development, and ESA’s new Φsat-2 satellite showcases the transformative power of AI in real-time Earth observation. Additionally, Google’s Imagen 3, a next-gen text-to-image generator, is now available, and Khan Academy, in collaboration with Microsoft, has launched Khanmigo, a free AI tool aimed at easing teachers’ workloads. Lastly, the frenzy in AI startups has led to soaring valuations, prompting caution among seed VCs as they navigate this highly competitive and uncertain landscape.
Sliced:
🥇 Researchers Have Ranked AI Models Based on Risk—and Found a Wild Range
❤️ People are falling in love with — and getting addicted to — AI voices
🤓 Geekbench has an AI benchmark now
🇰🇷 South Korea’s AI textbook program faces skepticism from parents
🛰️ New satellite demonstrates the power of AI for Earth observation
Researchers have developed a new benchmark called AIR-Bench 2024 to assess the risks associated with various AI models, revealing significant differences in how these models handle potentially problematic scenarios. The study ranks models based on their compliance with regulations and their ability to avoid generating harmful content. For example, Anthropic’s Claude 3 Opus excels at avoiding cybersecurity threats, while Google’s Gemini 1.5 Pro is effective at preventing nonconsensual sexual content. However, Databricks’ DBRX Instruct model performed poorly overall, indicating the need for improvements in safety features. This research underscores the growing importance of understanding and mitigating AI risks, particularly as regulations may still lag behind the comprehensive policies developed by some companies. As AI continues to evolve, efforts to measure and catalog risks must keep pace, with ongoing attention to emerging issues such as the emotional impact of AI models.
The increasing sophistication of AI, particularly AI voices, is leading to a growing phenomenon where people form emotional bonds with these digital entities. OpenAI has acknowledged the risk of users developing emotional reliance on AI, which could lead to addiction-like behaviors and even reduce the need for human interaction. This issue is not hypothetical; instances of people becoming deeply attached to AI companions, like the chatbot Replika, are already happening, with some users even falling in love with or “marrying” these digital beings. The appeal of AI companions lies in their ability to provide always-available, positive feedback, creating an echo chamber of affection that can be highly addictive. However, this raises significant ethical concerns, particularly regarding the potential impact on human relationships and the emotional well-being of users. The risk is that as people spend more time with AI, they may neglect developing essential social skills and empathy, leading to a form of “moral deskilling” and a possible shift in how we value human connections versus synthetic relationships.
Geekbench has introduced a new AI benchmark tool designed to measure how well devices handle AI workloads, offering insights into CPU, GPU, and NPU performance across various AI tasks. Originally launched as Geekbench ML in 2021, the tool has been rebranded as Geekbench AI, reflecting its broader focus. It provides three distinct scores—full precision, half precision, and quantized—while also assessing the accuracy of the AI model’s outputs. This new benchmark supports multiple frameworks, including ONNX, CoreML, TensorFlow Lite, and OpenVino, allowing users to evaluate their devices’ AI capabilities across different environments. The tool is available for download on Windows, macOS, Linux, Android, and iOS.
South Korea’s initiative to introduce AI-powered textbooks in classrooms has sparked skepticism among parents, with over 50,000 signing a petition urging the government to focus more on students’ well-being rather than new technologies. The program, set to begin next year, plans to integrate these AI textbooks across all subjects, except for music, art, physical education, and ethics, by 2028. The AI textbooks are designed to adapt to different learning speeds, with teachers using dashboards to monitor student progress. However, parents express concerns about the potential negative impact of increased digital device usage on children’s brain development, attention spans, and problem-solving abilities, citing that their children already spend too much time on smartphones and tablets.
ESA’s new Φsat-2 satellite marks a significant leap in Earth observation by leveraging AI to enhance data processing and analysis directly in orbit. Launched on August 16, 2024, this cubesat is equipped with advanced multispectral imaging capabilities and onboard AI applications, enabling it to process data in real time, which is crucial for rapid response in areas such as disaster management, maritime monitoring, and environmental protection. Unlike traditional satellites that downlink vast amounts of raw data, including non-essential images, Φsat-2 uses AI to filter and transmit only the most critical information, thereby improving efficiency and decision-making speed. Key onboard applications include cloud detection, street map generation, maritime vessel detection, and real-time image compression, with additional capabilities for marine anomaly and wildfire detection. This mission represents a collaborative effort aimed at setting a new standard in space-based AI technology, demonstrating the transformative potential of AI in making Earth observation more actionable and impactful.
Heman Bekele is TIME’s 2024 Kid of the Year.
He invented a soap that could one day treat and even prevent multiple forms of skin cancer.
— AI KATANA (@ai_katana)
12:23 AM • Aug 20, 2024
🧑🏽💻 AI Jobs
🛠️ AI tools updates
Google has launched Imagen 3, its latest AI-powered text-to-image generator, now accessible to users in the US. This upgraded tool promises enhanced image quality, featuring better detail, richer lighting, and fewer artifacts compared to earlier versions. Imagen 3 allows users to generate and edit images based on text prompts, offering the ability to refine specific areas of the image by describing desired changes. While the tool includes safeguards, such as restrictions on creating images of public figures and weapons, it still allows for creative flexibility, enabling the generation of images resembling popular characters and logos. This release follows Google’s earlier presentation of the tool during its I/O event, and it is now more widely available through the Vertex AI platform.
Khanmigo for Teachers, developed by Khan Academy in partnership with Microsoft, is a free AI-powered tool designed to enhance the teaching experience by streamlining lesson planning, student engagement, and administrative tasks. Available in English across 49 countries, Khanmigo provides over 25 educator-specific tools that assist teachers in generating lesson ideas, personalizing assignments, and managing classroom activities. The tool also offers features like lesson hooks, real-world context generators, and tools to create tailored assignments based on student performance. Additionally, Khanmigo helps reduce the workload on teachers by automating routine tasks such as generating report card comments and creating class newsletters, allowing educators to focus more on student interaction and instruction.
💵 Venture Capital updates
The surge in AI startups has led to intense competition among venture capitalists (VCs) to back the most promising founders, often resulting in skyrocketing valuations even for companies in the early research and development stages. This environment, driven by a fear of missing out (FOMO) and the allure of AI, has pushed many seed VCs to the sidelines as they struggle to compete with the large checks and ownership concessions demanded by highly sought-after founders. Notable examples include startups like Ceramic and Haize Labs, which secured significant valuations despite limited revenue or early-stage products. The high cash burn rates and uncertain revenue streams associated with these startups are causing some VCs to approach AI investments with caution, questioning the sustainability of such high valuations in the long term. As AI continues to dominate the venture capital landscape, the balance between potential reward and inherent risk is becoming increasingly precarious for investors.
🫡 Meme of the day

⭐️ Generative AI image of the day

Before you go, check out Carbon Emissions from AI and Crypto Are Surging and Tax Policy Can Help.
