Google’s Threat Intelligence Group (GTIG) reports that several government-backed hacking groups are actively using its Gemini generative AI platform to support reconnaissance, phishing development, vulnerability research, and malware creation.
Among them is North Korea-linked UNC2970, associated with Lazarus Group, Diamond Sleet, and Hidden Cobra, which used Gemini to synthesize open-source intelligence and build detailed profiles of defense and cybersecurity professionals in support of its long-running Operation Dream Job campaign.
Chinese advanced persistent threat actors have integrated the model into technical workflows, applying it to code generation, exploit research, and vulnerability analysis.
Iran’s APT42 has taken a different tack, using Gemini for translation, persona development, phishing content drafting, and post-compromise research, weaving the model into multiple stages of its operations.
At the same time, Google said it disrupted large-scale model extraction attempts aimed at Gemini itself, suggesting that advanced AI systems are increasingly viewed as operational tools and as assets to be replicated.
Why It Matters: When AI tools are embedded across the intrusion lifecycle, capability scales without equivalent increases in manpower. Teams can generate technical outputs and targeted content on demand, enabling sustained campaign development with limited personnel. The result is greater operational density per actor and a tighter feedback loop between testing and deployment. Because these activities resemble common enterprise AI usage, identifying malicious intent becomes a governance challenge as much as a technical one.
- Reconnaissance That Blends With Legitimate Activity: UNC2970 used Gemini to collect detailed information on cybersecurity and defense firms, including technical roles and salary data, to refine phishing targeting. Google noted that this type of research can resemble ordinary professional inquiry. Other groups, including UNC6418 and China-linked Mustang Panda, used the model to compile dossiers on individuals in Pakistan and map organizational structures of separatist groups, condensing large volumes of open-source data into actionable targeting insight.
- Embedded Support for Vulnerability and Exploit Development: Chinese actors such as APT31, APT41, and UNC795 applied Gemini to analyze known vulnerabilities, interpret open-source documentation, troubleshoot exploit code, and build web shells and scanners for PHP servers. Iran’s APT42 examined exploitation paths for the WinRAR flaw CVE-2025-8088 and used the model to assist in developing tools including, a Python-based Google Maps scraper and a SIM card management system in Rust. In these instances, Gemini functioned as a development assistant inside offensive pipelines.
- Persona Development and Information Operations: APT42 generated phishing drafts, translated content, located official email addresses, and constructed engagement scenarios grounded in detailed biographies. The group has previously targeted journalists, academics, and cybersecurity professionals. Google also observed actors from China, Iran, Russia, and Saudi Arabia using Gemini to produce political satire and propaganda, extending its application beyond intrusion support.
- AI-Integrated Malware and Phishing Infrastructure: Google identified a downloader framework called HONESTCUE that queries Gemini’s API to obtain C# source code for second-stage payloads, compiling and executing the returned code in memory via the .NET CSharpCodeProvider framework to avoid leaving artifacts on disk. Researchers suspect development by a single actor or small team. Separately, an AI-generated phishing kit known as COINBAIT, built using Lovable AI and partly attributed to cluster UNC5356, impersonated a cryptocurrency exchange to harvest credentials. Google also cited ClickFix campaigns that abused public AI-sharing features to host fake troubleshooting guides delivering information-stealing malware.
- Model Extraction as a Direct Threat to AI Systems: GTIG disrupted model extraction activity involving more than 100,000 prompts designed to reproduce Gemini’s reasoning across non-English tasks. A proof of concept by Praetorian showed that a replica model achieved 80.1% accuracy using 1,000 API queries and 20 training epochs. The findings indicate that API outputs can be harvested as training data, making model behavior an asset that requires protection alongside model weights.
Trusted insights for technology leaders
Our readers are CIOs, CTOs, and senior IT executives who rely on The National CIO Review for smart, curated takes on the trends shaping the enterprise, from GenAI to cybersecurity and beyond.
Subscribe to our 4x a week newsletter to keep up with the insights that matter.


