Artificial Intelligence & Machine Learning
,
Government
,
Industry Specific
Analysts Say Pentagon Must Add Guardrails to Musk’s Grok in Military Systems

A push by U.S. Defense Secretary Pete Hegseth to integrate Elon Musk’s Grok artificial intelligence model into military classified and unclassified systems does not acknowledge how the large language model fails to meet key federal AI risk and security framework requirements, say cybersecurity analysts.
See Also: New Trend in Federal Cybersecurity: Streamlining Efficiency with a Holistic IT Approach
Should the Pentagon proceed with the integration, the military will likely have to rely on additional guardrails and testing to prevent the same failures that have plagued the model’s public deployments, sources told Information Security Media Group. Hegseth announced Monday that Grok will soon be integrated into military systems alongside other commercial AI tools as part of a broader push to accelerate AI adoption. Hegseth said the move is part of a wider “AI acceleration strategy” aiming to “unleash experimentation” and “eliminate bureaucratic barriers” across the force.
The decision follows weeks of heightened scrutiny of Grok, which is developed by Musk’s xAI, after users generated graphic sexual imagery using the model – prompting regulatory action and investigations by foreign authorities (see: UK Probes X Over AI Deepfake Porn ). Separate incidents involving antisemitic and extremist outputs also raised concerns about the model’s safety controls, with analysts warning that the series of controversies expose potential weaknesses in how the model behaves under permissive or adversarial conditions.
A former senior defense cybersecurity official who requested anonymity to discuss the military’s AI considerations said the core issue is whether Grok will have appropriate guardrails once connected to some of the military’s most sensitive networks.
“The real question is what additional guardrails and testing will be applied to ensure it doesn’t reproduce the same behaviors once it’s inside military systems,” the former official said.
The unpredictability surrounding LLM behavioral patterns and responses complicates the Pentagon’s cybersecurity authorization process, which relies on clearly defined system behavior, logging and failure modes to certify software for operational use. Many AI tools already fielded across the military are narrowly scoped systems built for specific analytic, maintenance or logistics tasks, analysts said, with tightly controlled data flows and well-defined update mechanisms.
Analysts said deploying Grok safely will likely require significant hardening, including sandboxed testing environments that mirror operational data, extensive red-team exercises designed to probe for failure modes and strict limits on what systems and datasets the model can access. Absent those controls, they warned Grok could introduce new attack surfaces into military networks, including exposure to prompt injection attacks, adversarial manipulation of outputs or unintended disclosure of sensitive context through model responses.
Sean Applegate, chief technology officer at Swish and a former U.S. Marine Corps intelligence analyst and C4I systems administrator, said Grok’s appeal to the Department of Defense may rest largely on supply chain considerations, including where and how the model is developed and trained.
Applegate said Grok does not natively meet the requirements of key federal AI risk frameworks, including guidance from the National Institute of Standards and Technology, and it doesn’t appear to address security concerns outlined in widely used industry threat models for large language systems. Those frameworks and threat models are designed to identify and mitigate risks like model misuse, data leakage, unreliable outputs and adversarial exploitation – all of which carry heightened consequences in military and intelligence environments.
“Grok isn’t immune to LLM or Generative AI vulnerabilities,” Applegate said, adding that safe deployment in defense settings would require layered guardrails and continuous adversarial testing.
The military has ramped up efforts in recent months to operationalize generative AI tools faster than its typical acquisition and cybersecurity processes, expanding the use of LLMs like Google’s Gemini through the Defense Department’s GenAI.mil platform while continuing to evaluate other commercial models for logistics, intelligence and decision-support use cases.
