Microsoft, Northwestern, And Witness Launch Deepfake Detection Benchmark To Address Real-World AI Risks

Microsoft, Northwestern, and Witness introduced the MNW benchmark to improve how AI-generated media is detected under real-world conditions (Photo credit: IEEE Spectrum).

Risk / News / Securityairisktoday May 5, 2026

The dataset introduces continuously updated, real-world conditions to improve the detection of AI-generated media across platforms.

Microsoft, Northwestern University, and Witness released a new benchmark dataset designed to improve the detection of AI-generated media, addressing growing concerns over the reliability of existing deepfake detection systems.

The researchers said current detection tools often fail outside controlled environments, particularly when media is altered through common processes such as compression, cropping, or reposting across platforms.

The details of the dataset, called the Microsoft-Northwestern-Witness (MNW) benchmark, were published in an IEEE Intelligent Systems paper. It uses a wide range of AI-generated images and videos, including content produced by multiple generative models and modified through typical distribution channels. The dataset evolves over time, with periodic updates intended to account for advances in generative AI systems.

The paper states that current detection models train on static datasets that fail to adapt as new deepfake techniques evolve. As new deepfakes emerge, they may evade these static detection methods despite these models previously reporting high accuracy detection.

The MNW benchmark tests detection tools in conditions that reflect how content actually spreads online. It measures performance after images and videos have been resized, compressed, filtered, or reposted – changes that often cause current detection systems to fail. Most existing models test on clean, unaltered files, which do not reflect real-world use.

The researchers also found that many detection systems are trained on narrow datasets, making them effective only on certain types of AI-generated content. The MNW benchmark addresses this by using a wider mix of synthetic media and updating the dataset over time. This creates a more realistic and consistent way to measure how well detection tools perform as new AI models emerge.

Researchers and developers can access the MNW benchmark through a public repository, though only on an “evaluation-only” basis. The benchmark is not for commercial use and is limited to testing detection systems.”

Essential AI Risk Intelligence

Daily insights on AI governance, regulation, and enterprise risk management. Trusted by Chief Risk Officers and compliance leaders globally.

By subscribing, you agree to receive our daily newsletter. Unsubscribe anytime.