Release: Oculus (7.85 GB)

Via the Belarus Cyberpartisans, the source code of the neural network or AI used by Russia's censorship office to detect objectionable material online.

Aug 11, 2023

This release includes the source code of the neural network (AI) scanner used by Russia's censorship office to detect "objectionable" material online, such as so-called propaganda relating to gender and sexuality, instructions on weapons or drugs, signs of extremism, terrorism and misinformation, or disrespect that discredits official state sources.

Due to being trained on a wide variety of source material, the source code in this release may include some objectionable material such as nazi imagery, etc.

In 2022, Kommersant got access to the contract terms for the development of Oculus, and reported that the neural network was due to be completed by December 12, 2022, at a cost of 57.7 million rubles ($965,000).

Bleeping Computer reports on the intended functionality of Oculus:

The automatic scanner will analyze URLs, images, videos, and chats on websites, forums, social media, and even chat/messenger channels to locate material that should be redacted or taken down. Examples of information targeted by Oculus include homosexuality “propaganda,” instructions on manufacturing weapons or drugs, and misinformation that discredits official state and army sources.

Tech Radar has reported on the Belarus Cyberpartisans hacktivists’ interest in the Russian censorship machine:

Cyberpartisans (a Belarusian hacker group) gave a glimpse into Russia censorship tactics. The group managed to hack an internal network used by the Center and download over two terabytes of sensitive documents.
Among other revelations on how censors operate, the leak showed evidence of Russian authorities training Oculus to find undesirable depictions of Putin across the web.

Please torrent and seed.

Disclaimer

This dataset was released in the buildup to, in the midst of, or in the aftermath of a cyberwar or hybrid war. Therefore, there is an increased chance of malware, ulterior motives and altered or implanted data, or false flags/fake personas. As a result, we encourage readers, researchers and journalists to take additional care with the data.

This is a standard disclaimer that will be added to all datasets in the Cyberwar category, even absent specific suspicions.

Distributed Email of Secrets

Discussion about this post