PhotoDNA is a proprietary image-identification and content filtering technology [1] widely used by online service providers. [2] [3]
PhotoDNA was developed by Microsoft Research and Hany Farid, professor at Dartmouth College, beginning in 2009. From a database of known images and video files, it creates unique hashes to represent each image, which can then be used to identify other instances of those images. [4]
The hashing method initially relied on converting images into a black-and-white format, dividing them into squares, and quantifying the shading of the squares, [5] did not employ facial recognition technology, nor could it identify a person or object in the image.[ citation needed] The method sought to be resistant to alterations in the image, including resizing and minor color alterations. [4] Since 2015, [6] similar methods are used for individual video frames in video files. [7]
Microsoft donated[ failed verification] the PhotoDNA technology to Project VIC, managed and supported by the International Centre for Missing & Exploited Children (ICMEC) and used as part of digital forensics operations [8] [9] by storing "fingerprints" that can be used to uniquely identify an individual photo. [9] [10] The database includes hashes for millions of items. [11]
In December 2014, Microsoft made PhotoDNA available to qualified organizations in a software as a service model for free through the Azure Marketplace. [12]
In the 2010s and 2020s, PhotoDNA was put forward in connection with policy proposals relating to content moderation and internet censorship, [13] including US Senate hearings (2019 on "digital responsibility", [2] 2022 on the EARN IT Act [14]) and various proposals by the European Commission dubbed " upload filters" by civil society [15] [16] such as so-called voluntary codes (in 2016 [17] on hate speech [18] after 2015 events, 2018 [19] and 2022 [20] on disinformation), copyright legislation (chiefly the 2019 copyright directive debated between 2014 [21] and 2021 [22]), terrorism-related regulations ( TERREG) [23] and internet wiretapping regulations (2021 "chat control"). [24]
In 2016, Hany Farid proposed to extend usage of the technology to terrorism-related content. [25] In December 2016, Facebook, Twitter, Google and Microsoft announced plans to use PhotoDNA to remove extremist content such as terrorist recruitment videos or violent terrorist imagery. [26] In 2018 Facebook stated that PhotoDNA was used to automatically remove al-Qaeda videos. [13]
By 2019, big tech companies including Microsoft, Facebook and Google publicly announced that since 2017 they were running the GIFCT as a shared database of content to be automatically censored. [2] As of 2021, Apple was thought to be using NeuralHash for similar purposes. [27]
In 2022, The New York Times covered the story of two dads whose Google accounts were closed after photos they took of their child for medical purposes were automatically uploaded to Google's servers. [28] The article compares PhotoDNA, which requires a database of known hashes, with Google's AI-based technology, which can recognize previously unseen exploitative images. [29] [30]
Microsoft originally used PhotoDNA on its own services including Bing and OneDrive. [31] As of 2022, PhotoDNA was widely used by online service providers for their content moderation efforts [10] [32] [33] including Google's Gmail, Twitter, [34] Facebook, [35] Adobe Systems, [36] Reddit, [37] and Discord. [38]
The UK Internet Watch Foundation, which has been compiling a reference database of PhotoDNA signatures, reportedly had over 300,000 hashes of known child sexual exploitation materials.[ citation needed] Another source of the database was the National Center for Missing & Exploited Children (NCMEC). [39] [40]
PhotoDNA is widely used to remove content, [2] disable accounts, and report people. [7]
In 2021, Anish Athalye was able to partially invert PhotoDNA hashes with a neural network, which raises concerns about the reversibility of a PhotoDNA hash. [41]
Image fingerprints, such as PhotoDNA from Microsoft, are used throughout the industry to identify images that depict child exploitation and abuse
{{
cite web}}
: CS1 maint: numeric names: authors list (
link)
A bigger breakthrough came along almost a decade later, in 2018, when Google developed an artificially intelligent tool that could recognize never-before-seen exploitative images of children. [...] When Mark's and Cassio's photos were automatically uploaded from their phones to Google's servers, this technology flagged them.
According to Google, those incident reports come from multiple sources, not limited to the automated PhotoDNA tool.
Google has used hash matching with Microsoft's PhotoDNA for scanning uploaded images to detect matches with known CSAM. [...] In 2018, Google announced the launch of its Content Safety API AI toolkit that can "proactively identify never-before-seen CSAM imagery so it can be reviewed and, if confirmed as CSAM, removed and reported as quickly as possible." It uses the tool for its own services and, along with a video-targeting CSAI Match hash matching solution developed by YouTube engineers, offers it for use by others as well.
PhotoDNA is a proprietary image-identification and content filtering technology [1] widely used by online service providers. [2] [3]
PhotoDNA was developed by Microsoft Research and Hany Farid, professor at Dartmouth College, beginning in 2009. From a database of known images and video files, it creates unique hashes to represent each image, which can then be used to identify other instances of those images. [4]
The hashing method initially relied on converting images into a black-and-white format, dividing them into squares, and quantifying the shading of the squares, [5] did not employ facial recognition technology, nor could it identify a person or object in the image.[ citation needed] The method sought to be resistant to alterations in the image, including resizing and minor color alterations. [4] Since 2015, [6] similar methods are used for individual video frames in video files. [7]
Microsoft donated[ failed verification] the PhotoDNA technology to Project VIC, managed and supported by the International Centre for Missing & Exploited Children (ICMEC) and used as part of digital forensics operations [8] [9] by storing "fingerprints" that can be used to uniquely identify an individual photo. [9] [10] The database includes hashes for millions of items. [11]
In December 2014, Microsoft made PhotoDNA available to qualified organizations in a software as a service model for free through the Azure Marketplace. [12]
In the 2010s and 2020s, PhotoDNA was put forward in connection with policy proposals relating to content moderation and internet censorship, [13] including US Senate hearings (2019 on "digital responsibility", [2] 2022 on the EARN IT Act [14]) and various proposals by the European Commission dubbed " upload filters" by civil society [15] [16] such as so-called voluntary codes (in 2016 [17] on hate speech [18] after 2015 events, 2018 [19] and 2022 [20] on disinformation), copyright legislation (chiefly the 2019 copyright directive debated between 2014 [21] and 2021 [22]), terrorism-related regulations ( TERREG) [23] and internet wiretapping regulations (2021 "chat control"). [24]
In 2016, Hany Farid proposed to extend usage of the technology to terrorism-related content. [25] In December 2016, Facebook, Twitter, Google and Microsoft announced plans to use PhotoDNA to remove extremist content such as terrorist recruitment videos or violent terrorist imagery. [26] In 2018 Facebook stated that PhotoDNA was used to automatically remove al-Qaeda videos. [13]
By 2019, big tech companies including Microsoft, Facebook and Google publicly announced that since 2017 they were running the GIFCT as a shared database of content to be automatically censored. [2] As of 2021, Apple was thought to be using NeuralHash for similar purposes. [27]
In 2022, The New York Times covered the story of two dads whose Google accounts were closed after photos they took of their child for medical purposes were automatically uploaded to Google's servers. [28] The article compares PhotoDNA, which requires a database of known hashes, with Google's AI-based technology, which can recognize previously unseen exploitative images. [29] [30]
Microsoft originally used PhotoDNA on its own services including Bing and OneDrive. [31] As of 2022, PhotoDNA was widely used by online service providers for their content moderation efforts [10] [32] [33] including Google's Gmail, Twitter, [34] Facebook, [35] Adobe Systems, [36] Reddit, [37] and Discord. [38]
The UK Internet Watch Foundation, which has been compiling a reference database of PhotoDNA signatures, reportedly had over 300,000 hashes of known child sexual exploitation materials.[ citation needed] Another source of the database was the National Center for Missing & Exploited Children (NCMEC). [39] [40]
PhotoDNA is widely used to remove content, [2] disable accounts, and report people. [7]
In 2021, Anish Athalye was able to partially invert PhotoDNA hashes with a neural network, which raises concerns about the reversibility of a PhotoDNA hash. [41]
Image fingerprints, such as PhotoDNA from Microsoft, are used throughout the industry to identify images that depict child exploitation and abuse
{{
cite web}}
: CS1 maint: numeric names: authors list (
link)
A bigger breakthrough came along almost a decade later, in 2018, when Google developed an artificially intelligent tool that could recognize never-before-seen exploitative images of children. [...] When Mark's and Cassio's photos were automatically uploaded from their phones to Google's servers, this technology flagged them.
According to Google, those incident reports come from multiple sources, not limited to the automated PhotoDNA tool.
Google has used hash matching with Microsoft's PhotoDNA for scanning uploaded images to detect matches with known CSAM. [...] In 2018, Google announced the launch of its Content Safety API AI toolkit that can "proactively identify never-before-seen CSAM imagery so it can be reviewed and, if confirmed as CSAM, removed and reported as quickly as possible." It uses the tool for its own services and, along with a video-targeting CSAI Match hash matching solution developed by YouTube engineers, offers it for use by others as well.