Problem caused by profanity filters on the Internet
The Scunthorpe problem is the unintentional blocking of online content by a
spam filter or
search engine because their text contains a
string (or
substring) of letters that appear to have an
obscene or otherwise unacceptable meaning. Names, abbreviations, and technical terms are most often cited as being affected by the issue.
The problem arises since computers can easily identify strings of text within a document, but interpreting words of this kind requires considerable ability to interpret a wide range of
contexts, possibly across many
cultures, which is an extremely difficult task. As a result, broad blocking rules may result in
false positives affecting many innocent phrases.
Etymology and origin
The problem was named after an incident in 1996 in which
AOL's profanity filter prevented residents of the town of
Scunthorpe,
North Lincolnshire, England, from creating accounts with AOL, because the town's name contains the substring "
cunt".[1] In the early 2000s,
Google's opt-in
SafeSearch filters made the same error, with local services and businesses that included Scunthorpe in their names or
URLs among those mistakenly excluded from appearing in search results.[2]
Workarounds
The Scunthorpe problem is challenging to completely solve due to the difficulty of creating a filter capable of understanding words in context.[3][4]
One solution involves creating a
whitelist of known false positives. Any word appearing on the whitelist can be ignored by the filter, even though it contains text that would otherwise not be allowed.[5]
Other examples
Mistaken decisions by obscenity filters include:
Refused web domain names and account registrations
In April 1998, Jeff Gold attempted to register the
domain name shitakemushrooms.com, but due to the substring shit he was blocked by an
InterNIC filter prohibiting the "
seven dirty words".[6] (Shiitake, also commonly spelled shitake, is the Japanese name for the edible fungus Lentinula edodes.)
In 2000, a Canadian television news story on
web filtering software found that the website for the
Montreal Urban Community (Communauté Urbaine de Montréal, in French) was entirely blocked because its domain name was its French acronym CUM (www.cum.qc.ca);[7] "
cum" (among other meanings) is an English-language vulgar slang term for
semen.
In February 2004 in Scotland, Craig Cockburn reported that he was unable to use his surname (pronounced "Coburn",
IPA: /
ˈkoʊbərn/) with
Hotmail because it contains the substring cock, a slang word for the
penis. Separately he had problems with his workplace email because his job title, software specialist, contained the substring Cialis, an
erectile dysfunction medication commonly mentioned in
spam e-mails. Hotmail initially told him to spell his name C0ckburn (with a zero instead of the letter "o") but later reversed the ban.[8] In 2010, he had a similar problem registering on the BBC website, where again the first four characters of his surname caused a problem for the content filter.[9]
In February 2006, Linda Callahan was initially prevented from registering her name with
Yahoo! as an e-mail address as it contained the substring Allah. Yahoo! later reversed the ban.[10]
In July 2008, Dr. Herman I. Libshitz could not register an e-mail address containing his name with
Verizon because his surname contained the substring shit, and Verizon initially rejected his request for an exception. In a subsequent statement, a Verizon spokeswoman apologized for not approving his desired e-mail address.[11]
Blocked web searches
In the months leading up to January 1996, some web searches for
Super Bowl XXX were being filtered, because the
Roman numeral for the game and the site (XXX) is also used to identify
pornography.[12]
Gareth Roelofse, the web designer for RomansInSussex.co.uk, noted in 2004: "We found many library Net stations, school networks and Internet cafes block sites with the word 'sex' in the domain name. This was a challenge for RomansInSussex.co.uk because its target audience is school children."[2]
In 2008, the filter of the free wireless service of the town of
Whakatane in New Zealand blocked searches involving the town's own name because the filter's
phonetic analysis deemed the "whak" to sound like fuck; the town name is in Māori, and in the
Māori language "wh" is most commonly pronounced /
f/. The town subsequently put the town name on the filter's
whitelist.[13]
In July 2011, web searches in China on the name
Jiang were blocked following claims on the
Sina Weibo microblogging site that former
Chinese Communist Party (CCP) general secretary
Jiang Zemin had died. Since the word "Jiang" meaning "river" is written with the same
Chinese character (江), searches related to rivers including the
Yangtze (
Cháng Jiāng) produced the message: "According to the relevant laws, regulations and policies, the results of this search cannot be displayed."[14]
In February 2018, web searches on Google's shopping platform were blocked for items such as
glue guns,
Guns N' Roses, and
Burgundy wine after Google hastily patched its search system that was displaying results for weapons and accessories that violated Google's stated policies.[15]
Blocked emails
In 2001,
Yahoo! Mail introduced an
email filter which automatically replaced
JavaScript-related strings with alternative versions, to prevent the possibility of
cross-site scripting in
HTML email. The filter would
hyphenate the terms "JavaScript", "
JScript", "
VBScript" and "
LiveScript"; and replaced "
eval", "mocha" and "
expression" with the similar but not quite synonymous terms "review", "espresso" and "statement", respectively. Assumptions were involved in the writing of the filters: no attempts were made to limit these string replacements to
script sections and attributes, or to respect word boundaries, in case this would leave some loopholes open. This resulted in such errors as medireview in place of medieval.[16][17][18]
In February 2003,
Members of Parliament at the
British House of Commons found that a new
spam filter was blocking emails containing references to the Sexual Offences Bill then under debate, as well as some messages relating to a
Liberal Democrat consultation paper on censorship.[19] It also blocked emails sent in
Welsh because it did not recognise the language.[20]
In October 2004, it was reported that the
Horniman Museum in London was failing to receive some of its email because filters mistakenly treated its name as a version of the words horny man.[21]
Blocked for words with multiple meanings
In October 2004, e-mails advertising the
pantomimeDick Whittington sent to schools in the UK were blocked by school computers because of the use of the name Dick, sometimes used as
slang for penis.[22]
In May 2006, a man in
Manchester in the UK found that e-mails he wrote to his local council to complain about a planning application had been blocked as they contained the word erection when referring to a structure.[23]
Blocked e-mails and web searches relating to The Beaver, a magazine based in
Winnipeg, caused the publisher to change its name to Canada's History in 2010, after 89 years of publication.[24][25] Publisher Deborah Morrison commented: "Back in 1920, The Beaver was a perfectly appropriate name. And while its other meaning [
vulva] is nothing new, its ambiguity began to pose a whole new challenge with the advance of the Internet. The name became an impediment to our growth".[26]
In June 2010,
Twitter blocked a user from
Luxembourg 29 minutes after he had opened his account and posted his first tweet. The tweet read: "Finally! A pair of
great tits (Parus major) has moved into my birdhouse!" Despite including the Latin name to point out that the tweet was about birds, any attempts to unblock the account were in vain.[27]
Residents of
Penistone in South Yorkshire have had e-mails blocked because the town's name includes the substring penis.[29]
Residents of
Clitheroe (
Lancashire, England) have been repeatedly inconvenienced because their town's name includes the substring clit, which is short for "
clitoris".[30]
Résumés containing references to graduating with
Latin honors such as cum laude, magna cum laude, and summa cum laude have been blocked by spam filters because of inclusion of the word cum, which is Latin for with (in this usage), but is sometimes used as slang for
semen or
ejaculation in English usage.[31]
News articles
In June 2008, a news site run by the anti-
LGBT lobby group
American Family Association filtered an
Associated Press article on sprinter
Tyson Gay, replacing instances of "gay" with "
homosexual", thus rendering his name as "Tyson Homosexual".[32][33] This same function had previously changed the name of basketball player
Rudy Gay to "Rudy Homosexual".[34]
The word or string "ass" may be replaced by "butt", resulting in "clbuttic" for "classic", "buttignment" for "assignment", and "buttbuttinate" for "assassinate".[35]
Other
In 2008,
Microsoft confirmed that its policy to prevent the use of words relating to sexual orientation had meant that Richard Gaywood's name was deemed offensive and could not be used in his "gamertag" or in the "Real Name" field of his bio.[36]
In 2011, the release of Pokémon Black and White introduced Cofagrigus, which could not be traded online to other players without a nickname because its species name contained the substring fag. The system has since been updated to allow players to trade it without nicknames. The same problem occurred with
Nosepass,
Probopass and
Froslass due to their inclusion of the substring ass.[37]
In 2013, file transfers named for the Swedish city of
Falun caused web connection outages at Diakrit, a firm based in China. Diakrit resolved the issue by renaming the files. Fredrik Bergman of Diakrit believes that the file names triggered the
Great Firewall's censors used to block discussion of
Falun Gong, a banned religious movement founded in China.[38]
In January 2014, files used in the online game League of Legends were reportedly blocked by some UK
ISP filters due to the names "VarusExpirationTimer
.luaobj" and "XerathMageChainsExtended.luaobj", which contain the substring sex. This was later corrected.[40]
In May 2018, the website of the grocery store
Publix would not allow a cake to be ordered containing the Latin phrase summa cum laude. The customer attempted to rectify the problem by including special instructions but still ended up with a cake reading "Summa --- Laude".[41][42]
In May 2020, despite extensive media scrutiny, some
hashtags directly referring to British political advisor
Dominic Cummings were unable to
trend on
Twitter because the substring cum triggered an anti-porn filter.[43]
In October 2020, a
paleontology conference's virtual meeting platform blocked various words including "bone", "
pubic", and "stream".[44]
In January 2021, Facebook apologized for muting and banning users after it had erroneously flagged the Devon landmark
Plymouth Hoe as misogynistic.[45]
In April 2021, the official Facebook page for the French Commune of
Bitche was taken down. In response, commune officials created a new page referencing instead the postal code, Mairie 57230. Facebook later apologized and restored the original page. As a precaution, the officials of
Rohrbach-lès-Bitche renamed their Facebook page Ville de Rohrbach.[46][47]
^Mozur, Paul; Tejada, Carlos (13 February 2013).
"China's 'Wall' Hits Business". The Wall Street Journal.
Archived from the original on 10 September 2013. Retrieved 25 May 2013.
Problem caused by profanity filters on the Internet
The Scunthorpe problem is the unintentional blocking of online content by a
spam filter or
search engine because their text contains a
string (or
substring) of letters that appear to have an
obscene or otherwise unacceptable meaning. Names, abbreviations, and technical terms are most often cited as being affected by the issue.
The problem arises since computers can easily identify strings of text within a document, but interpreting words of this kind requires considerable ability to interpret a wide range of
contexts, possibly across many
cultures, which is an extremely difficult task. As a result, broad blocking rules may result in
false positives affecting many innocent phrases.
Etymology and origin
The problem was named after an incident in 1996 in which
AOL's profanity filter prevented residents of the town of
Scunthorpe,
North Lincolnshire, England, from creating accounts with AOL, because the town's name contains the substring "
cunt".[1] In the early 2000s,
Google's opt-in
SafeSearch filters made the same error, with local services and businesses that included Scunthorpe in their names or
URLs among those mistakenly excluded from appearing in search results.[2]
Workarounds
The Scunthorpe problem is challenging to completely solve due to the difficulty of creating a filter capable of understanding words in context.[3][4]
One solution involves creating a
whitelist of known false positives. Any word appearing on the whitelist can be ignored by the filter, even though it contains text that would otherwise not be allowed.[5]
Other examples
Mistaken decisions by obscenity filters include:
Refused web domain names and account registrations
In April 1998, Jeff Gold attempted to register the
domain name shitakemushrooms.com, but due to the substring shit he was blocked by an
InterNIC filter prohibiting the "
seven dirty words".[6] (Shiitake, also commonly spelled shitake, is the Japanese name for the edible fungus Lentinula edodes.)
In 2000, a Canadian television news story on
web filtering software found that the website for the
Montreal Urban Community (Communauté Urbaine de Montréal, in French) was entirely blocked because its domain name was its French acronym CUM (www.cum.qc.ca);[7] "
cum" (among other meanings) is an English-language vulgar slang term for
semen.
In February 2004 in Scotland, Craig Cockburn reported that he was unable to use his surname (pronounced "Coburn",
IPA: /
ˈkoʊbərn/) with
Hotmail because it contains the substring cock, a slang word for the
penis. Separately he had problems with his workplace email because his job title, software specialist, contained the substring Cialis, an
erectile dysfunction medication commonly mentioned in
spam e-mails. Hotmail initially told him to spell his name C0ckburn (with a zero instead of the letter "o") but later reversed the ban.[8] In 2010, he had a similar problem registering on the BBC website, where again the first four characters of his surname caused a problem for the content filter.[9]
In February 2006, Linda Callahan was initially prevented from registering her name with
Yahoo! as an e-mail address as it contained the substring Allah. Yahoo! later reversed the ban.[10]
In July 2008, Dr. Herman I. Libshitz could not register an e-mail address containing his name with
Verizon because his surname contained the substring shit, and Verizon initially rejected his request for an exception. In a subsequent statement, a Verizon spokeswoman apologized for not approving his desired e-mail address.[11]
Blocked web searches
In the months leading up to January 1996, some web searches for
Super Bowl XXX were being filtered, because the
Roman numeral for the game and the site (XXX) is also used to identify
pornography.[12]
Gareth Roelofse, the web designer for RomansInSussex.co.uk, noted in 2004: "We found many library Net stations, school networks and Internet cafes block sites with the word 'sex' in the domain name. This was a challenge for RomansInSussex.co.uk because its target audience is school children."[2]
In 2008, the filter of the free wireless service of the town of
Whakatane in New Zealand blocked searches involving the town's own name because the filter's
phonetic analysis deemed the "whak" to sound like fuck; the town name is in Māori, and in the
Māori language "wh" is most commonly pronounced /
f/. The town subsequently put the town name on the filter's
whitelist.[13]
In July 2011, web searches in China on the name
Jiang were blocked following claims on the
Sina Weibo microblogging site that former
Chinese Communist Party (CCP) general secretary
Jiang Zemin had died. Since the word "Jiang" meaning "river" is written with the same
Chinese character (江), searches related to rivers including the
Yangtze (
Cháng Jiāng) produced the message: "According to the relevant laws, regulations and policies, the results of this search cannot be displayed."[14]
In February 2018, web searches on Google's shopping platform were blocked for items such as
glue guns,
Guns N' Roses, and
Burgundy wine after Google hastily patched its search system that was displaying results for weapons and accessories that violated Google's stated policies.[15]
Blocked emails
In 2001,
Yahoo! Mail introduced an
email filter which automatically replaced
JavaScript-related strings with alternative versions, to prevent the possibility of
cross-site scripting in
HTML email. The filter would
hyphenate the terms "JavaScript", "
JScript", "
VBScript" and "
LiveScript"; and replaced "
eval", "mocha" and "
expression" with the similar but not quite synonymous terms "review", "espresso" and "statement", respectively. Assumptions were involved in the writing of the filters: no attempts were made to limit these string replacements to
script sections and attributes, or to respect word boundaries, in case this would leave some loopholes open. This resulted in such errors as medireview in place of medieval.[16][17][18]
In February 2003,
Members of Parliament at the
British House of Commons found that a new
spam filter was blocking emails containing references to the Sexual Offences Bill then under debate, as well as some messages relating to a
Liberal Democrat consultation paper on censorship.[19] It also blocked emails sent in
Welsh because it did not recognise the language.[20]
In October 2004, it was reported that the
Horniman Museum in London was failing to receive some of its email because filters mistakenly treated its name as a version of the words horny man.[21]
Blocked for words with multiple meanings
In October 2004, e-mails advertising the
pantomimeDick Whittington sent to schools in the UK were blocked by school computers because of the use of the name Dick, sometimes used as
slang for penis.[22]
In May 2006, a man in
Manchester in the UK found that e-mails he wrote to his local council to complain about a planning application had been blocked as they contained the word erection when referring to a structure.[23]
Blocked e-mails and web searches relating to The Beaver, a magazine based in
Winnipeg, caused the publisher to change its name to Canada's History in 2010, after 89 years of publication.[24][25] Publisher Deborah Morrison commented: "Back in 1920, The Beaver was a perfectly appropriate name. And while its other meaning [
vulva] is nothing new, its ambiguity began to pose a whole new challenge with the advance of the Internet. The name became an impediment to our growth".[26]
In June 2010,
Twitter blocked a user from
Luxembourg 29 minutes after he had opened his account and posted his first tweet. The tweet read: "Finally! A pair of
great tits (Parus major) has moved into my birdhouse!" Despite including the Latin name to point out that the tweet was about birds, any attempts to unblock the account were in vain.[27]
Residents of
Penistone in South Yorkshire have had e-mails blocked because the town's name includes the substring penis.[29]
Residents of
Clitheroe (
Lancashire, England) have been repeatedly inconvenienced because their town's name includes the substring clit, which is short for "
clitoris".[30]
Résumés containing references to graduating with
Latin honors such as cum laude, magna cum laude, and summa cum laude have been blocked by spam filters because of inclusion of the word cum, which is Latin for with (in this usage), but is sometimes used as slang for
semen or
ejaculation in English usage.[31]
News articles
In June 2008, a news site run by the anti-
LGBT lobby group
American Family Association filtered an
Associated Press article on sprinter
Tyson Gay, replacing instances of "gay" with "
homosexual", thus rendering his name as "Tyson Homosexual".[32][33] This same function had previously changed the name of basketball player
Rudy Gay to "Rudy Homosexual".[34]
The word or string "ass" may be replaced by "butt", resulting in "clbuttic" for "classic", "buttignment" for "assignment", and "buttbuttinate" for "assassinate".[35]
Other
In 2008,
Microsoft confirmed that its policy to prevent the use of words relating to sexual orientation had meant that Richard Gaywood's name was deemed offensive and could not be used in his "gamertag" or in the "Real Name" field of his bio.[36]
In 2011, the release of Pokémon Black and White introduced Cofagrigus, which could not be traded online to other players without a nickname because its species name contained the substring fag. The system has since been updated to allow players to trade it without nicknames. The same problem occurred with
Nosepass,
Probopass and
Froslass due to their inclusion of the substring ass.[37]
In 2013, file transfers named for the Swedish city of
Falun caused web connection outages at Diakrit, a firm based in China. Diakrit resolved the issue by renaming the files. Fredrik Bergman of Diakrit believes that the file names triggered the
Great Firewall's censors used to block discussion of
Falun Gong, a banned religious movement founded in China.[38]
In January 2014, files used in the online game League of Legends were reportedly blocked by some UK
ISP filters due to the names "VarusExpirationTimer
.luaobj" and "XerathMageChainsExtended.luaobj", which contain the substring sex. This was later corrected.[40]
In May 2018, the website of the grocery store
Publix would not allow a cake to be ordered containing the Latin phrase summa cum laude. The customer attempted to rectify the problem by including special instructions but still ended up with a cake reading "Summa --- Laude".[41][42]
In May 2020, despite extensive media scrutiny, some
hashtags directly referring to British political advisor
Dominic Cummings were unable to
trend on
Twitter because the substring cum triggered an anti-porn filter.[43]
In October 2020, a
paleontology conference's virtual meeting platform blocked various words including "bone", "
pubic", and "stream".[44]
In January 2021, Facebook apologized for muting and banning users after it had erroneously flagged the Devon landmark
Plymouth Hoe as misogynistic.[45]
In April 2021, the official Facebook page for the French Commune of
Bitche was taken down. In response, commune officials created a new page referencing instead the postal code, Mairie 57230. Facebook later apologized and restored the original page. As a precaution, the officials of
Rohrbach-lès-Bitche renamed their Facebook page Ville de Rohrbach.[46][47]
^Mozur, Paul; Tejada, Carlos (13 February 2013).
"China's 'Wall' Hits Business". The Wall Street Journal.
Archived from the original on 10 September 2013. Retrieved 25 May 2013.