![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||
|
![]() | On 21 December 2023, it was proposed that this article be moved to URL encoding. The result of the discussion was not moved. |
The article doesn't mention it, so I assume both upper and lower-case hex digits are valid? So %FF == %ff == %fF == %Ff — Preceding unsigned comment added by 75.161.222.199 ( talk) 19:21, 7 August 2020 (UTC)
I note that backslash, "\", 0x5C, is now a reserved character for at least some browsers -- specifically Firefox. It will automatically be changed in the path to forward slash, both on the URI line and when processing AJAX calls, if not escaped as %5C. Please add that the list of reserved characters. Hansonrstolaf ( talk) —Preceding undated comment added 16:45, 29 August 2016 (UTC)
if i have a encoded sequence %7e%7e, how does it know it's ~~ or a unicode char with hex 7e7e? Xah Lee 03:42, September 10, 2005 (UTC)
%7e7e is no complete encoding. Only %7e would be encoded the second 7e would be normal text. So %7e7e == ~7e
Even in UTF the bytes get encoded one by one. -- JonnyJD 11:54, 30 July 2007 (UTC)
URIs are not generally UTF-8 encoded. Cite from text: The generic URI syntax mandates that new URI schemes that provide for the representation of character data in a URI must, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to UTF-8, and then percent-encode those values. IMHO this is wrong. RFC 3986 does norm this only for the host part. The encoding of the URI is generally transparent: Each application that generates an URI can interpret its own URIs correctly (apply encoding and decoding correctly). The only interesting point is the behaviour of externally generated URIs e.g. browser forms and external applications (e.g. goolge parameter ie=UTF-8). The browser behaviour may be different among the browser types. -- Jenswilke ( talk) 09:31, 28 February 2008 (UTC)
Hey all, I just came to this article looking how to encode a % sign in a url string... noticed that the article couldn't tell me, just how to find out (which is arguably more encyclopedic) so I went along and added a table. Possibly needs a bit of rewording if it's decided to keep the table, else if you don't like it feel free to remove it =) Themania 15:12, 1 March 2007 (UTC)
Shouldn't the whitespace character %20 be encoded also? It already is encoded and is mentioned in examples several times in the RFC3986, just read all the paragraphs containing "%20".
I also think the line "No other characters are allowed in an URI." deserves a citation. Daveoh 12:05, 29 July 2007 (UTC)
From a standards perspective (HTML, Javascript and ECMAScript) and from the standpoint of programming languages that assist encoding strings for use as a URL, the proper term (one that is "standard"), is URL encoding. Since verifiability is a requirement for entry on Wikipedia, this standard term should be the preferred term as an article entry. The term "Percent-encoding" is not nearly as likely to appear in a search result.
Accepting as argument to this opinion, search results that link to a location associating these two terms seems irrelevant unless there is also evidence the association is made in usage. Kernel.package ( talk) 20:58, 29 July 2010 (UTC)
Why is there a hyphen in the article title? -- Kvng ( talk) 15:23, 30 September 2010 (UTC)
The corresponding german article about url encoding mentions the percent sign as a reserved character as well. But here I don't see the percent sign in the list. If one reads RFC 3986, the percent sign is indeed not reserved. So its more a problem of the german article.
Janburse ( talk) 15:33, 24 July 2011 (UTC)
I think, in the character data section, there should be a more thorough explanation of the significance of UTF-8 percent encoding; for instance there is no examples of "higher" characters; how do you encode say a diacritic or a kenji, etc.? At this point, I don't know, and the article as written will not give me that knowledge either... A.R. — Preceding unsigned comment added by 205.151.118.180 ( talk) 20:34, 1 December 2011 (UTC)
The text in the article repeats an ambiguous statement from RFC3986:
"Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI."
The grammar of the sentence makes it unclear whether "that octet" refers back to "%25", or to an octet in "percent-encoded octets", or to "it" or furthest back of all to '("%")'.
I'm pretty certain that the sentence means:
"For a percent ("%") character to appear as stand-alone data (octet value 25) in a URI, it must avoid being interpreted as the first character of a percent encoding. Thus a stand-alone "%" must be encoded, as "%25".
If someone has a reference which backs this up, perhaps we could use this clearer sentence, or some better one? Gwideman ( talk) 10:23, 9 February 2012 (UTC)
It's unambiguous in the RFC, because the word "data" is specifically defined there, and octet means byte. But, it's hard to understand on WP, so I'm changing it.
martnik ( talk) 11:05, 5 September 2015 (UTC)
It encodes bytes. A code unit like “%C2” represents a byte, not a character. Revert (or do not make) silly edits like [1] [2], please. Incnis Mrsi ( talk) 16:10, 27 May 2013 (UTC)
The most prominent implementation is the set of built-in JavaScript (ECMAScript) functions: encodeURI, encodeURIComponent, decodeURI, decodeURIComponent. These deserve a more prominent mention in the implementation section. Currently they are only mentioned in passing in the Non-standard implementations section. — Preceding unsigned comment added by Erikn2 ( talk • contribs) 01:49, 13 September 2016 (UTC)
The result of the move request was: moved. Uncontested move request. ( non-admin closure) Colonestarrice ( talk) 10:42, 19 January 2023 (UTC)
Percent-encoding → URL encoding – URL encoding is the WP:COMMONNAME, see Google Ngram for example. URL-encoding might be better, I don't have a strong opinion on that matter. PhotographyEdits ( talk) 14:23, 11 January 2023 (UTC)
shouldn't the space character be in the list titled: 'RFC 3986 section 2.2 Reserved Characters (January 2005)'? 82.71.43.56 ( talk) 12:17, 14 February 2023 (UTC)
The result of the move request was: not moved. ( non-admin closure) Mattdaviesfsic ( talk) 23:04, 28 December 2023 (UTC)
Percent-encoding → URL encoding – After going back and forth with Svnpenn, I see what they mean by consensus not established with the previous move request here but also it seems this was done as an uncontroversial page move through WP:RM?
I do think WP:COMMONNAME applies in this case.
Courtesy ping PhotographyEdits and DanShearer for insight. – The Grid ( talk) 15:43, 21 December 2023 (UTC)
![]() | This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||
|
![]() | On 21 December 2023, it was proposed that this article be moved to URL encoding. The result of the discussion was not moved. |
The article doesn't mention it, so I assume both upper and lower-case hex digits are valid? So %FF == %ff == %fF == %Ff — Preceding unsigned comment added by 75.161.222.199 ( talk) 19:21, 7 August 2020 (UTC)
I note that backslash, "\", 0x5C, is now a reserved character for at least some browsers -- specifically Firefox. It will automatically be changed in the path to forward slash, both on the URI line and when processing AJAX calls, if not escaped as %5C. Please add that the list of reserved characters. Hansonrstolaf ( talk) —Preceding undated comment added 16:45, 29 August 2016 (UTC)
if i have a encoded sequence %7e%7e, how does it know it's ~~ or a unicode char with hex 7e7e? Xah Lee 03:42, September 10, 2005 (UTC)
%7e7e is no complete encoding. Only %7e would be encoded the second 7e would be normal text. So %7e7e == ~7e
Even in UTF the bytes get encoded one by one. -- JonnyJD 11:54, 30 July 2007 (UTC)
URIs are not generally UTF-8 encoded. Cite from text: The generic URI syntax mandates that new URI schemes that provide for the representation of character data in a URI must, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to UTF-8, and then percent-encode those values. IMHO this is wrong. RFC 3986 does norm this only for the host part. The encoding of the URI is generally transparent: Each application that generates an URI can interpret its own URIs correctly (apply encoding and decoding correctly). The only interesting point is the behaviour of externally generated URIs e.g. browser forms and external applications (e.g. goolge parameter ie=UTF-8). The browser behaviour may be different among the browser types. -- Jenswilke ( talk) 09:31, 28 February 2008 (UTC)
Hey all, I just came to this article looking how to encode a % sign in a url string... noticed that the article couldn't tell me, just how to find out (which is arguably more encyclopedic) so I went along and added a table. Possibly needs a bit of rewording if it's decided to keep the table, else if you don't like it feel free to remove it =) Themania 15:12, 1 March 2007 (UTC)
Shouldn't the whitespace character %20 be encoded also? It already is encoded and is mentioned in examples several times in the RFC3986, just read all the paragraphs containing "%20".
I also think the line "No other characters are allowed in an URI." deserves a citation. Daveoh 12:05, 29 July 2007 (UTC)
From a standards perspective (HTML, Javascript and ECMAScript) and from the standpoint of programming languages that assist encoding strings for use as a URL, the proper term (one that is "standard"), is URL encoding. Since verifiability is a requirement for entry on Wikipedia, this standard term should be the preferred term as an article entry. The term "Percent-encoding" is not nearly as likely to appear in a search result.
Accepting as argument to this opinion, search results that link to a location associating these two terms seems irrelevant unless there is also evidence the association is made in usage. Kernel.package ( talk) 20:58, 29 July 2010 (UTC)
Why is there a hyphen in the article title? -- Kvng ( talk) 15:23, 30 September 2010 (UTC)
The corresponding german article about url encoding mentions the percent sign as a reserved character as well. But here I don't see the percent sign in the list. If one reads RFC 3986, the percent sign is indeed not reserved. So its more a problem of the german article.
Janburse ( talk) 15:33, 24 July 2011 (UTC)
I think, in the character data section, there should be a more thorough explanation of the significance of UTF-8 percent encoding; for instance there is no examples of "higher" characters; how do you encode say a diacritic or a kenji, etc.? At this point, I don't know, and the article as written will not give me that knowledge either... A.R. — Preceding unsigned comment added by 205.151.118.180 ( talk) 20:34, 1 December 2011 (UTC)
The text in the article repeats an ambiguous statement from RFC3986:
"Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI."
The grammar of the sentence makes it unclear whether "that octet" refers back to "%25", or to an octet in "percent-encoded octets", or to "it" or furthest back of all to '("%")'.
I'm pretty certain that the sentence means:
"For a percent ("%") character to appear as stand-alone data (octet value 25) in a URI, it must avoid being interpreted as the first character of a percent encoding. Thus a stand-alone "%" must be encoded, as "%25".
If someone has a reference which backs this up, perhaps we could use this clearer sentence, or some better one? Gwideman ( talk) 10:23, 9 February 2012 (UTC)
It's unambiguous in the RFC, because the word "data" is specifically defined there, and octet means byte. But, it's hard to understand on WP, so I'm changing it.
martnik ( talk) 11:05, 5 September 2015 (UTC)
It encodes bytes. A code unit like “%C2” represents a byte, not a character. Revert (or do not make) silly edits like [1] [2], please. Incnis Mrsi ( talk) 16:10, 27 May 2013 (UTC)
The most prominent implementation is the set of built-in JavaScript (ECMAScript) functions: encodeURI, encodeURIComponent, decodeURI, decodeURIComponent. These deserve a more prominent mention in the implementation section. Currently they are only mentioned in passing in the Non-standard implementations section. — Preceding unsigned comment added by Erikn2 ( talk • contribs) 01:49, 13 September 2016 (UTC)
The result of the move request was: moved. Uncontested move request. ( non-admin closure) Colonestarrice ( talk) 10:42, 19 January 2023 (UTC)
Percent-encoding → URL encoding – URL encoding is the WP:COMMONNAME, see Google Ngram for example. URL-encoding might be better, I don't have a strong opinion on that matter. PhotographyEdits ( talk) 14:23, 11 January 2023 (UTC)
shouldn't the space character be in the list titled: 'RFC 3986 section 2.2 Reserved Characters (January 2005)'? 82.71.43.56 ( talk) 12:17, 14 February 2023 (UTC)
The result of the move request was: not moved. ( non-admin closure) Mattdaviesfsic ( talk) 23:04, 28 December 2023 (UTC)
Percent-encoding → URL encoding – After going back and forth with Svnpenn, I see what they mean by consensus not established with the previous move request here but also it seems this was done as an uncontroversial page move through WP:RM?
I do think WP:COMMONNAME applies in this case.
Courtesy ping PhotographyEdits and DanShearer for insight. – The Grid ( talk) 15:43, 21 December 2023 (UTC)