This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||
|
This article was nominated for merging with Formal grammar on February 2012. The result of the discussion was no consensus. |
This article was nominated for merging with Terminal and non-terminal functions on August 2014. The result of the discussion was do not merge. |
This page identifies what the terms are, but are missing why they are called that. jptdrake 17:44, 4 February 2007 (UTC)
Lyle 16:21, 8 August 2007 (UTC)
This page explains formal grammars, for which there is a page already. Why not just redirect there, adding the additional material provided here? Rp ( talk) 17:08, 22 February 2012 (UTC)
I'd agree with Rp's above point of view, but this issue may be postponed for now. However, the article Terminal and non-terminal functions (apparently written from a practical point of view, and linked only from Parse tree and some redirect and some user pages) should be merged in any case into this article; both articles deal with the very same subject. - Jochen Burghardt ( talk) 10:53, 11 August 2014 (UTC)
That's right,
Terminal and non-terminal functions only handles CFL (without saying so, as many wikipedia articles do). It is mainly about parse trees and uses some unusual notation (viz. "function", which occurs neither in the only source
[1], nor in the
parse tree article). Maybe a sentence that nonterminal symbols and terminal symbols correspond to interior and leaf nodes in a CFG parse tree, respectively, could be used. And the example from the picture seems more understandable (everyone who learned English grammar might have seen such a tree) than the <integer>
example (which doesn't fit the previous grammar definition but uses some unexplained extensions for optionality and repetition).
It could look like this (wording subject to improvements):
As an example, consider the following context-free grammar for simple English sentences. N = { S, NP, VP, Det, N, V, ... }, Σ = { John, ball, hit, the, ... }, start symbol is S, P is as follows:
S ::= NP VP | ...
VP ::= V NP | ...
NP ::= Det N | ... | John | ...
N ::= ball | ...
V ::= hit | ...
Det ::= the | ...
In this example, nonterminal symbols denote grammatical categories, like " sentence" (abbreviated by "S"), " noun phrase" ("NP"), etc., while terminal symbols are English words, like "John", "ball", etc.
- Jochen Burghardt ( talk) 18:56, 11 August 2014 (UTC)
"Characters" may not be correct, but "lexemes" is clearly wrong. Perhaps we should replace most of the reference to "characters" by "symbols"; (a character is usually a symbol, but a symbol need not be a character). — Arthur Rubin (talk) 02:13, 17 October 2012 (UTC)
For example, if there was a rule , we know that a or b is nonterminal, but we don't know which. It's not clear the terminals and nonterminals are determined by the production rules, unless we make the additional assumption that the only "words" we are interested in have only terminals, and an intermediate string which can not lead to a string with only terminals can be disregarded.
This is not solely a question related to the topic: The statement in formal grammar that "we" are only interested in strings which do not contain nonterminals should also be reflected here, assuming it to be correct. — Arthur Rubin (talk) 04:04, 26 November 2012 (UTC)
Why the example on the end of the page - representing an integer, expressed in a variant of Backus–Naur form::
<integer> ::= ['-'] <digit> {<digit>}
<digit> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
has finite number of production rules (which is a rule for CFG) and not infinitely many since the digit may be placed varied number of times to infinite. And yes it can be rewritten as
<integer> ::= ['-'] <digit>
<digit> ::= '0' | '1'<prdigit> | '2'<prdigit> |
'3'<prdigit>| '4'<prdigit> | '5'<prdigit> | '6'<prdigit> |
'7'<prdigit> | '8'<prdigit> | '9'<prdigit>
<prdigit> ::= '0'<prdigit> | '1'<prdigit> | '2'<prdigit> |
'3'<prdigit> | '4'<prdigit> | '5'<prdigit> | '6'<prdigit> |
'7'<prdigit> | '8'<prdigit> | '9'<prdigit> | empty string
This also removes the leading zeroes in the original example. So a string like "00000000" won't be in the language. — Preceding unsigned comment added by 95.111.115.209 ( talk) 12:02, 7 January 2014 (UTC)
<integer> :: = '0' | ['-'] <digitstring>
<nzdigit> :: = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
<digit> ::= '0' | <nzdigit>
<digitstring> ::= <nzdigit> | <digitstring> <digit>
I think it would be insightful if there was an example with english letters and words to compliment the one given in Cyrillic. Because if using Cyrillic gives us a different point of view, than using english would give context to our understanding.
Using symbols that don't belong to the English alphabet, you will forget the concept of "word" reading the sentences and you get focus in pictoric marks interacting each other: this sentence is awkward and horrible — Preceding unsigned comment added by 132.161.247.75 ( talk) 07:10, 1 March 2016 (UTC)
The "functions" article is a subtopic, and essentially a content fork, of the "symbols" article. Enterprisey ( talk!) 05:32, 12 September 2018 (UTC)
"Undid revision 1164690965 by Epachamo (talk): misleading: Literal (computer programming) refers to (multi-character-)identifier for values, while terminal symbols are almost always considered to have length one)" This is not quite correct. Terminal symbols are often multi-character. Think of a lexical analyzer in a compiler. Terminal symbols include things like "if", "then", "for", "function", etc. As the article states, terminal symbols are Lexical items, which are more often than not more than one character. Epachamo ( talk) 05:20, 12 July 2023 (UTC) - Copied from User_talk:Jochen_Burghardt#reverted_edit by Jochen Burghardt ( talk) 09:12, 12 July 2023 (UTC)
symbols ... like "if", "then", "for", "function". For this reason, formal grammars of programming languages have to include a lexical part, i.e. rules like
<identifier> ::= <letter> | <identifier><letter> | <identifier><digit>
<number literal> ::= <digit> | <number literal><digit>
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<identifier>
). The output of lexical analysis is usually fed into a
parser which transforms it into a
syntax tree. Lexical analysis and parsing usually handles the
regular and the properly
context-free part of the language, respectively. I guess you already know these things.This article is rated C-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||
|
This article was nominated for merging with Formal grammar on February 2012. The result of the discussion was no consensus. |
This article was nominated for merging with Terminal and non-terminal functions on August 2014. The result of the discussion was do not merge. |
This page identifies what the terms are, but are missing why they are called that. jptdrake 17:44, 4 February 2007 (UTC)
Lyle 16:21, 8 August 2007 (UTC)
This page explains formal grammars, for which there is a page already. Why not just redirect there, adding the additional material provided here? Rp ( talk) 17:08, 22 February 2012 (UTC)
I'd agree with Rp's above point of view, but this issue may be postponed for now. However, the article Terminal and non-terminal functions (apparently written from a practical point of view, and linked only from Parse tree and some redirect and some user pages) should be merged in any case into this article; both articles deal with the very same subject. - Jochen Burghardt ( talk) 10:53, 11 August 2014 (UTC)
That's right,
Terminal and non-terminal functions only handles CFL (without saying so, as many wikipedia articles do). It is mainly about parse trees and uses some unusual notation (viz. "function", which occurs neither in the only source
[1], nor in the
parse tree article). Maybe a sentence that nonterminal symbols and terminal symbols correspond to interior and leaf nodes in a CFG parse tree, respectively, could be used. And the example from the picture seems more understandable (everyone who learned English grammar might have seen such a tree) than the <integer>
example (which doesn't fit the previous grammar definition but uses some unexplained extensions for optionality and repetition).
It could look like this (wording subject to improvements):
As an example, consider the following context-free grammar for simple English sentences. N = { S, NP, VP, Det, N, V, ... }, Σ = { John, ball, hit, the, ... }, start symbol is S, P is as follows:
S ::= NP VP | ...
VP ::= V NP | ...
NP ::= Det N | ... | John | ...
N ::= ball | ...
V ::= hit | ...
Det ::= the | ...
In this example, nonterminal symbols denote grammatical categories, like " sentence" (abbreviated by "S"), " noun phrase" ("NP"), etc., while terminal symbols are English words, like "John", "ball", etc.
- Jochen Burghardt ( talk) 18:56, 11 August 2014 (UTC)
"Characters" may not be correct, but "lexemes" is clearly wrong. Perhaps we should replace most of the reference to "characters" by "symbols"; (a character is usually a symbol, but a symbol need not be a character). — Arthur Rubin (talk) 02:13, 17 October 2012 (UTC)
For example, if there was a rule , we know that a or b is nonterminal, but we don't know which. It's not clear the terminals and nonterminals are determined by the production rules, unless we make the additional assumption that the only "words" we are interested in have only terminals, and an intermediate string which can not lead to a string with only terminals can be disregarded.
This is not solely a question related to the topic: The statement in formal grammar that "we" are only interested in strings which do not contain nonterminals should also be reflected here, assuming it to be correct. — Arthur Rubin (talk) 04:04, 26 November 2012 (UTC)
Why the example on the end of the page - representing an integer, expressed in a variant of Backus–Naur form::
<integer> ::= ['-'] <digit> {<digit>}
<digit> ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
has finite number of production rules (which is a rule for CFG) and not infinitely many since the digit may be placed varied number of times to infinite. And yes it can be rewritten as
<integer> ::= ['-'] <digit>
<digit> ::= '0' | '1'<prdigit> | '2'<prdigit> |
'3'<prdigit>| '4'<prdigit> | '5'<prdigit> | '6'<prdigit> |
'7'<prdigit> | '8'<prdigit> | '9'<prdigit>
<prdigit> ::= '0'<prdigit> | '1'<prdigit> | '2'<prdigit> |
'3'<prdigit> | '4'<prdigit> | '5'<prdigit> | '6'<prdigit> |
'7'<prdigit> | '8'<prdigit> | '9'<prdigit> | empty string
This also removes the leading zeroes in the original example. So a string like "00000000" won't be in the language. — Preceding unsigned comment added by 95.111.115.209 ( talk) 12:02, 7 January 2014 (UTC)
<integer> :: = '0' | ['-'] <digitstring>
<nzdigit> :: = '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
<digit> ::= '0' | <nzdigit>
<digitstring> ::= <nzdigit> | <digitstring> <digit>
I think it would be insightful if there was an example with english letters and words to compliment the one given in Cyrillic. Because if using Cyrillic gives us a different point of view, than using english would give context to our understanding.
Using symbols that don't belong to the English alphabet, you will forget the concept of "word" reading the sentences and you get focus in pictoric marks interacting each other: this sentence is awkward and horrible — Preceding unsigned comment added by 132.161.247.75 ( talk) 07:10, 1 March 2016 (UTC)
The "functions" article is a subtopic, and essentially a content fork, of the "symbols" article. Enterprisey ( talk!) 05:32, 12 September 2018 (UTC)
"Undid revision 1164690965 by Epachamo (talk): misleading: Literal (computer programming) refers to (multi-character-)identifier for values, while terminal symbols are almost always considered to have length one)" This is not quite correct. Terminal symbols are often multi-character. Think of a lexical analyzer in a compiler. Terminal symbols include things like "if", "then", "for", "function", etc. As the article states, terminal symbols are Lexical items, which are more often than not more than one character. Epachamo ( talk) 05:20, 12 July 2023 (UTC) - Copied from User_talk:Jochen_Burghardt#reverted_edit by Jochen Burghardt ( talk) 09:12, 12 July 2023 (UTC)
symbols ... like "if", "then", "for", "function". For this reason, formal grammars of programming languages have to include a lexical part, i.e. rules like
<identifier> ::= <letter> | <identifier><letter> | <identifier><digit>
<number literal> ::= <digit> | <number literal><digit>
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<identifier>
). The output of lexical analysis is usually fed into a
parser which transforms it into a
syntax tree. Lexical analysis and parsing usually handles the
regular and the properly
context-free part of the language, respectively. I guess you already know these things.