![]() | This article is rated B-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||
|
FWIW, I asked GPT-3.5 the following question and was answered as follows:
"Do you think GPT is a generic term for a kind of AI?
Yes, 'GPT' (Generative Pre-trained Transformer) is a specific type of language model developed by OpenAI, but the term is often used as a general descriptor for a type of AI that can generate human-like language output. The GPT architecture has been highly influential in the field of natural language processing and has been used to power a wide range of language-based applications, including chatbots, language translation tools, and content creation tools. Other companies and researchers have also developed similar language models based on the GPT architecture or other approaches, so the term "GPT" has come to represent a broader category of AI language models that share similar features and capabilities." <emphasis added> Estadje ( talk) 12:31, 2 April 2023 (UTC)
FWIW, I asked GPT-3.5...It's really worth nothing at all.
The current article gives a lot of space to discussing whether GPT is a trademarked term or the term of a generic concept.
The United States Patent and Trademark Office issued a final ruling 6 February 2024 that the term is generic and not a trademark. For wiki purposes, it is not certain what source to cite to establish this, but if we did share that conclusion, then the entire section could be shortened as mostly moot now. The ruling is on their site but as a primary source, we typically do not such such things.
Sites which do cover these issues are
Anyone have thoughts on updating this? Bluerasberry (talk) 18:11, 15 February 2024 (UTC)
I think we need to have a good discussion about what this article should look like because there are some pretty big problems with it now, and it's starting to attract a lot of attention. The article used to be specifically about OpenAI's family of GPT models, but as Estadje has pointed out, the term GPT is now in common use even outside of OpenAI-developed models.
I think it would make some sense to pivot the article's focus to GPTs in general, as Estadje has begun to do, but the problem there is that we will begin to have a very heavy overlap with large language model. With the exception of BERT, every single LLM in the list on that article can be classified as a GPT. And as it stands, a lot of the content in large language model is relevant to generative pre-trained transformer.
So we need to make the choice of having this article be either about OpenAI's GPT family in particular, or about all models that are 'generative', 'pre-trained' and 'transformers'. If we go for the general option, how do we reconcile it with large language model given that most LLMs are GPTs? At the very least, I think only one of them should contain a list section.
Pinging potentially interested editors to the discussion ( Colin M— Artem.G— InfiniteNexus— Gmelli— DFlhb) PopoDameron talk 01:03, 3 April 2023 (UTC)
1) that its title should reflect that particularization (which would avoid actively perpetuating OpenAI's view that it's a trademark)I think this is an interesting idea. Did you have any ideas for an alternative title? OpenAI language models? If we did this, there would still be the question of what should happen to the current title. I would think it should still redirect to this article (maybe with a dab hatnote linking to Large language model?). Colin M ( talk) 19:34, 4 April 2023 (UTC)
hey, few weeks ago we had a conversation about huge Background section in the GPT-2 article. I believe it belongs here, not there. Please see Talk:GPT-2#Background_section. Also pinging JPxG. Artem.G ( talk) 16:15, 12 April 2023 (UTC)
This section should be removed from GPT-2, yes. The main reason it exists there is because, at the time it was written there, this article did not exist, nor did others in the series. The reason it exists at all is that, frankly, there are a large number of articles ( History of artificial intelligence, Timeline of artificial intelligence, Progress in artificial intelligence, History of artificial neural networks (and probably others), of which many are poorly written or incomplete, and none of which none serve to explain to the reader "how GPT works". If there is a more appropriate article for this to go in, it should go there, but I do not think it is a good idea to plop this into an article like History of artificial intelligence; for example, said article has sections titled "1.1.2 Alchemical means of artificial intelligence" and "5.1 The rise of expert systems". jp× g 01:56, 14 April 2023 (UTC)
Proposal to merge with Generative Pre-trained Transformer
OpenAI has filed a trademark for GPT. There will prbably be no contenders, as they have invented the term. [1]
GPT is more akin to a commercial name, rather than a genuine taxon. Large Language Model, seems to be an accepted taxon in Wikipedia, the GPT page, and the scientific community.
As it stands, the current taxonomy (as defined in the leading paragraphs and their platonian definitions) proposes that: ChatGPT is an instance| of GPT, and GPT in turn is an instance of a Large Language Model. My proposition is that we cut the middle man and just classify ChatGPT as an LLM, and make GPT and ChatGPT synonymous. You will find that the only instances of GPT algorithms are GPT1, GPT2, GPT3 and GPT4, and that these were all called ChatGPT.
This proposal would provide a clearer understanding for both students and laymen by shortening the amount of clicks they have to make to understand what ChatGPT really is. Or seen from another perspective, given the same amount of investment, readers will reach one level deeper than the previous one before either becoming satisfied with their current level of knowledge, or becoming frustrated at not understanding.
For a clear example, a layman with the patience to click on two links may read the first paragraphs of ChatGPT, then GPT, then LLM and find the term Neural Network before becoming frustrated and giving up. While under the new structure they would read the first paragraph of ChatGPT, then that of LLM, then click on Neural Network, and they will find an understandable first paragraph, with no further topical terms.
Some may argue that this may consist of original research, but this field does not have any consensus or even proposed taxonomy. An the taxonomy is described not as a separate statement that can be removed for lack of consensus, but as a necessary element of an article, its definition. And since there is no consensus or proposed taxonomy that could source the definition, there is no way to avoid this. I do not recommend to obsess over finding sources for this statement, rather I would recommend interpreting the existing sources, which will probably have enough information to define the concept. I always find it annoying when too many sources are used for an encyclopedic noise.
Thank you for coming to my Ted Talk. At this point, you probably don't need to do anything, really, this is one of those "speak now or be silent forever" scenarios at weddings but on a more bureocratic.
That said, if you see someone opposing this merge, a brief comment expressing your support would be appreciated. Please avoid excess discussion to avoid bikeshedding and filibustering.
Similarly, if you are interested in helping with the merge once an approving sentiment is proven, just let me know briefly. We can coordinate the specific details of the merger LATER. (what bits from which of both articles to use).
Regards, Tom TZubiri ( talk) 03:16, 25 April 2023 (UTC)
GPT-1: There is absolutely no information about what they meant by "1 month on 8 GPUs". I suspect it's V100 GPU, but it has 100 TFLOP/sec, which would give 2e21 FLOP, which is way too high.
GPT-2: The only mention from official source is "tens of petaflop/s-day", so about 1e21 FLOP...
GPT-3: They had to use damn words like "several thousand petaflop/s-days of compute". I actually did a screenshot of the damn Figure 2.2 and measured the histogram precisely. It came out to be 0.89 that of the full figure height, so that corresponds to 10000^0.89 = 3630 PF/s-day, which is almost exactly the 3.1e23 FLOP from my other reference.
GPT-4: > Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
Thanks "Open"AI. pony in a strange land ( talk) 00:52, 2 May 2023 (UTC)
Are all GPTs large language models, as implied by the lead sentence? Surely a smaller language model could also be trained as a GPT, right? – Gluonz talk contribs 23:27, 21 February 2024 (UTC)
![]() | This article is rated B-class on Wikipedia's
content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||
|
FWIW, I asked GPT-3.5 the following question and was answered as follows:
"Do you think GPT is a generic term for a kind of AI?
Yes, 'GPT' (Generative Pre-trained Transformer) is a specific type of language model developed by OpenAI, but the term is often used as a general descriptor for a type of AI that can generate human-like language output. The GPT architecture has been highly influential in the field of natural language processing and has been used to power a wide range of language-based applications, including chatbots, language translation tools, and content creation tools. Other companies and researchers have also developed similar language models based on the GPT architecture or other approaches, so the term "GPT" has come to represent a broader category of AI language models that share similar features and capabilities." <emphasis added> Estadje ( talk) 12:31, 2 April 2023 (UTC)
FWIW, I asked GPT-3.5...It's really worth nothing at all.
The current article gives a lot of space to discussing whether GPT is a trademarked term or the term of a generic concept.
The United States Patent and Trademark Office issued a final ruling 6 February 2024 that the term is generic and not a trademark. For wiki purposes, it is not certain what source to cite to establish this, but if we did share that conclusion, then the entire section could be shortened as mostly moot now. The ruling is on their site but as a primary source, we typically do not such such things.
Sites which do cover these issues are
Anyone have thoughts on updating this? Bluerasberry (talk) 18:11, 15 February 2024 (UTC)
I think we need to have a good discussion about what this article should look like because there are some pretty big problems with it now, and it's starting to attract a lot of attention. The article used to be specifically about OpenAI's family of GPT models, but as Estadje has pointed out, the term GPT is now in common use even outside of OpenAI-developed models.
I think it would make some sense to pivot the article's focus to GPTs in general, as Estadje has begun to do, but the problem there is that we will begin to have a very heavy overlap with large language model. With the exception of BERT, every single LLM in the list on that article can be classified as a GPT. And as it stands, a lot of the content in large language model is relevant to generative pre-trained transformer.
So we need to make the choice of having this article be either about OpenAI's GPT family in particular, or about all models that are 'generative', 'pre-trained' and 'transformers'. If we go for the general option, how do we reconcile it with large language model given that most LLMs are GPTs? At the very least, I think only one of them should contain a list section.
Pinging potentially interested editors to the discussion ( Colin M— Artem.G— InfiniteNexus— Gmelli— DFlhb) PopoDameron talk 01:03, 3 April 2023 (UTC)
1) that its title should reflect that particularization (which would avoid actively perpetuating OpenAI's view that it's a trademark)I think this is an interesting idea. Did you have any ideas for an alternative title? OpenAI language models? If we did this, there would still be the question of what should happen to the current title. I would think it should still redirect to this article (maybe with a dab hatnote linking to Large language model?). Colin M ( talk) 19:34, 4 April 2023 (UTC)
hey, few weeks ago we had a conversation about huge Background section in the GPT-2 article. I believe it belongs here, not there. Please see Talk:GPT-2#Background_section. Also pinging JPxG. Artem.G ( talk) 16:15, 12 April 2023 (UTC)
This section should be removed from GPT-2, yes. The main reason it exists there is because, at the time it was written there, this article did not exist, nor did others in the series. The reason it exists at all is that, frankly, there are a large number of articles ( History of artificial intelligence, Timeline of artificial intelligence, Progress in artificial intelligence, History of artificial neural networks (and probably others), of which many are poorly written or incomplete, and none of which none serve to explain to the reader "how GPT works". If there is a more appropriate article for this to go in, it should go there, but I do not think it is a good idea to plop this into an article like History of artificial intelligence; for example, said article has sections titled "1.1.2 Alchemical means of artificial intelligence" and "5.1 The rise of expert systems". jp× g 01:56, 14 April 2023 (UTC)
Proposal to merge with Generative Pre-trained Transformer
OpenAI has filed a trademark for GPT. There will prbably be no contenders, as they have invented the term. [1]
GPT is more akin to a commercial name, rather than a genuine taxon. Large Language Model, seems to be an accepted taxon in Wikipedia, the GPT page, and the scientific community.
As it stands, the current taxonomy (as defined in the leading paragraphs and their platonian definitions) proposes that: ChatGPT is an instance| of GPT, and GPT in turn is an instance of a Large Language Model. My proposition is that we cut the middle man and just classify ChatGPT as an LLM, and make GPT and ChatGPT synonymous. You will find that the only instances of GPT algorithms are GPT1, GPT2, GPT3 and GPT4, and that these were all called ChatGPT.
This proposal would provide a clearer understanding for both students and laymen by shortening the amount of clicks they have to make to understand what ChatGPT really is. Or seen from another perspective, given the same amount of investment, readers will reach one level deeper than the previous one before either becoming satisfied with their current level of knowledge, or becoming frustrated at not understanding.
For a clear example, a layman with the patience to click on two links may read the first paragraphs of ChatGPT, then GPT, then LLM and find the term Neural Network before becoming frustrated and giving up. While under the new structure they would read the first paragraph of ChatGPT, then that of LLM, then click on Neural Network, and they will find an understandable first paragraph, with no further topical terms.
Some may argue that this may consist of original research, but this field does not have any consensus or even proposed taxonomy. An the taxonomy is described not as a separate statement that can be removed for lack of consensus, but as a necessary element of an article, its definition. And since there is no consensus or proposed taxonomy that could source the definition, there is no way to avoid this. I do not recommend to obsess over finding sources for this statement, rather I would recommend interpreting the existing sources, which will probably have enough information to define the concept. I always find it annoying when too many sources are used for an encyclopedic noise.
Thank you for coming to my Ted Talk. At this point, you probably don't need to do anything, really, this is one of those "speak now or be silent forever" scenarios at weddings but on a more bureocratic.
That said, if you see someone opposing this merge, a brief comment expressing your support would be appreciated. Please avoid excess discussion to avoid bikeshedding and filibustering.
Similarly, if you are interested in helping with the merge once an approving sentiment is proven, just let me know briefly. We can coordinate the specific details of the merger LATER. (what bits from which of both articles to use).
Regards, Tom TZubiri ( talk) 03:16, 25 April 2023 (UTC)
GPT-1: There is absolutely no information about what they meant by "1 month on 8 GPUs". I suspect it's V100 GPU, but it has 100 TFLOP/sec, which would give 2e21 FLOP, which is way too high.
GPT-2: The only mention from official source is "tens of petaflop/s-day", so about 1e21 FLOP...
GPT-3: They had to use damn words like "several thousand petaflop/s-days of compute". I actually did a screenshot of the damn Figure 2.2 and measured the histogram precisely. It came out to be 0.89 that of the full figure height, so that corresponds to 10000^0.89 = 3630 PF/s-day, which is almost exactly the 3.1e23 FLOP from my other reference.
GPT-4: > Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
Thanks "Open"AI. pony in a strange land ( talk) 00:52, 2 May 2023 (UTC)
Are all GPTs large language models, as implied by the lead sentence? Surely a smaller language model could also be trained as a GPT, right? – Gluonz talk contribs 23:27, 21 February 2024 (UTC)