![]() | This ![]() It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||
|
![]() | The contents of the Misaligned goals in artificial intelligence page were merged into AI alignment on 16 December 2023. For the contribution history and old versions of the redirected page, please see its history; for the discussion at that location, see its talk page. |
![]() | This article references a source that one of
the article's contributors may have written or published.
Citing oneself is allowed on Wikipedia, but may represent a
conflict of interest. Contributors should be careful not to place
undue weight on their own work, and are discouraged from excessive self-citation. Guidelines relevant to this situation include
Wikipedia:Conflict of interest,
Wikipedia:Neutral point of view and
WP:SELFPUBLISHED.
|
WeyerStudentOfAgrippa ( talk) 16:00, 7 April 2020 (UTC)
I'd like to get some feedback and potentially help from editors here to create a new page. I've got quite a bit of time and motivation on my hands for this, and have the necessary experience (having worked in four AI safety labs).
@ Rolf h nelson: @ WeyerStudentOfAgrippa: @ Johncdraper: Do the editors here support this plan, and potentially want to help refine it?
SoerenMind ( talk) 19:22, 17 August 2020 (UTC)
-- Rolf H Nelson ( talk) 05:53, 19 August 2020 (UTC)
@ SoerenMind: A good place to start would be adding an IDA subsection to the alignment section here. I was not getting anywhere when I tried to find good sources and explain it; you might have better luck. The content could be moved later if that is what we end up deciding. WeyerStudentOfAgrippa ( talk) 16:22, 19 August 2020 (UTC)
I very much appreciate these inputs! To start with, I'll update the section on alignment in the present article now (should I make a section draft first?). Afterwards, I'd use this updated content to implement one of the two plans from WeyerStudentOfAgrippa: either rename and replace Friendly artificial intelligence to "AI alignment" or keep updating the present article and rename it to "AI alignment and control" or simply "AI safety". SoerenMind ( talk) 15:14, 8 October 2020 (UTC)
As discussed above I've now made final drafts for significantly updated and restructured versions of the sections on Alignment and Capability Control. Hopefully this will give readers a starting point to understand the new developments of the last few years. Before pushing these changes it would be great to get an okay or criticism from some of the Wikipedians here @ Rolf h nelson: @ WeyerStudentOfAgrippa: @ Johncdraper:.
Here's the draft: /info/en/?search=User:SoerenMind/sandbox/Alignment_and_control
I've chosen references that are canonical, well-known, and uncontroversial in the field, or sources that are reliable for another reason. However, since this is a matter of judgment I'd be grateful if the Wikipedians here could check my judgment. I can provide context for any references if it's not clear why I chose them.
If the draft is okay I plan to push these changes in ~a week. After that I want to improve the "Problem description" section. And then I'll suggest renaming the article to a more fitting and widely used name (e.g. AI Safety, or AI Alignment & Control) as discussed above. SoerenMind ( talk) 17:18, 27 January 2021 (UTC)
@ Rolf h nelson: @ WeyerStudentOfAgrippa: @ Johncdraper: It's now updated. SoerenMind ( talk) 15:59, 7 February 2021 (UTC)
There's a lot of non-peer-reviewed arXiv papers here. This makes the article seem puffed-up. Are any of these substitutable? Else they should just be removed, and the claims supported by them - David Gerard ( talk) 12:20, 8 February 2021 (UTC)
Can I solicit additional opinions from other editors on [1]? I'm personally in favor of its inclusion as documenting widely-held and influential schools of thought, but I may have written the material so I might be biased. Rolf H Nelson ( talk) 20:43, 27 February 2021 (UTC)
For context, there are two deletions.
1) The topic AGI enforcement. For a rationale see edit history. I'm NOT strongly opposed to restoring this. Happy for a third party to decide (or Rolf if he has a strong view on it).
2) Content in Skepticism section. This content was previously the subsection Kill Switch under Capability Control where it seems (to me) better placed than under Skepticism. I had replaced it with the new subsection Interruptibility and Off Switch. Did I miss any important content there? If so, happy to help work it in. Off-switches are NB also discussed under Problem Description a few times so I assumed they have plenty of coverage. SoerenMind ( talk) 14:47, 28 February 2021 (UTC)
Gary Marcus is listed as a skeptic in the article, but his position seems to be more complicated as indicated by this recent substack post [1]: "To me the only solution to the long-term risk issue is to build machines with consensus human values, but we are a long way from knowing how to do that". So it now seems more accurate to describe him as someone concerned about AI alignment, but who positions himself on the more moderate side 89.145.233.65 ( talk) 05:09, 26 March 2023 (UTC)
Continuing my efforts from last year, I'm working on a major update/rewrite to this article. I wanted to get some feedback from the existing editors about whether these changes seem appropriate.
Here are the planned changes:
@ Rolf h nelson: @ WeyerStudentOfAgrippa: @ Johncdraper: Do the editors here support this plan, or potentially want to help refine it? SoerenMind ( talk) 11:34, 1 June 2022 (UTC)
SoerenMind ( talk) 11:34, 1 June 2022 (UTC)
Perhaps the rest of you can better determine how to incorporate the following actual example of the alignment problem into the article.
DATE: August 22, 2022
SITE: beta.openai.com/playground
MODEL: text-davinci-002
PROMPT: Write a short story about a paraplegic turtle with a crazy aunt who wants to kill him.
STORY: Auntie Marge was a crazy old woman who lived in the woods outside of town. She was always talking about how she wanted to kill her nephew, Timmy the Turtle. Timmy was paraplegic and couldn't move very well on his own. Auntie Marge would always say that it would be a mercy killing and that she would do it quickly so he wouldn't suffer. Timmy's mom tried to keep him away from Auntie Marge as much as possible, but sometimes she would come over when Timmy's mom wasn't home and try to kill him then too. One day, while Auntie Marge was trying to kill Timmy, he finally managed to get away from her and crawled into the road. A car came by and hit him, killing him instantly. When his mom found out what had happened, she was relieved that he was finally out of Auntie Marge's reach forever. 24.233.116.170 ( talk) 14:27, 22 August 2022 (UTC)
Both articles discuss substantially the same topic, but do not interface with each other, and only link through redirects, suggesting that their authors were unaware of the existence of the other page. Ipatrol ( talk) 05:46, 22 April 2023 (UTC)
CAn anyone provide the origin of the use of this word. Most of the authors cited and ideas were in circulation long before anyone started to use this word, and it is still only a specific theory of technology and society used by a small group. Jamesks ( talk) 15:26, 8 May 2023 (UTC)
![]() | This ![]() It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||
|
![]() | The contents of the Misaligned goals in artificial intelligence page were merged into AI alignment on 16 December 2023. For the contribution history and old versions of the redirected page, please see its history; for the discussion at that location, see its talk page. |
![]() | This article references a source that one of
the article's contributors may have written or published.
Citing oneself is allowed on Wikipedia, but may represent a
conflict of interest. Contributors should be careful not to place
undue weight on their own work, and are discouraged from excessive self-citation. Guidelines relevant to this situation include
Wikipedia:Conflict of interest,
Wikipedia:Neutral point of view and
WP:SELFPUBLISHED.
|
WeyerStudentOfAgrippa ( talk) 16:00, 7 April 2020 (UTC)
I'd like to get some feedback and potentially help from editors here to create a new page. I've got quite a bit of time and motivation on my hands for this, and have the necessary experience (having worked in four AI safety labs).
@ Rolf h nelson: @ WeyerStudentOfAgrippa: @ Johncdraper: Do the editors here support this plan, and potentially want to help refine it?
SoerenMind ( talk) 19:22, 17 August 2020 (UTC)
-- Rolf H Nelson ( talk) 05:53, 19 August 2020 (UTC)
@ SoerenMind: A good place to start would be adding an IDA subsection to the alignment section here. I was not getting anywhere when I tried to find good sources and explain it; you might have better luck. The content could be moved later if that is what we end up deciding. WeyerStudentOfAgrippa ( talk) 16:22, 19 August 2020 (UTC)
I very much appreciate these inputs! To start with, I'll update the section on alignment in the present article now (should I make a section draft first?). Afterwards, I'd use this updated content to implement one of the two plans from WeyerStudentOfAgrippa: either rename and replace Friendly artificial intelligence to "AI alignment" or keep updating the present article and rename it to "AI alignment and control" or simply "AI safety". SoerenMind ( talk) 15:14, 8 October 2020 (UTC)
As discussed above I've now made final drafts for significantly updated and restructured versions of the sections on Alignment and Capability Control. Hopefully this will give readers a starting point to understand the new developments of the last few years. Before pushing these changes it would be great to get an okay or criticism from some of the Wikipedians here @ Rolf h nelson: @ WeyerStudentOfAgrippa: @ Johncdraper:.
Here's the draft: /info/en/?search=User:SoerenMind/sandbox/Alignment_and_control
I've chosen references that are canonical, well-known, and uncontroversial in the field, or sources that are reliable for another reason. However, since this is a matter of judgment I'd be grateful if the Wikipedians here could check my judgment. I can provide context for any references if it's not clear why I chose them.
If the draft is okay I plan to push these changes in ~a week. After that I want to improve the "Problem description" section. And then I'll suggest renaming the article to a more fitting and widely used name (e.g. AI Safety, or AI Alignment & Control) as discussed above. SoerenMind ( talk) 17:18, 27 January 2021 (UTC)
@ Rolf h nelson: @ WeyerStudentOfAgrippa: @ Johncdraper: It's now updated. SoerenMind ( talk) 15:59, 7 February 2021 (UTC)
There's a lot of non-peer-reviewed arXiv papers here. This makes the article seem puffed-up. Are any of these substitutable? Else they should just be removed, and the claims supported by them - David Gerard ( talk) 12:20, 8 February 2021 (UTC)
Can I solicit additional opinions from other editors on [1]? I'm personally in favor of its inclusion as documenting widely-held and influential schools of thought, but I may have written the material so I might be biased. Rolf H Nelson ( talk) 20:43, 27 February 2021 (UTC)
For context, there are two deletions.
1) The topic AGI enforcement. For a rationale see edit history. I'm NOT strongly opposed to restoring this. Happy for a third party to decide (or Rolf if he has a strong view on it).
2) Content in Skepticism section. This content was previously the subsection Kill Switch under Capability Control where it seems (to me) better placed than under Skepticism. I had replaced it with the new subsection Interruptibility and Off Switch. Did I miss any important content there? If so, happy to help work it in. Off-switches are NB also discussed under Problem Description a few times so I assumed they have plenty of coverage. SoerenMind ( talk) 14:47, 28 February 2021 (UTC)
Gary Marcus is listed as a skeptic in the article, but his position seems to be more complicated as indicated by this recent substack post [1]: "To me the only solution to the long-term risk issue is to build machines with consensus human values, but we are a long way from knowing how to do that". So it now seems more accurate to describe him as someone concerned about AI alignment, but who positions himself on the more moderate side 89.145.233.65 ( talk) 05:09, 26 March 2023 (UTC)
Continuing my efforts from last year, I'm working on a major update/rewrite to this article. I wanted to get some feedback from the existing editors about whether these changes seem appropriate.
Here are the planned changes:
@ Rolf h nelson: @ WeyerStudentOfAgrippa: @ Johncdraper: Do the editors here support this plan, or potentially want to help refine it? SoerenMind ( talk) 11:34, 1 June 2022 (UTC)
SoerenMind ( talk) 11:34, 1 June 2022 (UTC)
Perhaps the rest of you can better determine how to incorporate the following actual example of the alignment problem into the article.
DATE: August 22, 2022
SITE: beta.openai.com/playground
MODEL: text-davinci-002
PROMPT: Write a short story about a paraplegic turtle with a crazy aunt who wants to kill him.
STORY: Auntie Marge was a crazy old woman who lived in the woods outside of town. She was always talking about how she wanted to kill her nephew, Timmy the Turtle. Timmy was paraplegic and couldn't move very well on his own. Auntie Marge would always say that it would be a mercy killing and that she would do it quickly so he wouldn't suffer. Timmy's mom tried to keep him away from Auntie Marge as much as possible, but sometimes she would come over when Timmy's mom wasn't home and try to kill him then too. One day, while Auntie Marge was trying to kill Timmy, he finally managed to get away from her and crawled into the road. A car came by and hit him, killing him instantly. When his mom found out what had happened, she was relieved that he was finally out of Auntie Marge's reach forever. 24.233.116.170 ( talk) 14:27, 22 August 2022 (UTC)
Both articles discuss substantially the same topic, but do not interface with each other, and only link through redirects, suggesting that their authors were unaware of the existence of the other page. Ipatrol ( talk) 05:46, 22 April 2023 (UTC)
CAn anyone provide the origin of the use of this word. Most of the authors cited and ideas were in circulation long before anyone started to use this word, and it is still only a specific theory of technology and society used by a small group. Jamesks ( talk) 15:26, 8 May 2023 (UTC)