From Wikipedia, the free encyclopedia

CLAWS

CLAWS (linguistics) The Constituent Likelihood Automatic Word-tagging System or CLAWS is a program that performs part-of-speech tagging. It was developed in the 1980s at Lancaster University by the University Centre for Computer Corpus Research on Language. [1] It has an overall accuracy rate of 96-97% with the latest version (CLAWS4) tagging around 100 million words of the British National Corpus. [1]

History

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. [2] Developed in the early 1980s [1], CLAWS was built to fill the ever-growing gap created by always changing POS necessities. Since its inception, CLAWS has been hailed for its functionality and adaptability. Still it is not without flaws, and though it boasts an error-rate of only 1.5% when judged in major categories, CLAWS still remains with c.3.3% ambiguities unresolved. [1] Ambiguity arises in cases such as with the word flies, and whether it should be classified as a noun or a verb. [3] It's these ambiguities that will require the various upgrades and tagsets that CLAWS will endure.

Rules and Processing

CLAWS uses a Hidden Markov model to determine the likelihood of sequences of words / parts of speech. [4]

Sample Output

Sample Outputs of CLAWS
C5 -----_PUN "_PUQ Welcome_VVB to_PRP my_DPS house_NN1 !_SENT -----_PUN Enter_VVB freely_AV0 and_CJC of_PRF your_DPS own_DT0 will_NN1 !_PUN "_SENT -----_PUN He_PNP made_VVD no_AT0 motion_NN1 of_PRF stepping_VVG to_TO0 meet_VVI me_PNP ,_PUN but_CJC stood_VVD like_PRP a_AT0 statue_NN1 ,_PUN as_CJS though_CJS his_DPS gesture_NN1 of_PRF welcome_NN1 had_VHD fixed_VVN him_PNP into_PRP stone_SENT ._PUN
C7 "_" Welcome_VV0 to_II my_APPGE house_NN1 !_!

Enter_VV0 freely_RR and_CC of_IO your_APPGE own_DA will_NN1 !_! "_" He_PPHS1 made_VVD no_AT motion_NN1 of_IO stepping_VVG to_TO meet_VVI me_PPIO1 ,_, but_CCB stood_VVD like_II a_AT1 statue_NN1 ,_, as_CS21 though_CS22 his_APPGE gesture_NN1 of_IO welcome_NN1 had_VHD fixed_VVN him_PPHO1 into_II stone_NN1 ._.

This excerpt from Bram Stoker's Dracula (1897) has been tagged using both the CLAWS C5 and C7 tagsets. This is what a CLAWS output will generally look like, with the most likely part-of-speech tag following each word.

Tagsets

CLAWS1 tagset

The first tagset developed in CLAWS, CLAWS1 tagset, has 132 word tags. In terms of form and application, C1 tagset is similar to Brown Corpus tags [5]. See Table of tags in C1 tagset here [6].

CLAWS2 tagset

CLAWS2 tasget with 166 word tags was developed at Lancaster in 1983-1986 [7]. See Table of tags in C2 tagset here [8].

CLAWS4 tagset

CLAWS4 is a general-purpose grammatical tagger. It was used for 100-million-word British National Corpus (BNC). It is a successor of CLAWS1 tagger [9]. The latest version of CLAWS4 is offered by UCREL, a research center of Lancaster University [10] [11].

CLAWS5 tagset

CLAWS5 tagset, which was used for BNC, has over 60 tags [12]. See Table of tags in C5 tagset here [13].

CLAWS6 tagset

CLAWS6 tagset, which was used for BNC sampler corpus, has over 160 tags [14]. See Table of tags in C6 tagset here [15].

CLAWS7 tagset

The standard CLAWS7 tagset is used currently. It is only different in the punctuation tags when compared to the CLAWS6 tagset [16]. See Table of tags in C7 tagset here [17].

CLAWS8 tagset

CLAWS8 tagset was extended from C7 tagset with further distinctions in the determiner and pronoun categories, as well as auxiliary verbs [18]. See Table of tags in C8 tagset here.

Quotes and Reference links(dont delete until you use the reference otherwise it goes away from the reference section)

.tagsets used in Sketch Engine

about Sketch Engine

what it was created for

what it is used for now "The latest version of the tagger, CLAWS4, was used to POS tag c.100 million words of the British National Corpus" [19]

who uses it now

CLAWS Tag Sets - what in each one, what it means: present in table/chart form



http://ucrel.lancs.ac.uk/claws/ [1]

https://www.researchgate.net/publication/2618590_The_CLAWS_Web_Tagger [20]

https://www.sketchengine.eu/english-claws7-part-of-speech-tagset/: claws 7 Tag sets [21]

https://books.google.com/books?id=67OSqA_3hykC&pg=PA164&lpg=PA164&dq=CLAWS+(linguistics)&source=bl&ots=bfXmJDElEe&sig=ACfU3U17rX6G14e_ulF3sGIkTwesNCfGHw&hl=en&sa=X&ved=2ahUKEwiRlZbQs-vnAhUJR6wKHeomC9sQ6AEwBnoECAsQAQ#v=onepage&q=CLAWS%20(linguistics)&f=false [22]

http://www.natcorp.ox.ac.uk/docs/claws7.html [23]

other taggers: http://martinweisser.org/corpora_site/taggers.html [24]

benefits of POS taggers: http://www.inf.ed.ac.uk/teaching/courses/inf2a/slides/2007_inf2a_L13_slides.pdf

in depth explanation of taggers: http://www.cs.uccs.edu/~jkalita/work/cs589/2010/5POSTags.pdf

https://e-space.mmu.ac.uk/619647/1/Tagging_the_Bard_Evaluating_the_accuracy_of_a_mode.pdf how it can be used/works

http://ucrel-api.lancaster.ac.uk/claws/free.html [25] Link to actual free tagger

current system: http://ucrel.lancs.ac.uk/papers/coling.html [26]

References

  1. ^ a b c d e "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-01.
  2. ^ "Stanford Log-linear Part-Of-Speech Tagger". The Stanford Natural Language Processing Group.{{ cite web}}: CS1 maint: url-status ( link)
  3. ^ McCoy, Kathy. "Part of Speech Tagging (Chapter 5)" (PDF).{{ cite web}}: CS1 maint: url-status ( link)
  4. ^ Jurafsky, Dan, 1962- (2009). Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Martin, James H., 1959- (2nd ed ed.). Upper Saddle River, N.J.: Pearson Prentice Hall. ISBN  978-0-13-187321-6. OCLC  213375806. {{ cite book}}: |edition= has extra text ( help)CS1 maint: multiple names: authors list ( link) CS1 maint: numeric names: authors list ( link)
  5. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  6. ^ "UCREL CLAWS1 (LOB) Tagset". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  7. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  8. ^ "UCREL CLAWS2 Tagset". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  9. ^ "CLAWS4: THE TAGGING OF THE BRITISH NATIONAL CORPUS". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  10. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  11. ^ "UCREL home page, Lancaster UK". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  12. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  13. ^ "UCREL CLAWS5 Tagset". ucrel.lancs.ac.uk. Retrieved 2020-04-20.
  14. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  15. ^ "UCREL CLAWS6 Tagset". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  16. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  17. ^ "UCREL CLAWS7 Tagset". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  18. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  19. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-01.
  20. ^ Rayson, Paul; Garside, Roger (1998-06-12). "The CLAWS Web Tagger". {{ cite journal}}: Cite journal requires |journal= ( help)
  21. ^ "English CLAWS 7 part-of-speech tagset | Sketch Engine". 2017-02-08. Retrieved 2020-04-01.
  22. ^ Conference, International Computer Archive of Modern English; English, International Computer Archive of Modern (1993). Corpus-based Computational Linguistics. Rodopi. ISBN  978-90-5183-485-7.
  23. ^ "CLAWS7 Manual". www.natcorp.ox.ac.uk. Retrieved 2020-04-01.
  24. ^ "Taggers". martinweisser.org. Retrieved 2020-04-01.
  25. ^ "Free CLAWS web tagger". ucrel-api.lancaster.ac.uk. Retrieved 2020-04-01.
  26. ^ "CLAWS4: THE TAGGING OF THE BRITISH NATIONAL CORPUS". ucrel.lancs.ac.uk. Retrieved 2020-04-01.
From Wikipedia, the free encyclopedia

CLAWS

CLAWS (linguistics) The Constituent Likelihood Automatic Word-tagging System or CLAWS is a program that performs part-of-speech tagging. It was developed in the 1980s at Lancaster University by the University Centre for Computer Corpus Research on Language. [1] It has an overall accuracy rate of 96-97% with the latest version (CLAWS4) tagging around 100 million words of the British National Corpus. [1]

History

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. [2] Developed in the early 1980s [1], CLAWS was built to fill the ever-growing gap created by always changing POS necessities. Since its inception, CLAWS has been hailed for its functionality and adaptability. Still it is not without flaws, and though it boasts an error-rate of only 1.5% when judged in major categories, CLAWS still remains with c.3.3% ambiguities unresolved. [1] Ambiguity arises in cases such as with the word flies, and whether it should be classified as a noun or a verb. [3] It's these ambiguities that will require the various upgrades and tagsets that CLAWS will endure.

Rules and Processing

CLAWS uses a Hidden Markov model to determine the likelihood of sequences of words / parts of speech. [4]

Sample Output

Sample Outputs of CLAWS
C5 -----_PUN "_PUQ Welcome_VVB to_PRP my_DPS house_NN1 !_SENT -----_PUN Enter_VVB freely_AV0 and_CJC of_PRF your_DPS own_DT0 will_NN1 !_PUN "_SENT -----_PUN He_PNP made_VVD no_AT0 motion_NN1 of_PRF stepping_VVG to_TO0 meet_VVI me_PNP ,_PUN but_CJC stood_VVD like_PRP a_AT0 statue_NN1 ,_PUN as_CJS though_CJS his_DPS gesture_NN1 of_PRF welcome_NN1 had_VHD fixed_VVN him_PNP into_PRP stone_SENT ._PUN
C7 "_" Welcome_VV0 to_II my_APPGE house_NN1 !_!

Enter_VV0 freely_RR and_CC of_IO your_APPGE own_DA will_NN1 !_! "_" He_PPHS1 made_VVD no_AT motion_NN1 of_IO stepping_VVG to_TO meet_VVI me_PPIO1 ,_, but_CCB stood_VVD like_II a_AT1 statue_NN1 ,_, as_CS21 though_CS22 his_APPGE gesture_NN1 of_IO welcome_NN1 had_VHD fixed_VVN him_PPHO1 into_II stone_NN1 ._.

This excerpt from Bram Stoker's Dracula (1897) has been tagged using both the CLAWS C5 and C7 tagsets. This is what a CLAWS output will generally look like, with the most likely part-of-speech tag following each word.

Tagsets

CLAWS1 tagset

The first tagset developed in CLAWS, CLAWS1 tagset, has 132 word tags. In terms of form and application, C1 tagset is similar to Brown Corpus tags [5]. See Table of tags in C1 tagset here [6].

CLAWS2 tagset

CLAWS2 tasget with 166 word tags was developed at Lancaster in 1983-1986 [7]. See Table of tags in C2 tagset here [8].

CLAWS4 tagset

CLAWS4 is a general-purpose grammatical tagger. It was used for 100-million-word British National Corpus (BNC). It is a successor of CLAWS1 tagger [9]. The latest version of CLAWS4 is offered by UCREL, a research center of Lancaster University [10] [11].

CLAWS5 tagset

CLAWS5 tagset, which was used for BNC, has over 60 tags [12]. See Table of tags in C5 tagset here [13].

CLAWS6 tagset

CLAWS6 tagset, which was used for BNC sampler corpus, has over 160 tags [14]. See Table of tags in C6 tagset here [15].

CLAWS7 tagset

The standard CLAWS7 tagset is used currently. It is only different in the punctuation tags when compared to the CLAWS6 tagset [16]. See Table of tags in C7 tagset here [17].

CLAWS8 tagset

CLAWS8 tagset was extended from C7 tagset with further distinctions in the determiner and pronoun categories, as well as auxiliary verbs [18]. See Table of tags in C8 tagset here.

Quotes and Reference links(dont delete until you use the reference otherwise it goes away from the reference section)

.tagsets used in Sketch Engine

about Sketch Engine

what it was created for

what it is used for now "The latest version of the tagger, CLAWS4, was used to POS tag c.100 million words of the British National Corpus" [19]

who uses it now

CLAWS Tag Sets - what in each one, what it means: present in table/chart form



http://ucrel.lancs.ac.uk/claws/ [1]

https://www.researchgate.net/publication/2618590_The_CLAWS_Web_Tagger [20]

https://www.sketchengine.eu/english-claws7-part-of-speech-tagset/: claws 7 Tag sets [21]

https://books.google.com/books?id=67OSqA_3hykC&pg=PA164&lpg=PA164&dq=CLAWS+(linguistics)&source=bl&ots=bfXmJDElEe&sig=ACfU3U17rX6G14e_ulF3sGIkTwesNCfGHw&hl=en&sa=X&ved=2ahUKEwiRlZbQs-vnAhUJR6wKHeomC9sQ6AEwBnoECAsQAQ#v=onepage&q=CLAWS%20(linguistics)&f=false [22]

http://www.natcorp.ox.ac.uk/docs/claws7.html [23]

other taggers: http://martinweisser.org/corpora_site/taggers.html [24]

benefits of POS taggers: http://www.inf.ed.ac.uk/teaching/courses/inf2a/slides/2007_inf2a_L13_slides.pdf

in depth explanation of taggers: http://www.cs.uccs.edu/~jkalita/work/cs589/2010/5POSTags.pdf

https://e-space.mmu.ac.uk/619647/1/Tagging_the_Bard_Evaluating_the_accuracy_of_a_mode.pdf how it can be used/works

http://ucrel-api.lancaster.ac.uk/claws/free.html [25] Link to actual free tagger

current system: http://ucrel.lancs.ac.uk/papers/coling.html [26]

References

  1. ^ a b c d e "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-01.
  2. ^ "Stanford Log-linear Part-Of-Speech Tagger". The Stanford Natural Language Processing Group.{{ cite web}}: CS1 maint: url-status ( link)
  3. ^ McCoy, Kathy. "Part of Speech Tagging (Chapter 5)" (PDF).{{ cite web}}: CS1 maint: url-status ( link)
  4. ^ Jurafsky, Dan, 1962- (2009). Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. Martin, James H., 1959- (2nd ed ed.). Upper Saddle River, N.J.: Pearson Prentice Hall. ISBN  978-0-13-187321-6. OCLC  213375806. {{ cite book}}: |edition= has extra text ( help)CS1 maint: multiple names: authors list ( link) CS1 maint: numeric names: authors list ( link)
  5. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  6. ^ "UCREL CLAWS1 (LOB) Tagset". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  7. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  8. ^ "UCREL CLAWS2 Tagset". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  9. ^ "CLAWS4: THE TAGGING OF THE BRITISH NATIONAL CORPUS". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  10. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  11. ^ "UCREL home page, Lancaster UK". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  12. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  13. ^ "UCREL CLAWS5 Tagset". ucrel.lancs.ac.uk. Retrieved 2020-04-20.
  14. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  15. ^ "UCREL CLAWS6 Tagset". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  16. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  17. ^ "UCREL CLAWS7 Tagset". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  18. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-12.
  19. ^ "CLAWS part-of-speech tagger". ucrel.lancs.ac.uk. Retrieved 2020-04-01.
  20. ^ Rayson, Paul; Garside, Roger (1998-06-12). "The CLAWS Web Tagger". {{ cite journal}}: Cite journal requires |journal= ( help)
  21. ^ "English CLAWS 7 part-of-speech tagset | Sketch Engine". 2017-02-08. Retrieved 2020-04-01.
  22. ^ Conference, International Computer Archive of Modern English; English, International Computer Archive of Modern (1993). Corpus-based Computational Linguistics. Rodopi. ISBN  978-90-5183-485-7.
  23. ^ "CLAWS7 Manual". www.natcorp.ox.ac.uk. Retrieved 2020-04-01.
  24. ^ "Taggers". martinweisser.org. Retrieved 2020-04-01.
  25. ^ "Free CLAWS web tagger". ucrel-api.lancaster.ac.uk. Retrieved 2020-04-01.
  26. ^ "CLAWS4: THE TAGGING OF THE BRITISH NATIONAL CORPUS". ucrel.lancs.ac.uk. Retrieved 2020-04-01.

Videos

Youtube | Vimeo | Bing

Websites

Google | Yahoo | Bing

Encyclopedia

Google | Yahoo | Bing

Facebook