Screenshot | |
Developer(s) | Chilin (HK) Ltd. |
---|---|
Initial release | July 1995 |
Stable release | V3.1
/ Feb 2024 |
Operating system | Cross-platform |
Available in | English, Traditional and Simplified Chinese |
Type | Corpus |
Website |
www |
LIVAC is an uncommon language corpus dynamically maintained since 1995. Different from other existing corpora, LIVAC has adopted a rigorous and regular "Windows" approach in processing and filtering massive media texts from representative Chinese speech communities such as Beijing, Hong Kong, Macau, Taipei, Singapore, Shanghai, as well as Guangzhou, and Shenzhen. [1] The contents are thus deliberately repetitive in most cases, represented by textual samples drawn from editorials, local and international news, cross- Taiwan Strait news, as well as news on finance, sports and entertainment. [2] By 2023, more than 3 billion characters of news media texts have been filtered, of which 700 million characters have been processed and analyzed and have yielded an expanding Pan-Chinese dictionary of 2.5 million words from the Pan-Chinese printed media. Through rigorous analysis based on computational linguistic methodology, LIVAC has at the same time accumulated a large amount of accurate and meaningful statistical data on the Chinese language and on their diverse speech communities in the Pan-Chinese context, and the results show considerable and important long standing as well as evolving variations. [3] [4]
The "Windows" approach is the most innovative feature of LIVAC and has enabled Pan-Chinese media texts to be quantitatively analyzed according to various attributes such as locations, time and subject domains. Thus, various types of comparative studies and applications in information technology as well as development of often related innovative applications have been possible. [5] [6] Moreover, LIVAC has allowed longitudinal developments to be taken into account, facilitating Key Word in Context (KWIC) search and comprehensive study of target words and their underlying concepts as well as linguistic structures over the past 25 years, based on the above mentioned variables of location, time and subject. Results from the extensive and accumulative data analysis contained in LIVAC have enabled the cultivation of textual databases of proper names, place names, organization names, new words, and bi-weekly and annual rosters of media figures. Related applications have included the establishment of verb and adjective databases, the formulation of sentiment indices, and related opinion mining, to measure and compare the popularity of global media figures in the Chinese media (LIVAC Annual Pan-Chinese Celebrity Rosters, later renamed as the Pan-Chinese Newsmaker Rosters), [7] [8] [9] [10] [11] and compilation of new word databases (LIVAC Annual Pan-Chinese New Word Rosters). [12] [13] [14] [15] [16] On this basis, the analysis of the emergence, diffusion and transformation of new words, and the publication of dictionaries of neologisms have been made possible. [17] [18]
A recent focus is on the relative balance between disyllabic words and growing trisyllabic words in the Chinese language, [19] and the comparative study of light verbs in three Chinese speech communities. [20] as well as the link between the language use and use of language as a reflection of epochal change in China. [21] A new LIVAC version 3.1 was launched in February 2024.
The above applications are provided by the following functions:
Screenshot | |
Developer(s) | Chilin (HK) Ltd. |
---|---|
Initial release | July 1995 |
Stable release | V3.1
/ Feb 2024 |
Operating system | Cross-platform |
Available in | English, Traditional and Simplified Chinese |
Type | Corpus |
Website |
www |
LIVAC is an uncommon language corpus dynamically maintained since 1995. Different from other existing corpora, LIVAC has adopted a rigorous and regular "Windows" approach in processing and filtering massive media texts from representative Chinese speech communities such as Beijing, Hong Kong, Macau, Taipei, Singapore, Shanghai, as well as Guangzhou, and Shenzhen. [1] The contents are thus deliberately repetitive in most cases, represented by textual samples drawn from editorials, local and international news, cross- Taiwan Strait news, as well as news on finance, sports and entertainment. [2] By 2023, more than 3 billion characters of news media texts have been filtered, of which 700 million characters have been processed and analyzed and have yielded an expanding Pan-Chinese dictionary of 2.5 million words from the Pan-Chinese printed media. Through rigorous analysis based on computational linguistic methodology, LIVAC has at the same time accumulated a large amount of accurate and meaningful statistical data on the Chinese language and on their diverse speech communities in the Pan-Chinese context, and the results show considerable and important long standing as well as evolving variations. [3] [4]
The "Windows" approach is the most innovative feature of LIVAC and has enabled Pan-Chinese media texts to be quantitatively analyzed according to various attributes such as locations, time and subject domains. Thus, various types of comparative studies and applications in information technology as well as development of often related innovative applications have been possible. [5] [6] Moreover, LIVAC has allowed longitudinal developments to be taken into account, facilitating Key Word in Context (KWIC) search and comprehensive study of target words and their underlying concepts as well as linguistic structures over the past 25 years, based on the above mentioned variables of location, time and subject. Results from the extensive and accumulative data analysis contained in LIVAC have enabled the cultivation of textual databases of proper names, place names, organization names, new words, and bi-weekly and annual rosters of media figures. Related applications have included the establishment of verb and adjective databases, the formulation of sentiment indices, and related opinion mining, to measure and compare the popularity of global media figures in the Chinese media (LIVAC Annual Pan-Chinese Celebrity Rosters, later renamed as the Pan-Chinese Newsmaker Rosters), [7] [8] [9] [10] [11] and compilation of new word databases (LIVAC Annual Pan-Chinese New Word Rosters). [12] [13] [14] [15] [16] On this basis, the analysis of the emergence, diffusion and transformation of new words, and the publication of dictionaries of neologisms have been made possible. [17] [18]
A recent focus is on the relative balance between disyllabic words and growing trisyllabic words in the Chinese language, [19] and the comparative study of light verbs in three Chinese speech communities. [20] as well as the link between the language use and use of language as a reflection of epochal change in China. [21] A new LIVAC version 3.1 was launched in February 2024.
The above applications are provided by the following functions: