mandarin files

 

Character / bigram frequency lists 

 

When studying a new language, one always wonders which words, bigrams and phrases are the most common. If you knew this, it would prevent you from studying irrelevant items, and in this way you are able to make your studies more effective.

In the case of Mandarin, we are fortunate. We have Prof. Jun da, at this time, Associate Professor of Linguistics and Director: Media Center for Language Acquisition, Department of Foreign Languages and Literatures, Middle Tennessee State University, Murfreesboro, USA, to thank.

He selected a large corpus (millions) of Mandarin texts, mainly "news of the day" items, and computed the frequency with with characters appeared in the selected corpus. He published his findings in the form of lists with the most common character ( 的 ) as number 1, the second most common ( 一 ), as number 2, and so on.

His Character frequency lists 汉字单字频率列表  and also 

Bigram frequency lists 汉字双字组频率列表

can be found and downloaded on websites like

  http://lingua.mtsu.edu/chinese-computing/statistics/ 

By using Prof. Da's lists, I have compiled an MSExcel file that contains the character, it's pinyin form, and the most common English meanings. I use it to review individual characters, and as a quick reference. (No need paging through dictionaries). I fed it into a MSExcel file, so that I could use the find and sort functions when needed.

 

Notes:

1. The H colomn to the right indicates that the character is also found in the HSK test.

2. I have given the characters a status of  1  or  0  in the total colomn on the far right.

      1  indicates that I have come across the character before.

      0  indicates an alternative meaning of the same character (which are shown), or, that I have never

            come across the character before (which I have hidden).

3. This file is used by me on a daily basis, so it will constantly change.

    If you find it useful, you are welcome to download it from the link below, and alter it to suit your personal 

    study program.

4. Should you, however, want to use it comercially, I would appreciate it if you would let me know.

WORDS 1 - 6000.xls WORDS 1 - 6000.xls
Size : 1.051 Kb
Type : xls

 

HSK

 

China's Hanyu Shuiping Kaoshi, (simplified Chinese: 汉语水平考试), known as HSK or the Chinese Proficiency Test is a standardized test at the state level designed and developed by the HSK Center of Beijing Language and Culture University to assess the Chinese proficiency of non-native speakers (foreigners, overseas Chinese and students of Chinese national minorities). Read more about the test on their website at:

http://www.hsk.org.cn/index_e.aspx 

or get some study materials at:

 http://hskflashcards.com/download.php

 

Mandarin does not make use of "words" as we know it in English to convey a particular meaning. Characters have meaning. In addition, Mandarin is a tonal language. That is to say, the meaning of a "word" can be altered by pronouncing it in a different tone. Furthermore, even the same word, pronounced in the same tone, can carry many different meanings. Only by referring to the particular character, can one determine the exact meaning.

For example: the New Age Chinese-English Dictionary, (The Commercial Press, 2004, Beijing) lists 75 charcters under the word "shi" , each with a number of meanings.

It is interesting in this regard to take note of the work by Eugene A. Nida. His most notable contribution to translation theory is called Dynamic Equivalence. In the book The Theory and Practice of Translation, (E.J. Brill, 1982, Leiden), he and his co-author, Charels R. Taber, say the following:

 

Each languge has its own genius. p.3.

Each language has its own system of symbolizing meaning. p. 20. (As shown above, Mandarin uses characters).

The best translation does not sound like a translation. p.12.

 

We should forget that "words have meaning", and rahter think in terms of "meanings have symbols/words/characters"

However, in Mandarin, by far the most meanings are expressed by using bigrams . Wikipedia defines bigrams as "groups of two written letters, two syllables, or two words, and are very commonly used".

 

Because of all the above, I wanted a database that would give me:

a) the most commonly used characters according to their frequency

b) the most commonly bigrams that use these characters as their basis

c) all the characters/bigrams that I have come across

d) all in one file, and on a spreadsheet.

 

 I set out and combined the Jun da frequency list with the bigrams found in the HSK test. Now I had a study tool which I could as a reference, and most importantly, do my revision in a structured manner. I constantly add new bigrams that I come across and deem useful.

 

Notes:

Again, I have given the bigrams a status of  1  or  0  in the total colomn on the far right.

1  indicates that I have come across the character/bigram before.

0  indicates that I have never come across the bigram before.

This file is used by me on a daily basis, so it will constantly change.

If you find it useful, you are welcome to download it and alter it to suit your personal study program.

Should you, however, want to use it comercially, I would appreciate it if you would let me know.

NW HSK WEB.xls NW HSK WEB.xls
Size : 1.734 Kb
Type : xls

 

My last step was to use the MSExcel spreadsheet, and created another file in which I sorted the bigrams alphabetically. In this way I can find a bigram easily.

NW HSK NUM WEB.xls NW HSK NUM WEB.xls
Size : 1.729 Kb
Type : xls

 

 

If you would like to use these files, and cannot download them directly,

feel free to send me an e-mail, and I will mail them to you.

 

myself       scenic spots      birding

 

home