How to Expose False Etymologies of Chinese Characters? An Introduction to the Study of (Early) Chinese Writing
Original article (in Russian): https://vk.com/@randomkj-etymol
Preface
Today, Chinese characters are the only logographic writing system still in use globally. Because of this, sooner or later everyone encounters people trying desperately to find 'pictures' in them. It is one thing when people try to come up with a ‘picture’ just to memorise the character more easily (this is called a mnemonic technique). However, it is a different matter entirely when people who don’t understand anything about the ancient characters make up gibberish, peddling it as the real origin of the character, or simply spread information from outdated sources, without bothering to check contemporary scholarship, ending up misleading and confusing people who are interested in etymology.
Most books dedicated to the history/structure/etymology of Chinese logograms that you can buy at regular bookstores contain plain nonsense. I’m not even talking about the obvious cases, stuff titled like ‘The Cosmic Truth behind Chinese Characters’. You can find a staggering amount of badly researched or simply extremely outdated information even in regular dictionaries. This is all due to the incompetence of their authors in the area of the history of early Chinese writing.
Etymology is a science and it must utilise scientific methods, eschewing any guesswork, fantasies and other ‘temptations’. The origin of every character must be supported by evidence based on academic methodology.
This article will be useful to the people who are interested in learning the real etymologies of Chinese characters but don’t know where and how to look, what sources to use or even what to believe.
I decided to write this article as a comprehensive and succinct piece of reference material that can be easily shared with other people. Considering the academic study of etymology cannot be introduced without any background, a large portion of this article will be dedicated to just that. I’ll introduce the history of the study of Chinese writing, the characters themselves, and important theory that is relevant to contemporary scholarship (more on theory and the history of both early and modern Chinese writing you can read in my other article). Only after these basics are laid down will I answer the question from the title: what you should look for when studying the etymology of specific characters and how to prove etymologies.
An Introduction to the Study of Early Chinese Writing
What is the Study of Early Chinese Writing?
The history of Chinese script is typically divided into two periods: early writing and modern writing. You can see it in the picture below.
The study of early Chinese writing, without which any talk of correct etymology is unthinkable, is dedicated to the study of the characters of the ancient period.
To study these characters one must use ancient texts. For example, the inscriptions on bones and turtle plastrons, on bronze vessels, the writing on bamboo slips, etc.
The main goals of the study of early Chinese writing are the understanding the origin, nature and structure of the ancient characters and the deciphering and reading of the texts written using the ancient characters.
The Nature of the Study of Ancient Chinese Characters and its Connection with Other Areas of Inquiry
The study of ancient Chinese characters is one of many subfields of linguistics. It’s also tightly connected to Chinese linguistics. Aside from that, it is intertwined with some other academic fields and might require the understanding of archaeology (to understand the properties of the materials being researched), history (to understand the historical connections in the researched texts and to find mistakes in the ancient chronicles), culture (to understand the nature of ritual texts), philosophy (to understand philosophical treatises), philology (to be able to compare different texts) etc.
The study of ancient Chinese characters is divided into four main areas:
the study of inscriptions on turtle plastrons and bones (mostly concerned with the texts, writing and culture of the Shang period);
the study of inscriptions on bronze vessels (in a broad sense, all inscriptions from Shang to Han, in the narrow sense, only Western Zhou inscriptions);
the study of Warring States inscriptions (texts from the Spring and Autumn and Warring States periods on seals, money, ceramics, etc);
the study of bamboo and silk inscriptions (texts on bamboo/wooden slips and silk from the Warring States period to the Han period).
The History of the Study of Ancient Chinese Characters
The Death of Ancient Chinese Characters and the Emergence of their Study
Before the Qin's wars of unification in the late 3rd century BC there were many very diverse regional variants of different characters that were developed in separate kingdoms (more on that in a bit).
After unification, all characters began to be written only as they were written in the Qin kingdom. At the same time, book burning (mostly of Confucian books) was carried out, which destroyed a vast amount of pre-Qin-era literature.
With the fall of the Qin Empire and the transition to the Han era, the pressure on Confucianism lessened and the texts, which had been transmitted mostly orally until then, were again written down, this time using the clerical/chancery script that was popular at the time.
Circa 1st century BC the old home of Confucius was demolished to make way for the construction of a new palace. During the demolition, many canonical texts were found in the walls of the house, hidden there to save them from the book burnings. They were written in the Warring States scripts. At first, they were mostly ignored. It was only at the very end of the Western Han era (beginning of the 1st century), when they were edited and organised by Liu Xiang (劉向).
It turned out that those canonical books were different from those transmitted orally, and as a result there was a division between the ‘old canon’ (written in ancient script of the Warring States period) and the ‘current canon’ (written in modern at the time clerical script). Not all scholars accepted the old canon, believing it to be a forgery, but those who were interested in the texts could be considered the first scholars of ancient characters. Around the same time a tradition of textual criticism also emerged.
Xu Shen (許慎) didn’t like that when interpreting the canon people tended to use contemporary (to them) construction of characters and definitions of words so at the start of the 2nd century he wrote the ‘Shuowen Jiezi’ (說文解字). Unfortunately, we don’t have the original dictionary, but it was transmitted through handwritten copies for centuries.
It contains more than 10,000 characters, each of which has a Qin dynasty seal form that is used to explain the construction, origin and meaning of the character. It also presents some ‘ancient script’ (古文) forms, based on the forms of the Warring States period, as well as ‘Zhou script’ (籀文) forms, based on the transmitted forms of the ‘Shizhoupian’ (史籀篇) dictionary from the Western Zhou era.
Xu Shen’s dictionary is very important to this day, both as the pinnacle of Han dynasty scholarship, as well as the standard source of shapes for contemporary seals.
The Study of Character Forms of the Warring States Period
After the fall of the Han dynasty, wars broke out again and the script was in disarray once more. At the same time, however, a calligraphic style that we today call ‘regular script’ emerged and quickly caught on. Shuowen Jiezi started becoming popular and acquiring its authoritative reputation, while its seal forms for a time were even used as ‘prototypical forms’.
At the end of the 3rd century a new source for ancient characters was discovered, the ‘Jizhong books’ (汲冢書). They were bamboo slips plundered from the tomb of a ruler of the Wei kingdom (魏) from the Warring States period. Aside from the already known canonical texts, the tomb contained a lot of literature that was considered lost at the time. The scholars of the time organised and edited these texts for 20 years, studying the script of a bygone epoch.
By the end of the 10th century Guo Zhongshu (郭忠恕) compiled all the Warring States characters that were known at the time and released the dictionary 'Hanjian’ (汗簡).
In the 11th century Xia Song (夏竦) compiled even more characters and released them in the dictionary ‘Guwen Si Shengyun’ (古文四聲韻).
After that, the study of the Warring States scripts came to a long standstill, during which all of the original texts were lost, complete with the contents, editorial work, etc. Today we don’t have any of the material the scholars of the time had. Since the characters of other kingdoms were not directly related to the Qin seal forms, nobody was particularly interested in them.
The Boom in the Study of Bronze and Stone Inscriptions
During the Song dynasty the study of the texts written on bronze vessels and stones became very fashionable. Many specialised books and dictionaries on the topic were written at the time, for example the ‘Kaogutu’ (考古圖) by Lu Dalin (呂大臨).
During Yuan and Ming this kind of research stopped, but was revived during the Qing as an additional tool for the exegesis of the canonical texts. Dictionaries get released that contain characters from all sorts of seals, coins, ceramics, etc. Many of them are supposed to be additions to the Shuowen Jiezi, for example the ‘Shuowen Guzhou Bu’ (說文古籀補) written by Wu Dacheng, that includes many auxiliary forms from bronze inscriptions.
The Discovery of Oracle Bone Script and the Contemporary State of Affairs
At the end of the 19th century the inscriptions on turtle shells and bones from the time of the Shang dynasty were discovered. In 1903 Lui E (劉鶚) released the compilation ‘Tieyun Canggui’ (鐵雲藏龜) which widely popularised the find.
These texts were studied by such famous scholars as Luo Zhenyu (羅振玉), Wang Guowei (王國維), Guo Moruo (郭沫若) and many others.
From 1928 to 1937, the first archaeological excavation of the Yin capital was carried out, and some 25,000 different inscriptions were discovered. Before that all finds originated either from grave robbers or were of unknown provenance, but this dig made further study possible. The history of the study of the oracle bone script is pretty extensive so I won’t muse on it too much.
Research on the inscriptions of the Warring States period also began to gain momentum; in the 1940s a Chu text written on silk was published, and in 1951 more Chu inscriptions were found in Wulipai (五里牌). Thanks to that, scholars have once again recognised the importance of the Hanjian and Guwen Si Shengyun compilations published several centuries before that.
Since the 20th century, many more archaeological discoveries have been made, and research is rapidly continuing to this day.
How Chinese Characters Work
What are Chinese Characters?
A Chinese character is a symbol that records a Chinese word in a way accessible to the eye. They are a kind of an agreement to write down words that wouldn’t have gotten preserved in speech.
Often, people say that a Chinese character has pronunciation, meaning and form. In reality, they have only form, while sound and meaning are something inherent to the words that they record.
Now, let’s break down how Chinese characters make it possible to visualise words (Here we speak of how exactly they record words, not of the origin of the characters).
Shuowen Jiezi gives us an exhaustive description of a system of ‘six categories’ (六書) that divides the characters into pictograms (象形), simple ideographs (指事), compound ideographs (會意), phono-semantic compounds (形聲), transfer characters (轉注) and loan characters (假借). This method of dividing all characters into categories has become so ingrained that it is used to this day in popular literature.
That said, these ‘six categories’ are not a result of any kind of study that showed that all characters can be divided into these six types. It is the result of Han-era scholars interpreting each word from the classical texts in such a manner. From the very beginning this system forced scholars into its rigid framework and had many drawbacks to boot. Contemporary scholarship eschews it completely.
In reality, Chinese characters are way more diverse than these six categories imply and the divisions between the various types are often not very clear at all. But three main categories could be singled out: semantograms, rebuses (sound loans) and phono-semantic compounds. They can also be divided into smaller subtypes.
Semantograms
A character is considered a semantogram when it visualises only the meaning of the word it transcribes, ignoring its pronunciation.
When a character directly depicts the object denoted by the word the character represents, then it is a pictogram (not the same thing as the ‘pictograms’ from the ‘six categories’). For example, because the word {人} means ‘person’ it is written using a picture of a human. (Curly brackets are used to denote the word itself. For example, ‘{人}’ means ‘the word pronounced rén that means ‘person’, which today is usually written down as 人’).
There are quite many pictograms. They usually depict people, body parts, plants, celestial bodies, natural phenomena, animals, buildings, tools, features of landscape, etc. It is one of the oldest types of characters, and almost no new ones have been created since Western Zhou.
There are also abstract pictograms, this is when a character depicts some abstract concept. They also tend to belong to the oldest stratum of the system, but some have been coined at every stage of its evolution (See for example the comparatively late 凸 ‘convex’ or 丫 ‘bifurcation’).
Empathising pictograms, well, empathise something to denote a different meaning. This category mostly contains body parts and actions connected with them. They are very old too, and pretty much no new ones have been coined since the times of early writing.
The types that we’ve described so far contain characters that are considered basic. This means that they cannot be broken down into smaller parts. But there are also complex characters.
For example the word {獲} ‘to catch (prey)’ is written with the character made up of the bird (隹) and the hand (又).
But it still cannot be broken down into two elements because it depicts the act of catching a bird with a hand. The hand is depicted as an instrument with which the bird is being caught, you can’t just describe it as ‘bird + hand’, because it is composed of graphical elements, rather than semantic ones (more on that later). This is why it can be called a complex pictogram. There are pretty much no new coinings since the end of the early writing period, except for a few (閂 ‘latch’ or 灭 ‘to extinguish’).
From pictograms we move on to a different type of semantograms. Indicators contain highlighting elements like circles, dots or lines that point to some specific part denoting what the character means. These are also pretty old.
Finally we move on to compound semantograms. These contain two or more semantic elements (two or more semantics) that mean something by themselves, with the compound characters combining their meanings. These are quite rare in early writing but become more and more frequent later on (especially in Japan and Korea).
Of course, there are some characters that occupy an unclear niche. They can either be complex pictograms or compound semantograms. This problem won’t be discussed here.
Rebuses
Say, we have the word {錡} that means ‘spiked axe’ and is written with the corresponding pictogram. We also have the word {我} that means ‘I’ but for which it’s a bit difficult to come up with any specific character. Since the words {錡} and {我} sound similar we can simply take the character from {錡} and use it to phonetically write {我}. (An important thing to note is that we’re talking about the ancient readings of these words. Today, they’re really quite different. We won’t discuss the reconstruction of Old Chinese in this article however).
Cases like this one, when one word is used to write down a different, etymologically unrelated word, are called rebuses or sound loans. This method is used to this day, see for example the Internet’s ‘martian language’ or even puns.
One of the reasons for the emergence of this practice is plain difficulty in visualising certain words, for example the ones denoting abstract concepts. Perhaps another reason is the fact that some people can mistakenly connect two etymologically unrelated words based on their pronunciation. In contemporary context, an example could be the superstitions based on the similar pronunciation of {四} ‘four’ and {死} ‘death’.
Phono-Semantic Compounds
Phono-semantic compounds are compound characters that contain a phonetic element (a phonetic).
For example, you could write the word {道} ‘road’ using the characters composed of the semantic 行 ‘road’ and the phonetics 首 or 舀. The characters 首 and 舀 transcribe the words {首} ‘head’ and {搯} ‘to pull out’. This means that their definitions have nothing to do with the word {道} ‘road’ (unlike the semantic 行) but their pronunciation is similar.
Phono-semantic compounds are a very convenient way of coining new characters from the semantograms and rebuses you already have. When they appeared the amount of Chinese characters simply exploded. Almost all characters currently in use (more than 90% of them) belong to this category.
Many words have synonyms with similar meanings that are nevertheless minutely different. For example there are a bunch of words that mean ‘road’: {行}, {道}, {路}, {途}, {徑}, etc. Transcribing them only using semantograms might be a bit of a bother. On the other hand, the readers might have trouble understanding which of these words was being meant if they’re all transcribed using one character.
Phono-semantic compounds solve this problem, letting us easily transcribe all of these words by adding a phonetic element to the semantic one.
An additional phonetic is often added to a semantogram to make the pronunciation clearer. For example, the word {鷄} ‘rooster’ started out simply as a picture of a rooster. Later on it acquired the phonetic 奚 that sounded like the word {鷄}. Even later, the picture of a rooster was replaced with a generic bird 鳥 because the phonetic made him redundant.
A phonetic can be appended to any type of character.
With sound loans, a single character can convey a variety of similar-sounding words; for example, the character for {錡} ‘spiked axe’ can mean {我} ‘I’, {義} ‘morality’, {儀} ‘etiquette’, {宜} ‘should’, etc. Which particular word is written using the character must be determined by the context. This can easily lead to a problem where it is not clear exactly what meaning of what word is used, because due to semantic shifts and extensions, the same word can have a huge range of very different meanings.
Phono-semantic compounds help us deal with this problem. By adding different semantics to a single phonetic we can separate words that sound too similar. For example, by adding the semantics 行 'road', 艸 'grass' and 糸 'thread' to the phonetic 巠 we can get the characters 徑 'road', 莖 'stem' and 經 'thread running lengthwise' respectively.
For the same reason, there are composite characters where a semantic has been appended to an already existing character. The characters for the word {道} ‘road’ have already been shown above, but they also contain the additional element 止 ‘foot’.
Actually, the semantics 行 (referring to the road) and 止 (referring to movement) were used together very frequently. This is why they ended up becoming the element 辵(辶) in the modern system.
To write the word {來} ‘to come’ a depiction of ears of wheat was used because the words {來} ‘to come’ and {麥} ’wheat’ sounded similar. To specify the meaning ‘to come’ a semantic 夊 (‘foot’, 止 flipped upside down) could be used. At first, these characters were used interchangeably but later split in their usage. 來 remained the usual way to write {來} ‘to come’, while the character with the additional 夊 started being used to write the word {麥} ‘wheat’. Differentiation happened.
There exists another type of differentiation, when a semantic is added to the character to delineate a borrowed or extended meaning of the character. For example the word {溢} ‘to overflow’ was written down as an image of a vessel full of water. Later on the word {溢} acquired the additional meaning of {益} ‘to expand’. The semantic 水 ‘water’ was added to specify the original meaning, while the original character took on the extended one.
There are quite many characters of this kind.
There are especially many characters that had a semantic added to denote a borrowed or extended definition. Almost all characters created after block script became widely used belong to this type.
This is how phono-semantic compounds can be formed not just by combining a semantic and a phonetic, but also by appending a semantic or a phonetic to a full-fledged character.
Just like in the example with {道} ‘road’, different elements can be used as phonetics, it is the same case with semantics, where various synonyms can be used interchangeably. Because of this, a single word can be written using a large variety of phono-semantic compounds.
Ornamental Elements
Ornamental elements (including strokes) are the elements of a character that are added only to decorate/complicate it or sometimes to differentiate it. They don’t carry any semantic or phonetic information. A character with an ornamental stroke and the same character that doesn’t have one are usually used interchangeably. That said, there are often cases where ornamental elements cause differentiation to happen, like in the case with 來 ‘to come’ and 麥 ’wheat’ that we’ve already discussed (more on that in a bit).
Often the ornamental strokes/dots are added either above the character (if it has a horizontal stroke at the top), around vertical characters (kinda like 八), go across straight strokes or fill up empty spaces. Characters with ornamental strokes can be found to this day, for example in some regions they can write 人 ‘person’ or 文 ‘pattern’ with the three strokes 彡 going across the right ‘leg’.
The most common ornamental element is 口, which should not be confused with the semantic 口 ‘mouth’. The decorative 口 is usually placed below or inside the character (in the latter case there can be several of them). Aside from 口 there are also elements like 寸 or 工.
The characters that contain ornamental elements aren’t classified any differently from the characters that don’t contain them.
More on Differentiation
Differentiated forms can emerge not only from added semantics or phonetics, but also from added ornamental elements. In that case the differentiation that can happen can be both in pronunciation (if the original and differentiated characters sound similar) or meaning (if they’re close semantically).
The former case is pretty simple. The character with the ornamental element denotes the pronunciation of the original word but records a different one. These characters should be considered phono-semantic compounds.
The earliest stages of writing had some characters that were probably ideographs (that is, that recorded several distinct but semantically similar words). This principle was used in some very early characters which used the ornamental elements to indicate that the character denotes a different, semantically related word. These should be considered semantograms.
Another important kind of differentiation is the corruption of the character’s graphical form or its simplification. Usually this happens to extremely frequent (particularly grammatical) words. It is based only on the sound similarity of words. Characters like this can be considered both a special class of semantograms and a special class of rebus characters. They can be found at every stage of Chinese writing, including contemporary (see 刁 ‘deceitful’ from 刀 ‘knife’, 乒乓 ‘table tennis’ from 兵 ‘soldier’, 茶 ‘tea’ from 荼 ‘bitter’, 着 ‘to wear’ from 著 ‘book’ from 箸 ‘chopsticks’, etc).
All of these types of differentiation are well known to scholars and are extraordinarily important.
Ligatures
Early writing frequently utilised ligratures (This is when two or more characters are combined into one). They are supposed to be read as several words, usually numbers, quantities of items, names, toponyms, etc. You can find them even in the earliest texts and starting with the Spring and Autumn period they are usually marked with =. Ever since Western Zhou, that mark has been used to denote the repetition of a character. Modern characters like 浬 ‘nautical mile’ (< 海里) are also ligatures.
Rare Categories of Characters
Sometimes you can find a character made up of two identical phonetic elements. You could say they’re phono-semantic compounds, except they lack a semantic. These kinds of characters shouldn’t be confused with the likes of 在, which have a phonetic added to a sound loan but use it as a semantic too. This category of characters is peculiar to early writing. Some notable ones are 友 ‘friend’ (two 又), 比 ‘to compare’ (two 匕), 兩 (two 丙). These characters use the repetition of the phonetic to convey the idea of a pair.
Some compounds, in which the semantic has been replaced, can be considered a separate category. For example, you can say that 賑 ‘to donate’ is derived from 振 ‘to help’ with the semantic replaced to 貝 ‘sea shell (cowry)’. Of course, you can also say that these are regular phono-semantic compounds.
A more modern and unique category are the fanqie type derivations. This is when the initial consonant of the syllable is taken from one element, while the rest of the syllable is taken from the other element. These characters were created for the more precise transcription of foreign sounds that Chinese didn’t have. Some examples are the characters 𡅖 (the initial 名 + the final 養) and 𠆙 (the initial 亭 + the final 夜) from Buddhist texts.
Another type characteristic of modern writing can often be found in mediaeval texts, ‘element assimilation’. It is used in polysyllabic words (especially colloquial ones). There are quite many examples of this kind of construction, take for instance {敦煌} ‘Dunhuang’, which you could write like 燉煌, that is, the character 敦 took on the left element from 煌. Usually the assimilated element was the semantic. Even though 燉 already exists as a standalone character (usually used to record the words like ‘warm’ to ‘to boil’), it’s not the same as the 燉 from 燉煌 because the former is made up of the semantic 火 ‘fire’ and the phonetic 敦, while the latter is composed of the phonetic 敦 and the assimilated element 火. There are also some examples of the phonetic being assimilated. For instance, the word {鳳凰} ‘phoenix’, was usually written down as 鳳皇, but later on the second character turned into 凰 because of the assimilation with 鳳. In the character 鳳 the element 凡 is a phonetic but in 凰 it’s completely meaningless.
Sometimes these kinds of characters merge semantics, that is, fuse both the meanings and the sound of different characters. They’re not uncommon in dialects, take for example 甭 ‘needn’t’ from 用 ‘need’ and 不 ‘not’, or the dialectical character 㬟 ‘never’ from 勿 ‘not’ and 曾 ‘[refers to the past]’. This category also includes some characters denoting chemical elements. See for instance 羥 ‘hydroxyl’ (instead of 氫氧基) or 羰 ‘carbonyl’ (instead of 碳氧基). Another unique example is the characters for isotopes of hydrogen: 氕 ‘protium’, 氘 ‘deuterium’, 氚 ‘tritium’, where the phonetics and simultaneously the abstract pictograms are 丿(= 撇 ‘a stroke that falls downwards towards the left’), 刂 (a variant of 刀 ‘knife’) and 川 ‘river’ respectively.
Some characters are stylised borrowings from other writing systems. For example the character 卍 ‘swastika’ was brought from India together with Buddhism. The character 歹(𣦶) ‘bad’ is most likely a borrowed Tibetan character ཏ /ta/ (not to be confused with the homographic 歹, which is a simplified 死 ‘to die’). Prior to the 20th century these kinds of borrowings could pretty much only be found in dictionaries and you could count them on one hand.
Japan has been using characters that contain elements taken from the Japanese syllabaries for quite a long time now. These can either be entirely composed of kana or use them as phonetics. Similarly, you can find characters containing symbols from the Korean writing systems (either Hangul or Gugyeol) as additional elements. In recent centuries characters containing Latin letters have emerged in various regions. There are examples from even more writing systems but they will be covered in a different article.
Allographs
Allographs are characters that have the same readings and definitions but are written differently. In the strictest sense allographs are only those characters that are completely interchangeable, with no difference in meaning or pronunciation at all. Allographs can emerge from:
Additional elements (匧:篋 ‘box’)
Different approaches to the construction of the character (涙:泪 ‘tears’ or 韤:韈:𥀯:襪:𤿗:帓:𥿉:袜 ‘stockings’)
Different arrangement of elements (蠏:蟹 ‘crab’)
Simplification of elements (灋:法 ‘law’)
Cursive simplification (頭:头 ‘head’)
Corruption (珍:珎 ‘treasure’)
Tabooisation (玄:𤣥 ‘black’)
In a looser sense allographs can be characters that only share some of the semantics (quasi-allographs). For example the quasi-allographs of the character 雕 are 鵰, 彫, 琱 and 凋. 雕 has three primary definitions:
A kind of a bird of prey (original sense)
To engrave (sound loan)
To wither (sound loan)
The character 鵰 is only an allograph of 雕 in the first sense, 彫 and 琱 only in the second and 凋 only in the third.
This example belongs to the category of inclusive quasi-allographs, that is, when the usage of one character is completely included in the usage of a different one. There is also the non-inclusive type, when they have one common use but each of them also has their own different set of meanings. For example, the characters 記 and 紀 are interchangeable when used in the sense ‘to write down’ (in constructions like {記錄} ‘to record’, {記念}’commemoration’, {記要} ‘summary’, {大事記} ‘chronicle’, etc), but when {記} is used as a standalone word or in {記憶} ‘memory’, {記號} ‘symbol’ or {記者} ‘journalist’ then the character 紀 cannot be used. Quite often the relations between non-inclusive quasi-allographs are convoluted and it’s often impossible to find patterns in their usage so the link between 記 and 紀 described above can have a myriad of exceptions.
Homographs
Homographs are characters that are completely different but happen to look exactly the same (the opposite of allographs). In the narrow sense homographs are characters that were independently created for different words but accidentally turned out very similar. For instance 鉈 in the ancient times used to record a word referring to a kind of spear and it also used to be an allograph of 砣 ‘stone weight’, these days it can be used to write ‘thallium’. Homography can be a result of:
different approaches to character construction (the phono-semantic compound 体 ‘pallbearer’ and the compound semantogram 体 ‘body’);
same construction being used to convey different ideas (椅 ‘a kind of tree’, 椅 ‘chair’, 椅 ‘fragile (about a tree)’);
the development of the graphical form or its corruption (歹 ‘bad’ and 歹 (a simplified form of 死 ‘to die’)).
More often than not these don’t coexist during the same general era.
In a wider sense you can call all characters that look the same but record different words homographs. In that case even rebuses are homographs, like 花 ‘flower’ and 花 ‘to spend’. Same thing for extended meanings, such as 行 ‘road’ and 行 ‘row’. I don’t see any point in using the term ‘homograph’ for these, so these cases won’t be described here.
The History of Chinese Writing
Prehistory
Due to a lack of materials regarding proto-writing it’s not possible to reconstruct the processes that led to the emergence of the Chinese writing system, so this won’t be discussed here.
However, we have quite many Neolithic finds that are possibly somehow connected with Chinese writing.
The precise relationship between these symbols and Chinese is unknown, so we won’t discuss it here. It’s not even clear if they’re proto-writing, they could be something akin to the modern huaya (花押) monographic signatures.
Something else to keep in mind is that these symbols never disappeared after Chinese writing emerged. They kept being used in some regions for a long time. They can be found on pottery from various periods, up until Western Han.
Shang Era
The earliest known artefacts containing Chinese writing belong to the Shang period. By that time the writing has already been fully formed and developed, so it’s natural to assume that it had existed for a long time before that.
The main writing surfaces at the time were bamboo slips, but all of them were degraded in soil and are unavailable to us.
Most of the texts from that period that we have access to are inscriptions on plastrons and bones. Even though these weren’t the main materials at the time and were only used for very specialised purposes (usually divination) they give us a good idea of what writing during that era was like.
The divinatory records are typically divided into ‘royal’ and ‘non-royal’. The royal inscriptions are further divided into northern and southern.
The main groups of the northern branch:
Shi group (𠂤組)
Bin group (賓組)
Chu group (出組)
He group (何組)
Huang group (黃組)
Southern branch:
Li group (歷組)
Nameless group (無名組)
The primary non-royal groups are:
Zi group (子組)
Wu group (午組)
Funu divinations (婦女卜辭)
Round-form type (圓體類)
Cacoform type (劣體類)
Hounanzi type (侯南子)
Tunxizi type (屯西子類)
Huadongzi group (花東子組)
Something to note is that each of these groups have many subgroups and these subgroups in turn have subcategories of their own. I won’t be listing them because there’s still no complete consensus and they aren’t compiled anywhere yet. Below is a periodisation of the groups and the main subgroups based on rulers. The northern branch is in blue, the southern branch is in orange and the green is non-royal inscriptions.
The division is based on the contents of the inscriptions, the use of specific characters, style, etc. I won’t muse on that here.
Bronze artefacts rank second in terms of quantity of inscriptions. They can be grouped into four periods:
Middle of Shang ~ Wu Ding
Wu Ding ~ Zu Jia
Lin Xin ~ Wen Ding
Di Yi ~ Di Xin
Among the bronzes of the first period are around 20 texts that are older than the first oracle bone records we have. Usually they just have clan emblems on them.
The second period sees the character 亞 get added to the clan names. There are differing opinions but it’s most likely the name of a position.
During the third period the names of clans started being written inside of the 亞 (it became a simple frame). First sentences started to appear.
Artefacts from the fourth period contain long texts up to 40 characters on all kinds of topics: the creation of the item, royal provenance, etc.
Aside from these two writing surfaces there are also some rarer ones, like writing on clay pots (sometimes even with a brush) as well as inscriptions on gemstones. According to some, even some seals have been discovered.
Something to keep in mind is that the characters inscribed on bones and turtle shells are simplified compared to the ‘official’ forms used at that time that were probably used on the bamboo slips and that can be found on the bronze items. In the more official texts the characters preserved the corresponding shape.
Nevertheless, a more ‘pictographic’ appearance (like on the early bronzes) doesn’t necessarily indicate an earlier form of the character. This is something that has to be taken into consideration when researching etymology.
Western Zhou Era
Here bronze takes centre stage. Many inscriptions started being produced, recording various state affairs of the time.
There are still some bone records from the divinations by the various Zhou rulers. They are mostly concentrated in the early Western Zhou period, some are from before the Shang conquest.
Aside from that, there are also clay pot inscriptions.
This period also sees the emergence of the ‘Zhou forms’ that can be found in the ‘Shuowen Jiezi’. Since the forms in the dictionary are transmitted, they are considerably distorted.
Because the state was involved in the production of most vessels during Western Zhou the variation of the characters here is comparatively minimal. Nevertheless, the simplifications by the end of the Western Zhou period become obvious: the straightening of lines, loss of dots, loss of variation in line thickness, etc.
Eastern Zhou (Spring and Autumn and Warring States Periods)
These periods are combined here for one simple reason: China was divided into a whole bunch of kingdoms, resulting in the script splitting into several branches. But first let’s look at the artefacts.
From the Warring States period onwards, texts on a huge variety of different materials become available to us. First of all, the bronzes become much more diverse. Items that hardly ever appeared in the Shang and Western Zhou periods (musical instruments, weapons, measuring tools, accessories for horse carts, etc) are found in huge quantities from now on.
Inscriptions on jade also become common. For example we could mention the Houma covenant texts (侯馬盟書) from the Zhao kingdom (趙), as well as the Wenxian covenant texts (溫縣盟書) from the Han kingdom (韓), both from around the end of the Spring and Autumn period. They’re written with a brush, but there are also plenty of carved texts, like the Xingqi Yuming (行氣玉銘), a meditative text found on the twelve-sided jade knob of a magic wand or shaman’s staff from around the beginning of the Warring States era.
Inscriptions on regular stone also emerged, such as the stone-drum inscriptions (石鼓文) from the Qin kingdom (秦) around the end of the Spring and Autumn, the Qin Gong lithophones (景公石磬) from the Qin kingdom from around the middle of the Spring and Autumn as well as the ‘Curses against the State of Chu’ (詛楚文) also from the Qin kingdom from the middle of the Warring States period (unfortunately we don’t have the original text, only later copies).
There is also quite a lot of writing on coinage. Coinage can be found since Spring and Autumn, but during the Warring States period most kingdoms started producing them in large quantities. The coins themselves come in four shapes:
spades (generally of Jin (晉) provenance, also in its successor states Han, Zhao and Wei (魏);
knives (generally from Qi (齊), Yan (燕) and Zhao);
cowry shells (in Chu (楚)
round (generally Qin but also Jin, Chu and Yan).
Seals are extraordinarily rare in the earlier periods but became widespread during the Warring States. By their contents they can be divided into official, personal and letter seals, which were used to stamp a phrase with good wishes at the end of a letter.
Writing on ceramics can be found during Warring States too. Usually it’s just seal imprints, but from time to time something gets carved. These usually come from Qi or Yan right next to Qi.
During the Warring States era many items of lacquerware were produced too and in very few cases they had characters etched into them as well.
The earliest known silk manuscript also belongs to the Warring States era. It was made in the Chu kingdom and contains a creation myth, reports of extraordinary natural phenomena and disasters and what one should be cautious of during each of the 12 months.
Prior to the invention of paper, people in China mainly wrote on bamboo or wood. Bamboo slips were already widely used during the Shang and Western Zhou periods, but everything before the Warring States rotted away, as the material was not durable. All inscriptions from this period can be divided into Chu and Qin inscriptions (however, Chu inscriptions also contain characters from other kingdoms, see below). Mostly they contained literature, learning materials, documents, lists of funerary objects, divinations, etc. We have access to a huge number of different texts recorded on this material, almost all of them come from the kingdom of Chu.
Literacy in China grew starting with the Spring and Autumn. Consequently, this meant more regional variation among the characters. The diversity reached its peak during the Warring States era, but after the unification of China by Qin, the characters converged. Pretty much all contemporary graphs are descendants of the Qin forms. The Qin kingdom managed to preserve the Western Zhou characters quite well, so the non-Qin forms are usually called ‘characters of the six states’. All characters can be divided into five branches:
Qin branch (秦系)
Chu branch (楚系)
Qi branch (齊系)
Jin branch (晉系)
Yan branch (燕系)
The characters of the different kingdoms were quite divergent. Just compare the variants of 者:
They could also construct the characters in completely different ways. For example Chu wrote the word {來} ‘to come’ with an additional 辶, Yan added a 彳 instead, while Jin added a 止.
Which characters were used for which words also varied a lot. For example, Chu tended to use 砫 or sometimes 䝬 or 鉒 to write {重} ‘heavy’, Qi and Jin used 冢 and 塚, while Qin and Yan used 重. Chu and Yan wrote the word {縣} ‘prefecture’ as 睘 (in Yan sometimes also 還), Jin as 𬪗 and Qin and Qi used 縣.
Let’s have a closer look at how the branches are divided. Aside from the Chu kingdom, the Chu branch also includes the characters from the states of Zeng (曾), Xu (徐), Tsai (蔡), Song (宋), Wu (吳), Yue (越) and some others. The main defining trait of the bronze inscriptions from these kingdoms are the vertically elongated characters with thin, smooth lines. (These forms have also found their way to some eastern kingdoms like Qi and Lu (魯)). They started being widespread around the middle or the end of the Spring and Autumn period. By the end of the Warring States they start looking closer to regular brushwork.
This style also serves as the origin of a decorative style that uses bird-like shapes in its characters. It’s often called by the generic term ‘bird-worm script’ (鳥蟲書) and was mostly popular in Wu and Yue around the end of the Spring and Autumn era.
Also relevant here is the text ‘Goulou-bei’ (岣嶁碑) that contains characters of unknown origin. As the legend goes, they were created by Yu (禹) the founder of the Xia dynasty (夏). We don’t have any original inscriptions, but it’s generally believed that they are related to distorted inscriptions written in the Yue decorative script from the Warring States period.
The Qi branch can be divided into two sub-branches: a northern one (the states of Qi (齊), Ju (莒), Zhu (鑄), Feng (夆), Qi (杞), etc) and a southern one (Lu (魯), Zhu (邾), Teng (滕), Xue (薛), Cao (曹), etc).
Lu was Confucius’ homeland. Even though we don’t have any bamboo slips from these kingdoms, it’s generally believed that books were written in large quantities there, hence some forms that preserve the features peculiar to the Qi branch that can be found in Shuowen Jiezi among the ‘ancient script’ forms, as well as in the Hanjian. The bamboo books found in the walls of Confucius’ house were also written with the characters of this branch.
You can also find characters characteristic of the Qi branch in some Chu writings. It is believed that this is because the people from Chu were copying some things written in Qi. Generally these can be found in the texts 郭店簡《唐虞之道》《忠信之道》《語叢一~三》、上博簡《緇衣》、清華簡《保訓》.
The Jin branch contains the characters from Zhao (趙), Wei (魏) and Han (韓) (and Jin (晉) of course), as well as the nearby Wey (衛), Zheng (鄭), Zhongshan (中山) and some others. The strokes in the characters of this branch are thin and straight, while the characters themselves are relatively simple.
Even though we don’t have any bamboo writing from this period it’s generally accepted that the texts found by the grave robbers were written by people from Zheng who moved to Chu. Some characters have forms characteristic of the Jin branch. Generally you can find them in the texts 清華簡《繫年》《良臣》《祝辭》《筮法》《鄭武夫人規孺子》《鄭文公問太伯》《子産》etc.
The kingdoms of Zhao, Wei and Han could be said to be sub-branches of their own. For example the bottom part of the character 襄 got corrupted into 羊, a phonetic, in Zhao. In Wei and Han this character has forms that are completely different. Wei used a character with the phonetic 庚 to write the word {容}, while Han used a character with the phonetic 凶.
Transition to Modern Writing
It’s often believed that during Qin a writing reform was conducted that led to the emergence of modern writing, but the transition was actually quite gradual. One of the people supporting the idea of a reform was the author of Shuowen Jiezi, but the forms presented in that dictionary as the ‘standard Qin seal forms’ are often quite divergent from the real forms of the period that were in official use. In this picture you can see a character from Shuowen Jiezi next to some real forms from the official Qin writings, which, as you can tell, are quite variable.
The clerical script was formed around the end of the Warring States period. Its origin lies in the character forms used by the Qin commoners. Of course, the style found on the Qin bamboo slips still hadn’t reached its final form and was still constantly changing.
During the Han dynasty the clerical script reached its final form and became the main style of writing.
While cursive served as the auxiliary style.
Around the middle of the Eastern Han era the so-called neo-clerical script emerged from the clerical script. It was simpler and more convenient for everyday use.
Around the end of the Eastern Han period the semi-cursive emerged, combining the neo-clerical and cursive forms.
Between the Han and the Wei periods ‘regular script’ emerged, basing its forms on the semi-cursive.
Of course, it didn’t kill off the clerical and the neo-clerical scripts overnight. They coexisted until around the Wei-Jin period when regular script became dominant.
At the same time, under the influence from the semi-cursive and the regular script, the Han era cursive gradually transitioned to its modern form.
From that point on, Chinese writing had become more or less how we know it today. Of course, some characters still kept changing throughout the modern period.
How Character Etymology Should be Done
Now that we’ve got the bare basics down, let’s move on to the question set up in the title of the article. Here we’ll cover the key aspects that you’ll need to understand and internalise to be able to tell whether the character etymology presented to you is trustworthy.
1. More than 90% of Characters are Phono-Semantic Compounds
This is the most basic rule that you have to keep in mind. Since most characters are phono-semantic compounds the first thing that you should do is look for the phonetic. Let’s take this (wrong) etymology for instance:
鯨 jīng «whale» is made up of 魚 «fish» and 京 «big».
Of course, 京 jīng can mean ‘big’. But before we jump to conclusions let’s look at some other characters containing this element: 倞 jìng ‘strong’, 黥 qíng ‘to brand’, 䁁 liàng ‘crossed eyes’, 諒 liàng ‘to presume’, 惊 jīng ‘to get scared’, 景 jǐng ‘sunlight’, 晾 liàng ‘to dry’, 鍄 liàng ‘liang (musical instrument)’, 琼 qióng ‘red jade’, 就 jiù ‘to advance’.
You can tell that even in contemporary Mandarin the connection between the pronunciation of these words is transparent, the pattern is quite clear (all of these words can be traced back to the averaged *(K)RAŊ), while almost none have any connection to the meaning ‘strong’. The only word that seems out of place is 就, and that’s because that’s the one word where 京 actually isn’t a phonetic. Except it still doesn’t mean ‘big’, but is actually a simplification of the original element 𫢁 (which is made up of 亯 and 京 and the etymology of which is unknown).
What’s more is that characters are pretty much never used as semantics in their extended sense or as rebuses (unless you’re dealing with the non-Chinese recensions of the Chinese script, especially the Zhuang-Vietnamese branch). For example 大 ‘big’ doesn’t actually mean ‘big’ pretty much ever when used to construct other characters. In the majority of cases it just depicts a person.
Only if you couldn’t find the phonetic should you start trying to divide the character into semantic elements. If that doesn’t work either then most likely you’re dealing with graphical corruption, which, as with the other cases, should be found and proven.
The importance of phonetic writing in the ancient Chinese script can’t be stressed enough. Always remember that it was the cornerstone of the entire system. You can see this on the example of the variants of the word {腹} ’abdomen’ that you can find in the Houma covenant texts. Again, these are all from the same location and general era, so all of them were clear to the intended readers.
This is 22 variants total. In the image they’re arranged by frequency, the numbers in brackets are how many instances of this form can be found in the corpus. Under the forms you see the elements that make them up. Something to keep in mind though is that even these forms are generalised and can be divided into subforms, different only graphically. For example, here are some subforms derived from the form number 1:
But let’s talk about the regular forms. The core element in every single one of them is the phonetic 复, the rest are semantic elements. By frequency: 肉 ‘flesh’, 彳 ‘road’, 止 (a depiction of a foot), 夊 (an upside-down depiction of a foot), 口 ‘mouth’, 冖 (a depiction of a building or a room), 心 ‘heart’. Looking at these, you might have noticed that most of them aren’t obviously connected with ‘abdomen’ in any way. The most obvious is the element 肉 which is used as a semantic in a myriad of characters used for body part words. 心 could probably also refer to the body, or perhaps it appears under the influence from {腹心} ‘heart and body’ that appears quite often in these texts. Perhaps 冖 also conveys the idea of ‘tranquillity’ like in some other characters. The other elements are influence from the identical-sounding word {復}, ‘to return/to be restored’ that can also frequently be found in this text. 彳, 止 and 夊 are pretty obvious, but 口 might seem a bit out of place. A possible explanation for the presence of this component is that the restoration of Zhao Ni’s clan was viewed as a political act which involved the notion of ‘declaring’. Another possibility is that 口 was added to {復} not as a separate component but as a component linked to the 止, together forming the component 足. In the forms, which do not include the component 止, the component 口 could simply be the abbreviation of 足.
Of course, some forms can be interpreted in more than one manner. For instance, while form 1 is unambiguously a combination of the semantic 肉 and the phonetic 复, form 2 can be interpreted as:
a combination of the semantic 肉 (referring to ‘abdomen’), the semantic 彳(referring to ‘to return’) and the phonetic 复;
a combination of the semantic 肉 and the phonetic 復;
a combination of the semantic 彳and the phonetic 腹.
In the first case the semantics augment each other, referring to different, but identical-sounding words, while the phonetic combines them, i.e. ‘this word has something to do with either flesh or the road and it sounds like 复’, in that case we can choose either {腹} or {復}. In the second case we’ve only got one choice, the word {腹}, since it’s just a phono-semantic compound. In the third case we’ve also got only one option, the word {復}, but we can still reach the word {腹} that we need by treating this character as a rebus. (The previous two cases can also theoretically record any similar-sounding words).
Regardless of our interpretation, the main thing we’re left with is the phonetic 复 (or one derived from it). It is the skeleton of the graph that takes on whatever else gets attached (or doesn’t get attached, like in form 11).
The majority of words in early writing (in any single community and at any single point in time) could be written in different ways which were used with various frequencies. In theory, you could come up with any combination for pretty much any word, and that’s one of the key features of Chinese writing that has to be kept in mind. The scribes always tried to make it so that the readers could unambiguously pronounce the text they’re reading. Not just reading, but reading aloud, which is why we can even find phonetic assimilation, like in 日居月諸 *nik-ka ŋot-ta, which is how they wrote the assimilated 日乎月乎 *nik-wa ŋot-wa ‘Oh, Sun! Oh, Moon!’
2. Phonetics are Meaningless
If you encounter an etymology where the phonetic is presented as having semantic value, then most likely you’re dealing with folk etymology. Let’s have a look at an etymology for the same character but taken from a different source:
鯨 ‘whale’ is made up of 魚 ‘fish’ and the phonetic 京 (big, strong).
The dictionary purports an etymological connection based on the word {京} ‘huge’.
But in every single character that uses 京 as a phonetic (same as any other character that uses any other phonetic) this happened coincidentally. If several graphs have the same phonetic this means nothing except that the words these characters record are all phonetically similar to each other which led to the characters used to record them having the same phonetic.
You could also write {鯨} using the character 䲔 with the phonetic 畺. There is nothing special in 京, it just so happened that this is the phonetic that ended up being more predominant. Again, most phono-semantic compounds have variants that use different phonetics, just remember the characters for {道}’road’ that we’ve already discussed. This is not to mention the rebuses, which were used non-stop, so pretty much every word could be written in a ton of different ways, based simply on sound similarity.
Nevertheless, there are some rare examples where the phonetic is meaningful. In all of these cases however, both words must (and you should be able to prove it) be etymologically related. Among the obvious examples are 方 *paŋ ‘square’ > 鈁 *paŋ ‘a kind of a square vessel’, or 四 *ɬis ‘four’ > 駟 *ɬis ‘a vehicle drawn by four horses’ and 牭 *ɬis ‘four-year old cow’.
Another case when the phonetics can be considered to bear semantic load is when a semantic gets attached to a character without a change in meaning. We’ve discussed this device extensively in the part dedicated to phono-semantic compounds.
3. The Earliest Available Forms and Sources Must be Used
Since the characters have been distorted many times throughout their history (even in the earliest monuments), it is the earliest forms that need to be looked at in order to find out the true etymology. Even though they are not always known, scholarship is still being actively done, so to find the earliest forms it is necessary to use cutting-edge knowledge (see below). Consider, for instance, the characters 丁, 家 and 安.
The scholars of the 19th-20th centuries tended to explain the character 丁 as a depiction of a nail, with the original sense being {釘} ‘nail’. This was based on its form from Shuowen Jiezi. But when the writings from the Shang era were discovered, it became strikingly obvious that this isn’t the case. 丁 used to be written as a simple circle or a square. Nowadays, scholars tend to say that it was a depiction of a head, with the original sense {頂} ‘head’. Compare the development of 丁 and 天 (depiction of a person with the focus on the head):
The character 家 ‘house’ is made up of 宀 ‘house’ and 豕 ‘pig’ because of which it is often interpreted as something like ‘pig in house → bountiful home’.
The earliest texts draw a clear distinction between a female pig (豕) and a male pig (𢑓 = {豭}). The latter is always drawn with a penis. Look at the picture and compare the forms 豕, 𢑓 and 家. 家 clearly has 𢑓 aside from the later forms, where 𢑓 got contaminated with 豕.
Consequently, 家 is a phono-semantic compound with the semantic 宀 ‘house’ and the phonetic 𢑓.
The character 安 ‘calm’ is made up of 宀 ‘house’ and 女 ‘woman’. This leads to many people interpreting it in the spirit of ‘woman in house → calm’.
Shang era texts have two kinds of characters with a woman inside a house: one with a stroke inside and one without.
Many dictionaries list these as the ancient forms of 安, although we now know that the characters from the top row actually have nothing to do with 安 and instead record the word {賓} ‘guest’.
You can find a different form of 安 in the bone inscriptions excavated in 2003 in the village Daxin (Shandong). They show us a pictographic form of 安, one that didn’t acquire the ‘roof’ yet. If you look closely, there’s still a stroke under the woman’s arse. This stroke is preserved up until the Han period.
From that we can deduce that the original meaning of 安 was probably ‘to sit’ (see Erya: 安坐也 ‘安 means “to sit”’). The element 宀 was appended later (compare the similar progression in 坐 → 座 ‘to sit’).
4. The Most Minute Differences Must be Noted
Sometimes the differences between the characters are so insignificant that they’re practically impossible to notice. A non-specialist can easily confuse several different characters. A good example is on the table below, which compares the characters 人 ‘person’, 匕 ‘spoon’, 从 ‘to follow’ and 比 ‘to compare’ during the different periods in the Shang era.
It really is difficult to find any difference between these, but the differences are there and they’re positively crucial.
As an additional example let’s take the character 般 bān ‘kind, class’. Nowadays it’s composed of 舟 zhōu ‘boat’ and 殳 shū ‘bamboo pike’. Neither have much at all to contribute semantically nor phonetically (even if you look at Old Chinese, just compare 般 *p(r)ˤan with 舟 *tu and 殳 *do) (for convenience the rest of the article will use only Old Chinese readings). If we take a look at the earliest texts then we’ll see that 般 is actually constructed out of elements that have nothing to do with 舟 or 殳.
One of the elements of this character is 皿 ‘plate’ (usually turned by 90°) which, when turned, does bear resemblance to 舟. You can find examples of rotated 皿 in other characters too. For example the early character for the word {溫} ‘warm’, which depicts a person pouring warm water onto themselves out of a big plate (they were used for washing).
The other element didn’t survive to our days but probably depicts a {鞭} *pen ‘whip’. It can also be found in the character 更 (更 *ben, not to be confused with the homographic 更 *kˤraŋ ‘to change’), composed of the semantic {鞭} ‘whip’ and the phonetic 丙, that is, it’s one of the times when a phonetic is added to an already existing character.
Since it makes the most sense to assume that 般 *p(r)ˤan is made up of the semantic 皿 ‘plate’ and the phonetic {鞭} *pen, it most likely originally denoted {盤} *bˤan ‘plate’ (the plate 盤 was used for washing too). 般 is also used in the name of the Shang ruler Pan Geng (later on his name would be spelled as 盤庚). The modern sense ‘kind, class’ is just a rebus.
5. Academic Methodology must be Used
Methodology is part and parcel of any area of inquiry. It demands objectivity and a lack of impressionistic interpretation and taking things on faith. The objectivity is reached by the means of research, statistical tendencies, correction of past knowledge and reasoning based on empirical knowledge.
Every etymology must be proven. To prove them, an entire arsenal of materials must be utilised:
Ancient texts proper
Transmitted texts
Ancient dictionaries and commentaries
Linguistics (phonology, word etymology, etc)
History
Culture
Archaeology
Writing universals
And much more.
Let’s see some examples based on the etymologies of some characters.
Let’s start with the character 年 ‘year’. It contains the elements 禾 ‘grain’ and 人 ‘person’, the original meaning was ‘harvest’. It is often explained as ‘a man is carrying his harvest’, but there is no evidence in favour of this etymology. Every single character where a man is carrying something is constructed horizontally, not vertically.
There are two things to note here. In some of the forms the elements aren’t connected. Pretty much all characters where the elements aren’t connected are phono-semantic compounds. The second thing is that 人 *niŋ and 年 *nˤiŋ sound similar. Naturally, this implies that 人 is a phonetic here. This is how you can use general tendencies of early writing as evidence.
The character 母 ‘mother’ is often explained as ‘a depiction of a woman with dots denoting breasts’. The evidence for these being breasts is nonexistent. We would need a series of characters where the breasts are clearly depicted, and in the same manner.
At first, the word {母} was written using the character 女 ‘woman’ (one of the rare cases when one character denotes two phonetically unrelated words that are related semantically). This is already grounds for considering 母 a differentiated form of 女 (by analogy with 大→夫, etc), but we need more than that.
These kinds of dots filling up empty space can be found in many characters. The ones with dots and the ones without them mean exactly the same thing. This is no different from what we see in 母.
Based on this the most sensible conclusion is that the dots inside 母 are a simple ornament to differentiate it from 女, rather than a depiction of breasts.
Let’s move on to more complicated cases. The character 明 ‘bright’ is often explained as a semantogram composed of 日 ‘sun’ and 月 ‘moon’. Its ancient form is 朙 and it is also usually mistakenly interpreted as something like ‘月 “the moon” is shining through 囧 “the window”’. In reality, 囧 is just a simple phonetic. Let’s prove this.
During the Shang period, the character 朙 always contained the element 月 ‘moon’, albeit the second element varied: 囧, 日, 田, 口.
Can we consider them corrupted forms of 囧? Sure, but more evidence for that is in order. For that let’s look at the character 𥁰.
As you can see, the pattern repeats, so at this point there’s no reason not to consider the different forms in the variants of 朙 simplified forms of 囧.
In the Shang era inscriptions, 朙 could be used to write the same words as the character 囧. From the Western Zhou onwards, 𥁰 could be used to write the words {明} ‘bright’ and {盟} ‘ alliance’ (and even in post-Shang texts there were variants written with 日 and 田 instead of 囧). This evidence shows that 囧 is a phonetic in 朙.
However, the Shuowen Jiezi interprets the character 囧 as having the initial *k- (it says 讀若獷 ‘pronounced like 獷’), whereas we need *m-. This is why many people don't consider 囧 a phonetic in 朙. They missed another important comment made by Jia Kui (賈逵): 讀與明同 ‘pronounced the same as 明’. Moreover, apart from this comment from the Shuowen Jiezi, there are no examples in the ancient texts indicating that 囧 had the initial *k-. So where did it come from in the first place?
Shujing has the section 囧命, written as 伯囧 (both of these words can be spelled with 冏 instead of 囧 but that’s a mistake by later transcribers). Shuowen Jiezi also has a form of the last name written as 伯臩 (the character 臩 is interpreted as a form of 囧). The character 臩 is described as having the phonetic 臦, the reading of which is notated as 讀若誑, that is, it’s only different from the aforementioned 獷 by the syllable final. That said, there are also the characters 䀠/𥉁 which are very similar to 臦/臩 and 䀠 is a phonetic in many characters, for example in 攫 that has a different final consonant but is pretty similar. Seems like this has all been a huge mistake, but again, we need proof.
Among the many Warring States period copies of the Shuowen one can find a character corresponding with the 臩 from the 伯臩. It looks very similar to 臩 and is composed of 聑 and 大 (𫯺). This character can also be found in the Book of Rites compiled around the same time. In the transmitted versions this character is written as 攝.
All of this is to say that 臦/臩 phonetically corresponds not with 獷/誑/攫, but rather 聶/攝 and 𫯺. But we still need more evidence to connect 囧 and 臦/臩.
Thankfully, we have it. Among the (other) excavated copies of the Book of Rites, the same word in the same part of the text is written as 㘝, quite similarly to 囧.
Shuowen Jiezi has a comment next to it that states 讀若聶 ‘pronounced like 聶’. To conclude, that implies that the 囧 from 伯囧 is a variant of 㘝 and has nothing to do with the element from 朙.
What we end up with: the 囧 with the initial *m- is a phonetic in the characters 朙 and 𥁰, while the 囧 with the initial *k- from the Shuowen Jiezi is a corruption of 㘝, which, because of the corruption 聑 > 臦 > 䀠 was written with an incorrect reading. There is nothing to substantiate the semantics ‘window’ and the 囧 with the initial *m- has unknown etymology. The modern form 明 is a corruption of 眀 (the most common way to write this word during Qin), which in turn is a corruption of 朙.
Here we’ve used linguistic data, ancient dictionaries, ancient and received texts and the general tendencies of early writing (sound loans, corruption, the frequency of phono-semantic compounds, etc) to prove our thesis.
The character 閒 (間) ‘gap’ is frequently interpreted as a compound semantogram composed of 門 ‘gates’ and 月 ‘moon’, something like ‘the moonlight shining through the gates’. This etymology is however highly unconvincing and has no substantiation.
These days it’s generally accepted that the 月 in 閒 is a phonetic (originally transcribing the word {闌} ‘screen’). But if we look at the reconstructions of Old Chinese readings of both characters (as well as 闌), we’ll see that they’re actually not very similar at all:
月 */[ŋ]ʷat/ (Baxter-Sagart), */ŋod/ (Zhengzhang)
閒 */kˤre[n]/ (Baxter-Sagart), */kreen/ (Zhengzhang)
闌 */[r]ˤan/ (Baxter-Sagart), */ɡ·raːn/ (Zhengzhang)
That said, in the case of this character the meaningful evidence comes not from the pronunciations, but rather from the writing:
the connection between 閒 and 外;
the connection between 閒 and 柬;
the connection between 閒 and the characters similar in reading to 月.
Let’s discuss these in order.
In the writings from Chu and Qi from the Warring States period the word {閒} was usually spelled using the character 𨳿 with the phonetic 外 (which in turn obviously uses the phonetic 月).
The character [⿰夕刀] (has a similar pronunciation to 外) could be used to write the word {閒}.
A form of the character 蕑 exists, written as 𫟌 (with the phonetic 外 instead of 閒).
During Shang and Western Zhou the character 闌 *[r]ˤan is sometimes written with the additional phonetic 月.
In the texts of the Chu kingdom from the Warring States you could often see 閒 being used to write the word {諫} (the character 諫 has the same 柬 phonetic as 闌).
The character 閒 was frequently used to write the words that were written with the phonetic 干 *KAR, for instance {干} and {奸}. Examples also exist of the characters 汗 and 迀 being used to write the word {閒}.
There is at least one place where the word {宣} *s-qʷar was written using 閒.
The toponym 伊闕 could be written as 伊閒, that is, with 閒 for {闕} *kʷʰat.
All of this indicates that 閒 has some kind of close phonetic connection with 月. The current reconstructions don’t show this because this is a problem yet to be solved.
The character 開 ‘to open’ is frequently interpreted as ‘a depiction of hands removing a latch from the gates’ based on the ‘ancient form’ from the Shuowen Jiezi.
Since the ‘ancient forms’ from this dictionary tend to be copies of the forms of the ‘six states’, we should expect to find such a form there. And it is there, one can find it on a Qi seal from the Warring States. A similar form can also be found among the forms of the Zhongshan kingdom (中山).
I won’t bore you with the details, but it most likely descends from 𨴔 (the proto-form of 闢 ‘to open the door’), its interpretation as 開 is a mistake of the ancient scholars.
Our modern form 開 *kʰˤəiʔ however consists of just 門 ‘gates’ and 幵 (a depiction of two hairpins, transcribes the word {笄}).
The element 幵 *kˤen is a phonetic here, but just like in the previous example, the readings don’t align. Yet again, there is textual evidence that 幵 is a phonetic after all.
First of all, in Huangmen (皇門) we can find the word ‘to open’ being written using the character [⿱幵見], so, again, with the phonetic 幵 ([⿱幵見] itself was probably originally used to transcribe the word {䀘} ‘to obstruct the line of sight’). Here it is most likely being used as a sound loan.
Secondly, the book Xinian (繫年) has the same word written using the character 建 *kans. This is also a sound loan.
Then why does the phonetic not align with the reading? The answer is actually pretty simple, the reading changed under the influence of a different word ‘to open’, specifically {闓} *kʰˤəiʔ. This probably happened back in around the 3rd century BC. This kind of sleight of hand is important to keep in mind, the problem with 閒 is also most likely in this kind of substitution that we don’t know about yet.
6. The Most Cutting-Edge Scholarship must be Used
Throw the Shuowen Jiezi in the bin! Despite being written in the 2nd century people still often use it as an ‘authoritative’ source for etymologies. This is kinda like using the Greek philosophers to study physics. Is that what you want? Of course, I’m not telling you to abandon it entirely, it is excellent reference material. Just not for etymology.
The mistakes in the Shuowen were dissected countless times by contemporary (and not just contemporary) scholars, so I see no reason in talking about them in detail here. Its author couldn’t have known of the existence of the oracle bone script (and as we’ve shown, using the earliest available sources is of utmost importance) and had access to very few bamboo slips. Many etymologies in the dictionary are simply wrong or impossible to prove.
Pretty much all of my criticism of the Shuowen Jiezi also applies to any etymological works predating the 20th century, when the bone inscriptions were discovered, along with a huge number of bamboo slips and many other important materials.
Some contemporary incorrect etymologies go back to Song era scholarship. For examples, sometimes you may see this etymology:
烏 ‘raven’ depicts a bird (鳥) with no eye, since you can’t see a black eye on a black bird.
Most likely it first appeared in the lost 10th century book ‘Some Notes on the Origins of Characters and their Elements’ (字源偏傍小說) by Lin Han (林罕). Despite the text being lost, you can frequently find quotes from it in the works by various Song scholars. For example, this exact explanation of the character 烏 ‘raven’ is given in the 11th century dictionary Piya (埤雅) written by Lu Dian (陸佃):
林罕以爲全象鳥形但不注其目睛萬類目睛皆黒烏體全黒遠而不分別其睛也
Lin Han believed this is a depiction of a bird but without the eye. All birds have black eyes, but ravens also have black bodies, so you can’t see its eye from afar.
This is certainly an interesting theory, but since the scholars at the time had no access to any ancient characters aside from those in the Shuowen Jiezi (as well as some bronze, stone and bamboo slips, but all of those also had to be properly interpreted), they had no chance of reaching the correct etymologies. Today we have access to many much more ancient inscriptions, allowing us to observe closely the evolution of these two characters.
The earliest known versions of 鳥 ‘bird’ and 烏 ‘raven’ are quite distinct from each other. This former was a depiction of a normal bird looking straight, while the latter had the bird looking upwards with an open beak. Either character could be drawn either with or without eyes, which is the moment when Lin Han’s theory crumbles.
Later on they converged, but 烏 was kept distinct from 鳥 for quite a long time, even during the modern period. It only reached its current appearance during the Tang dynasty.
When talking about character etymology, some people keep using terms that have nothing to do with it. For example, the ‘radical’ of a character, which is sometimes confused for the semantic, even though in characters like 信 ‘to believe’ the radical 人 ‘person’ is the phonetic, while in the characters that have abstract elements like 一, 二 or 亠 as their radicals they are neither semantics nor phonetics.
The radicals are simply dictionary indexing components that have nothing to do with the historical composition of the characters or their origin. They are used to make dictionary lookup easier. Radicals first appeared in the Shuowen Jiezi, which had around 540 of them (these are still used in contemporary compilations of ancient forms of characters). After that, the quantity remained roughly the same in the Yupian dictionary (玉篇) written in 543 AD, which had 542 radicals, and the Leipian (類篇) from 1066, that had 544. The Zitong dictionary (字通) from 1254 rapidly decreased the number of radicals to 89. The 1615 Zihui (字彙) dictionary included 214 radicals. These formed the basis for the 1716 Kangxi (康熙字典) dictionary, on which the contemporary standard is partially based on. In 2009, the Chinese Ministry of Education issued the ‘Table of Indexing Chinese Character Components’ (漢字部首表) consisting of 201 radicals, which is the current standard for simplified Chinese under the specification code GF 0011-2009.
Another commonly used term is ‘large seal script’ (大篆), which is applied to a form of Chinese characters that was used in a certain period. At first, it was used to refer to any ‘Zhou script’ forms that were similar to the Qin seal forms but predated them. How this term is used today is pretty confusing. Some people use it to refer to all pre-Qin characters (early scholars tended to use it in this sense). Others used it to refer to the bronze inscriptions from later Western Zhou, as well as to the stone-drum inscriptions (this is also a relatively old sense of the term). And others use it to refer to the Qin script used during the Spring and Autumn and Warring States periods. As you can tell, there’s nothing even close to a common definition here. I think that this term should be completely avoided when talking about history of writing. It should be considered a kind of a calligraphic style used by contemporary calligraphers when making, for example, seals. (Same with ‘small seal’ (小篆), which is generally used to refer to the forms from the Shuowen Jiezi).
I would also like to address the ancient forms of characters found in various online dictionaries and compilations. I would advise to use them only when absolutely necessary. For one, they usually contain outdated information (For example, when an ancient character is not really the character you are looking for, but was used to write a completely different word). The number of characters they contain is really small and they’re usually taken from very few sources (For example, very few bamboo slip forms are usually listed, if any at all), you almost never find forms from the seals of different kingdoms and periods, Warring States period coinage, late Spring and Autumn period contracts, etc. They are also usually sorted according to calligraphic styles rather than actual historical periods or geographical locations. Often, the forms are unsourced. Knowing the source is crucial to be able to check the text for yourself to see whether the form really exists and what it means in the context. As far as online dictionaries go, it is best to use only specialised ones that are linked to actual scans of the texts. I will link several of these in the list at the end of the article. You can forget about general sites like the famous hanziyuan.net or xiaoxue.iis.sinica.edu.tw, to say nothing of regular dictionaries like zdic.net.
After the archaeological discoveries of the 20th century, the study of early Chinese writing began to develop at such a rapid pace that many modern mass-produced books on characters are still stuck in the distant past. Etymology books and dictionaries use long outdated data from scholars of the last century. Although this is not nearly as bad as citing Shuowen, we should still stick to the latest trends.
Sometimes even 20th century scholarship doesn't help. Some people keep using even older sources. This is to say nothing about the research being done in the 21st century: a lot of the things that were considered relevant ten years ago are now outdated. Everything depends very much on the newest archaeological discoveries and the output of leading researchers. A recent example is the character 丸 ‘ball’, long thought to be a differentiated form of 夗, until in mid-2021 it was proven to be a depiction of a bent man. Being able to follow the most cutting-edge trends in etymology requires serious study of a huge number of books, articles, the ability to read the original ancient texts, etc.
Of course, the average person just wanting to learn some etymology shouldn’t need to do all that. For such people, etymology should be presented in a more or less straightforward way and in accordance with contemporary scholarship, not as is done in the vast majority of modern dictionaries. One of my projects is aimed at amending this situation.
However, if you want to study Chinese character etymology independently, you should be prepared to read the vast amount of material published over the last century. I don’t think I have to mention that 99% of it is in Chinese.
You should pay attention to the names Qiu Xigui (裘錫圭), Li Xueqin (李學勤), He Linyi (何琳儀), Liu Zhao (劉釗), Chen Jian (陳劍), Lin Yun (林澐), Cheng Yan (程燕), Wu Keqing (鄔可晶), Ji Xusheng (季旭昇), Xie Minwen (謝明文), Bai Yulan (白於藍), Huang Dequan (黃德寬) and many others. I could continue the list of good scholars for a long time, but the easiest way for you to find other great authors is to see who these people are citing.
You should be wary of some authors, like Wang Ning (王寧), who is not a palaeographer. Her etymologies are frequently criticised by leading scholars. What’s more is that she has an unhealthy reliance on the Shuowen. Another example of a scholar you should avoid is Shizuka Shirakawa. They worship him over in Japan, they show him on TV, constantly cite, but his etymologies are either painfully outdated or based on thin air.
Another thing to know is that you probably won’t find papers with titles like ‘An Explanation of the Character X’. You can expect to find the newest character etymologies in big theory books or something like a three-page paper commenting on some ancient text. This means that you must read everything. The more the better.
I hope the article was helpful to all beginner etymologists and interesting to the people who already know some of this stuff. Now I’ll list some resources.
Recommended Literature and Resources
Etymology Quick Lookup
季旭昇《說文新證》(2014)
林志強《〈文源〉評注》(2017)
劉志基《中國漢字文物大系》(2013)
李學勤《字源》(2013)
Keep in mind that these books, despite being written by good scholars in the last decade, are already severely outdated (especially the latter two, they’re only good for composite or more modern characters).
Must Read
裘錫圭《文字學概要(修訂本)》(2013) (a general history and theory of Chinese writing)
裘錫圭《裘錫圭學術文集》(2012) (a collection of works by Qiu Xigui (裘錫圭), whose works are nothing but revolutionary for palaeography and etymology, and are pretty much the gold standard on contemporary etymological methodology)
劉釗《古文字構形學(修訂本)》(2011) (theory of construction of ancient characters)
張涌泉《漢語俗字研究(增訂本)》(2010) (theory of common characters and forms)
Imre Galambos, ‘Orthography of Early Chinese Writing: Evidence from Newly Excavated Manuscripts’ (2006) (a history of Chinese orthography and why early writing shouldn’t be treated the same way as contemporary)
Crispin L. Williams, ‘Interpreting the Wenxian Covenant Texts. Methodological Procedure and Selected Analysis’ (2004) (a methodology of decipherment and reading of ancient texts)
An Introduction to the Study of Ancient Characters
李學勤《古文字學初階》(2013) (first edition is from 1985, new ones don’t get updated)
陳煒湛、唐鈺明《古文字學綱要》(2009) (around 200 real texts with breakdown)
黃德寬《古文字學》(2015) (talks about the study of ancient characters but also about the characters themselves)
馮時《中國古文字學概論》(2016) (also explains the phonology and interpretation of ancient texts)
陳世輝、湯餘惠《古文字學概要(修訂本)》(2017) (lots of pictures, the book is based on reading real texts)
何琳儀《戰國文字通論(訂補)》(2017) (an introduction to the theory of the Warring States characters)
Shang Era (+ Oracle Bone)
黃德寬《商代文字字形表》(2017) (excellent review book on Shang forms)
劉釗《新甲骨文編》(2014) (as far as identifying forms from bone inscription goes, this guy is much more trustworthy than the previous one)
畢秀潔《商代文全編》(2012) (not a bad compilation of characters from the Shang bronzes)
松丸道雄《甲骨文字字釋綜覽》(1994) (a huge overview of all interpretations of characters on turtle shells and bones from the last century)
于省《甲骨文字詁林》(1996) (a classic summary of forms and meanings, but obviously outdated)
何景成《甲骨文字詁林補編》(2017) (many corrections and additions to the previous one based on new research)
崔恒昇《簡明甲骨文詞典》(2001) (words and meanings)
趙誠《甲骨文簡明詞典》(2009) (words and meanings)
孟世凱《甲骨學辭典》(2009) (words and meanings)
朱歧祥《甲骨文詞譜》(2013) (words and meanings)
落合淳思《甲骨文字辞典》(2016) (words and meanings)
毛祖志《現有甲骨文字典詞典及其存在的問題概述》(2019) (a review of dictionaries)
Western Zhou (+ Bronzes)
張俊成《西周金文字編》(2018) (best dictionary of W. Zhou bronzes so far)
江學旺《西周文字字形表》(2017) (excellent review book on W. Zhou forms)
董蓮池《新文編》(2011) (review book on bronzes of all periods)
王文耀《簡明金文詞典》(1998) (words and meanings)
吳鎮烽《文人名匯編》(2006) (personal names from bronzes of all periods)
Eastern Zhou + Qin (+ Bamboo and Silk Script)
黃德寬《春秋文字字形表》(2017) (so far the best dictionary on Spring and Autumn forms)
徐在國《戰國文字字形表》(2017) (an excellent review of the forms of the Warring States)
呉良寶《先秦貨幣文字編》(2006) (a major compilation of characters from coins)
曹錦炎《東周鳥篆文字編》(1998) (decorative style characters)
單曉偉《秦文字字形表》(2017) (a good review of Qin forms)
王輝《秦文字編》(2015) (best dictionary of Qin forms)
李守奎《楚文字編》(2003) (excellent dictionary of Chu forms)
滕壬生《楚系簡帛文字編》(2008) (an excellent dictionary of Chu forms from the bamboo and silk inscriptions)
張守中《中山王厝器文字編》(2011) (characters from the Zhongshan Kingdom)
張守中《侯馬盟書字表新編》(2017) ( characters from the Houma treaties)
湯志彪《三晉文字編》(2013) (best dictionary on Jin forms)
黃聖松《東周齊國文字研究・文字編》(2002) (an excellent dictionary of Qi forms)
孫剛《齊文字編》(2010) (excellent dictionary of Qi forms)
何琳儀《戰國古文字典》(2004) (an excellent work on the meanings and readings of the characters + a huge compilation of forms)
徐在國《上博楚簡文字聲系》(2013) (comments and corrections to all characters from the previous work + new information)
白於藍《簡帛古書通假字大系》(2017) (giant work on sound loans in bamboo and silk inscriptions)
徐俊剛《非簡帛類戰國文字通假材料的整理與研究》(2018) (sound loans in inscriptions other than bamboo and silk)
Han
于淼《漢代隸書異體字表與相關問題研究》(2015) (a huge compilation of Han forms)
李鵬輝《漢印文字資料整理與相關問題研究》(2017) (huge compilation of Han seals)
趙平安《秦漢印章封泥文字編》(2019) (the best dictionary of characters from Qin and Han seals so far)
徐正考《漢代銅器銘文文字編》(2005) (Han bronzes)
General and Miscellaneous
黃德寬《古文字譜系疏證》(2007) (a huge dictionary of ancient script characters with meanings in different periods and etymologies)
殷蓀《中國磚銘文字徵》(1996) (a huge compilation of stone inscriptions)
王恩田《陶文字典》(2007) (ceramics characters)
高明、涂白奎 《古陶字錄》(2014) (more characters from pottery)
黃征《煌俗字典》(2020) (common forms from Dunhuang)
徐中舒《秦漢魏晉篆隸字形表》(1985) (a handy but severely outdated overview of transitional forms from ancient to modern writing)
徐在國《傳抄古文字編》(2006) (compilation of transitional forms in early writing)
徐在國《隸定古文疏證》(2011) (a compilation of transmitted forms in early writing)
曾良、陳敏《明清小說俗字典》(2017) (common forms from Ming and Qing texts)
There are also many collections of specific texts from all eras with detailed commentaries, analyses of the characters, words, misspellings, etc, but there are way too many of them to list here. You can find these yourself by referencing the list of reputable authors earlier in the article.
Websites with Scholarship
cnki.net (the obvious and best candidate, but requires a university subscription if you don’t want to pay to download articles)
jgw.aynu.edu.cn (a huge number of articles, dissertations, etc, periodically uploading new stuff, but the site does not work from 3PM to 8PM UTC)
resource.hzlib.cn (a huge collection of articles from various journals, not limited to just palaeography)
fdgwz.org.cn (some articles)
xianqin.org (some notes and thingies)
bsm.org.cn (for bamboo and silk writing)
Useful Online Compilations and Dictionaries
kaom.net (a Chinese studies student’s bible, very handy for reconstructions,sound loans, huge dialect databases, etc)
zi.tools (an excellent site on character meanings and ancient forms (which you should always against the latest scholarship), which is constantly updated with various features and other things, which you can follow on their telegram channel)
jgw.aynu.edu.cn (already featured above, but it's also a huge corpus of bone inscriptions)
bsm.org.cn (also featured above, and can also be used as a bamboo inscription corpus)
coe21.zinbun.kyoto-u.ac.jp (huge corpus of inscriptions on stelae and forms that can be found on them)
mokkanko.nabunken.go.jp (corpus of Japanese mokkans)
wcd-ihp.ascdc.sinica.edu.tw/woodslip (corpus of Han zhujian inscriptions)
mojiportal.nabunken.go.jp (here you can quickly look up the forms of the character in Japanese mokkan and zhujian inscriptions + some other stuff, but it’s generally better to use corpora for that)
idp.bl.uk (Dunhuang collections)
codh.rois.ac.jp/tensho (collections of seal forms, including many transmitted ancient forms and forms from the Shuowen)
base1.nijl.ac.jp (a compilation of Japanese seals)
dict.variants.moe.edu.tw (a dictionary of variants of characters with scans of their sources)
database-of-medieval-chinese-texts.be (huge compilation of forms from Dunhuang manuscripts)
A great source!
Please give a comment how to cite it - even your name/authorship is not clear to me....
Absolutely perfect, orderly document with Must read list at the bottom. Thank you! John Renfroe of Outlier Linguistics showed it to our Intermediate Classical Chinese class. 多谢, 祝你们天天开心, 日日心里有得平安!