Buginese

The Buginese script, also known as Lontara, is used to write the Buginese (Bugis) language on the island of Sulawesi in Indonesia, primarily in southwest Sulawesi, but the language is also spoken in other areas. The script is a descendant of Brahmi, and may be related to Javanese. It shows some affinity to the Tagalog script. Buginese has been in use since the 14C. It has also been used to write the Makasar, Bima, and Mandar languages.

Unicode blocks Buginese
Alternate names
Timeframe 14C to present
Regions European
Type abugida
Alternate names left to right
Status living
Number of speakers 4 million
Languages
Main sources Matthes, B. F. 1875. Boeginesche Spraakkunst. Den Haag: Martinus Nijhoff.
Secondary sources Sirk, Ü. 1983. The Buginese language. Moscow: Nauka. (Languages of Asia and Africa.)
Proposal http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2633r.pdf

Buhid

Buhid is a living minority script used in Mindoro in the Philippines used to write the Buhid language. Buhid is a Brahmi-derived script, distantly related to the South Indian scripts. Buhid is closely related to the Hanunóo and Tagbanwa scripts of the Philippines. All three scripts are related to Tagalog, but may not be directly descended from it. The ancestor of these Philippine scripts (including Tagalog) may have been transported to the Philippines via palaeographic scripts of western Java between the 10 and 14C CE.

Unicode blocks Buhid
Alternate names
Timeframe pre-19C to present
Regions European
Type abugida
Alternate names left to right
Status living
Number of speakers 8000
Languages
Main sources Kuipers, J., and R. McDermott. 1996. "Insular Southeast Asian Scripts" in The World’s Writing Systems, ed. Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press, pp. 474-484.
Secondary sources Santos, Hector. 1994. The Living Scripts. Los Angeles: Sushi Dog Graphics. (Ancient Philippine scripts series, 2).
Proposal http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1933.pdf

Byzantine Musical Symbols

Byzantine Musical Symbols are musical symbols used to write the religious music and hymns of the Christian Orthodox Church and some folk music manuscripts. These symbols first appeared in the 7C to 8C CE. In 1881, the Orthodox Patriarchy Musical Committee established the New Analytical Byzantine Musical Notation System, which is used today. Most of the manuscripts are in Greek, although a few are in in Russian, Bulgarian, Romanian, and Arabic.

Unicode blocks Byzantine Musical Symbols
Alternate names
Timeframe 7C or 8C to present
Regions European
Type symbols
Alternate names left to right
Status historical
Number of speakers 0
Languages
Main sources Hellenic Organization for Standardization (ELOT). 1997. The Greek Byzantine Musical Notation System. Athens. (=ELOT 1373)
Secondary sources
Proposal

CJK

"CJK" refers to to the unified Han characters used to write to the Chinese, Japanese, and Korean languages. Technically, CJK can also be used for the Vietnamese language, since early Vietnamese writing systems were based on Han. There are several blocks of CJK characters, including multiple blocks of unified ideographs, and blocks of compatibilty ideographs, symbols and punctuation marks, strokes, and radicals. A description of the CJK blocks and the unification principles appears in section 12.1 of The Unicode Standard, and a history of the encoding appears in Appendix E of the Standard.

Unicode blocks CJK Compatibility Forms, CJK Compatibility Ideographs, CJK Compatibility Ideographs Supplement, CJK Radicals Supplement, CJK Strokes, CJK Symbols and Punctuation, CJK Unified Ideographs, CJK Unified Ideographs Extension A, CJK Unified Ideographs Extension B, CJK Unified Ideographs Extension C, CJK Unified Ideographs Extension D
Alternate names
Timeframe
Regions European
Type logosyllabary
Alternate names variable
Status living
Number of speakers 1.3 billion
Languages
Main sources Mair, V. 1996. "Modern Chinese Writing" in The World’s Writing Systems, ed. Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press, pp. 200-208; Smith, J. 1996. "Japanese Writing" in Daniels & Bright, pp. 209-217; King, R. 1996. "Korean Writing" in Daniels & Bright, pp. 218-227.
Secondary sources Lunde, Ken. 2009. CJKV Information Processing. 2nd ed. Beijing, Cambridge, MA: O’Reilly.
Proposal

Carian

Carian is a partly undeciphered script, which has some relationship to the Greek alphabet. It is used to write the Carian language. Carian dates to the first millennium BCE. A few inscriptions have been found in Caria, on the western portion of present-day Turkey, and a fragmentary bilingual has also been found in Athens. However, the bulk of extant texts have been found in Egypt, and were left by Carian mercenaries. In 1996 an extensive Carian-Greek bilingual was discovered at Kaunos in southwestern Turkey. The bilingual clearly demonstrated Carian was a member of the Anatolian branch of Indo-European, though details of the language still remain unclear.

Unicode blocks Carian
Alternate names
Timeframe x-7C to -3C
Regions European
Type alphabet
Alternate names variable
Status historical
Number of speakers 0
Languages
Main sources Melchert, H. C. 2004. "Carian" in The Cambridge Encyclopedia of the World’s Ancient Languages, ed. Roger Woodard. Cambridge: Cambridge University Press, pp. 609-613.
Secondary sources Swiggers, P., and W. Jenniges. 1996. “The Anatolian Alphabets” in The World’s Writing Systems, ed. Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press, pp. 281-287.
Proposal http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3020.pdf

Cham

The Cham script, also known as Akhar Thrah, is a Brahmi-derived script used write the Cham language, an Austronesian language. There are two main varieties of the Cham language: Western Cham, which spoken in Cambodia (and to a lesser extent in Vietnam and Thailand), and Eastern Cham, spoken in Vietnam. Speakers of the former tend to use the Arabic script while some speakers of the latter still use the Cham script.

Unicode blocks Cham
Alternate names
Timeframe 1000C to present
Regions European
Type abugida
Alternate names left to right
Status living
Number of speakers 290000
Languages
Main sources Aymonier, Étienne, and Antoine Cabaton. 1906. Dictionnaire cam-Français. Paris.
Secondary sources Kono Rokuro, Chino Eiichi, and Nishida Tatsuo. 2001. The Sanseido Encyclopaedia of Linguistics. Volume 7: Scripts and Writing Systems of the World (Gengogaku dai ziten (bekkan) sekai mozi ziten). Tokyo: Sanseido Press.
Proposal http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3120.pdf

Cherokee

The Cherokee script is used to write the indigenous Cherokee language. Cherokee is the native tongue to about 20,000 people, though most speakers today use it as a second language. It was originally invented by Sequoyah, a Cherokee silversmith, in the early 19C, between 1815 and 1821. Sequoyah devised a system of numbers for Cherokee, but today Latin numbers are used instead. The script is still taught today.

Unicode blocks Cherokee
Alternate names
Timeframe early 19C to present
Regions European
Type syllabary
Alternate names left to right
Status living
Number of speakers 20000
Languages
Main sources Scancarelli, J. 1996. “Cherokee Writing” in The World’s Writing Systems, ed. Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press, pp. 587-592.
Secondary sources
Proposal

Combining Diacritical Marks

Diacritical marks are ancillary marks that are added to a base character, and can be used to indicate how the character is to be pronounced or stressed. The Combining Diacritical Marks block includes characters intended for general use with any script. Diacritical marks that are specific to a particular script are encoded with that script. The Combining Diacritical Marks Supplement comprises a set of lesser-used combining diacritical marks.

Unicode blocks Combining Diacritical Marks, Combining Diacritical Marks Supplement
Alternate names
Timeframe
Regions
Type symbols
Alternate names
Status living
Number of speakers
Languages
Main sources The Unicode Consortium. 2011. The Unicode Standard, Version 6.0, defined by: The Unicode Standard, Version 6.0. Mountain View, CA: The Unicode Consortium, p. 234 (Section 7.9).
Secondary sources
Proposal

Combining Diacritical Marks for Symbols

This set of Combining Diacritical Marks for Symbols are, in general, to be applied to mathematical or technical symbols, and serve to extend the set of such symbols. A number of compatibility enclosing marks are also included in this block, which can enclose the base character in different ways.

Unicode blocks Combining Diacritical Marks for Symbols
Alternate names
Timeframe
Regions
Type symbols
Alternate names
Status living
Number of speakers
Languages
Main sources The Unicode Consortium. 2011. The Unicode Standard, Version 6.0, defined by: The Unicode Standard, Version 6.0. Mountain View, CA: The Unicode Consortium, pp. 234-235 (Section 7.9).
Secondary sources
Proposal

Combining Half Marks

The Combining Half Marks set consists of a number of combining mark pieces that can be used to visually encode certain combining marks that extend over multiple base letterforms. They are included to facilitate the support of such marks in legacy implementations. However, double diacritics, such as U+0360 and U+0361, are to be preferred. The block also includes macron marks that are recommended for representing a style of supralineation in Coptic.

Unicode blocks Combining Half Marks
Alternate names
Timeframe
Regions
Type symbols
Alternate names
Status lviing
Number of speakers
Languages
Main sources The Unicode Consortium. 2011. The Unicode Standard, Version 6.0, defined by: The Unicode Standard, Version 6.0. Mountain View, CA: The Unicode Consortium, p. 235 (Section 7.9).
Secondary sources
Proposal

Common Indic Number Forms

The Common Indic Number Forms are characters that are used to represent fractional values in various scripts of North India, Pakistan and Nepal. These signs were used to write currency, weight, measure, time, and other units. They have been used since the 16C and are still employed today in a limited capacity.

Unicode blocks Common Indic Number Forms
Alternate names
Timeframe 16C to present
Regions European
Type numeric
Alternate names left to right
Status living
Number of speakers
Languages
Main sources The Unicode Consortium. 2011. The Unicode Standard, Version 6.0, defined by: The Unicode Standard, Version 6.0. Mountain View, CA: The Unicode Consortium, p. 486-487 (Section 15.3).
Secondary sources
Proposal http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3367.pdf

Control Pictures

Control pictures are conventional representations of nongraphic characters for use when it is necessary to show the position of a control code within a data stream. Three characters are included in this block to visibly represent ASCII space.

Unicode blocks Control Pictures
Alternate names
Timeframe various
Regions
Type symbols
Alternate names
Status living
Number of speakers
Languages
Main sources The Unicode Consortium. 2011. The Unicode Standard, Version 6.0, defined by: The Unicode Standard, Version 6.0. Mountain View, CA: The Unicode Consortium, p. 495 (Section 15.6).
Secondary sources
Proposal

Coptic

The Coptic script represents the final stage in the development of the Egyptian writing system and was used for writing the Coptic language. Coptic was based on the Greek uncial alphabets but several letters were added that were unique to Coptic. Although the language died out in the 14C, it is still maintained as a liturgical language by Coptic Christians. Before Unicode 4.1, Coptic was considered a stylistic variant of Greek, so 14 Coptic characters appear in the "Greek and Coptic" block. Coptic is now considered to be disunified from Greek, so one can use the 14 Coptic characters in the "Greek and Coptic" block as well as additional letters that appear in the new "Coptic" block (which also includes characters for Old Coptic and Nubian).

Unicode blocks Coptic, Greek and Coptic
Alternate names
Timeframe 4C to present
Regions European
Type alphabet
Alternate names left to right
Status liturgical
Number of speakers 0
Languages
Main sources Ritner R. 1996. "The Coptic Alphabet" in The World’s Writing Systems, ed. Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press, pp. 287-290.
Secondary sources
Proposal http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2744.pdf

Cuneiform

Sumero-Akkadian Cuneiform is a logosyllabary that was used from the end of the third millennium until the 1C CE. It spread beyond Mesopotamia to Elam, Assyria, eastern Syria, southern Anatolia, and Egypt, and was used for many languages outside of Sumerian and Akkadian. The script developed from Proto-Cuneiform and Early Dynastic cuneiform. Cuneiform is one of the world's oldest writing systems.

Unicode blocks Cuneiform, Cuneiform Numbers and Punctuation
Alternate names
Timeframe x-2350 to 1C
Regions European
Type logosyllabary
Alternate names left to right
Status historical
Number of speakers 0
Languages
Main sources Cooper, J. 1996. “Sumerian and Akkadian” in The World’s Writing Systems, ed. Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press, pp. 37-57.
Secondary sources Gragg, G. 1996. “Other languages” in The World’s Writing Systems, ed. Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press, pp. 58-72.
Proposal http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2786.pdf

Currency Symbols

The Currency Symbols block includes customary symbols used to indicate certain currencies in general text. The signs may vary in shape and are often used for more than one currency. Not all currencies are represented by a currency symbol; some use sequences of multiple-letters, while the abbreviations for currencies can vary by language. Some contemporary or historic currency symbols, not found in the Currency Symbols block, may be found in other blocks.

Unicode blocks Currency Symbols
Alternate names
Timeframe various
Regions
Type symbols
Alternate names
Status living
Number of speakers
Languages
Main sources The Unicode Consortium. 2011. The Unicode Standard, Version 6.0, defined by: The Unicode Standard, Version 6.0. Mountain View, CA: The Unicode Consortium, pp. 478-480 (Section 15.1).
Secondary sources
Proposal

Cypriot Syllabary

The Cypriot syllabary is an historical script used to write the Cypriot dialect of Greek and a non-Indo-European language, "Eteo-Cypriot." The script was used from the middle of the 11C to 3C BCE and appears at be descended from one of the Cypro-Minoan scripts of Cyprus. The Cypriot syllabary shares some orthographic conventions with Linear B, but the script is written right to left.

Unicode blocks Cypriot Syllabary
Alternate names
Timeframe x-11C to -3C
Regions European
Type syllabary
Alternate names right to left
Status historical
Number of speakers 0
Languages
Main sources Woodard, Roger. 2004. "Greek dialects" in The Cambridge Encyclopedia of the World’s Ancient Languages, ed. Roger Woodard. Cambridge: Cambridge University Press, pp. 650-672.
Secondary sources Bennett, E. 1996. "Aegean Scripts" in The World’s Writing Systems, ed. Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press, pp. 125-133.
Proposal http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2378.pdf

Cyrillic

The Cyrillic script has traditionally been used for writing various Slavic languages, including Russian. The script dates to the 9C or 10C CE, and is named after St. Cyril, a Byzantine missionary. It is one of several scripts that were ultimately derived from the Greek script. Cyrillic has been extended to write non-Slavic languages, particularly the minority languages of Russia and surrounding countries. The Cyrillic Extended-A block is made up of Old Church Slavonic superscripted combining letters, while Cyrillic Extended-B comprises various historic characters, such as those used in Old Cyrillic and Old Abkhasian. The Cyrillic Supplement block is made up of additional letters needed to write various non-Slavic languages.

Unicode blocks Cyrillic, Cyrillic Extended-A, Cyrillic Extended-B, Cyrillic Supplement
Alternate names
Timeframe 9C or 10C to present
Regions European
Type alphabet
Alternate names left to right
Status living
Number of speakers 629 million
Languages
Main sources Cubberly, P. 1996. "The Slavic Alphabets" in The World’s Writing Systems, ed. Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press, pp. 346-355.
Secondary sources Comrie, B. 1996. "Adaptations of the Cyrillic Alphabet" in The World’s Writing Systems, ed. Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press, pp. 700-726.
Proposal

Deseret

Deseret is a phonemic alphabet that was developed in the 1850s by the regents of the University of Deseret, now the University of Utah. Deseret was used to write the English language, and was promoted by The Church of Jesus Christ of Latter-day Saints (also known as the Mormon or LDS Church), under Church President Brigham Young (1801–1877). George Watt, who was a secretary to Brigham Young, contributed to the script's development. Though the Church published four books and other materials were written in the script, it did not gain wide acceptance and was not actively promoted after 1869.

Unicode blocks Deseret
Alternate names
Timeframe 1850s to 1860ss
Regions European
Type alphabet
Alternate names left to right
Status historical
Number of speakers 0
Languages
Main sources Monson, Samuel C. 1992. "Deseret Alphabet" in Encyclopedia of Mormonism, ed. Daniel H. Ludlow. New York: Macmillan.
Secondary sources
Proposal

Devanagari

The Devanagari script is used to write classical Sanskrit and its modern variant, Hindi. The script is also used to write many other languages. Devanagari is a Brahmi-derived script, like the Sinhala script of Sri Lanka, the Tibetan script, and many South and Southeast Asian scripts. The Devanagari Extended block includes cantillation marks for the Samaveda, marks of nasalization, and a few editorial marks.

Unicode blocks Devanagari, Devanagari Extended
Alternate names
Timeframe 11C to present
Regions European
Type abugida
Alternate names left to right
Status living
Number of speakers 499 million
Languages
Main sources Bright, W. 1996. “The Devanagari Script” in The World’s Writing Systems, ed. Peter T. Daniels & William Bright. New York; Oxford: Oxford University Press, pp. 384-390.
Secondary sources
Proposal

Dingbats

The characters in the Dingbats block are derived from the ITC Zapf Dingbats series 100, a set of glyphs which make up the “Zapf Dingbat” font currently available on most laser printers. The Zapf Dingbats are considered an industry standard. The font was created in 1977 and 1978 by the German type designer Hermann Zapf. Other dingbat glyphs series - which are not found in the Zapf Dingbat font - exist, but they are not encoded in the Unicode Standard because they have not been widely implemented in hardware and software as character-encoded fonts. A number of the characters from the ITC Zapf Dingbats series may appear in other blocks if the symbol was deemed to be a generic symbol used in other contexts.

Unicode blocks Dingbats
Alternate names
Timeframe 1977 and 1978 to present
Regions
Type symbols
Alternate names
Status living
Number of speakers
Languages
Main sources The Unicode Consortium. 2011. The Unicode Standard, Version 6.0, defined by: The Unicode Standard, Version 6.0. Mountain View, CA: The Unicode Consortium, p. 504 (Section 15.8).
Secondary sources
Proposal