EDIT The Private Use Area consists of 6400 codepoints located from U+E000 to U+F8FF inclusive. Its formal title is the primary Private Use Area. It is often known simply as the Private Use Area (PUA).
All PUA codepoints have left-to-right directionality.
The Unicode Standard presently also contains two other Private Use Areas. The other two Private Use Areas are located in planes fifteen and sixteen, where they each occupy all but two codepoints of a whole plane. They are known as Supplementary Private Use Area-A and Supplementary Private Use Area-B respectively.
The following text refers to the Private Use Area which is located in the Basic Multilingual Plane, that is, to the Private Use Area which is located in plane zero, from U+E000 to U+F8FF.
The Unicode Standard states the following.
By convention, the primary Private Use Area is divided into a corporate use subarea for platform writers, starting at U+F8FF and extending downward in values, and an end user subarea, starting at U+E000 and extending upward. By following this convention, the likelihood of collision between private-use characters defined by platform writers with private-use characters defined by end users can be reduced. However, it should be noted that this is only a convention, not a normative specification. In principle, any user can define any interpretation of any private-use character.
In practice there are other factors not mentioned in the Unicode Standard which someone defining a character in the Private Use Area may choose to consider.
One factor is that the Microsoft Corporation has a particular use of many characters in the range U+F000 to U+F0FF in relation to symbol fonts on the Microsoft platform for PCs. It is perhaps best to avoid using that range for ordinary Private Use Area character definition, though I cannot state any specific reasons for doing so.
Another factor is that there are various published uses of various Private Use Area codepoints by a number of individuals and organizations. These should possibly be considered before deciding to make a new allocation of one\'s own. In the basic theory of the Private Use Area there is no need whatsoever to do that, because each person is free to make Private Use Area allocations as he or she chooses. However, suppose that one is defining one or two characters which one would like to become added into an existing font, perhaps a font from an independent fontmaker. A request to add a Private Use Area character into the font in a place which does not clash with an existing Private Use Area codepoint within that font is more likely to be successful than asking the fontmaker to discard a character already in the font. Also, a fontmaker may have tacitly, in his or her own mind, decided to reserve some Private Use Area codepoints in the font for the possible later addition of some characters from an already published Private Use Area allocation. Thus, although each person is free to make his or her own Private Use Area allocations, it can be advantageous to consider the practicalities of making progress with the applying of such an allocation for the desired needs.
EDIT Codepoints for latin small ligatures. Within the range U+E707 to U+E7BF there are details of a particular published use of the Private Use Area for latin small ligatures. They start at U+E707 rather than at U+E700 so that they continue from the set of seven latin small ligatures at U+FB00 to U+FB06 of regular Unicode.
Unicode no longer encodes latin small ligatures. This is because ligatures may be expressed by using U+200D ZERO WIDTH JOINER between two characters which it is desired to ligate. Three character ligatures, such as LATIN SMALL LIGATURE LONG S CR may be expressed using two copies of U+200D ZERO WIDTH JOINER: one between the long s and the c and one between the c and the r.
However, a computer system using advanced font format capabilities is needed to use that method, with the system recognizing sequences such as c ZWJ t or even just ct or both of them as desired which are listed in a table within the font: the font providing a glyph which is substituted for that sequence. A given font would then contain zero, one or more such glyph substitution possibilities. Thus a text which contains ct or c ZWJ t would be displayed using a ct ligature glyph if and only if the font provides a glyph for the ct ligature and the table entry exists for the sequence of codepoints in the source text to be replaced by the ligature glyph. The substitution of a ligature can be prevented by placing a U+200C ZERO WIDTH NON-JOINER character between a pair of letter characters.
This means that some text with a sequence such as ct could automatically have ligatures used in its display if the font is able to provide the ligature glyph.
However, advanced font technology is not available on all systems, so, as Unicode no longer encodes latin small ligatures, a collection of Private Use Area codepoints for them has been produced. They could be used in documents for archiving, though notes as to the coding would need to be stored with them as Private Use Area encodings are not unique and the codepoints could well be used by someone else for some entirely different purpose. However, another use is so that ligatures can be used in desktop publishing so as to produce hardcopy printed texts using ligatures.
The set of codepoints also has usefulness in relation to advanced format fonts as the codepoint can be used to allow also direct access to the ligature glyph, so that the ligature in the font may be used also by a person using the font on computer equipment which does not support advanced font technology yet which does support Unicode in other respects.
There is also another interesting consideration in that having an individual code point in the Private Use Area of the Unicode system for each ligature sort in use in print shops of olden days may be regarded as artistic expression. Please note, however, that not all of the ligatures for which codepoints have been published are ligatures from olden days: some are modern, a few were devised especially. In transcribing old printed books there can be aesthetic satisfaction in using codepoints which correspond one to one with the pieces of metal type used in the print shop.
Yet the rules of using the Private Use Area allow anyone to produce and publish a list of codepoints for ligatures, or simply to just use such a list of his or her own allocation privately. However, the existence of a list does present the possibility that a number of fontmakers may each use the same list, thereby allowing an end user to try the look of a document within a desktop publishing program with a variety of fonts from various fontmakers without changing the encoding of the document.
EDIT Gutenberg and ligatures. You might find the following of interest. It is the web information of a television program in a series called Renaissance secrets which was on the BBC in England some time ago.
An interesting point is as to why Gutenberg had so many ligatures! With metal type in the twentieth century the making of an additional ligature was a lot of extra work in that an additional metal punch to make metal matrices had to be produced. Today, with electronic fonts, the constructing of an extra ligature glyph also takes extra work.
Yet was the same true for Gutenberg? Did ligatures in fact save Gutenberg work?
My reason for thinking that this is a possibility is as follows. In that television program about Gutenberg a researcher had made images of individual characters printed by Gutenberg and found that each character of a sort had small differences which meant that they could not have been made from the same matrix. It is possible that Gutenberg used matrices of whatever material such that the matrix was destroyed during the casting of a character, thus meaning that as many matrices had to be made as there were pieces of type: the invention of a reusable matrix being a later invention, perhaps by another printer.
So making a ligature character would mean producing at least one less matrix than would have otherwise been the situation! A two-letter ligature would save the production of one matrix, a three-letter ligature would save the production of two matrices. Gutenberg used many ligatures. Perhaps it was a money-saving idea as well as an artistic idea: thus, as one might say, "painting two birds on one canvas" in that one idea served two purposes!
Supplementary note of 1 September 2005
Earlier today I was adding some ligature glyphs into the Private Use Area of one of my fonts, a font named Chronicle Text. It is a black letter font.
I was adding a glyph at U+E70D within the font. It was not a ligature as such, but was two lowercase letters l, side by side. I was thinking about the above idea that Gutenberg may have used many ligatures to save work and it occurred to me that in that case maybe he cast often-used-pairs of letters onto one piece of type so as to minimize the number of pieces of type whilst increasing the number of sorts of pieces of type yet within the limits of only casting pairs, or even triples or longer sets, of characters for frequently used sequences which could be used in various pieces of text.
I then wondered whether Gutenberg thus might have used far more 'ligatures' than people have thus far detected, simply because those undiscovered ligatures are for items where there is no visible join between the two parts of the ligature.
Is that a hypothesis which could be tested by measuring the relative positions on the printed page of two characters which might have been so used, for each use of the pair of characters, bearing in mind that there might be several notionally identical sorts for the pair of characters?
EDIT Additional Code Space.
The Private Use Area of Unicode provides great opportunities for people to encode their own collections of characters using their own codepoint allocations.
However, such codepoint allocations are not unique and a character referenced by a codepoint in the Private Use Area could have one of many potential meanings should an archivist be attempting to read a plain text document in which a Private Use Area character is used.
Additional Code Space, from A+110000 to A+FFFFFF is intended so that people can encode uniquely characters which would otherwise just be Private Use Area characters. This would help in the archiving of documents. Additional Code Space is a new idea and not part of Unicode. Hopefully Additional Code Space will be developed and play a valuable part in typography in the future.
The system also has a portal from the Unicode Private Use Area so that Additional Code Space characters may be stored within a Unicode text file framework, though Additional Code Space is intended as a 24-bit system.
Suppose that a character is A+PQRSTU in Additional Code Space. The representation in the portal is as a sequence of three Unicode Private Use Area characters, namely U+F4PQ U+F5RS U+F6TU.
The basic idea for Additional Code Space is suggested in a science fiction story.
That document is downloadable from the following web page.
As of the time of writing this present text (2005-04-23 morning) there are no codepoint allocations in Additional Code Space.
A way to allocate codepoints in Additional Code Space would be for a person to suggest an allocation and then for people to discuss the suggestion. Maybe the suggestion would be modified, yet the idea is that the suggestion could be accepted within a week or so.
This would be a similar process as that which is used to produce the alt.* system of newsgroups. In that system anyone can suggest a new newsgroup by posting in the alt.config newsgroup. Discussion takes place and maybe modifications to the name of the suggested new newsgroup are made. The idea is that progress is made and suggestions are agreed wherever possible. Sometimes a person suggesting a new alt.* newsgroup is informed of an existing newsgroup which covers the same topic of which he or she was not previously aware.
However, Additional Code Space need not necessarily be produced using a newsgroup or a mailing list. A forum for discussions and a database for documents would be needed.
Additional Code Space has 239 planes available. Unicode has 17 planes available.
Additional Code Space is intended to be used for encoding a wide variety of information in character format. Thus it can be used for many purposes including letters, symbols, presentation forms such as ligatures, formatting codes, colour codes, a vector graphics system and a portable object code level software system. It could have applications including transcribing of eighteenth-century printed books and archive representations of virtual worlds and knowledgebases.
The process of defining the items to be encoded in Additional Code Space is potentially a fascinating project with long lasting benefit for encoding and archiving information in a character-style format.