Combining Grapheme Joiner

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

The combining grapheme joiner (CGJ), U+034F ͏ COMBINING GRAPHEME JOINER (HTML ͏) is a Unicode character that has no visible glyph and is "default ignorable" by applications. Its name is a misnomer and does not describe its function; the character does not join graphemes.[1] Its purpose is to separate characters that should not be considered digraphs.

For example, in a Hungarian language context, adjoining characters c and s would normally be considered equivalent to the cs digraph. If they are separated by the CGJ, they will be considered as two separate graphemes.

It is also needed for complex scripts. For example, in most cases the Hebrew cantillation accent Metheg is supposed to appear to the left of the vowel point and by default most display systems will render it like this even if it is typed before the vowel. But in some words in Biblical Hebrew the Metheg appears to the right of the vowel, and to tell the display engine to render it properly on the right, CGJ must be typed between the Metheg and the vowel. Compare:

he + pathah + metheg הַֽ
he + metheg + pathah הַֽ
he + metheg + CGJ + pathah הֽ͏ַ

These examples may not be supported if you don't have a font that properly supports Hebrew cantillation display. Ezra SIL SR is recommended. These examples may not render the same in other operating systems, applications and browsers.

In the case of several consecutive combining diacritics, an intervening CGJ indicates that they should not be subject to canonical reordering.[2]

Compare to this the "zero-width non-joiner" (as it were a space mark of width zero) at U+200C in the General Punctuation range.

External links