googlefonts.github.io

Diacritic marks

🐳 A diacritic is a mark used in combination with a base letter for many purposes, such as modifying the pronunciation by extending a basic alphabet to include more phonemes; adding stress that could differentiate similar words, hence meanings; and, in some languages, adding or modifying a vowel in a word.

Many diacritics are separated from the base letter, and can be placed above, below, aside, or through it; while other diacritics connect to the base.

This guide will give users an overview introduction to diacritics both from a design and technical perspective that will help them avoid the most common problems when shaping texts.
Background reading:
must→ Overall font files requirements/Glyphsets

Table of contents

Design considerations

Examples of what to avoid - critical cases

Multiple diacritics inconsistencies
Size and convention issues
Horizontal position fixes comparison

Legacy – Spacing marks

The spacing diacritical marks (for example, U+00B4 ACUTE ACCENT) are required mainly for historical reasons and for backward compatibility. This is why they are also known as “Legacy marks”.

These marks are only used as placeholder when typing for a combination of keys to add an accent to a base letter e.g. ´ + a to obtain a á.

Legacy marks in action. Once the sequence of accent + letter is typed it is replaced by the composite glyph in the font, and so gets deleted as a unique glyph.

Requirements:

Name Unicode Catgory Subcategory
acute 00B4 Mark Spacing
breve 02D8 Mark Spacing
caron 02C7 Mark Spacing
cedilla 00B8 Mark Spacing
circumflex 02C6 Mark Spacing
dieresis 00A8 Mark Spacing
dotaccent 02D9 Mark Spacing
grave 0060 Mark Spacing
hungarumlaut 02DD Mark Spacing
macron 00AF Mark Spacing
ogonek 02DB Mark Spacing
ring 02DA Mark Spacing
tilde 02DC Mark Spacing

Note that any other accent in the font exists only as combining mark

Combining diacritical marks - Nonspacing

As the name suggests, the combining diacritics are the marks actually used to construct the accented letters, which would be done either by:

Automatic alignment

When creating the precomposed characters in the source file, ideally, automatic alignment should be enabled consistently in the composite glyphs. This way they would get authomatically updated after any change on any of the components is performed.

Tip: When using ufo2glyphs to convert source files, use this argument to preserve alignments: --enable-automatic-alignment

Anchors

All the glyphs involved in the generation of accented letters use Anchors, which are special points that allow the attachment of glyphs to one another and play a key role in the identification of the glyph definition as well as the generation of the “Mark to base positioning” mark and the “Mark to mark positioning” mkmk GPOS features (see below).

Anchors are commonly represented as a red rhombus in the glyph view of the source file and are identified with a name. The name part should be shared among the base glyph and the mark glyph, but in the mark glyph there should be a preceding underscore. For example, there should be a top anchor in the base glyph and a corresponding _top anchor in the mark. This name schema is crucial for the positioning to work as expected - for example if the underscore is omitted in the mark glyph, it would not be attached to the base letter - so you must pay special care and attention to them.

Requirements for combining marks:

Combining marks would be listed like this in the GDEF table:

  <ClassDef glyph="acutecomb" class="3"/>
  <ClassDef glyph="acutecomb.case" class="3"/>

Stacked diacritics

In some languages like Vietnamese, marks are made of the combination of two other marks known as stacked diacritics. In such cases, a combining mark could also act as the ‘base’ glyph of another mark, and therefore, it would need more than one anchor. For example, in the brevecomb_acutecomb, the brevecomb mark would have one _top anchor to be attached to a base letter, plus a top one to attach other marks to it; in this case, the acutecomb.

Soft dotted glyphs

Some Latin and Cyrillic glyphs like i or j lose their dot when combined with marks that replace the dot. For example, in Dutch when stress is marked for emphasis j can be combined with acutecomb in the digraph ij spelled with two acute marks (íj́), in Navajo the iogonek can be combined with acutecomb (į́), or in Ukrainian when stress is marked i-cy can be combined with acutecomb (і́). In such cases, a glyph substitution should make the dot disappear for example by substituting the soft dotted glyphs when combined with at least one top mark by dotless variants with a ccmp feature in the GSUB table.

Incorrect behavior of i-cy with acutecomb (і́), without the appropriate glyph substition.
Expected behavior of i-cy with acutecomb (і́), with the appropriate glyph substition.

A top anchor is needed in the dotless variants of the glyphs for correct positioning of the top mark glyphs. A _top anchor is needed in the top mark glyphs.

A `top` anchor in the dotless glyphs allow top marks to be .

In a font with a small Latin set the ccmp feature code can have the following lookup:

lookup ccmp_soft_dotted {
    @CombiningTopAccents = [acutecomb brevecomb caroncomb circumflexcomb dieresiscomb dotaccentcomb gravecomb macroncomb ringcomb tildecomb];
    lookupflag UseMarkFilteringSet @CombiningTopAccents;
    sub [i j]' @CombiningTopAccents by [idotless jdotless];
} ccmp_soft_dotted;

In Glyphs, the automatically generated ccmp feature adds a similar lookup but does not update it with larger glyph sets.

In a font with a larger Latin glyph set and Cyrillic glyph set, after creating the dotless forms of other soft dotted glyphs with glyph construction recipes like idotless+dotbelowcomb=idotless_dotbelowcomb idotless+ogonekcomb=idotless_ogonekcomb idotless+tildebelowcomb=idotless_tildebelowcomb or after creating istroke.dotless and jstroke.dotless, the ccmp feature can have a lookup similiar to the following:

lookup ccmp_soft_dotted {
    @CombiningTopAccents = [acutecomb brevecomb caroncomb circumflexcomb dieresiscomb dotaccentcomb gravecomb macroncomb ringcomb tildecomb];
    lookupflag UseMarkFilteringSet @CombiningTopAccents;
    sub [i j idotbelow iogonek itildebelow istroke jstroke i-cy je-cy]' @CombiningTopAccents by [idotless jdotless idotless_dotbelowcomb idotless_ogonekcomb idotless_tildebelowcomb istroke.dotless jstroke.dotless idotless jdotless];
} ccmp_soft_dotted;

One should ensure these substitutions do not break when combined with other substitutions, for example the small capitals smcp feature should produce small capitals for the soft dotted glyph combined with top marks.

Without the soft dotted substition and the `top` anchor, the sample string i̊j́ị́į́ḭ́ɨ́ɉ́і́ј́ is incorrectly displayed.
With the soft dotted substitution and the `top` anchor, the sample string i̊j́ị́į́ḭ́ɨ́ɉ́і́ј́ is correctly displayed.

Special glyphs

Vertical caron

For historical and thus convention reasons, in languages like Czech and Slovak, the caron should have a vertical form when used on characters such as Lcaron, lcaron, dcaron, tcaron. But, be aware that it must not be composed with any other “lookalike” form like any quote, comma, and let alone apostrophe. In fact, it should distinguish particularly from the latter to avoid possible meaning confusion for some words.

Please refer to the “Useful links” section below for more information.

Dotted circle

The dotted circle character (U+25CC) is inserted by shaping engines before mark glyphs which do not have an associated base, especially in the context of broken syllabic clusters.

For fonts containing combining marks, it is recommended that the dotted circle character is included so that these isolated marks can be displayed properly; for fonts supporting complex scripts, this should be considered mandatory.

Since when a dotted circle glyph is present, it should be able to display all marks correctly, Google Fonts expect all the fonts to include it, regardless of the script it is addressing.

Therefore:

Text Shaping process and Open Type Layout

For a text to be displayed in a readable way on screens or desktop apps, there is a required process called shaping which consist on translating a string of character codes into an ordered sequence of glyphs, and this process is performed by a engines like Harfbuzz

For text shaping to work, it depends on four factors: the input string given, the inclusion of Open Type Layout required tables in the font, the writing system (script), and the language of the text. For shaping to occur at all, the GDEF, GSUB and GPOS tables must be present in the font.

The Glyph Definition (GDEF) table

The GDEF table provides various glyph properties used in OpenType Layout processing in six types of information provided in different subtables. One of them is the GlyphClassDef that classifies the different types of glyphs in the font. This subtable will identify each glyph in one of the following classes:

Class Description
1 Base glyph (single character, spacing glyph)
2 Ligature glyph (multiple character, spacing glyph)
3 Mark glyph (non-spacing combining glyph)
4 Component glyph (part of single character, spacing glyph)

Both the GSUB and GPOS tables rely on this information to identify which glyph classes to adjust with lookups.

For any glyph to be classified into the right class, the following must be ensured on each one:

This identification is critical for the font compilers like Fontmake to process the correct glyph category and export functional fonts. If a glyph is not in the suitable class, you could correct it in the Glyphs font editor by using the “Glyph Info” pane and setting the Category and Subcategory fields described above. In Fontlab editor, you could inspect the “Glyph Panel” in the OT Class.

The Glyph Positioning (GPOS) table

GPOS table will use all the glyphs’ X and Y position values to precisely control placement operations conditioned by the script and language the font supports, plus advanced typographic composition tasks such as kerning or superscripts.

From the eight type of positioning actions that the table support, at least two are essential for the functioning of diacritic marks:

Key factors for these GPOS features to work are:

For more context and details, please read the entire GPOS entry in the OT Spec.

The Glyph Substitution (GSUB) Table

Sometimes the correct positioning of a mark will need first to use a different glyph shape for a given base letter, that is, to substitute it for another form that will allow the mark to be rightly placed.

A typical case in Latin script is the necessity of using an i without the dot to receive any other mark like the macron. The GSUB table makes it possible for such substitutions through the Glyph Composition/Decomposition ccmp feature that will substitute for example, the glyph i by idotless when it is combined with any comb accent — this is the soft dotted glyphs case explained above.

Key factors for the ccmp feature to work are:

For more context and details, please read the entire GSUB entry in the OT Spec.


Further reading:
learn Outline Quality