acute
, grave
and circumflex
should share the design decisions as per the above.breve
should not be reused as a caron
.)horn
) which, once combined with the base letter, will blend into the base and resemble an integral part of it..case
marks should be created. These marks should be designed with adjustments to size, height, or slope and are typically wider, shorter, and flatter than their lowercase counterparts so they use less vertical space, avoiding collisions with above text lines. These adjustments are particularly relevant for languages like Vietnamese that require stacked diacritics.Examples of what to avoid - critical cases
The spacing diacritical marks (for example, U+00B4 ACUTE ACCENT
) are required mainly for historical reasons and for backward compatibility. This is why they are also known as “Legacy marks”.
These marks are only used as placeholder when typing for a combination of keys to add an accent to a base letter e.g. ´
+ a
to obtain a á
.
Requirements:
≠0
Due to their function as displaying the accent as a standalone character, they are expected to have an advance width value with positive sidebearings, that is, they should not be zero width glyphs.combining marks
(see below) for consistency reasons. To ensure this in a practical way, it is suggested to create them by using the combining marks as components in the source file.acute
or grave
.Category=Mark
and Subcategory=Spacing
.
Name | Unicode | Catgory | Subcategory |
---|---|---|---|
acute | 00B4 | Mark | Spacing |
breve | 02D8 | Mark | Spacing |
caron | 02C7 | Mark | Spacing |
cedilla | 00B8 | Mark | Spacing |
circumflex | 02C6 | Mark | Spacing |
dieresis | 00A8 | Mark | Spacing |
dotaccent | 02D9 | Mark | Spacing |
grave | 0060 | Mark | Spacing |
hungarumlaut | 02DD | Mark | Spacing |
macron | 00AF | Mark | Spacing |
ogonek | 02DB | Mark | Spacing |
ring | 02DA | Mark | Spacing |
tilde | 02DC | Mark | Spacing |
Note that any other accent in the font exists only as combining mark
As the name suggests, the combining diacritics are the marks actually used to construct the accented letters, which would be done either by:
00C1 Á LATIN CAPITAL LETTER A WITH ACUTE
Or to allow the character composition of the accented letters by using the mark + base glyphs on the fly as the user types, e.g. 0041 LATIN CAPITAL LETTER A
followed by the combining diacritical mark 0301 COMBINING ACUTE ACCENT
, which would be the decomposition or Unicode equivalence of the above. This makes it possible for the combination of diacritics and display of non-encoded accented glyphs.
Automatic alignment
When creating the precomposed characters in the source file, ideally, automatic alignment should be enabled consistently in the composite glyphs. This way they would get automatically updated after any change on any of the components is performed.
Tip: When using ufo2glyphs
to convert source files, use this argument to preserve alignments: --enable-automatic-alignment
All the glyphs involved in the generation of accented letters use Anchors, which are special points that allow the attachment of glyphs to one another and play a key role in the identification of the glyph definition as well as the generation of the “Mark to base positioning” mark
and the “Mark to mark positioning” mkmk
GPOS features (see below).
Anchors are commonly represented as a red rhombus in the glyph view of the source file and are identified with a name. The name part should be shared among the base glyph and the mark glyph, but in the mark glyph there should be a preceding underscore. For example, there should be a top
anchor in the base glyph and a corresponding _top
anchor in the mark.
This name schema is crucial for the positioning to work as expected - for example if the underscore is omitted in the mark glyph, it would not be attached to the base letter - so you must pay special care and attention to them.
=0
. While working on the source file, it is possible to use a positive width with positive sidebearings to facilitate access to the glyph. Then, if the glyphs have the anchors and the correct name, the tool generating the font will take care of changing the width to 0 to comply with the requirement that they must be zero value width glyphs (hence the name nonspacing).acutecomb
, gravecomb
.top_viet
and _top_viet
, accordingly.Category=Mark
and Subcategory=Nonspacing
.Combining marks would be listed like this in the GDEF table:
<ClassDef glyph="acutecomb" class="3"/>
<ClassDef glyph="acutecomb.case" class="3"/>
In some languages like Vietnamese, marks are made of the combination of two other marks known as stacked diacritics. In such cases, a combining mark could also act as the ‘base’ glyph of another mark, and therefore, it would need more than one anchor. For example, in the brevecomb_acutecomb
, the brevecomb
mark would have one _top
anchor to be attached to a base letter, plus a top
one to attach other marks to it; in this case, the acutecomb
.
mkmk
feature in the GPOS
table.Some Latin and Cyrillic glyphs like i or j lose their dot when combined with marks that replace the dot. For example, in Dutch when stress is marked for emphasis j
can be combined with acutecomb
in the digraph ij spelled with two acute marks (íj́), in Navajo the iogonek
can be combined with acutecomb
(į́), or in Ukrainian when stress is marked i-cy
can be combined with acutecomb
(і́).
In such cases, a glyph substitution should make the dot disappear for example by substituting the soft dotted glyphs when combined with at least one top mark by dotless variants with a ccmp
feature in the GSUB
table.
A top
anchor is needed in the dotless variants of the glyphs for correct positioning of the top mark glyphs. A _top
anchor is needed in the top mark glyphs.
In a font with a small Latin set the ccmp
feature code can have the following lookup:
lookup ccmp_soft_dotted {
@CombiningTopAccents = [acutecomb brevecomb caroncomb circumflexcomb dieresiscomb dotaccentcomb gravecomb macroncomb ringcomb tildecomb];
lookupflag UseMarkFilteringSet @CombiningTopAccents;
sub [i j]' @CombiningTopAccents by [idotless jdotless];
} ccmp_soft_dotted;
In Glyphs, the automatically generated ccmp
feature adds a similar lookup but does not update it with larger glyph sets.
In a font with a larger Latin glyph set and Cyrillic glyph set, after creating the dotless forms of other soft dotted glyphs with glyph construction recipes like idotless+dotbelowcomb=idotless_dotbelowcomb idotless+ogonekcomb=idotless_ogonekcomb idotless+tildebelowcomb=idotless_tildebelowcomb
or after creating istroke.dotless
and jstroke.dotless
, the ccmp
feature can have a lookup similiar to the following:
lookup ccmp_soft_dotted {
@CombiningTopAccents = [acutecomb brevecomb caroncomb circumflexcomb dieresiscomb dotaccentcomb gravecomb macroncomb ringcomb tildecomb];
lookupflag UseMarkFilteringSet @CombiningTopAccents;
sub [i j idotbelow iogonek itildebelow istroke jstroke i-cy je-cy]' @CombiningTopAccents by [idotless jdotless idotless_dotbelowcomb idotless_ogonekcomb idotless_tildebelowcomb istroke.dotless jstroke.dotless idotless jdotless];
} ccmp_soft_dotted;
One should ensure these substitutions do not break when combined with other substitutions, for example the small capitals smcp
feature should produce small capitals for the soft dotted glyph combined with top marks.
Vertical caron
For historical and thus convention reasons, in languages like Czech and Slovak, the caron should have a vertical form when used on characters such as Lcaron
, lcaron
, dcaron
, tcaron
. But, be aware that it must not be composed with any other “lookalike” form like any quote, comma, and let alone apostrophe. In fact, it should distinguish particularly from the latter to avoid possible meaning confusion for some words.
caroncomb.alt
(or caron.alt), and eventually, depending on the design, caroncomb.alt.case
for Lcaron
.Please refer to the “Useful links” section below for more information.
Dotted circle
The dotted circle character (U+25CC) is inserted by shaping engines before mark glyphs which do not have an associated base, especially in the context of broken syllabic clusters.
For fonts containing combining marks, it is recommended that the dotted circle character is included so that these isolated marks can be displayed properly; for fonts supporting complex scripts, this should be considered mandatory.
Since when a dotted circle glyph is present, it should be able to display all marks correctly, Google Fonts expect all the fonts to include it, regardless of the script it is addressing.
Therefore:
For a text to be displayed in a readable way on screens or desktop apps, there is a required process called shaping which consist on translating a string of character codes into an ordered sequence of glyphs, and this process is performed by a engines like Harfbuzz
For text shaping to work, it depends on four factors: the input string given, the inclusion of Open Type Layout required tables in the font, the writing system (script), and the language of the text. For shaping to occur at all, the GDEF
, GSUB
and GPOS
tables must be present in the font.
The GDEF
table provides various glyph properties used in OpenType Layout processing in six types of information provided in different subtables. One of them is the GlyphClassDef that classifies the different types of glyphs in the font. This subtable will identify each glyph in one of the following classes:
Class | Description |
---|---|
1 | Base glyph (single character, spacing glyph) |
2 | Ligature glyph (multiple character, spacing glyph) |
3 | Mark glyph (non-spacing combining glyph) |
4 | Component glyph (part of single character, spacing glyph) |
Both the GSUB
and GPOS
tables rely on this information to identify which glyph classes to adjust with lookups.
For any glyph to be classified into the right class, the following must be ensured on each one:
combining mark
must have anchors, as well as the letters intended to become a base letter
.This identification is critical for the font compilers like Fontmake to process the correct glyph category and export functional fonts. If a glyph is not in the suitable class, you could correct it in the Glyphs font editor by using the “Glyph Info” pane and setting the Category and Subcategory fields described above. In Fontlab editor, you could inspect the “Glyph Panel” in the OT Class.
GPOS table will use all the glyphs’ X and Y position values to precisely control placement operations conditioned by the script and language the font supports, plus advanced typographic composition tasks such as kerning or superscripts.
From the eight type of positioning actions that the table support, at least two are essential for the functioning of diacritic marks:
mark
feature. Positions combining marks with respect to base glyphs, as when positioning vowels, diacritical marks, or tone marks in Arabic, Hebrew, and Vietnamese.mkmk
feature. Positions one mark relative to another, as when positioning tone marks with respect to vowel diacritical marks in Vietnamese.Key factors for these GPOS features to work are:
For more context and details, please read the entire GPOS entry in the OT Spec.
Sometimes the correct positioning of a mark will need first to use a different glyph shape for a given base letter, that is, to substitute it for another form that will allow the mark to be rightly placed.
A typical case in Latin script is the necessity of using an i
without the dot to receive any other mark like the macron
. The GSUB
table makes it possible for such substitutions through the Glyph Composition/Decomposition ccmp
feature that will substitute for example, the glyph i
by idotless
when it is combined with any comb
accent — this is the soft dotted glyphs case explained above.
Key factors for the ccmp feature to work are:
dotlessi
and dotlessj
in Latin script. (it should be named idotless
if you are working on Glyphs font editor.)ccmp
feature is included with all the necessary Lookups.ccmp
feature need to be at the top of features list so that it gets processed prior to any other feature.For more context and details, please read the entire GSUB entry in the OT Spec.