HVF FlatPartCache Inefficiency Causing Chinese Text Rendering Regression on iOS 18+

Summary

On iOS 18 and later, Chinese text rendering shows a noticeable performance regression related to the HVF (Hierarchical Variable Font) pipeline.

Environment

  • iOS Version: iOS 18+
  • Framework: libhvf.dylib (Hierarchical Variable Font)
  • Affected Font: PingFangUI.ttc (private system font, automatically used for Chinese text)
  • Related Frameworks: CoreText, CoreGraphics, FontParser
  • Devices: All iOS devices (more noticeable on older hardware)

Background

iOS 18 Change:

  • PingFang.ttc was removed from /System/Library/Fonts/
  • Private PingFangUI.ttc was added (inaccessible via normal font APIs)
  • System automatically uses PingFangUI.ttc for all Chinese text rendering
  • PingFangUI.ttc contains HVF tables → utilizes libhvf.dylib

HVF Architecture:

  • HVF (Hierarchical Variable Font) organizes glyphs as tree structures
  • Each glyph = Composite → multiple Parts → nested hierarchy
  • Rendering a single character requires traversing this tree

Key Observation

A single Chinese glyph typically triggers ~20 calls to HVF::LoaderHVGL::loadPartAtIndex.

Cache invalidation is triggered via IncrementRenderCount after every 18 glyphs:

__ZNK27THierVariationsDataForkFont20IncrementRenderCountEv:
    ldr    w8, [x0, #0x12c]
    add    w8, w8, #0x1
    str    w8, [x0, #0x12c]
    cmp    w8, #0x12
    b.lo   return
    ldr    x0, [x0, #0x120]
    bl     HVF_clear_part_cache
    str    wzr, [x19, #0x12c]
return:
    ret

This causes the cache to be cleared before a typical sentence finishes rendering.


Complete Call Stack (Rendering Hot Path)

#0-1  HVF::LoaderHVGL::loadPartAtIndex
#2    HVF::FlatPartCache::partAtIndex
#3    HVF::PartTransformRenderer::renderComposite
#4    HVF::PartTransformRenderer::render
#5    HVF::PartTransformRenderer::renderToContext
#6    _HVF_render_current_part
#7    THierVariationsFontHandler::GetOutlinePath
#8    TFontHandler::CopyGlyphPath
#9    THierVariationsFontHandler::CopyGlyphPath
#10   TFPFont::CopyGlyphPath
#11-12 TFPFont::CopyGlyphPath / _FPFontCopyGlyphPath
#13   _CGFontCreateGlyphPath
#14   _CGGlyphBuilderLockBitmaps
#15   _render_glyphs
#16   _draw_glyph_bitmaps
#17   _ripc_DrawGlyphs
#18   CG::DisplayList::executeEntries
#19   _CGDisplayListDrawInContextDelegate
#20   _CABackingStoreUpdate_
#21-22 CALayer display/layout
#23-24 CA::Transaction::commit
#25-30 UIApplicationMain / RunLoop

HVF::LoaderHVGL::loadPartAtIndex is consistently observed as a hot function in Instruments and in production.


Cache Clear Call Stack

#0 HVF::FlatPartCache::clear
#1 HVF_clear_part_cache
#2 THierVariationsDataForkFont::IncrementRenderCount
#3 THierVariationsFontHandler::GetOutlinePath
#4 TFontHandler::CopyGlyphPath
#5 FPFontCopyGlyphPath
#6 CGFontCreateGlyphPath
#7 _render_glyphs
#8 _draw_glyph_bitmaps
#9 _ripc_DrawGlyphs

This shows that cache clearing occurs within the glyph rendering path.


Impact

For a typical Chinese sentence (~20 characters):

  • Each glyph requires multiple part loads (~20 per glyph)
  • Cache is cleared before rendering completes
  • Previously loaded parts cannot be reused

Observed effects:

  • Increased loadPartAtIndex invocation count
  • Low cache hit rate
  • Increased CPU usage in glyph rendering
  • Main-thread blocking during Core Animation commit

Regression

  • iOS 17 and earlier: Rendering is smooth under similar workloads.
  • iOS 18+: Increased rendering cost and visible frame drops.

The issue is more pronounced on older devices such as iPhone XS and iPhone 11.


Reproduction

Render a Chinese text string longer than 18 characters, for example:

刷新测试中文文本用于验证渲染性能问题需要超过十八个字

Observe:

  • Repeated loadPartAtIndex calls
  • Frequent cache clearing

Request

It would be helpful to review the cache eviction strategy for HVF, particularly for complex scripts such as Chinese.

Potential considerations:

  • Adjusting or scaling the cache threshold
  • Avoiding full cache clears during continuous rendering
  • Improving reuse of parts across glyphs within the same rendering batch

The GlyphOutlineDictionaryCache appears to use:

  • A CFDictionary for glyph-to-path lookup (constant-time access)
  • A FIFO queue with a fixed capacity (512 entries)
  • A mutex for thread safety

Entries are evicted strictly based on insertion order

Frequently used glyphs may still be evicted if they are not accessed again within the fixed window

Cache effectiveness decreases as the working set exceeds capacity

GlyphOutlineDictionaryCache itself does not appear to be the primary bottleneck. Its lookup structure is efficient, while the larger opportunity for improvement seems to be in part cache, especially in how cached parts are retained and reused during continuous Chinese text rendering.

HVF FlatPartCache Inefficiency Causing Chinese Text Rendering Regression on iOS 18+
 
 
Q