I think you’re seeing a bug in initializing a CharacterSet with a multi-character string containing characters outside the Unicode Basic Multilingual Plane (BMP). In this case the emoji lives in the Supplementary Multilingual Plane with a 17-bit encoding, but the character set seems to lose the first bit. I get this in Swift REPL:
1> import Foundation
2> print(CharacterSet(charactersIn: "a"))
<CFCharacterSet Items(U+0061)>
3> print(CharacterSet(charactersIn: "ab"))
<CFCharacterSet Items(U+0061 U+0062)>
4> print(CharacterSet(charactersIn: "a\u{1F940}"))
<CFCharacterSet Items(U+0061 U+F940)>
If we assume the printed description is accurate, then the last line is clearly wrong: the string contains U+1F940 WILTED FLOWER but the resulting character set contains U+F940 CJK COMPATIBILITY IDEOGRAPH-F940 (a Chinese character). Given this broken character set, your diagnostic tests produce expected results.
Oddly, this doesn’t happen if the string contains exactly one character. The description is formatted differently but appears to be correct, as 129344 == 0x1F940:
5> print(CharacterSet(charactersIn: "\u{1F940}"))
<CFCharacterSet Range(129344, 1)>
All this may relate to the known issue that NSString works differently from Swift String for characters outside the BMP because... reasons. Read the documentation around “extended grapheme clusters” if you want to dive into it.
Topic:
App & System Services
SubTopic:
General
Tags: