# Copyright (C) 2016 and later: Unicode, Inc. and others. # License & terms of use: http://www.unicode.org/copyright.html # Copyright (c) 2012-2015 International Business Machines # Corporation and others. All Rights Reserved. # # This file should be in UTF-8 with a signature byte sequence ("BOM"). # # collationtest.txt: Collation test data. # # created on: 2012apr13 # created by: Markus W. Scherer # A line with "** test: description" is used for verbose and error output. # A collator can be set with "@ root" or "@ locale language-tag", # for example "@ locale de-u-co-phonebk". # An old-style locale ID can also be used, for example "@ locale de@collation=phonebook". # A collator can be built with "@ rules". # An "@ rules" line is followed by one or more lines with the tailoring rules. # A collator can be modified with "% attribute=value". # "* compare" tests the order (= or <) of the following strings. # The relation can be "=" or "<" (the level of the difference is not specified) # or "<1", "<2", " 1 CE &ae=ch=cH=Ch=CH # 2 chars -> 2 CEs &rst=yz=yZ=Yz=YZ # 2 chars -> 3 CEs % caseFirst=lower * compare <1 ae = ch <3 cH <3 Ch <3 CH <1 rst = yz <3 yZ <3 Yz <3 YZ <1 w <1 x = uv <3 uV = Uv # mixed case on single CE cannot distinguish variations <3 UV ** test: tertiary CEs, tertiary, caseLevel=off, caseFirst=lower @ rules &\u0001<<"a". # We need to back up before the identical prefix "1" and compare the full numbers. <1 11b <1 101a ** test: simple locale data test @ locale de * compare <1 a <2 ä <1 ae <2 æ @ locale de-u-co-phonebk * compare <1 a <1 ae <2 ä <2 æ # The following test cases were moved here from ICU 52's DataDrivenCollationTest.txt. ** test: DataDrivenCollationTest/TestMorePinyin # Testing the primary strength. @ locale zh % strength=primary * compare < lā = lĀ = Lā = LĀ < lān = lĀn < lē = lĒ = Lē = LĒ < lēn = lĒn ** test: DataDrivenCollationTest/TestLithuanian # Lithuanian sort order. @ locale lt * compare < cz < č < d < iz < j < sz < š < t < zz < ž ** test: DataDrivenCollationTest/TestLatvian # Latvian sort order. @ locale lv * compare < cz < č < d < gz < ģ < h < iz < j < kz < ķ < l < lz < ļ < m < nz < ņ < o < rz < ŗ < s < sz < š < t < zz < ž ** test: DataDrivenCollationTest/TestEstonian # Estonian sort order. @ locale et * compare < sy < š < šy < z < zy < ž < v < va < w < õ < õy < ä < äy < ö < öy < ü < üy < x ** test: DataDrivenCollationTest/TestAlbanian # Albanian sort order. @ locale sq * compare < cz < ç < d < dz < dh < e < ez < ë < f < gz < gj < h < lz < ll < m < nz < nj < o < rz < rr < s < sz < sh < t < tz < th < u < xz < xh < y < zz < zh ** test: DataDrivenCollationTest/TestSimplifiedChineseOrder # Sorted file has different order. @ root # normalization=on turned on & off automatically. * compare < \u5F20 < \u5F20\u4E00\u8E3F ** test: DataDrivenCollationTest/TestTibetanNormalizedIterativeCrash # This pretty much crashes. @ root * compare < \u0f71\u0f72\u0f80\u0f71\u0f72 < \u0f80 ** test: DataDrivenCollationTest/TestThaiPartialSortKeyProblems # These are examples of strings that caused trouble in partial sort key testing. @ locale th-TH * compare < \u0E01\u0E01\u0E38\u0E18\u0E20\u0E31\u0E13\u0E11\u0E4C < \u0E01\u0E01\u0E38\u0E2A\u0E31\u0E19\u0E42\u0E18 * compare < \u0E01\u0E07\u0E01\u0E32\u0E23 < \u0E01\u0E07\u0E42\u0E01\u0E49 * compare < \u0E01\u0E23\u0E19\u0E17\u0E32 < \u0E01\u0E23\u0E19\u0E19\u0E40\u0E0A\u0E49\u0E32 * compare < \u0E01\u0E23\u0E30\u0E40\u0E08\u0E35\u0E22\u0E27 < \u0E01\u0E23\u0E30\u0E40\u0E08\u0E35\u0E4A\u0E22\u0E27 * compare < \u0E01\u0E23\u0E23\u0E40\u0E0A\u0E2D < \u0E01\u0E23\u0E23\u0E40\u0E0A\u0E49\u0E32 ** test: DataDrivenCollationTest/TestJavaStyleRule # java.text allows rules to start as '<<