吃什么会导致流产| 八字五行属什么| 经常吃红枣有什么好处和坏处| 隐疾是什么意思| 糖尿病人可以吃什么零食| 鼻烟是什么| 党群是什么意思| 防弹衣为什么能防弹| 朝秦暮楚是什么生肖| 什么的舞姿| 墨龟为什么只能养一只| 肛门痒是什么原因男性| 空调开什么模式最凉快| 治未病科是看什么病的| 12岁是什么礼| 肾b超能检查出什么| 吃避孕药有什么危害| 偏头痛看什么科| 经常吃红枣有什么好处和坏处| 服软是什么意思| 容易感冒的人缺什么| 魔芋是什么| 肠易激综合症什么症状| 虚热是什么意思| 肺结节是什么症状| 子宫内膜薄有什么危害| 尿路感染去医院挂什么科| 睡觉吹气是什么原因| 立是什么意思| 2024年是什么年| 尿道口为什么叫马眼| 刚出生的小鱼苗吃什么| mice是什么意思| 射手座属于什么星象| 盆腔b超检查什么| 他达拉非是什么药| 生吃大蒜有什么好处| 乙肝两对半阴性是什么意思| 瘘是什么意思| 烫伤忌口不能吃什么| 莲雾是什么| 诱导是什么意思| TB是什么缩写| 子加一笔是什么字| 子宫内膜异位症是什么意思| 梦见别人给我介绍对象是什么意思| 男人喜欢什么样的女人做老婆| 蛇遇猪就得哭什么意思| 眼睛发炎吃什么消炎药| 大便一粒一粒的是什么原因| 童五行属什么| 星星代表什么生肖| Preparing什么意思| 什么补铁| 什么时候长智齿| 下嘴唇发紫是什么原因| 广州属于什么气候| 尿毒症是什么原因引起的| 枸杞有什么作用和功效| 氯吡格雷是什么药| 白球比低是什么原因| 梳头有什么好处| 烂大街是什么意思| 有什么症状是肯定没怀孕| 今天天气适合穿什么衣服| 阴茎出血是什么原因| 在什么前面用英语怎么说| 中药和中成药有什么区别| 猛犸象什么时候灭绝的| 探花是什么意思| 腊月初八是什么日子| 自闭症是什么人投胎| 百合病是什么病| 黑裙子配什么鞋子| 为什么插几下就射了| 吊龙是什么| 蜂鸟是什么鸟| 什么样的红点是艾滋病| cnc是什么牌子| 验光pd是什么意思| amy是什么意思| 边界尚清是什么意思| 梦见自己生小孩是什么征兆| 格林巴利综合症是什么病| 830是什么意思| 吃什么排湿气最好最快| mrv是什么检查| 文气是什么意思| 春回大地是什么生肖| 猎户座是什么星座| 酒店五行属什么| 琼脂是什么东西| 医学P代表什么| 茶不能和什么一起吃| 水瓶女和什么星座最配| 天美时手表什么档次| 破军星是什么意思| 为什么这样对我| 人流手术前需要注意什么| 子宫肥大是什么原因| 7.7是什么星座| 便秘是什么| 手背上有痣代表什么| 出汗对身体有什么好处| 药品gmp是什么意思| 心脑血管挂什么科| 庆幸是什么意思| 霜对什么| 苦荞有什么作用| 性功能减退吃什么药| fsa是什么意思| Q什么意思| 玉米除草剂什么时候打最好| 11月17日什么星座| 芈月和秦始皇是什么关系| 什么是宦官| 九加虎念什么| 萧邦手表什么档次| 月经不调是什么原因| 吃猪肺有什么好处和坏处| 广西有什么水果| 脚痒用什么药好| 许褚字什么| 5.7是什么星座| 女人长期喝西洋参有什么好处| pdd是什么| 缺镁吃什么药| 花千骨什么时候上映的| lafuma是什么牌子| 心包隐窝是什么意思| 花生属于什么类食物| 打饱嗝是什么原因造成的| 八字加一笔是什么字| 玉和石头有什么区别| 智利说什么语言| 月经来了有血块是什么原因| 尿路感染要吃什么药| 梦见烧纸钱是什么意思| b站的硬币有什么用| 西凤酒什么香型| 什么是宫寒| 肝功能异常挂什么科| 全身疼是什么病| 碳13和碳14有什么区别| 走后门什么意思| 屁股骨头疼是什么原因| tg是什么| 猴跟什么生肖相冲| 什么食物含叶黄素最多| 虹视是什么意思| laurel是什么牌子| 月经淋漓不尽是什么原因| 2.3是什么星座| 血瘀吃什么中成药| 靴型心见于什么病| 什么的山顶| 什么食物补肾| 鎏是什么意思| 传教士是什么意思| 中国国粹是什么| 联通查流量发什么短信| 什么药止痒效果最好| 做健身教练有什么要求| 抑郁症挂什么科室| 母公司是什么意思| 羊绒和羊毛有什么区别| 65年属什么生肖| 娃娃衫配什么裤子图片| 非浅表性胃炎是什么意思| 什么是双修| 托班是什么意思| 肾虚是什么原因造成的| 过期的啤酒能干什么| 三氧化硫常温下是什么状态| 脱臼是指什么从什么中滑脱| 肺部纹理增粗是什么意思| recipe什么意思| 不眠之夜是什么意思| 下气是什么意思| 拿什么东西不用手| 花圃是什么意思| 2t是什么意思| 肠胃炎可以喝什么饮料| 双抗是什么药| 八月一号什么星座| 洛五行属性是什么| 杞子配什么增强性功能| 房性心动过速是什么意思| 尿黄尿味大难闻是什么原因| ara是什么| 郭晶晶什么学历| 属相鸡与什么属相相合| 口臭口苦吃什么药最好| 老鹰茶是什么茶| 发低烧有什么症状| 属狗什么命| pass掉是什么意思| 臻的意思是什么| 9月3号是什么节日| 血压偏低是什么原因造成的| 补铁的药什么时候吃最好| 心脏舒张功能减低是什么意思| 40岁适合什么工作| 戈谢病是什么病| 什么药治咳嗽最好| 什么叫八字| 属马的男生和什么属相最配| 三七泡酒有什么功效| 胃有幽门螺旋杆菌是什么症状| 流鼻血是什么引起的| 血脂四项包括什么| 水杨酸是什么| 做梦梦见大火是什么意思| 检查肺部挂什么科室| 三白眼是什么意思| 什么星星| 什么水果消炎| 头孢是什么药| 明天属什么生肖| sos代表什么| 梦见面包是什么意思| 忌讳是什么意思| 唾液酸酶阳性是什么意思| 什么是反流性食管炎| 什么是肝炎| 小鸟吃什么| 长孙是什么意思| 琅琊榜是什么意思| 意思是什么意思| pt什么意思| 贤侄是什么意思| 吃什么雌激素会增多| 有脚气用什么药| 口腔发粘是什么原因| 吃什么东西可以长高| 木命和什么命最配| 心电图窦性心律不齐是什么意思| 什么时候排卵| 什么是k金| 什么是碳水化合物| 为什么小腹隐隐作痛| 包臀裙配什么上衣| 讨吃货什么意思| 吃什么可以壮阳| 拾荒者是什么意思| 梵高的星空表达了什么| 怀孕两天会有什么反应| 中药学是干什么的| 拉痢疾吃什么药| 马黛茶什么味道| 小儿鼻炎用什么药好| 高胆红素血症是什么病| 眼睛干涩吃什么食物好| 牛仔是什么面料| 汽车空调不制冷是什么原因| 大白刁是什么鱼| 菊花可以和什么一起泡水喝| 杨梅用什么酒泡最好| 风湿性关节炎吃什么药| 梦见好多鱼是什么意思| 肺结核钙化是什么意思| 彰字五行属什么| 偏头痛吃什么药效果好| 右肺中叶索条什么意思| 百度Jump to content

10月4号什么星座

From Wikipedia, the free encyclopedia
(Redirected from Collating sequence)
百度 3月5日是周恩来同志诞辰120周年的纪念日。

Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office filing systems, library catalogs, and reference books.

Collation differs from classification in that the classes themselves are not necessarily ordered. However, even if the order of the classes is irrelevant, the identifiers of the classes may be members of an ordered set, allowing a sorting algorithm to arrange the items by class.

Formally speaking, a collation method typically defines a total order on a set of possible identifiers, called sort keys, which consequently produces a total preorder on the set of items of information (items with the same identifier are not placed in any defined order).

A collation algorithm such as the Unicode collation algorithm defines an order through the process of comparing two given character strings and deciding which should come before the other. When an order has been defined in this way, a sorting algorithm can be used to put a list of any number of items into that order.

The main advantage of collation is that it makes it fast and easy for a user to find an element in the list, or to confirm that it is absent from the list. In automatic systems this can be done using a binary search algorithm or interpolation search; manual searching may be performed using a roughly similar procedure, though this will often be done unconsciously. Other advantages are that one can easily find the first or last elements on the list (most likely to be useful in the case of numerically sorted data), or elements in a given range (useful again in the case of numerical data, and also with alphabetically ordered data when one may be sure of only the first few letters of the sought item or items).

Ordering

[edit]

Numerical and chronological

[edit]

Strings representing numbers may be sorted based on the values of the numbers that they represent. For example, "?4", "2.5", "10", "89", "30,000". Pure application of this method may provide only a partial ordering on the strings, since different strings can represent the same number (as with "2" and "2.0" or, when scientific notation is used, "2e3" and "2000").

A similar approach may be taken with strings representing dates or other items that can be ordered chronologically or in some other natural fashion.

Alphabetical

[edit]

Alphabetical order is the basis for many systems of collation where items of information are identified by strings consisting principally of letters from an alphabet. The ordering of the strings relies on the existence of a standard ordering for the letters of the alphabet in question. (The system is not limited to alphabets in the strict technical sense; languages that use a syllabary or abugida, for example Cherokee, can use the same ordering principle provided there is a set ordering for the symbols used.)

To decide which of two strings comes first in alphabetical order, initially their first letters are compared. The string whose first letter appears earlier in the alphabet comes first in alphabetical order. If the first letters are the same, then the second letters are compared, and so on, until the order is decided. (If one string runs out of letters to compare, then it is deemed to come first; for example, "cart" comes before "carthorse".) The result of arranging a set of strings in alphabetical order is that words with the same first letter are grouped together, and within such a group words with the same first two letters are grouped together, and so on.

Capital letters are typically treated as equivalent to their corresponding lowercase letters. (For alternative treatments in computerized systems, see Automated collation, below.)

Certain limitations, complications, and special conventions may apply when alphabetical order is used:

  • When strings contain spaces or other word dividers, the decision must be taken whether to ignore these dividers or to treat them as symbols preceding all other letters of the alphabet. For example, if the first approach is taken then "car park" will come after "carbon" and "carp" (as it would if it were written "carpark"), whereas in the second approach "car park" will come before those two words. The first rule is used in many (but not all) dictionaries, the second in telephone directories (so that Wilson, Jim K appears with other people named Wilson, Jim and not after Wilson, Jimbo).
  • Abbreviations may be treated as if they were spelt out in full. For example, names containing "St." (short for the English word Saint) are often ordered as if they were written out as "Saint". There is also a traditional convention in English that surnames beginning Mc and M' are listed as if those prefixes were written Mac.
  • Strings that represent personal names will often be listed by alphabetical order of surname, even if the given name comes first. For example, Juan Hernandes and Brian O'Leary should be sorted as "Hernandes, Juan" and "O'Leary, Brian" even if they are not written this way.
  • Very common initial words, such as The in English, are often ignored for sorting purposes. So The Shining would be sorted as just "Shining" or "Shining, The".
  • When some of the strings contain numerals (or other non-letter characters), various approaches are possible. Sometimes such characters are treated as if they came before or after all the letters of the alphabet. Another method is for numbers to be sorted alphabetically as they would be spelled: for example 1776 would be sorted as if spelled out "seventeen seventy-six", and 24 heures du Mans as if spelled "vingt-quatre..." (French for "twenty-four"). When numerals or other symbols are used as special graphical forms of letters, as in 1337 for leet or Se7en for the movie title Seven, they may be sorted as if they were those letters.
  • Languages have different conventions for treating modified letters and certain letter combinations. For example, in Spanish the letter ? is treated as a basic letter following n, and the digraphs ch and ll were formerly (until 1994) treated as basic letters following c and l, although they are now alphabetized as two-letter combinations. A list of such conventions for various languages can be found at Alphabetical order § Language-specific conventions.

In several languages the rules have changed over time, and so older dictionaries may use a different order than modern ones. Furthermore, collation may depend on use. For example, German dictionaries and telephone directories use different approaches.

Root sorting

[edit]

Some Arabic dictionaries, such as Hans Wehr's bilingual A Dictionary of Modern Written Arabic, group and sort Arabic words by semitic root.[1] For example, the words kitāba (????? 'writing'), kitāb (???? 'book'), kātib (???? 'writer'), maktaba (????? 'library'), maktab (???? 'office'), maktūb (????? 'fate,' or 'written'), are agglomerated under the triliteral root k-t-b (? ? ?), which denotes 'writing'.[2]

Radical-and-stroke sorting

[edit]

Another form of collation is radical-and-stroke sorting, used for non-alphabetic writing systems such as the hanzi of Chinese and the kanji of Japanese, whose thousands of symbols defy ordering by convention. In this system, common components of characters are identified; these are called radicals in Chinese and logographic systems derived from Chinese. Characters are then grouped by their primary radical, then ordered by number of pen strokes within radicals. When there is no obvious radical or more than one radical, convention governs which is used for collation. For example, the Chinese character 妈 (meaning "mother") is sorted as a six-stroke character under the three-stroke primary radical 女 (meaning "woman").

The radical-and-stroke system is cumbersome compared to an alphabetical system in which there are a few characters, all unambiguous. The choice of which components of a logograph comprise separate radicals and which radical is primary is not clear-cut. As a result, logographic languages often supplement radical-and-stroke ordering with alphabetic sorting of a phonetic conversion of the logographs. For example, the kanji word Tōkyō (東京) can be sorted as if it were spelled out in the Japanese characters of the hiragana syllabary as "to-u-ki-yo-u" (とうきょう), using the conventional sorting order for these characters.[citation needed]

In addition, Chinese characters can also be sorted by stroke-based sorting. In Greater China, surname stroke ordering is a convention in some official documents where people's names are listed without hierarchy.

Automation

[edit]

When information is stored in digital systems, collation may become an automated process. It is then necessary to implement an appropriate collation algorithm that allows the information to be sorted in a satisfactory manner for the application in question. Often the aim will be to achieve an alphabetical or numerical ordering that follows the standard criteria as described in the preceding sections. However, not all of these criteria are easy to automate.[3]

The simplest kind of automated collation is based on the numerical codes of the symbols in a character set, such as ASCII coding (or any of its supersets such as Unicode), with the symbols being ordered in increasing numerical order of their codes, and this ordering being extended to strings in accordance with the basic principles of alphabetical ordering (mathematically speaking, lexicographical ordering). So a computer program might treat the characters a, b, C, d, and $ as being ordered $, C, a, b, d (the corresponding ASCII codes are $ = 36, a = 97, b = 98, C = 67, and d = 100). Therefore, strings beginning with C, M, or Z would be sorted before strings with lower-case a, b, etc. This is sometimes called ASCIIbetical order. This deviates from the standard alphabetical order, particularly due to the ordering of capital letters before all lower-case ones (and possibly the treatment of spaces and other non-letter characters). It is therefore often applied with certain alterations, the most obvious being case conversion (often to uppercase, for historical reasons[note 1]) before comparison of ASCII values.

In many collation algorithms, the comparison is based not on the numerical codes of the characters, but with reference to the collating sequence – a sequence in which the characters are assumed to come for the purpose of collation – as well as other ordering rules appropriate to the given application. This can serve to apply the correct conventions used for alphabetical ordering in the language in question, dealing properly with differently cased letters, modified letters, digraphs, particular abbreviations, and so on, as mentioned above under Alphabetical order, and in detail in the Alphabetical order article. Such algorithms are potentially quite complex, possibly requiring several passes through the text.[3]

Problems are nonetheless still common when the algorithm has to encompass more than one language. For example, in German dictionaries the word ?konomisch comes between offenbar and olfaktorisch, while Turkish dictionaries treat o and ? as different letters, placing oyun before ?bür.

A standard algorithm for collating any collection of strings composed of any standard Unicode symbols is the Unicode Collation Algorithm. This can be adapted to use the appropriate collation sequence for a given language by tailoring its default collation table. Several such tailorings are collected in Common Locale Data Repository.

Sort keys

[edit]

In some applications, the strings by which items are collated may differ from the identifiers that are displayed. For example, The Shining might be sorted as Shining, The (see Alphabetical order above), but it may still be desired to display it as The Shining. In this case two sets of strings can be stored, one for display purposes, and another for collation purposes. Strings used for collation in this way are called sort keys.

Issues with numbers

[edit]

Sometimes, it is desired to order text with embedded numbers using proper numerical order. For example, "Figure 7b" goes before "Figure 11a", even though '7' comes after '1' in Unicode. This can be extended to Roman numerals. This behavior is not particularly difficult to produce as long as only integers are to be sorted, although it can slow down sorting significantly. For example, Microsoft Windows does this when sorting file names.

Sorting decimals properly is a bit more difficult, because different locales use different symbols for a decimal point, and sometimes the same character used as a decimal point is also used as a separator, for example "Section 3.2.5". There is no universal answer for how to sort such strings; any rules are application dependent.

Labeling of ordered items

[edit]

In some contexts, numbers and letters are used not so much as a basis for establishing an ordering, but as a means of labeling items that are already ordered. For example, pages, sections, chapters, and the like, as well as the items of lists, are frequently "numbered" in this way. Labeling series that may be used include ordinary Arabic numerals (1, 2, 3, ...), Roman numerals (I, II, III, ... or i, ii, iii, ...), or letters (A, B, C, ... or a, b, c, ...). (An alternative method for indicating list items, without numbering them, is to use a bulleted list.)

When letters of an alphabet are used for this purpose of enumeration, there are certain language-specific conventions as to which letters are used. For example, the Russian letters Ъ and Ь (which in writing are only used for modifying the preceding consonant), and usually also Ы, Й, and Ё, are omitted. Also in many languages that use extended Latin script, the modified letters are often not used in enumeration.

See also

[edit]

Notes

[edit]
  1. ^ Historically, computers only handled text in uppercase (this dates back to telegraph conventions).

References

[edit]
  1. ^ Abu-Haidar, J. A. (1983). "Review of A Dictionary of Modern Written Arabic (Arabic-English)". Bulletin of the School of Oriental and African Studies, University of London. 46 (2): 351–353. doi:10.1017/S0041977X00079040. ISSN 0041-977X. JSTOR 615409.
  2. ^ "Hans Wehr Arabic-English Dictionary". ejtaal.net. Retrieved 2025-08-06.
  3. ^ a b M Programming: A Comprehensive Guide, Richard F. Walters, Digital Press, 1997
[edit]
而已是什么意思 省油的灯是什么意思 小龙虾什么季节 慢悠悠的近义词是什么 吃酒是什么意思
大饼是什么意思 哦吼是什么意思 最好的红酒是什么牌子 钙是什么 精神萎靡是什么意思
男性下焦湿热吃什么药 流鼻血不止是什么原因 今年农历是什么年 6月5号什么星座 人为什么会磨牙
videos是什么意思 牙龈肿痛吃什么药好得快 天条是什么意思 田园生活是什么意思 倒斗是什么意思
九十岁老人称什么hcv8jop1ns9r.cn 什么时候取环最合适hcv8jop1ns1r.cn 乙肝两对半15阳性是什么意思hcv8jop4ns9r.cn 晚上右眼跳是什么预兆hcv9jop0ns3r.cn 迷茫是什么意思jinxinzhichuang.com
所向披靡是什么意思hcv9jop0ns9r.cn 月什么意思luyiluode.com 同房后为什么会出血hcv8jop4ns9r.cn 山药跟淮山有什么区别hcv9jop3ns4r.cn 栀子有什么功效hcv8jop4ns4r.cn
喉炎吃什么药效果最好hcv8jop9ns9r.cn 淡定自若什么意思hcv9jop0ns8r.cn 脂溢性脱发是什么意思hcv8jop7ns2r.cn 抵押是什么意思hcv9jop2ns4r.cn 借鸡生蛋是什么意思beikeqingting.com
鸭胗是鸭的什么部位hcv9jop3ns3r.cn 高血糖适合吃什么主食hcv8jop8ns3r.cn 吃什么都苦是什么原因hcv9jop1ns1r.cn 什么是公历年份0735v.com 6个月宝宝可以吃什么水果hcv8jop7ns2r.cn
百度