农转非是什么意思| 霉菌阴性是什么意思| 日食是什么现象| 857什么意思| 眼睛总是流泪是什么原因| 什么减肥药效果最好而且不反弹| 梦见橘子是什么意思| 属猪的护身佛是什么佛| 仕途是什么意思| 命里有时终须有命里无时莫强求什么意思| 百合什么时候开花| 为什么孩子要跟爸爸姓| 月经来了喝红糖水有什么好处| 包皮过长是什么样的| 女生掉头发严重是什么原因| 什么样的智齿不需要拔| 什么是单核细胞百分比| 胃一阵一阵的疼吃什么药| 瘢痕子宫什么意思| 哥们是什么意思| 小便有泡沫是什么情况| 90岁叫什么| 横梁是什么| 韩红是什么军衔| 逍遥丸主要治什么病| 软化血管吃什么药最好| 93是什么意思| 吃什么水果最好| 贫血三项是指什么检查| 高血压突然变成低血压是什么原因| 宫颈囊肿是什么| 萌宠是什么意思| 什么花能吃| 我知道你在想什么| 喝菊花水有什么好处| 湿浊中阻是什么意思| 经常流鼻涕是什么原因引起的| 什么光| 1993年属鸡是什么命| 浅棕色是什么颜色| ovs是什么品牌| 朱元璋为什么不传位给朱棣| 尿常规能检查出什么| 甲功七项能查出什么病| 粗口是什么意思| 姑姑的弟弟叫什么| 嗓子疼低烧吃什么药| 壁厚是什么意思| 什么的草地| 减脂早餐吃什么| 栀子花叶子发黄是什么原因| 什么的妈妈| 吃什么凉血效果最好| 2016年是属什么年| 瞳孔扩散意味着什么| 肩周炎用什么药最好| 印堂跳动是什么预兆| 怀孕前三个月不能吃什么| 什么是女人味| 扁桃体切除对身体有什么影响| 龋齿和蛀牙有什么区别| 血糖高应该注意什么| 2007年是什么生肖| 什么是核心期刊| 打呼噜吃什么药最管用| 颈肩综合症有什么症状| 镶牙与种牙有什么区别| 92年的属什么生肖| 一眼万年是什么意思| 义字少一点念什么| 91网站是什么| 长沙有什么区| hpv会有什么症状| 一个大一个小念什么| 6月16日是什么星座| 什么鱼蛋白质含量高| 天牛喜欢吃什么| hbaic是什么意思| 外痔疮是什么样子图片| 为什么叫马桶| 磨牙齿有什么方法可以治| 清凉的什么| 白喉采取什么隔离| 激素六项是查什么的| 2009属什么生肖| 炫的意思是什么| pppd是什么意思| 男人胡子长得快是什么原因| 3月5日是什么纪念日| 黄色裤子配什么上衣| 乳房边缘一按就疼是什么原因| 空调睡眠是什么意思| 鼻塞一直不好什么原因| 慢性萎缩性胃炎是什么意思| 成都立冬吃什么| hg是什么意思| 金字旁加各念什么| 押韵什么意思| 11年是什么婚| 陈年是什么意思| 井什么有什么| 一个口一个者念什么| 肺不张是什么意思| 补充b族维生素有什么好处| 什么是先天之本| 打感情牌是什么意思| 前列腺炎吃什么药最好| 梅长苏是什么电视剧| 轰20什么时候首飞| 妙手回春是什么意思| 1995是什么年| 中度贫血是什么原因造成的| 喉咙痛吃什么药效果最好| 女人脚抽筋是什么原因| 总胆固醇什么意思| 梦见橘子是什么意思| 肝小钙化灶是什么意思| 水瓶座和什么座最配对| 吃什么代谢快有助于减肥| 垢是什么意思| 慢心律又叫什么药| 陈皮泡酒喝有什么功效和作用| 六畜大宝在农家是什么生肖| 霉菌性阴炎用什么药好得快| 眼肿是什么原因引起的| 超导体是什么| 羊传染人的病叫什么名| 阴阳代表什么数字| 白头发吃什么药| 眼球发黄是什么原因| 相性是什么意思| 咳出血是什么原因| 淀粉可以用什么代替| dha什么牌子最好最安全| nike是什么牌子| 关节退行性变是什么意思| 什么是轻断食| 稽留流产是什么意思| 什么人不能吃马齿苋| 职业测试你适合什么工作| cps是什么意思啊| 缺镁吃什么食物补充最快| 公务员是什么职业| 阴茎皮开裂是什么原因| yaoi是什么| 梦见系鞋带是什么意思| 玉势是什么| 郑州有什么大学| 八月二十八是什么星座| 手容易出汗是什么原因| 康复治疗学什么| 肚脐右边疼是什么原因| 割包皮去医院挂什么科| 空气净化器有什么作用| 食用香精是什么| 肝经不通吃什么中成药| 风疹是什么症状| 一什么话| 让球是什么意思| 看高血压挂什么科| 什么是三级片| 特需病房是什么意思| 什么的珊瑚| 妇科养荣胶囊主治什么| 肾亏吃什么好| hpy什么意思| 腿麻是什么原因引起的| canon什么牌子| 孕妇吃什么补血| 什么是鸡胸病症状图片| emoji什么意思| 窦性心律逆钟向转位是什么意思| 夏天吃什么菜好| 煮粥用什么锅最好| 服装属于五行什么行业| 八个月宝宝可以吃什么水果| 寒冷的反义词是什么| 酸溜溜的什么| 牛字五行属什么| 转氨酶和转移酶有什么区别| 背靠背协议是什么意思| 蜜獾为什么什么都不怕| 孕妇贫血对胎儿有什么影响| 胸部中间痛什么原因引起的| 北极熊吃什么| 尿道口发痒是什么原因| 诺氟沙星胶囊治什么病| 月经量减少是什么原因| 什么的目光| 晚上十点是什么时辰| 暗网是什么意思| 什么的菊花| 惊什么失什么| 补办手机卡需要什么| 勾芡用什么粉| lg手机是什么牌子| 扁平疣是什么| 婴儿坐高铁需要什么证件| 何许人也是什么意思| coa什么意思| 喝鲜羊奶有什么好处和坏处| 内痔吃什么药| 1997属什么| dsa是什么意思| elsevier是什么期刊| 爱马仕为什么要配货| 09年的牛是什么命| 熙熙攘攘是什么意思| 草字头加青读什么| 抑郁症是什么病| 紫阳茶属于什么茶| 眩晕去医院挂什么科室| 什么鱼适合红烧| 白细胞低吃什么药可以增加白细胞| 紧急避孕药有什么副作用| 尿道感染吃什么药最好| 无犯罪证明需要什么材料| 什么米好吃又香又软| maxco是什么牌子| 老干局是干什么的| 88年属什么| 女人梦见棺材代表什么| 雷诺氏病是一种什么病| 梗塞灶是什么意思| 11月18号是什么星座| 坐飞机什么不能带| 生死离别代表什么生肖| darling什么意思| 厌恶是什么意思| 抗坏血酸钠是什么| 做活检是什么意思| 三角巾是什么| 三伏的伏是什么意思| 做梦梦见老婆出轨是什么意思| 学海无涯苦作舟的上一句是什么| 益母草颗粒什么时候喝| 脑梗输什么液效果最好| 申时左眼跳是什么预兆| 咖啡与什么食物相克| 给你脸了是什么意思| 牛黄是什么东西| 手心红是什么原因| cosmo是什么意思| 七月二十九是什么星座| 疤痕增生是什么原因| 肥肠炒什么菜好吃| 六味地黄丸起什么作用| 九五至尊是什么生肖| 7月7日是什么纪念日| 耘是什么意思| 肝囊肿是什么病| 5月11号是什么星座| 不宁腿综合症是什么原因引起的| 长期吃泡面有什么危害| 蛞蝓是什么动物| 胎盘低是什么原因造成的| 腹腔淋巴结肿大是什么原因| 咖啡因是什么东西| 南京有什么特产可以带回家| 肌酐高吃什么好| 无缘无故吐血是什么原因| 乳腺增生吃什么药好| 夜里2点到3点醒什么原因| 喝什么排湿气| 百度Jump to content

中国中车总经理:高铁出口不受中美贸易摩擦影响

From Wikipedia, the free encyclopedia
(Redirected from Hash map)
百度     在新出炉的报告中,取代刚果(金)成为第二大收养儿童来源国的是埃塞俄比亚,被美国收养的儿童人数为313人,排名其后的是韩国、海地、印度、乌克兰、哥伦比亚和尼日利亚。

Hash table
TypeUnordered associative array
Invented1953
Time complexity in big O notation
Operation Average Worst case
Search Θ(1) O(n)[a]
Insert Θ(1) O(n)
Delete Θ(1) O(n)
Space complexity
Space Θ(n)[2] O(n)
A small phone book as a hash table

In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps keys to values.[3] A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. A map implemented by a hash table is called a hash map.

Most hash table designs employ an imperfect hash function. Hash collisions, where the hash function generates the same index for more than one key, therefore typically must be accommodated in some way.

In a well-dimensioned hash table, the average time complexity for each lookup is independent of the number of elements stored in the table. Many hash table designs also allow arbitrary insertions and deletions of key–value pairs, at amortized constant average cost per operation.[4][5][6]

Hashing is an example of a space-time tradeoff. If memory is infinite, the entire key can be used directly as an index to locate its value with a single memory access. On the other hand, if infinite time is available, values can be stored without regard for their keys, and a binary search or linear search can be used to retrieve the element.[7]:?458?

In many situations, hash tables turn out to be on average more efficient than search trees or any other table lookup structure. For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets.

History

[edit]

The idea of hashing arose independently in different places. In January 1953, Hans Peter Luhn wrote an internal IBM memorandum that used hashing with chaining. The first example of open addressing was proposed by A. D. Linh, building on Luhn's memorandum.[5]:?547? Around the same time, Gene Amdahl, Elaine M. McGraw, Nathaniel Rochester, and Arthur Samuel of IBM Research implemented hashing for the IBM 701 assembler.[8]:?124? Open addressing with linear probing is credited to Amdahl, although Andrey Ershov independently had the same idea.[8]:?124–125? The term "open addressing" was coined by W. Wesley Peterson in his article which discusses the problem of search in large files.[9]:?15?

The first published work on hashing with chaining is credited to Arnold Dumey, who discussed the idea of using remainder modulo a prime as a hash function.[9]:?15? The word "hashing" was first published in an article by Robert Morris.[8]:?126? A theoretical analysis of linear probing was submitted originally by Konheim and Weiss.[9]:?15?

Overview

[edit]

An associative array stores a set of (key, value) pairs and allows insertion, deletion, and lookup (search), with the constraint of unique keys. In the hash table implementation of associative arrays, an array of length is partially filled with elements, where . A key is hashed using a hash function to compute an index location in the hash table, where . At this index, both the key and its associated value are stored. Storing the key alongside the value ensures that lookups can verify the key at the index to retrieve the correct value, even in the presence of collisions. Under reasonable assumptions, hash tables have better time complexity bounds on search, delete, and insert operations in comparison to self-balancing binary search trees.[9]:?1?

Hash tables are also commonly used to implement sets, by omitting the stored value for each key and merely tracking whether the key is present.[9]:?1?

Load factor

[edit]

A load factor is a critical statistic of a hash table, and is defined as follows:[2] where

  • is the number of entries occupied in the hash table.
  • is the number of buckets.

The performance of the hash table deteriorates in relation to the load factor .[9]:?2? In the limit of large and , each bucket statistically has a Poisson distribution with expectation for an ideally random hash function.

The software typically ensures that the load factor remains below a certain constant, . This helps maintain good performance. Therefore, a common approach is to resize or "rehash" the hash table whenever the load factor reaches . Similarly the table may also be resized if the load factor drops below .[10]

Load factor for separate chaining

[edit]

With separate chaining hash tables, each slot of the bucket array stores a pointer to a list or array of data.[11]

Separate chaining hash tables suffer gradually declining performance as the load factor grows, and no fixed point beyond which resizing is absolutely needed.[10]

With separate chaining, the value of that gives best performance is typically between 1 and 3.[10]

Load factor for open addressing

[edit]

With open addressing, each slot of the bucket array holds exactly one item. Therefore an open-addressed hash table cannot have a load factor greater than 1.[11]

The performance of open addressing becomes very bad when the load factor approaches 1.[10]

Therefore a hash table that uses open addressing must be resized or rehashed if the load factor approaches 1.[10]

With open addressing, acceptable figures of max load factor should range around 0.6 to 0.75.[12][13]:?110?

Hash function

[edit]

A hash function maps the universe of keys to indices or slots within the table, that is, for . The conventional implementations of hash functions are based on the integer universe assumption that all elements of the table stem from the universe , where the bit length of is confined within the word size of a computer architecture.[9]:?2?

A hash function is said to be perfect for a given set if it is injective on , that is, if each element maps to a different value in .[14][15] A perfect hash function can be created if all the keys are known ahead of time.[14]

Integer universe assumption

[edit]

The schemes of hashing used in integer universe assumption include hashing by division, hashing by multiplication, universal hashing, dynamic perfect hashing, and static perfect hashing.[9]:?2? However, hashing by division is the commonly used scheme.[16]:?264?[13]:?110?

Hashing by division

[edit]

The scheme in hashing by division is as follows:[9]:?2? where is the hash value of and is the size of the table.

Hashing by multiplication

[edit]

The scheme in hashing by multiplication is as follows:[9]:?2–3? Where is a non-integer real-valued constant and is the size of the table. An advantage of the hashing by multiplication is that the is not critical.[9]:?2–3? Although any value produces a hash function, Donald Knuth suggests using the golden ratio.[9]:?3?

Choosing a hash function

[edit]

Uniform distribution of the hash values is a fundamental requirement of a hash function. A non-uniform distribution increases the number of collisions and the cost of resolving them. Uniformity is sometimes difficult to ensure by design, but may be evaluated empirically using statistical tests, e.g., a Pearson's chi-squared test for discrete uniform distributions.[17][18]

The distribution needs to be uniform only for table sizes that occur in the application. In particular, if one uses dynamic resizing with exact doubling and halving of the table size, then the hash function needs to be uniform only when the size is a power of two. Here the index can be computed as some range of bits of the hash function. On the other hand, some hashing algorithms prefer to have the size be a prime number.[19]

For open addressing schemes, the hash function should also avoid clustering, the mapping of two or more keys to consecutive slots. Such clustering may cause the lookup cost to skyrocket, even if the load factor is low and collisions are infrequent. The popular multiplicative hash is claimed to have particularly poor clustering behavior.[19][5]

K-independent hashing offers a way to prove a certain hash function does not have bad keysets for a given type of hashtable. A number of K-independence results are known for collision resolution schemes such as linear probing and cuckoo hashing. Since K-independence can prove a hash function works, one can then focus on finding the fastest possible such hash function.[20]

Collision resolution

[edit]

A search algorithm that uses hashing consists of two parts. The first part is computing a hash function which transforms the search key into an array index. The ideal case is such that no two search keys hash to the same array index. However, this is not always the case and impossible to guarantee for unseen given data.[21]:?515? Hence the second part of the algorithm is collision resolution. The two common methods for collision resolution are separate chaining and open addressing.[7]:?458?

Separate chaining

[edit]
Hash collision resolved by separate chaining
Hash collision by separate chaining with head records in the bucket array.

In separate chaining, the process involves building a linked list with key–value pair for each search array index. The collided items are chained together through a single linked list, which can be traversed to access the item with a unique search key.[7]:?464? Collision resolution through chaining with linked list is a common method of implementation of hash tables. Let and be the hash table and the node respectively, the operation involves as follows:[16]:?258?

Chained-Hash-Insert(T, k)
  insert x at the head of linked list T[h(k)]

Chained-Hash-Search(T, k)
  search for an element with key k in linked list T[h(k)]

Chained-Hash-Delete(T, k)
  delete x from the linked list T[h(k)]

If the element is comparable either numerically or lexically, and inserted into the list by maintaining the total order, it results in faster termination of the unsuccessful searches.[21]:?520–521?

Other data structures for separate chaining

[edit]

If the keys are ordered, it could be efficient to use "self-organizing" concepts such as using a self-balancing binary search tree, through which the theoretical worst case could be brought down to , although it introduces additional complexities.[21]:?521?

In dynamic perfect hashing, two-level hash tables are used to reduce the look-up complexity to be a guaranteed in the worst case. In this technique, the buckets of entries are organized as perfect hash tables with slots providing constant worst-case lookup time, and low amortized time for insertion.[22] A study shows array-based separate chaining to be 97% more performant when compared to the standard linked list method under heavy load.[23]:?99?

Techniques such as using fusion tree for each buckets also result in constant time for all operations with high probability.[24]

Caching and locality of reference

[edit]

The linked list of separate chaining implementation may not be cache-conscious due to spatial localitylocality of reference—when the nodes of the linked list are scattered across memory, thus the list traversal during insert and search may entail CPU cache inefficiencies.[23]:?91?

In cache-conscious variants of collision resolution through separate chaining, a dynamic array found to be more cache-friendly is used in the place where a linked list or self-balancing binary search trees is usually deployed, since the contiguous allocation pattern of the array could be exploited by hardware-cache prefetchers—such as translation lookaside buffer—resulting in reduced access time and memory consumption.[25][26][27]

Open addressing

[edit]
Hash collision resolved by open addressing with linear probing (interval=1). Note that "Ted Baker" has a unique hash, but nevertheless collided with "Sandra Dee", that had previously collided with "John Smith".
This graph compares the average number of CPU cache misses required to look up elements in large hash tables (far exceeding size of the cache) with chaining and linear probing. Linear probing performs better due to better locality of reference, though as the table gets full, its performance degrades drastically.

Open addressing is another collision resolution technique in which every entry record is stored in the bucket array itself, and the hash resolution is performed through probing. When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot and proceeding in some probe sequence, until an unoccupied slot is found. When searching for an entry, the buckets are scanned in the same sequence, until either the target record is found, or an unused array slot is found, which indicates an unsuccessful search.[28]

Well-known probe sequences include:

  • Linear probing, in which the interval between probes is fixed (usually 1).[29]
  • Quadratic probing, in which the interval between probes is increased by adding the successive outputs of a quadratic polynomial to the value given by the original hash computation.[30]:?272?
  • Double hashing, in which the interval between probes is computed by a secondary hash function.[30]:?272–273?

The performance of open addressing may be slower compared to separate chaining since the probe sequence increases when the load factor approaches 1.[10][23]:?93? The probing results in an infinite loop if the load factor reaches 1, in the case of a completely filled table.[7]:?471? The average cost of linear probing depends on the hash function's ability to distribute the elements uniformly throughout the table to avoid clustering, since formation of clusters would result in increased search time.[7]:?472?

Caching and locality of reference

[edit]

Since the slots are located in successive locations, linear probing could lead to better utilization of CPU cache due to locality of references resulting in reduced memory latency.[29]

Other collision resolution techniques based on open addressing

[edit]
Coalesced hashing
[edit]

Coalesced hashing is a hybrid of both separate chaining and open addressing in which the buckets or nodes link within the table.[31]:?6–8? The algorithm is ideally suited for fixed memory allocation.[31]:?4? The collision in coalesced hashing is resolved by identifying the largest-indexed empty slot on the hash table, then the colliding value is inserted into that slot. The bucket is also linked to the inserted node's slot which contains its colliding hash address.[31]:?8?

Cuckoo hashing
[edit]

Cuckoo hashing is a form of open addressing collision resolution technique which guarantees worst-case lookup complexity and constant amortized time for insertions. The collision is resolved through maintaining two hash tables, each having its own hashing function, and collided slot gets replaced with the given item, and the preoccupied element of the slot gets displaced into the other hash table. The process continues until every key has its own spot in the empty buckets of the tables; if the procedure enters into infinite loop—which is identified through maintaining a threshold loop counter—both hash tables get rehashed with newer hash functions and the procedure continues.[32]:?124–125?

Hopscotch hashing
[edit]

Hopscotch hashing is an open addressing based algorithm which combines the elements of cuckoo hashing, linear probing and chaining through the notion of a neighbourhood of buckets—the subsequent buckets around any given occupied bucket, also called a "virtual" bucket.[33]:?351–352? The algorithm is designed to deliver better performance when the load factor of the hash table grows beyond 90%; it also provides high throughput in concurrent settings, thus well suited for implementing resizable concurrent hash table.[33]:?350? The neighbourhood characteristic of hopscotch hashing guarantees a property that, the cost of finding the desired item from any given buckets within the neighbourhood is very close to the cost of finding it in the bucket itself; the algorithm attempts to be an item into its neighbourhood—with a possible cost involved in displacing other items.[33]:?352?

Each bucket within the hash table includes an additional "hop-information"—an H-bit bit array for indicating the relative distance of the item which was originally hashed into the current virtual bucket within H ? 1 entries.[33]:?352? Let and be the key to be inserted and bucket to which the key is hashed into respectively; several cases are involved in the insertion procedure such that the neighbourhood property of the algorithm is vowed:[33]:?352–353? if is empty, the element is inserted, and the leftmost bit of bitmap is set to 1; if not empty, linear probing is used for finding an empty slot in the table, the bitmap of the bucket gets updated followed by the insertion; if the empty slot is not within the range of the neighbourhood, i.e. H ? 1, subsequent swap and hop-info bit array manipulation of each bucket is performed in accordance with its neighbourhood invariant properties.[33]:?353?

Robin Hood hashing
[edit]

Robin Hood hashing is an open addressing based collision resolution algorithm; the collisions are resolved through favouring the displacement of the element that is farthest—or longest probe sequence length (PSL)—from its "home location" i.e. the bucket to which the item was hashed into.[34]:?12? Although Robin Hood hashing does not change the theoretical search cost, it significantly affects the variance of the distribution of the items on the buckets,[35]:?2? i.e. dealing with cluster formation in the hash table.[36] Each node within the hash table that uses Robin Hood hashing should be augmented to store an extra PSL value.[37] Let be the key to be inserted, be the (incremental) PSL length of , be the hash table and be the index, the insertion procedure is as follows:[34]:?12–13?[38]:?5?

  • If : the iteration goes into the next bucket without attempting an external probe.
  • If : insert the item into the bucket ; swap with —let it be ; continue the probe from the th bucket to insert ; repeat the procedure until every element is inserted.

Dynamic resizing

[edit]

Repeated insertions cause the number of entries in a hash table to grow, which consequently increases the load factor; to maintain the amortized performance of the lookup and insertion operations, a hash table is dynamically resized and the items of the tables are rehashed into the buckets of the new hash table,[10] since the items cannot be copied over as varying table sizes results in different hash value due to modulo operation.[39] If a hash table becomes "too empty" after deleting some elements, resizing may be performed to avoid excessive memory usage.[40]

Resizing by moving all entries

[edit]

Generally, a new hash table with a size double that of the original hash table gets allocated privately and every item in the original hash table gets moved to the newly allocated one by computing the hash values of the items followed by the insertion operation. Rehashing is simple, but computationally expensive.[41]:?478–479?

Alternatives to all-at-once rehashing

[edit]

Some hash table implementations, notably in real-time systems, cannot pay the price of enlarging the hash table all at once, because it may interrupt time-critical operations. If one cannot avoid dynamic resizing, a solution is to perform the resizing gradually to avoid storage blip—typically at 50% of new table's size—during rehashing and to avoid memory fragmentation that triggers heap compaction due to deallocation of large memory blocks caused by the old hash table.[42]:?2–3? In such case, the rehashing operation is done incrementally through extending prior memory block allocated for the old hash table such that the buckets of the hash table remain unaltered. A common approach for amortized rehashing involves maintaining two hash functions and . The process of rehashing a bucket's items in accordance with the new hash function is termed as cleaning, which is implemented through command pattern by encapsulating the operations such as , and through a wrapper such that each element in the bucket gets rehashed and its procedure involve as follows:[42]:?3?

  • Clean bucket.
  • Clean bucket.
  • The command gets executed.

Linear hashing

[edit]

Linear hashing is an implementation of the hash table which enables dynamic growths or shrinks of the table one bucket at a time.[43]

Performance

[edit]

The performance of a hash table is dependent on the hash function's ability in generating quasi-random numbers () for entries in the hash table where , and denotes the key, number of buckets and the hash function such that . If the hash function generates the same for distinct keys (), this results in collision, which is dealt with in a variety of ways. The constant time complexity () of the operation in a hash table is presupposed on the condition that the hash function doesn't generate colliding indices; thus, the performance of the hash table is directly proportional to the chosen hash function's ability to disperse the indices.[44]:?1? However, construction of such a hash function is practically infeasible, that being so, implementations depend on case-specific collision resolution techniques in achieving higher performance.[44]:?2?

The best performance is obtained in the case that the hash function distributes the elements of the universe uniformaly, and the elements stored at the table are drawn at random from the universe. In this case, in hashing with chaining, the expected time for a successful search is , and the expected time for an unsuccessful search is .[45]

Applications

[edit]

Associative arrays

[edit]

Hash tables are commonly used to implement many types of in-memory tables. They are used to implement associative arrays.[30]

Database indexing

[edit]

Hash tables may also be used as disk-based data structures and database indices (such as in dbm) although B-trees are more popular in these applications.[46]

Caches

[edit]

Hash tables can be used to implement caches, auxiliary data tables that are used to speed up the access to data that is primarily stored in slower media. In this application, hash collisions can be handled by discarding one of the two colliding entries—usually erasing the old item that is currently stored in the table and overwriting it with the new item, so every item in the table has a unique hash value.[47][48]

Sets

[edit]

Hash tables can be used in the implementation of set data structure, which can store unique values without any particular order; set is typically used in testing the membership of a value in the collection, rather than element retrieval.[49]

Transposition table

[edit]

A transposition table to a complex Hash Table which stores information about each section that has been searched.[50]

Implementations

[edit]

Many programming languages provide hash table functionality, either as built-in associative arrays or as standard library modules.

  • In JavaScript, an "object" is a mutable collection of key-value pairs (called "properties"), where each key is either a string or a guaranteed-unique "symbol"; any other value, when used as a key, is first coerced to a string. Aside from the seven "primitive" data types, every value in JavaScript is an object.[51] ECMAScript 2015 also added the Map data structure, which accepts arbitrary values as keys.[52]
  • C++11 includes unordered_map in its standard library for storing keys and values of arbitrary types.[53]
  • Go's built-in map implements a hash table in the form of a type.[54]
  • Java programming language includes the HashSet, HashMap, LinkedHashSet, and LinkedHashMap generic collections.[55]
  • Python's built-in dict implements a hash table in the form of a type.[56]
  • Ruby's built-in Hash uses the open addressing model from Ruby 2.4 onwards.[57]
  • Rust programming language includes HashMap, HashSet as part of the Rust Standard Library.[58]
  • The .NET standard library includes HashSet and Dictionary,[59][60] so it can be used from languages such as C# and VB.NET.[61]

See also

[edit]

Notes

[edit]
  1. ^ There are approaches with a worst-case expected time complexity of O(log2 (1 - α)-1) where α is the load factor.[1]

References

[edit]
  1. ^ Martin Farach-Colton; Andrew Krapivin; William Kuszmaul. Optimal Bounds for Open Addressing Without Reordering. 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS). arXiv:2501.02305. doi:10.1109/FOCS61266.2024.00045.
  2. ^ a b Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009). Introduction to Algorithms (3rd ed.). Massachusetts Institute of Technology. pp. 253–280. ISBN 978-0-262-03384-8.
  3. ^ Mehlhorn, Kurt; Sanders, Peter (2008). "Hash Tables and Associative Arrays" (PDF). Algorithms and Data Structures. Springer. pp. 81–98. doi:10.1007/978-3-540-77978-0_4. ISBN 978-3-540-77977-3.
  4. ^ Leiserson, Charles E. (Fall 2005). "Lecture 13: Amortized Algorithms, Table Doubling, Potential Method". course MIT 6.046J/18.410J Introduction to Algorithms. Archived from the original on August 7, 2009.
  5. ^ a b c Knuth, Donald (1998). The Art of Computer Programming. Vol. 3: Sorting and Searching (2nd ed.). Addison-Wesley. pp. 513–558. ISBN 978-0-201-89685-5.
  6. ^ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Chapter 11: Hash Tables". Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. pp. 221–252. ISBN 978-0-262-53196-2.
  7. ^ a b c d e Sedgewick, Robert; Wayne, Kevin (2011). Algorithms. Vol. 1 (4 ed.). Addison-Wesley Professional – via Princeton University, Department of Computer Science.
  8. ^ a b c Konheim, Alan G. (2010). Hashing in Computer Science. doi:10.1002/9780470630617. ISBN 978-0-470-34473-6.
  9. ^ a b c d e f g h i j k l Mehta, Dinesh P.; Mehta, Dinesh P.; Sahni, Sartaj, eds. (2004). Handbook of Data Structures and Applications. doi:10.1201/9781420035179. ISBN 978-0-429-14701-2.
  10. ^ a b c d e f g Mayers, Andrew (2008). "CS 312: Hash tables and amortized analysis". Cornell University, Department of Computer Science. Archived from the original on April 26, 2021. Retrieved October 26, 2021 – via cs.cornell.edu.
  11. ^ a b James S. Plank and Brad Vander Zanden. "CS140 Lecture notes -- Hashing".
  12. ^ Maurer, W. D.; Lewis, T. G. (March 1975). "Hash Table Methods". ACM Computing Surveys. 7 (1): 5–19. doi:10.1145/356643.356645. S2CID 17874775.
  13. ^ a b Owolabi, Olumide (February 2003). "Empirical studies of some hashing functions". Information and Software Technology. 45 (2): 109–112. doi:10.1016/S0950-5849(02)00174-X.
  14. ^ a b Lu, Yi; Prabhakar, Balaji; Bonomi, Flavio (2006). Perfect Hashing for Network Applications. 2006 IEEE International Symposium on Information Theory. pp. 2774–2778. doi:10.1109/ISIT.2006.261567. ISBN 1-4244-0505-X. S2CID 1494710.
  15. ^ Belazzougui, Djamal; Botelho, Fabiano C.; Dietzfelbinger, Martin (2009). "Hash, displace, and compress" (PDF). Algorithms—ESA 2009: 17th Annual European Symposium, Copenhagen, Denmark, September 7–9, 2009, Proceedings. Lecture Notes in Computer Science. Vol. 5757. Berlin: Springer. pp. 682–693. CiteSeerX 10.1.1.568.130. doi:10.1007/978-3-642-04128-0_61. MR 2557794.
  16. ^ a b Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001). "Chapter 11: Hash Tables". Introduction to Algorithms (2nd ed.). Massachusetts Institute of Technology. ISBN 978-0-262-53196-2.
  17. ^ Pearson, Karl (1900). "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling". Philosophical Magazine. Series 5. 50 (302): 157–175. doi:10.1080/14786440009463897.
  18. ^ Plackett, Robin (1983). "Karl Pearson and the Chi-Squared Test". International Statistical Review. 51 (1): 59–72. doi:10.2307/1402731. JSTOR 1402731.
  19. ^ a b Wang, Thomas (March 1997). "Prime Double Hash Table". Archived from the original on September 3, 1999. Retrieved May 10, 2015.
  20. ^ Wegman, Mark N.; Carter, J.Lawrence (June 1981). "New hash functions and their use in authentication and set equality". Journal of Computer and System Sciences. 22 (3): 265–279. doi:10.1016/0022-0000(81)90033-7.
  21. ^ a b c Donald E. Knuth (April 24, 1998). The Art of Computer Programming: Volume 3: Sorting and Searching. Addison-Wesley Professional. ISBN 978-0-201-89685-5.
  22. ^ Demaine, Erik; Lind, Jeff (Spring 2003). "Lecture 2" (PDF). 6.897: Advanced Data Structures. MIT Computer Science and Artificial Intelligence Laboratory. Archived (PDF) from the original on June 15, 2010. Retrieved June 30, 2008.
  23. ^ a b c Culpepper, J. Shane; Moffat, Alistair (2005). "Enhanced Byte Codes with Restricted Prefix Properties". String Processing and Information Retrieval. Lecture Notes in Computer Science. Vol. 3772. pp. 1–12. doi:10.1007/11575832_1. ISBN 978-3-540-29740-6.
  24. ^ Willard, Dan E. (2000). "Examining computational geometry, van Emde Boas trees, and hashing from the perspective of the fusion tree". SIAM Journal on Computing. 29 (3): 1030–1049. doi:10.1137/S0097539797322425. MR 1740562..
  25. ^ Askitis, Nikolas; Sinha, Ranjan (October 2010). "Engineering scalable, cache and space efficient tries for strings". The VLDB Journal. 19 (5): 633–660. doi:10.1007/s00778-010-0183-9.
  26. ^ Askitis, Nikolas; Zobel, Justin (October 2005). "Cache-conscious Collision Resolution in String Hash Tables". Proceedings of the 12th International Conference, String Processing and Information Retrieval (SPIRE 2005). Vol. 3772/2005. pp. 91–102. doi:10.1007/11575832_11. ISBN 978-3-540-29740-6.
  27. ^ Askitis, Nikolas (2009). "Fast and Compact Hash Tables for Integer Keys" (PDF). Proceedings of the 32nd Australasian Computer Science Conference (ACSC 2009). Vol. 91. pp. 113–122. ISBN 978-1-920682-72-9. Archived from the original (PDF) on February 16, 2011. Retrieved June 13, 2010.
  28. ^ Tenenbaum, Aaron M.; Langsam, Yedidyah; Augenstein, Moshe J. (1990). Data Structures Using C. Prentice Hall. pp. 456–461, p. 472. ISBN 978-0-13-199746-2.
  29. ^ a b Pagh, Rasmus; Rodler, Flemming Friche (2001). "Cuckoo Hashing". Algorithms — ESA 2001. Lecture Notes in Computer Science. Vol. 2161. pp. 121–133. CiteSeerX 10.1.1.25.4189. doi:10.1007/3-540-44676-1_10. ISBN 978-3-540-42493-2.
  30. ^ a b c Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001), "11 Hash Tables", Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp. 221–252, ISBN 0-262-03293-7.
  31. ^ a b c Vitter, Jeffery S.; Chen, Wen-Chin (1987). The design and analysis of coalesced hashing. New York, United States: Oxford University Press. ISBN 978-0-19-504182-8 – via Archive.org.
  32. ^ Pagh, Rasmus; Rodler, Flemming Friche (2001). "Cuckoo Hashing". Algorithms — ESA 2001. Lecture Notes in Computer Science. Vol. 2161. pp. 121–133. CiteSeerX 10.1.1.25.4189. doi:10.1007/3-540-44676-1_10. ISBN 978-3-540-42493-2.
  33. ^ a b c d e f Herlihy, Maurice; Shavit, Nir; Tzafrir, Moran (2008). "Hopscotch Hashing". Distributed Computing. Lecture Notes in Computer Science. Vol. 5218. pp. 350–364. doi:10.1007/978-3-540-87779-0_24. ISBN 978-3-540-87778-3.
  34. ^ a b Celis, Pedro (1986). Robin Hood Hashing (PDF). Ontario, Canada: University of Waterloo, Dept. of Computer Science. ISBN 978-0-315-29700-5. OCLC 14083698. Archived (PDF) from the original on November 1, 2021. Retrieved November 2, 2021.
  35. ^ Poblete, P. V.; Viola, A. (July 2019). "Analysis of Robin Hood and Other Hashing Algorithms Under the Random Probing Model, With and Without Deletions". Combinatorics, Probability and Computing. 28 (4): 600–617. doi:10.1017/S0963548318000408. S2CID 125374363.
  36. ^ Clarkson, Michael (2014). "Lecture 13: Hash tables". Cornell University, Department of Computer Science. Archived from the original on October 7, 2021. Retrieved November 1, 2021 – via cs.cornell.edu.
  37. ^ Gries, David (2017). "JavaHyperText and Data Structure: Robin Hood Hashing" (PDF). Cornell University, Department of Computer Science. Archived (PDF) from the original on April 26, 2021. Retrieved November 2, 2021 – via cs.cornell.edu.
  38. ^ Celis, Pedro (March 28, 1988). External Robin Hood Hashing (PDF) (Technical report). Bloomington, Indiana: Indiana University, Department of Computer Science. 246. Archived (PDF) from the original on November 3, 2021. Retrieved November 2, 2021.
  39. ^ Goddard, Wayne (2021). "Chapter C5: Hash Tables" (PDF). Clemson University. pp. 15–16. Retrieved December 4, 2023.
  40. ^ Devadas, Srini; Demaine, Erik (February 25, 2011). "Intro to Algorithms: Resizing Hash Tables" (PDF). Massachusetts Institute of Technology, Department of Computer Science. Archived (PDF) from the original on May 7, 2021. Retrieved November 9, 2021 – via MIT OpenCourseWare.
  41. ^ Thareja, Reema (2014). "Hashing and Collision". Data Structures Using C. Oxford University Press. pp. 464–488. ISBN 978-0-19-809930-7.
  42. ^ a b Friedman, Scott; Krishnan, Anand; Leidefrost, Nicholas (March 18, 2003). "Hash Tables for Embedded and Real-time systems" (PDF). All Computer Science and Engineering Research. Washington University in St. Louis. doi:10.7936/K7WD3XXV. Archived (PDF) from the original on June 9, 2021. Retrieved November 9, 2021 – via Northwestern University, Department of Computer Science.
  43. ^ Litwin, Witold (1980). "Linear hashing: A new tool for file and table addressing" (PDF). Proc. 6th Conference on Very Large Databases. Carnegie Mellon University. pp. 212–223. Archived (PDF) from the original on May 6, 2021. Retrieved November 10, 2021 – via cs.cmu.edu.
  44. ^ a b Dijk, Tom Van (2010). "Analysing and Improving Hash Table Performance" (PDF). Netherlands: University of Twente. Archived (PDF) from the original on November 6, 2021. Retrieved December 31, 2021.
  45. ^ Baeza-Yates, Ricardo; Poblete, Patricio V. (1999). "Chapter 2: Searching". In Atallah (ed.). Algorithms and Theory of Computation Handbook. CRC Press. pp. 2–6. ISBN 0849326494.
  46. ^ Lech Banachowski. "Indexes and external sorting". pl:Polsko-Japońska Akademia Technik Komputerowych. Archived from the original on March 26, 2022. Retrieved March 26, 2022.
  47. ^ Zhong, Liang; Zheng, Xueqian; Liu, Yong; Wang, Mengting; Cao, Yang (February 2020). "Cache hit ratio maximization in device-to-device communications overlaying cellular networks". China Communications. 17 (2): 232–238. doi:10.23919/jcc.2020.02.018. S2CID 212649328.
  48. ^ Bottommley, James (January 1, 2004). "Understanding Caching". Linux Journal. Archived from the original on December 4, 2020. Retrieved April 16, 2022.
  49. ^ Jill Seaman (2014). "Set & Hash Tables" (PDF). Texas State University. Archived from the original on April 1, 2022. Retrieved March 26, 2022.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  50. ^ "Transposition Table - Chessprogramming wiki". chessprogramming.org. Archived from the original on February 14, 2021. Retrieved May 1, 2020.
  51. ^ "JavaScript data types and data structures - JavaScript | MDN". developer.mozilla.org. Retrieved July 24, 2022.
  52. ^ "Map - JavaScript | MDN". developer.mozilla.org. June 20, 2023. Retrieved July 15, 2023.
  53. ^ "Programming language C++ - Technical Specification" (PDF). International Organization for Standardization. pp. 812–813. Archived from the original (PDF) on January 21, 2022. Retrieved February 8, 2022.
  54. ^ "The Go Programming Language Specification". go.dev. Retrieved January 1, 2023.
  55. ^ "Lesson: Implementations (The Java? Tutorials > Collections)". docs.oracle.com. Archived from the original on January 18, 2017. Retrieved April 27, 2018.
  56. ^ Zhang, Juan; Jia, Yunwei (2020). "Redis rehash optimization based on machine learning". Journal of Physics: Conference Series. 1453 (1): 3. Bibcode:2020JPhCS1453a2048Z. doi:10.1088/1742-6596/1453/1/012048. S2CID 215943738.
  57. ^ Jonan Scheffler (December 25, 2016). "Ruby 2.4 Released: Faster Hashes, Unified Integers and Better Rounding". heroku.com. Archived from the original on July 3, 2019. Retrieved July 3, 2019.
  58. ^ "doc.rust-lang.org". Archived from the original on December 8, 2022. Retrieved December 14, 2022.
  59. ^ "HashSet Class (System.Collections.Generic)". learn.microsoft.com. Retrieved July 1, 2023.
  60. ^ dotnet-bot. "Dictionary Class (System.Collections.Generic)". learn.microsoft.com. Retrieved January 16, 2024.
  61. ^ "VB.NET HashSet Example". Dot Net Perls.

Further reading

[edit]
[edit]
自食其力是什么意思 肚子疼吐了是什么原因 阿司匹林不能和什么药一起吃 联通查话费打什么号码 巨蟹座和什么座最配
茉莉花茶适合什么季节喝 小孩多动症是什么原因引起的 上海有什么好玩的地方适合小孩子 友五行属什么 静对什么
解构是什么意思 后背凉凉的是什么原因 什么羽毛球拍最好 低聚木糖是什么 吃过榴莲不能吃什么
圆圆的月亮像什么 付之东流是什么意思 好马不吃回头草什么意思 王晶为什么不娶邱淑贞 小孩自闭症是什么原因引起的
喉咙肿大是什么原因hcv8jop5ns2r.cn 1948年属什么生肖hcv9jop4ns4r.cn 梦见抓鱼是什么预兆hcv7jop5ns5r.cn 公假是什么意思hcv8jop9ns1r.cn 9.29是什么星座hcv7jop5ns4r.cn
喝牛奶胀气是什么原因hcv8jop2ns5r.cn 眼圈黑是什么原因hcv7jop9ns5r.cn 结石长什么样子图片hcv9jop3ns4r.cn 什么药降肌酐最快最好hcv9jop5ns3r.cn 什么名字最好听hcv7jop9ns2r.cn
忏悔是什么意思hcv7jop5ns6r.cn 子宫内膜炎什么症状xjhesheng.com 正县级是什么级别hcv8jop3ns4r.cn 吃什么瘦肚子最快hcv9jop1ns9r.cn 活检是什么意思hcv8jop2ns2r.cn
申时属什么ff14chat.com fabric是什么面料gangsutong.com 这个梗是什么意思hcv8jop2ns5r.cn 为什么手臂上有很多很小的点adwl56.com 肠粉是用什么粉做的hcv8jop9ns1r.cn
百度