生生不息是什么意思| 冷喷机喷脸有什么好处| 中水是什么意思| 父母都是o型血孩子是什么血型| 01是什么意思| tel是什么意思啊| 牙龈有点发黑是什么原因| 不动产是什么意思| 8月6号什么星座| 晕车吃什么| 新生儿什么时候剪头发| 出现的反义词是什么| 吃什么死的比较舒服| 大便为什么是绿色的| 女人尿多是什么原因| 言字五行属什么| 心字底的字与什么有关| 一级亲属指的是什么| 口腔溃疡能吃什么| 膀胱炎是什么症状表现| 淋巴细胞偏高说明什么问题| 小朋友口臭是什么原因| 公主和郡主有什么区别| 嘴唇痒边缘痒用什么药| 电轴左偏是什么原因| 波罗蜜多什么意思| 双顶径和头围有什么区别| 碳酸钙d3片什么时候吃最好| 太子是什么生肖| 三什么两什么| 流金是什么字| 什么是相位| 头疼按什么穴位| 脂肪肝用什么药物治疗| 最好的止疼药是什么药| 07是什么生肖| 银壶一般什么价位| 圣诞节吃什么| 出汗发粘是什么原因| 人乳头瘤病毒58型阳性是什么意思| 大自然是什么意思| dw是什么牌子| 静脉曲张看什么科| 蝴蝶是什么意思| pvs是什么意思| 乙肝恢复期是什么意思| 展开的近义词是什么| 78岁属什么| 胃镜预约挂什么科| dsa什么意思| 从未是什么意思| youngor是什么牌子| 为什么金生水| 中超是什么| 吃降压药有什么副作用| 躯体化是什么意思| 久经沙场是什么意思| 乳酪是什么东西| 长期开灯睡觉有什么危害| 第57个民族是什么民族| 什么降血糖| 血小板减少有什么症状| 后背疼是什么原因引起的| 家里为什么有跳蚤| 天蝎座和什么座最配| 九月三日是什么日子| 肺燥吃什么中成药| 胳肢窝疼痛是什么原因| 蚊子除了吸血还吃什么| 鸡宝是什么| 斑秃挂什么科| 头孢和什么不能一起吃| 属猴的跟什么属相最配| 感冒喝什么粥| 松果体是什么| ds什么意思| 普洱茶属于什么茶类| 男生下面叫什么| ep是什么意思| 屙是什么意思| 护照需要什么材料| 什么时候绝经| 半梦半醒是什么意思| 制片人是做什么的| 1953年属蛇的是什么命| 女人吃什么越来越年轻| dady是什么意思| 女右上眼皮跳是什么预兆| 舒坦是什么意思| 双重人格是什么意思| 念珠菌吃什么药最好| 后脖子黑是什么原因| 月经不来是什么原因导致的| 胆汁反流是什么症状| lily是什么花| 灵芝长什么样子| 经常头痛是什么原因| 香砂是什么| 崽崽是什么意思| 直爽是什么意思| 吃小米粥有什么好处| 生小孩需要准备什么| 备皮是什么意思| 6月份生日是什么星座| slogan是什么意思啊| 枯草热是什么病| 雏凤是什么意思| 后背疼是什么原因| sorona是什么面料| 乳腺增生1类什么意思| 什么叫梗| 颈椎病吃什么药效果好| 马赛克是什么| 1989年属什么的| 有什么作用| 回声结节什么意思| 亨字五行属什么| 心梗是什么意思| 甲亢查什么项目| 扶阳是什么意思| 糯米粉是什么粉| 白天为什么能看到月亮| 2b什么意思| 油性记号笔用什么能擦掉| ad是补什么的| 脂蛋白高是什么意思| 小孩热感冒吃什么药好| 腹泻吃什么好| ao是什么意思| 鱼腥草破壁饮片有什么功效| 为什么会多囊| 野猪怕什么颜色| 万事顺意是什么意思| 胆囊结石会引起身体什么症状| 胃溃疡十二指肠溃疡吃什么药| 什么咖啡好喝| 胃酸反酸水吃什么药| hcg是什么意思| 马镫什么时候发明的| 为什么会上火| 花木兰代表什么生肖| 隅读什么| 为什么受伤总是我| 18k金是什么金| 中国最长的河流是什么| 吃面是什么意思| 什么的雾| 圣诞节送女生什么礼物好| 痛风不能吃什么东西| 棱长是什么| 今天是什么甲子| 脚癣是什么原因引起的| 屈光和近视有什么区别| 什么是上升星座| 长春有什么大学| ym是什么衣服品牌| 81年五行属什么| 靓字五行属什么| 什么是割礼| 小便带血是什么原因女性| 喜什么自什么| 头发掉的多是什么原因| 阴道瘙痒什么原因| mm表示什么| 睡不着吃什么药最有效| 荔枝适合什么地方种植| 红细胞体积偏高是什么意思| 什么叫生理需求| 恨不相逢未嫁时什么意思| 双减是什么意思| 心脏缺血吃什么补得快| 为什么一分钟就出来了| 书是什么排比句| 胃不舒服吃什么水果好| 什么的精神| 两败俱伤是什么意思| 什么是血压| 女人脱发是什么原因| 乙酉是什么意思| 什么什么来迟| 7月5号是什么星座| 蔻驰和古驰有什么区别| 为什么会长汗疱疹| MR医学上是什么意思| 女生有喉结是什么原因| d g是什么牌子| 转隶是什么意思| 斜纹棉是什么面料| 悉如外人的悉是什么意思| 因什么制宜| 莲花有什么寓意| 神龙见首不见尾是什么意思| 什么牌子的麦克风好用| 红细胞体积偏高是什么意思| 小孩腿疼膝盖疼可能是什么原因| 女人脸肿是什么原因引起的| 冰恋是什么| 重复肾是什么意思| 智五行属什么| 老年人脸肿是什么原因引起的| 克罗恩病吃什么药| 1996年属什么生肖| 众望所归是什么意思| tbs和tct有什么区别| rolls是什么意思| 戒指丢了暗示着什么| 内分泌紊乱是什么意思| 利空什么意思| 什么的雪花| 公鸡的尾巴有什么作用| 嗓子发干是什么原因| 肌肉疼痛吃什么药| 香菇配什么菜炒着好吃| 牙齿痛是什么原因| xo酱是什么酱| 上面一个日下面一个立是什么字| 月月红是什么花| 子宫内膜薄吃什么药| 发烧是什么感觉| 刘姥姥和贾府什么关系| 相对湿度是什么意思| 豆腐鱼是什么鱼| 腻害什么意思| 什么是抗性淀粉| 嗜睡是什么原因| 42岁属什么| 想一出是一出什么意思| 太阳五行属什么| 流清口水是什么原因| 原始心管搏动是什么意思| 吃三七粉不能吃什么| 其实不然是什么意思| 肠胃炎可以吃什么水果| 低脂牛奶适合什么人喝| 睾丸积液是什么原因造成的| 武汉大学校长是什么级别| 全麦粉和小麦粉的区别是什么| 龟苓膏的原料是什么| 秀米是什么| 外耳炎用什么药| 小孩子手脱皮是什么原因引起的| 脸颊两侧长斑是什么原因怎么调理| 解酒吃什么| 孟夏是什么意思| 喝酒对身体有什么危害| 雪莲果什么时候成熟| 五行缺土戴什么| 胃炎吃什么药好使| 将军是什么生肖| 品学兼优是什么意思| cpb是什么牌子| 番茄酱可以做什么菜| 西瓜有什么好处| 黄连水有什么作用与功效| 女为悦己者容是什么意思| 毒唯什么意思| 姨妈期间可以吃什么水果| 吃什么解酒快| 经常闪腰是什么原因引起的| 间接喉镜检查能检查出什么| 体检挂什么科| 验孕棒阴性是什么意思| 久坐伤什么| 香港为什么不用人民币| 百度Jump to content

纤细风筝线或可成伤人"利器" 大家玩耍要留心

From Wikipedia, the free encyclopedia
Comparison of two revisions of an example file, based on their longest common subsequence (black)
百度 美国与墨西哥之间有着一条全长1969英里(约3169公里)的西南边界线。

A longest common subsequence (LCS) is the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. The problem of computing longest common subsequences is a classic computer science problem, the basis of data comparison programs such as the diff utility, and has applications in computational linguistics and bioinformatics. It is also widely used by revision control systems such as Git for reconciling multiple changes made to a revision-controlled collection of files.

For example, consider the sequences (ABCD) and (ACBAD). They have five length-2 common subsequences: (AB), (AC), (AD), (BD), and (CD); two length-3 common subsequences: (ABD) and (ACD); and no longer common subsequences. So (ABD) and (ACD) are their longest common subsequences.

Complexity

[edit]

For the general case of an arbitrary number of input sequences, the problem is NP-hard.[1] When the number of sequences is constant, the problem is solvable in polynomial time by dynamic programming.

Given sequences of lengths , a naive search would test each of the subsequences of the first sequence to determine whether they are also subsequences of the remaining sequences; each subsequence may be tested in time linear in the lengths of the remaining sequences, so the time for this algorithm would be

For the case of two sequences of n and m elements, the running time of the dynamic programming approach is O(n × m).[2] For an arbitrary number of input sequences, the dynamic programming approach gives a solution in

There exist methods with lower complexity,[3] which often depend on the length of the LCS, the size of the alphabet, or both.

The LCS is not necessarily unique; in the worst case, the number of common subsequences is exponential in the lengths of the inputs, so the algorithmic complexity must be at least exponential.[4]

Solution for two sequences

[edit]

The LCS problem has an optimal substructure: the problem can be broken down into smaller, simpler subproblems, which can, in turn, be broken down into simpler subproblems, and so on, until, finally, the solution becomes trivial. LCS in particular has overlapping subproblems: the solutions to high-level subproblems often reuse solutions to lower level subproblems. Problems with these two properties are amenable to dynamic programming approaches, in which subproblem solutions are memoized, that is, the solutions of subproblems are saved for reuse.

Prefixes

[edit]

The prefix Sn of S is defined as the first n characters of S.[5] For example, the prefixes of S = (AGCA) are

S0 = ()
S1 = (A)
S2 = (AG)
S3 = (AGC)
S4 = (AGCA).

Let LCS(X, Y) be a function that computes a longest subsequence common to X and Y. Such a function has two interesting properties.

First property

[edit]

LCS(X^A,Y^A) = LCS(X,Y)^A, for all strings X, Y and all symbols A, where ^ denotes string concatenation. This allows one to simplify the LCS computation for two sequences ending in the same symbol. For example, LCS("BANANA","ATANA") = LCS("BANAN","ATAN")^"A", Continuing for the remaining common symbols, LCS("BANANA","ATANA") = LCS("BAN","AT")^"ANA".

Second property

[edit]

If A and B are distinct symbols (AB), then LCS(X^A,Y^B) is one of the maximal-length strings in the set { LCS(X^A,Y), LCS(X,Y^B) }, for all strings X, Y.

For example, LCS("ABCDEFG","BCDGK") is the longest string among LCS("ABCDEFG","BCDG") and LCS("ABCDEF","BCDGK"); if both happened to be of equal length, one of them could be chosen arbitrarily.

To realize the property, distinguish two cases:

  • If LCS("ABCDEFG","BCDGK") ends with a "G", then the final "K" cannot be in the LCS, hence LCS("ABCDEFG","BCDGK") = LCS("ABCDEFG","BCDG").
  • If LCS("ABCDEFG","BCDGK") does not end with a "G", then the final "G" cannot be in the LCS, hence LCS("ABCDEFG","BCDGK") = LCS("ABCDEF","BCDGK").

LCS function defined

[edit]

Let two sequences be defined as follows: and . The prefixes of are ; the prefixes of are . Let represent the set of longest common subsequence of prefixes and . This set of sequences is given by the following.

To find the LCS of and , compare and . If they are equal, then the sequence is extended by that element, . If they are not equal, then the longest among the two sequences, , and , is retained. (If they are the same length, but not identical, then both are retained.) The base case, when either or is empty, is the empty string, .

Worked example

[edit]

The longest subsequence common to R = (GAC), and C = (AGCAT) will be found. Because the LCS function uses a "zeroth" element, it is convenient to define zero prefixes that are empty for these sequences: R0 = ε; and C0 = ε. All the prefixes are placed in a table with C in the first row (making it a column header) and R in the first column (making it a row header).

LCS Strings
ε A G C A T
ε ε ε ε ε ε ε
G ε
A ε
C ε

This table is used to store the LCS sequence for each step of the calculation. The second column and second row have been filled in with ε, because when an empty sequence is compared with a non-empty sequence, the longest common subsequence is always an empty sequence.

LCS(R1, C1) is determined by comparing the first elements in each sequence. G and A are not the same, so this LCS gets (using the "second property") the longest of the two sequences, LCS(R1, C0) and LCS(R0, C1). According to the table, both of these are empty, so LCS(R1, C1) is also empty, as shown in the table below. The arrows indicate that the sequence comes from both the cell above, LCS(R0, C1) and the cell on the left, LCS(R1, C0).

LCS(R1, C2) is determined by comparing G and G. They match, so G is appended to the upper left sequence, LCS(R0, C1), which is (ε), giving (εG), which is (G).

For LCS(R1, C3), G and C do not match. The sequence above is empty; the one to the left contains one element, G. Selecting the longest of these, LCS(R1, C3) is (G). The arrow points to the left, since that is the longest of the two sequences.

LCS(R1, C4), likewise, is (G).

LCS(R1, C5), likewise, is (G).

"G" Row Completed
ε A G C A T
ε ε ε ε ε ε ε
G ε ε (G) (G) (G) (G)
A ε
C ε

For LCS(R2, C1), A is compared with A. The two elements match, so A is appended to ε, giving (A).

For LCS(R2, C2), A and G do not match, so the longest of LCS(R1, C2), which is (G), and LCS(R2, C1), which is (A), is used. In this case, they each contain one element, so this LCS is given two subsequences: (A) and (G).

For LCS(R2, C3), A does not match C. LCS(R2, C2) contains sequences (A) and (G); LCS(R1, C3) is (G), which is already contained in LCS(R2, C2). The result is that LCS(R2, C3) also contains the two subsequences, (A) and (G).

For LCS(R2, C4), A matches A, which is appended to the upper left cell, giving (GA).

For LCS(R2, C5), A does not match T. Comparing the two sequences, (GA) and (G), the longest is (GA), so LCS(R2, C5) is (GA).

"G" & "A" Rows Completed
ε A G C A T
ε ε ε ε ε ε ε
G ε ε (G) (G) (G) (G)
A ε (A) (A) & (G) (A) & (G) (GA) (GA)
C ε

For LCS(R3, C1), C and A do not match, so LCS(R3, C1) gets the longest of the two sequences, (A).

For LCS(R3, C2), C and G do not match. Both LCS(R3, C1) and LCS(R2, C2) have one element. The result is that LCS(R3, C2) contains the two subsequences, (A) and (G).

For LCS(R3, C3), C and C match, so C is appended to LCS(R2, C2), which contains the two subsequences, (A) and (G), giving (AC) and (GC).

For LCS(R3, C4), C and A do not match. Combining LCS(R3, C3), which contains (AC) and (GC), and LCS(R2, C4), which contains (GA), gives a total of three sequences: (AC), (GC), and (GA).

Finally, for LCS(R3, C5), C and T do not match. The result is that LCS(R3, C5) also contains the three sequences, (AC), (GC), and (GA).

Completed LCS Table
ε A G C A T
ε ε ε ε ε ε ε
G ε ε (G) (G) (G) (G)
A ε (A) (A) & (G) (A) & (G) (GA) (GA)
C ε (A) (A) & (G) (AC) & (GC) (AC) & (GC) & (GA) (AC) & (GC) & (GA)

The final result is that the last cell contains all the longest subsequences common to (AGCAT) and (GAC); these are (AC), (GC), and (GA). The table also shows the longest common subsequences for every possible pair of prefixes. For example, for (AGC) and (GA), the longest common subsequence are (A) and (G).

Traceback approach

[edit]

Calculating the LCS of a row of the LCS table requires only the solutions to the current row and the previous row. Still, for long sequences, these sequences can get numerous and long, requiring a lot of storage space. Storage space can be saved by saving not the actual subsequences, but the length of the subsequence and the direction of the arrows, as in the table below.

Storing length, rather than sequences
ε A G C A T
ε 0 0 0 0 0 0
G 0 0 1 1 1 1
A 0 1 1 1 2 2
C 0 1 1 2 2 2

The actual subsequences are deduced in a "traceback" procedure that follows the arrows backwards, starting from the last cell in the table. When the length decreases, the sequences must have had a common element. Several paths are possible when two arrows are shown in a cell. Below is the table for such an analysis, with numbers colored in cells where the length is about to decrease. The bold numbers trace out the sequence, (GA).[6]

Traceback example
ε A G C A T
ε 0 0 0 0 0 0
G 0 0 1 1 1 1
A 0 1 1 1 2 2
C 0 1 1 2 2 2

Relation to other problems

[edit]

For two strings and , the length of the shortest common supersequence is related to the length of the LCS by[3]

The edit distance when only insertion and deletion is allowed (no substitution), or when the cost of the substitution is the double of the cost of an insertion or deletion, is:

Code for the dynamic programming solution

[edit]

Computing the length of the LCS

[edit]

The function below takes as input sequences X[1..m] and Y[1..n], computes the LCS between X[1..i] and Y[1..j] for all 1 ≤ i ≤ m and 1 ≤ j ≤ n, and stores it in C[i,j]. C[m,n] will contain the length of the LCS of X and Y.[7]

function LCSLength(X[1..m], Y[1..n])
    C = array(0..m, 0..n)
    for i := 0..m
        C[i,0] = 0
    for j := 0..n
        C[0,j] = 0
    for i := 1..m
        for j := 1..n
            if X[i] = Y[j]
                C[i,j] := C[i-1,j-1] + 1
            else
                C[i,j] := max(C[i,j-1], C[i-1,j])
    return C[m,n]

Alternatively, memoization could be used.

Reading out a LCS

[edit]

The following function backtracks the choices taken when computing the C table. If the last characters in the prefixes are equal, they must be in an LCS. If not, check what gave the largest LCS of keeping and , and make the same choice. Just choose one if they were equally long. Call the function with i=m and j=n.

function backtrack(C[0..m,0..n], X[1..m], Y[1..n], i, j)
    if i = 0 or j = 0
        return ""
    if  X[i] = Y[j]
        return backtrack(C, X, Y, i-1, j-1) + X[i]
    if C[i,j-1] > C[i-1,j]
        return backtrack(C, X, Y, i, j-1)
    return backtrack(C, X, Y, i-1, j)

Reading out all LCSs

[edit]

If choosing and would give an equally long result, read out both resulting subsequences. This is returned as a set by this function. Notice that this function is not polynomial, as it might branch in almost every step if the strings are similar.

function backtrackAll(C[0..m,0..n], X[1..m], Y[1..n], i, j)
    if i = 0 or j = 0
        return {""}
    if X[i] = Y[j]
        return {Z + X[i] for all Z in backtrackAll(C, X, Y, i-1, j-1)}
    R := {}
    if C[i,j-1] ≥ C[i-1,j]
        R := backtrackAll(C, X, Y, i, j-1)
    if C[i-1,j] ≥ C[i,j-1]
        R := R ∪ backtrackAll(C, X, Y, i-1, j)
    return R
[edit]

This function will backtrack through the C matrix, and print the diff between the two sequences. Notice that you will get a different answer if you exchange and <, with > and below.

function printDiff(C[0..m,0..n], X[1..m], Y[1..n], i, j)
    if i >= 0 and j >= 0 and X[i] = Y[j]
        printDiff(C, X, Y, i-1, j-1)
        print "  " + X[i]
    else if j > 0 and (i = 0 or C[i,j-1] ≥ C[i-1,j])
        printDiff(C, X, Y, i, j-1)
        print "+ " + Y[j]
    else if i > 0 and (j = 0 or C[i,j-1] < C[i-1,j])
        printDiff(C, X, Y, i-1, j)
        print "- " + X[i]
    else
        print ""

Example

[edit]

Let be “XMJYAUZ” and be “MZJAWXU”. The longest common subsequence between and is “MJAU”. The table C shown below, which is generated by the function LCSLength, shows the lengths of the longest common subsequences between prefixes of and . The th row and th column shows the length of the LCS between and .

0 1 2 3 4 5 6 7
ε M Z J A W X U
0 ε 0 0 0 0 0 0 0 0
1 X 0 0 0 0 0 0 1 1
2 M 0 1 1 1 1 1 1 1
3 J 0 1 1 2 2 2 2 2
4 Y 0 1 1 2 2 2 2 2
5 A 0 1 1 2 3 3 3 3
6 U 0 1 1 2 3 3 3 4
7 Z 0 1 2 2 3 3 3 4

The highlighted numbers show the path the function backtrack would follow from the bottom right to the top left corner, when reading out an LCS. If the current symbols in and are equal, they are part of the LCS, and we go both up and left (shown in bold). If not, we go up or left, depending on which cell has a higher number. This corresponds to either taking the LCS between and , or and .

Code optimization

[edit]

Several optimizations can be made to the algorithm above to speed it up for real-world cases.

Reduce the problem set

[edit]

The C matrix in the naive algorithm grows quadratically with the lengths of the sequences. For two 100-item sequences, a 10,000-item matrix would be needed, and 10,000 comparisons would need to be done. In most real-world cases, especially source code diffs and patches, the beginnings and ends of files rarely change, and almost certainly not both at the same time. If only a few items have changed in the middle of the sequence, the beginning and end can be eliminated. This reduces not only the memory requirements for the matrix, but also the number of comparisons that must be done.

function LCS(X[1..m], Y[1..n])
    start := 1
    m_end := m
    n_end := n
    trim off the matching items at the beginning
    while start ≤ m_end and start ≤ n_end and X[start] = Y[start]
        start := start + 1
    trim off the matching items at the end
    while start ≤ m_end and start ≤ n_end and X[m_end] = Y[n_end]
        m_end := m_end - 1
        n_end := n_end - 1
    C = array(start-1..m_end, start-1..n_end)
    only loop over the items that have changed
    for i := start..m_end
        for j := start..n_end
            the algorithm continues as before ...

In the best-case scenario, a sequence with no changes, this optimization would eliminate the need for the C matrix. In the worst-case scenario, a change to the very first and last items in the sequence, only two additional comparisons are performed.

Reduce the comparison time

[edit]

Most of the time taken by the naive algorithm is spent performing comparisons between items in the sequences. For textual sequences such as source code, you want to view lines as the sequence elements instead of single characters. This can mean comparisons of relatively long strings for each step in the algorithm. Two optimizations can be made that can help to reduce the time these comparisons consume.

Reduce strings to hashes

[edit]

A hash function or checksum can be used to reduce the size of the strings in the sequences. That is, for source code where the average line is 60 or more characters long, the hash or checksum for that line might be only 8 to 40 characters long. Additionally, the randomized nature of hashes and checksums would guarantee that comparisons would short-circuit faster, as lines of source code will rarely be changed at the beginning.

There are three primary drawbacks to this optimization. First, an amount of time needs to be spent beforehand to precompute the hashes for the two sequences. Second, additional memory needs to be allocated for the new hashed sequences. However, in comparison to the naive algorithm used here, both of these drawbacks are relatively minimal.

The third drawback is that of collisions. Since the checksum or hash is not guaranteed to be unique, there is a small chance that two different items could be reduced to the same hash. This is unlikely in source code, but it is possible. A cryptographic hash would therefore be far better suited for this optimization, as its entropy is going to be significantly greater than that of a simple checksum. However, the benefits may not be worth the setup and computational requirements of a cryptographic hash for small sequence lengths.

Reduce the required space

[edit]

If only the length of the LCS is required, the matrix can be reduced to a matrix, or to a vector as the dynamic programming approach requires only the current and previous columns of the matrix. Hirschberg's algorithm allows the construction of the optimal sequence itself in the same quadratic time and linear space bounds.[8]

Reduce cache misses

[edit]

Chowdhury and Ramachandran devised a quadratic-time linear-space algorithm[9][10] for finding the LCS length along with an optimal sequence which runs faster than Hirschberg's algorithm in practice due to its superior cache performance.[9] The algorithm has an asymptotically optimal cache complexity under the Ideal cache model.[11] Interestingly, the algorithm itself is cache-oblivious[11] meaning that it does not make any choices based on the cache parameters (e.g., cache size and cache line size) of the machine.

Further optimized algorithms

[edit]

Several algorithms exist that run faster than the presented dynamic programming approach. One of them is Hunt–Szymanski algorithm, which typically runs in time (for ), where is the number of matches between the two sequences.[12] For problems with a bounded alphabet size, the Method of Four Russians can be used to reduce the running time of the dynamic programming algorithm by a logarithmic factor.[13]

Behavior on random strings

[edit]

Beginning with Chvátal & Sankoff (1975),[14] a number of researchers have investigated the behavior of the longest common subsequence length when the two given strings are drawn randomly from the same alphabet. When the alphabet size is constant, the expected length of the LCS is proportional to the length of the two strings, and the constants of proportionality (depending on alphabet size) are known as the Chvátal–Sankoff constants. Their exact values are not known, but upper and lower bounds on their values have been proven,[15] and it is known that they grow inversely proportionally to the square root of the alphabet size.[16] Simplified mathematical models of the longest common subsequence problem have been shown to be controlled by the Tracy–Widom distribution.[17]

Computing the longest palindromic subsequence of a string

[edit]

For decades, it had been considered folklore that the longest palindromic subsequence of a string could be computed by finding the longest common subsequence between the string and its reversal, using the classical dynamic programming approach introduced by Wagner and Fischer. However, a formal proof of the correctness of this method was only established in 2024 by Brodal, Fagerberg, and Moldrup Rysgaard.[18]

See also

[edit]

References

[edit]
  1. ^ David Maier (1978). "The Complexity of Some Problems on Subsequences and Supersequences". J. ACM. 25 (2). ACM Press: 322–336. doi:10.1145/322063.322075. S2CID 16120634.
  2. ^ Wagner, Robert; Fischer, Michael (January 1974). "The string-to-string correction problem". Journal of the ACM. 21 (1): 168–173. CiteSeerX 10.1.1.367.5281. doi:10.1145/321796.321811. S2CID 13381535.
  3. ^ a b L. Bergroth and H. Hakonen and T. Raita (7–29 September 2000). A survey of longest common subsequence algorithms. Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000. A Curuna, Spain: IEEE Computer Society. pp. 39–48. doi:10.1109/SPIRE.2000.878178. ISBN 0-7695-0746-8. S2CID 10375334.
  4. ^ Ronald I. Greenberg (2025-08-05). "Bounds on the Number of Longest Common Subsequences". arXiv:cs.DM/0301030.
  5. ^ Xia, Xuhua (2007). Bioinformatics and the Cell: Modern Computational Approaches in Genomics, Proteomics and Transcriptomics. New York: Springer. p. 24. ISBN 978-0-387-71336-6.
  6. ^ Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein (2001). "15.4". Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill. pp. 350–355. ISBN 0-262-53196-8.{{cite book}}: CS1 maint: multiple names: authors list (link)
  7. ^ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. "Dynamic Programming". Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. p. 394. ISBN 0-262-03384-4.
  8. ^ Hirschberg, D. S. (1975). "A linear space algorithm for computing maximal common subsequences". Communications of the ACM. 18 (6): 341–343. doi:10.1145/360825.360861. S2CID 207694727.
  9. ^ a b Chowdhury, Rezaul; Ramachandran, Vijaya (January 2006). "Cache-oblivious dynamic programming". Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm - SODA '06. pp. 591–600. doi:10.1145/1109557.1109622. ISBN 0898716055. S2CID 9650418.
  10. ^ Chowdhury, Rezaul; Le, Hai-Son; Ramachandran, Vijaya (July 2010). "Cache-oblivious dynamic programming for bioinformatics". IEEE/ACM Transactions on Computational Biology and Bioinformatics. 7 (3): 495–510. doi:10.1109/TCBB.2008.94. PMID 20671320. S2CID 2532039.
  11. ^ a b Frigo, Matteo; Leiserson, Charles E.; Prokop, Harald; Ramachandran, Sridhar (January 2012). "Cache-oblivious algorithms". ACM Transactions on Algorithms. 8 (1): 1–22. doi:10.1145/2071379.2071383.
  12. ^ Apostolico, Alberto; Galil, Zvi (2025-08-05). Pattern Matching Algorithms. Oxford University Press. ISBN 9780195354348.
  13. ^ Masek, William J.; Paterson, Michael S. (1980), "A faster algorithm computing string edit distances", Journal of Computer and System Sciences, 20 (1): 18–31, doi:10.1016/0022-0000(80)90002-1, hdl:1721.1/148933, MR 0566639.
  14. ^ Chvátal, Václáv; Sankoff, David (1975), "Longest common subsequences of two random sequences", Journal of Applied Probability, 12 (2): 306–315, doi:10.2307/3212444, JSTOR 3212444, MR 0405531, S2CID 250345191.
  15. ^ Lueker, George S. (2009), "Improved bounds on the average length of longest common subsequences", Journal of the ACM, 56 (3), A17, doi:10.1145/1516512.1516519, MR 2536132, S2CID 7232681.
  16. ^ Kiwi, Marcos; Loebl, Martin; Matou?ek, Ji?í (2005), "Expected length of the longest common subsequence for large alphabets", Advances in Mathematics, 197 (2): 480–498, arXiv:math/0308234, doi:10.1016/j.aim.2004.10.012, MR 2173842.
  17. ^ Majumdar, Satya N.; Nechaev, Sergei (2005), "Exact asymptotic results for the Bernoulli matching model of sequence alignment", Physical Review E, 72 (2): 020901, 4, arXiv:q-bio/0410012, Bibcode:2005PhRvE..72b0901M, doi:10.1103/PhysRevE.72.020901, MR 2177365, PMID 16196539, S2CID 11390762.
  18. ^ Brodal, G. S., Fagerberg, R., Rysgaard, C. M. (2024). On Finding Longest Palindromic Subsequences Using Longest Common Subsequences. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. pp. 35:1–35:16. doi:10.4230/lipics.esa.2024.35.
[edit]


言字旁与什么有关 恐龙的祖先是什么 朱砂痣是什么意思 捐肾对身体有什么影响 紧张性头痛吃什么药
心电轴重度左偏是什么意思 睡久了腰疼是什么原因 竹节棉是什么面料 噫是什么意思 尿变红色是什么原因
gif什么意思 脑内腔隙灶是什么意思 桃胶是什么东西 貔貅五行属什么 蒟蒻是什么东西
打嗝多是什么原因 脚抽筋是什么原因 糖链抗原是什么意思 看见蛇有什么预兆 牙齿发炎吃什么药
孩子发烧肚子疼是什么原因hcv9jop2ns2r.cn 臭屁多是什么原因hcv8jop9ns8r.cn 谬论是什么意思hcv7jop9ns8r.cn 锁阳是什么东西0297y7.com 左腿发麻是什么病征兆hcv9jop6ns5r.cn
高血压看什么科sscsqa.com 脚上起水泡用什么药膏hcv9jop4ns0r.cn 蒙脱石散是什么travellingsim.com 藕断丝连是什么意思hcv8jop6ns5r.cn camellia是什么意思hcv7jop5ns1r.cn
萎缩性阴道炎用什么药hcv8jop1ns7r.cn 四月九号是什么星座hcv8jop6ns0r.cn 捞仔是什么意思qingzhougame.com 鄙人什么意思hcv8jop6ns3r.cn 顶天立地是什么意思tiangongnft.com
汗蒸有什么好处和功效hcv8jop5ns8r.cn 诺如病毒吃什么食物hcv9jop1ns2r.cn 带状疱疹吃什么药hcv7jop9ns3r.cn 龙须菜是什么菜hcv9jop0ns5r.cn 有胃病的人吃什么最养胃inbungee.com
百度