小孩嘴臭是什么原因| 鹰的天敌是什么动物| fe是什么元素| 绞丝旁一个奇念什么| ambush是什么牌子| 桂鱼是什么鱼| 黄瓜苦是什么原因| 肱骨头小囊变什么意思| 月亮五行属什么| 拖鞋什么材质的好| 各就各位是什么意思| 现代是什么时候| 左撇子是什么意思| 葡萄不能和什么一起吃| 黑色皮肤适合什么颜色的衣服| 忧郁的意思是什么| 不想怀孕有什么办法| 三朵花代表什么意思| 第一次怀孕有什么反应| 伟哥是什么药| 韭菜什么时候种最合适| 膺是什么意思| 诺如病毒吃什么食物| 秋葵不能和什么一起吃| 极有家是什么意思| 今天是什么好日子| 平时血压高突然变低什么原因| 婴儿游泳有什么好处和坏处| 大脑精神紊乱什么病| 刘嘉玲什么星座| 吃什么减肥效果最好| 吃完饭就想睡觉是什么原因| 脚水泡痒用什么药| 腰间盘突出有什么好的治疗方法| 恒心是什么意思| 在家里可以做什么赚钱| 佳字属于五行属什么| 精液发黄是什么原因引起的| 前列腺钙化是什么病| 醋加小苏打有什么作用| 宝宝拉肚子吃什么药好| 出现的反义词是什么| 自己是什么意思| 老鼠最怕什么| 半夜胃反酸水是什么原因| 疾控中心属于什么单位| 风寒是什么意思| 吴亦凡属什么生肖| 牙齿为什么发黄| 羞耻是什么意思| 酗酒什么意思| 斑马鱼吃什么| 皮尔卡丹属于什么档次| 氯丙嗪是什么药| 盐酸舍曲林片治疗什么程度的抑郁| 789是什么意思| 阳瘘的最佳治疗方法是什么| 盆腔炎是什么原因引起的| 什么力| 老睡不着觉是什么原因| 例假来的是黑色的是什么原因| bzd是什么意思| 天贵星是什么意思| 脚肿是什么原因造成的| 甲基硫菌灵治什么病| 吃什么补气血最快最好| 30岁用什么眼霜比较好| 臭氧有什么作用| 勃勃生机是什么意思| 索性是什么意思| 秋葵有什么好处| 手足口病吃什么药好得快| 被老鼠咬了打什么疫苗| 婴儿枕头里面装什么好| dvt是什么意思| 为什么会长疣| 紫癜是什么引起的| 侏儒症是缺乏什么元素| 山竹什么样的好| 手脚发麻是什么病征兆| 腰间盘膨出是什么意思| 胃火旺吃什么中成药| 精益求精的意思是什么| 李白字什么号什么| 人突然消瘦是什么原因| 吃头发的虫子叫什么| 全青皮是什么皮| 打喷嚏是什么原因| 腰酸挂什么科| 肝掌是什么症状| 梦见在天上飞是什么意思| 贞操是什么| 盆底肌高张是什么意思| 汉朝后面是什么朝代| 悬饮是什么意思| roa胎位是什么意思| 丰衣足食是什么意思| 甲氰咪胍又叫什么| 职业测试你适合什么工作| 氯化钠是什么东西| 红色和蓝色混合是什么颜色| 老鼠的尾巴有什么作用| 皇协军是什么意思| 润滑油是什么| 女人喝甘草水有什么好处| 照看是什么意思| 高血压能喝什么饮料| 沙虫是什么| 额窦炎吃什么药效果好| 咖啡加什么最好喝| 慢性咽炎是什么症状| 什么是想象力| 断更是什么意思| 空调外机风扇不转是什么原因| 忧思是什么意思| 什么是真爱| 初潮什么意思| 早搏有什么危害| 什么补钾| 什么时间人流| rfc是什么意思| 医院什么时候下班| 一什么阳光填量词| 精液是什么形成的| 拉肚子可以吃什么食物| 宫寒应该吃什么怎样调理| 摩羯座是什么性格| 肠道感染吃什么消炎药| 腱鞘炎是什么| 大暑是什么时间| 长期喝茶有什么危害| 张良属什么生肖| 白带发黄吃什么药| 信女是什么意思| 吃榴莲有什么好处和坏处| 什么食物含镁| d二聚体高是什么原因| 肌无力是什么原因引起的| 金牛座和什么星座不合| 搪瓷杯为什么被淘汰了| 尿尿疼吃什么药| 孙悟空头上戴的是什么| 矫正度数是什么意思| 青帝是什么意思| 天麻种植需要什么条件| 向晚的意思是什么| 感冒头疼是什么原因| 血吸虫是什么动物| 空调病是什么症状| 什么是oa| 柔顺和拉直有什么区别| qjqj什么烟| 右小腿抽筋是什么原因| 为什么血脂会高| gamma什么意思| 梦见洗鞋子是什么意思| 乳腺结节看什么科| 怀孕吸烟对胎儿有什么影响| 暂住证办理需要什么材料| 肝病初期有什么症状| 金牛男喜欢什么样的女生| 自己开店做什么赚钱| 脸书是什么| 廊坊有什么好玩的地方| 为什么男人喜欢邓文迪| 男人补肾吃什么好| 动物园里有什么动物| 炖大骨头放什么调料| 努嘴是什么意思| 爱马仕为什么要配货| 不妄作劳什么意思| 红茶什么季节喝最好| 豆薯是什么| 文理分科什么时候开始| 丙字五行属什么| 皮肤黑穿什么颜色的衣服好看| ab型血和o型血生的孩子是什么血型| 舌强语謇是什么意思| 圈名什么意思| 麒麟飞到北极会变成什么| 什么叫无氧运动| 孕检nt主要检查什么| 倒睫是什么意思| 内向的人适合做什么工作| 乳贴是什么| 血管明显是什么原因| 孕妇梦见血是什么预兆| 老化是什么意思| 医院院长什么级别| 长痘痘涂什么药膏| 脑子里嗡嗡响是什么原因| 什么时候绝经| 国师是什么生肖| 有什么好看的三级片| 肾上腺素是什么意思| 青头鸭和什么煲汤最好| 玉历宝钞是什么书| 左侧头疼是什么原因| 浪子回头金不换是什么意思| 为什么会肾虚| 蚊虫叮咬擦什么药膏| 着凉嗓子疼吃什么药| 吃三七粉不能吃什么| 生肖鼠和什么生肖最配| 梦见小牛犊是什么预兆| 肠梗阻是什么病| 什么叫肾阳虚肾阴虚| 5月8号是什么日子| 包皮发炎用什么药| 四肢发麻是什么原因| 虫草花不能和什么一起吃| 在五行中属什么| 拔牙后吃什么食物最好| 骨折吃什么恢复得快| 脾胃虚弱吃什么好| 吃桂圆有什么好处| 健康证都查什么| 没有舌苔是什么原因| 爬行对身体有什么好处| 猫咪黑下巴是什么原因| 梦见小葱是什么意思| 大智若愚什么意思| 人心惶惶是什么意思| 心跳加快吃什么药| 阴煞是什么意思| 憨厚老实是什么意思| 博爱是什么意思| 宫颈柱状上皮异位是什么意思| 生气什么| 智齿是什么原因引起的| 广西狗肉节是什么时候| cdc是什么意思| 蔚姓氏读什么| 满月是什么时候| 佟丽娅为什么离婚| 霉菌性阴道炎用什么栓剂| 什么是体外受精| 什么的去路| 轻贱是什么意思| 甘油三酯低有什么危害| 手指甲有黑色条纹是什么原因| 脑内多发缺血灶是什么意思| 81年的鸡是什么命| 神疲乏力吃什么中成药| 什么鸟好养| 女性肛门坠胀看什么科| 弥月之喜是什么意思| 甲减不能吃什么东西| 吃生红枣有什么好处| 文书是什么| 继发性闭经是什么意思| 彩虹旗是什么意思| 作灶什么意思| 怀孕排卵试纸显示什么| 二甲双胍缓释片什么时候吃最好| 梦见去扫墓是什么预兆| 腱鞘炎吃什么药好使| 什么颜色加什么颜色等于绿色| 黄墙绿地的作用是什么| 芹菜和西芹有什么区别| 珍珠母是什么东西| 鸭子烧什么配菜好吃| 鳞状上皮乳头状瘤是什么| 喝豆浆有什么好处和坏处| 百度Jump to content

广东医院高薪“抢人” 近3万名毕业生挤爆招聘会

From Wikipedia, the free encyclopedia
百度   随着社会的发展,传统文化复苏,在现代文明中创新,进步是好事,但是,创新应有坚守,发展应有定力,如果打造猎奇,满足刺激,那就是丢了传统文化的魂,就是对传统文化的伤害了。

A string literal or anonymous string is a literal for a string value in source code. Commonly, a programming language includes a string literal code construct that is a series of characters enclosed in bracket delimiters – usually quote marks. In many languages, the text "foo" is a string literal that encodes the text foo but there are many other variations.

Syntax

[edit]

Bracket delimited

[edit]

A bracketed string literal is delimited by a start and an end character. The language can specify the use of any characters as delimiters.

Quotation is the most common way to delimit a string literal. Many languages support double-quotes (i.e. "Hello") and/or single-quotes (i.e. 'there'). When both are supported, delimiter collision can be minimized by treating one style of quotes as normal text when enclosed in quotes of the other style. In Python the literal "Dwayne 'the rock' Johnson" is valid since the outer quotes are double; making the inner single quotes regular text.

An empty string is written as "" or ''.

Paired delimiters are two different characters where one is used at the beginning of a literal and the other used at the end. With paired delimiters, the language can support embedding quotes in the literal text – as long as they all are paired. For example, PostScript uses parentheses, as in (The quick (brown fox)) and m4, uses backtick ` at the start, and apostrophe ' at the end. Tcl allows both quotes and braces, as in "The quick brown fox" or {The quick {brown fox}}; this derives from the single quotations in Unix shells and the use of braces in C for compound statements, since blocks of code is in Tcl syntactically the same thing as string literals – that the delimiters are paired is essential for making this feasible.

Quotation is most commonly via unpaired quotes, but some tools and character sets support paired quotes. The Unicode character set includes paired versions.

 “Hi there!”
 ‘Hi there!’
 ?Hi there!“
 ?Hi there!?

Whitespace delimited

[edit]

A language might support multi-line strings. In YAML, string literals may be specified by the relative positioning of whitespace and indentation.

    - title: An example multi-line string in YAML
      body : |
        This is a multi-line string.
        "special" metacharacters may
        appear here. The extent of this string is
        represented by indentation.

Word delimited

[edit]

Some languages, such as Perl and PHP, allow string literals that are delimited the same as words in a natural language. In the following Perl code, for example, red, green, and blue are string literals, even though not quoted:

%map = (red => 0x00f, blue => 0x0f0, green => 0xf00);

Perl treats a non-reserved sequence of alphanumeric characters as string literal in most contexts. For example, the following two lines of Perl are equivalent:

$y = "x";
$y = x;

Declarative notation

[edit]

The length of a literal can be encoded into the beginning of the text which alleviates the need for marking the beginning and end of a string. For example, in FORTRAN, string literals were written in Hollerith notation, where a decimal count of the number of characters was followed by the letter H, and then the characters of the string.

35HAn example Hollerith string literal

A drawback of this technique is that it is relatively error-prone unless length insertion is automated; especially for multi-byte encodings. Advantages include: alleviates need to search for the end delimiter and therefore requires less computational overhead, prevents delimiter collision issues and enables the inclusion of metacharacters that might otherwise be mistaken as commands

Delimiter collision

[edit]

When using quoting, if one wishes to represent the delimiter itself in a string literal, one runs into the problem of delimiter collision. For example, if the delimiter is a double quote, one cannot simply represent a double quote itself by the literal """ as the second quote is interpreted as the end of the string literal, not as the value of the string, and similarly one cannot write "This is "in quotes", but invalid." as the middle quoted portion is instead interpreted as outside of quotes. There are various solutions, the most general-purpose of which is using escape sequences, such as "\"" or "This is \"in quotes\" and properly escaped.", but there are many other solutions.

Paired quotes, such as braces in Tcl, allow nested strings, such as {foo {bar} zork} but do not otherwise solve the problem of delimiter collision, since an unbalanced closing delimiter cannot simply be included, as in {}}.

Doubling up

[edit]

A number of languages, including Pascal, BASIC, DCL, Smalltalk, SQL, J, and Fortran, avoid delimiter collision by doubling up on the quotation marks that are intended to be part of the string literal itself:

  'This Pascal string''contains two apostrophes'''
  "I said, ""Can you hear me?"""

Dual quoting

[edit]

Some languages, such as Fortran, Modula-2, JavaScript, Python, and PHP allow more than one quoting delimiter; in the case of two possible delimiters, this is known as dual quoting. Typically, this consists of allowing the programmer to use either single quotations or double quotations interchangeably – each literal must use one or the other.

  "This is John's apple."
  'I said, "Can you hear me?"'

This does not allow having a single literal with both delimiters in it, however. This can be worked around by using several literals and using string concatenation:

  'I said, "This is ' + "John's" + ' apple."'

Python has string literal concatenation, so consecutive string literals are concatenated even without an operator, so this can be reduced to:

  'I said, "This is '"John's"' apple."'

Delimiter quoting

[edit]

C++11 introduced so-called raw string literals. They consist, essentially of

R" end-of-string-id ( content ) end-of-string-id ",

that is, after R" the programmer can enter up to 16 characters except whitespace characters, parentheses, or backslash, which form the end-of-string-id (its purpose is to be repeated to signal the end of the string, eos id for short), then an opening parenthesis (to denote the end of the eos id) is required. Then follows the actual content of the literal: Any sequence characters may be used (except that it may not contain a closing parenthesis followed by the eos id followed a quote), and finally – to terminate the string – a closing parenthesis, the eos id, and a quote is required.
The simplest case of such a literal is with empty content and empty eos id: R"()".
The eos id may itself contain quotes: R""(I asked, "Can you hear me?")"" is a valid literal (the eos id is " here.)
Escape sequences don't work in raw string literals.

D supports a few quoting delimiters, with such strings starting with q" plus an opening delimiter and ending with the respective closing delimiter and ". Available delimiter pairs are (), <>, {}, and []; an unpaired non-identifier delimiter is its own closing delimiter. The paired delimiters nest, so that q"(A pair "()" of parens in quotes)" is a valid literal; an example with the non-nesting / character is q"/I asked, "Can you hear me?"/".
Similar to C++11, D allows here-document-style literals with end-of-string ids:

q" end-of-string-id newline content newline end-of-string-id "

In D, the end-of-string-id must be an identifier (alphanumeric characters).

In some programming languages, such as sh and Perl, there are different delimiters that are treated differently, such as doing string interpolation or not, and thus care must be taken when choosing which delimiter to use; see different kinds of strings, below.

Multiple quoting

[edit]

A further extension is the use of multiple quoting, which allows the author to choose which characters should specify the bounds of a string literal.

For example, in Perl:

qq^I said, "Can you hear me?"^
qq@I said, "Can you hear me?"@
qq§I said, "Can you hear me?"§

all produce the desired result. Although this notation is more flexible, few languages support it; other than Perl, Ruby (influenced by Perl) and C++11 also support these. A variant of multiple quoting is the use of here document-style strings.

Lua (as of 5.1) provides a limited form of multiple quoting, particularly to allow nesting of long comments or embedded strings. Normally one uses [[ and ]] to delimit literal strings (initial newline stripped, otherwise raw), but the opening brackets can include any number of equal signs, and only closing brackets with the same number of signs close the string. For example:

local ls = [=[
This notation can be used for Windows paths: 
local path = [[C:\Windows\Fonts]]
]=]

Multiple quoting is particularly useful with regular expressions that contain usual delimiters such as quotes, as this avoids needing to escape them. An early example is sed, where in the substitution command s/regex/replacement/ the default slash / delimiters can be replaced by another character, as in s,regex,replacement, .

Constructor functions

[edit]

Another option, which is rarely used in modern languages, is to use a function to construct a string, rather than representing it via a literal. This is generally not used in modern languages because the computation is done at run time, rather than at parse time.

For example, early forms of BASIC did not include escape sequences or any other workarounds listed here, and thus one instead was required to use the CHR$ function, which returns a string containing the character corresponding to its argument. In ASCII the quotation mark has the value 34, so to represent a string with quotes on an ASCII system one would write

"I said, " + CHR$(34) + "Can you hear me?" + CHR$(34)

In C, a similar facility is available via sprintf and the %c "character" format specifier, though in the presence of other workarounds this is generally not used:

char buffer[32];
snprintf(buffer, sizeof buffer, "This is %cin quotes.%c", 34, 34);

These constructor functions can also be used to represent nonprinting characters, though escape sequences are generally used instead. A similar technique can be used in C++ with the std::string stringification operator.

Escape sequences

[edit]

Escape sequences are a general technique for representing characters that are otherwise difficult to represent directly, including delimiters, nonprinting characters (such as backspaces), newlines, and whitespace characters (which are otherwise impossible to distinguish visually), and have a long history. They are accordingly widely used in string literals, and adding an escape sequence (either to a single character or throughout a string) is known as escaping.

One character is chosen as a prefix to give encodings for characters that are difficult or impossible to include directly. Most commonly this is backslash; in addition to other characters, a key point is that backslash itself can be encoded as a double backslash \\ and for delimited strings the delimiter itself can be encoded by escaping, say by \" for ". A regular expression for such escaped strings can be given as follows, as found in the ANSI C specification:[1][a]

"(\\.|[^\\"])*"

meaning "a quote; followed by zero or more of either an escaped character (backslash followed by something, possibly backslash or quote), or a non-escape, non-quote character; ending in a quote" – the only issue is distinguishing the terminating quote from a quote preceded by a backslash, which may itself be escaped. Multiple characters can follow the backslash, such as \uFFFF, depending on the escaping scheme.

An escaped string must then itself be lexically analyzed, converting the escaped string into the unescaped string that it represents. This is done during the evaluation phase of the overall lexing of the computer language: the evaluator of the lexer of the overall language executes its own lexer for escaped string literals.

Among other things, it must be possible to encode the character that normally terminates the string constant, plus there must be some way to specify the escape character itself. Escape sequences are not always pretty or easy to use, so many compilers also offer other means of solving the common problems. Escape sequences, however, solve every delimiter problem and most compilers interpret escape sequences. When an escape character is inside a string literal, it means "this is the start of the escape sequence". Every escape sequence specifies one character which is to be placed directly into the string. The actual number of characters required in an escape sequence varies. The escape character is on the top/left of the keyboard, but the editor will translate it, therefore it is not directly tapeable into a string. The backslash is used to represent the escape character in a string literal.

Many languages support the use of metacharacters inside string literals. Metacharacters have varying interpretations depending on the context and language, but are generally a kind of 'processing command' for representing printing or nonprinting characters.

For instance, in a C string literal, if the backslash is followed by a letter such as "b", "n" or "t", then this represents a nonprinting backspace, newline or tab character respectively. Or if the backslash is followed by 1-3 octal digits, then this sequence is interpreted as representing the arbitrary code unit with the specified value in the literal's encoding (for example, the corresponding ASCII code for an ASCII literal). This was later extended to allow more modern hexadecimal character code notation:

"I said,\t\t\x22Can you hear me?\x22\n"
Escape Sequence Unicode Literal Characters placed into string
\0 U+0000 null character[2][3]
(typically as a special case of \ooo octal notation)
\a U+0007 alert[4][5]
\b U+0008 backspace[4]
\f U+000C form feed[4]
\n U+000A line feed[4] (or newline in POSIX)
\r U+000D carriage return[4] (or newline in Mac OS 9 and earlier)
\t U+0009 horizontal tab[4]
\v U+000B vertical tab[4]
\e U+001B escape character[5] (GCC,[6] clang and tcc)
\u#### U+#### 16-bit Unicode character where #### are four hex digits[3]
\U######## U+###### 32-bit Unicode character where ######## are eight hex digits (Unicode character space is currently only 21 bits wide, so the first two hex digits will always be zero)
\u{######} U+###### 21-bit Unicode character where ###### is a variable number of hex digits
\x## Depends on encoding[b] 8-bit character specification where # is a hex digit. The length of a hex escape sequence is not limited to two digits, instead being of an arbitrary length.[4]
\ooo Depends on encoding[b] 8-bit character specification where o is an octal digit[4]
\" U+0022 double quote (")[4]
\& non-character used to delimit numeric escapes in Haskell[2]
\' U+0027 single quote (')[4]
\\ U+005C backslash (\)[4]
\? U+003F question mark (?)[4]

Note: Not all sequences in the list are supported by all parsers, and there may be other escape sequences which are not in the list.

Nested escaping

[edit]

When code in one programming language is embedded inside another, embedded strings may require multiple levels of escaping. This is particularly common in regular expressions and SQL query within other languages, or other languages inside shell scripts. This double-escaping is often difficult to read and author.

Incorrect quoting of nested strings can present a security vulnerability. Use of untrusted data, as in data fields of an SQL query, should use prepared statements to prevent a code injection attack. In PHP 2 through 5.3, there was a feature called magic quotes which automatically escaped strings (for convenience and security), but due to problems was removed from version 5.4 onward.

Raw strings

[edit]

A few languages provide a method of specifying that a literal is to be processed without any language-specific interpretation. This avoids the need for escaping, and yields more legible strings.

Raw strings are particularly useful when a common character needs to be escaped, notably in regular expressions (nested as string literals), where backslash \ is widely used, and in DOS/Windows paths, where backslash is used as a path separator. The profusion of backslashes is known as leaning toothpick syndrome, and can be reduced by using raw strings. Compare escaped and raw pathnames in C#:

 "The Windows path is C:\\Foo\\Bar\\Baz\\"
 @"The Windows path is C:\Foo\Bar\Baz\"

Extreme examples occur when these are combined – Uniform Naming Convention paths begin with \\, and thus an escaped regular expression matching a UNC name begins with 8 backslashes, "\\\\\\\\", due to needing to escape the string and the regular expression. Using raw strings reduces this to 4 (escaping in the regular expression), as in C# @"\\\\".

In XML documents, CDATA sections allows use of characters such as & and < without an XML parser attempting to interpret them as part of the structure of the document itself. This can be useful when including literal text and scripting code, to keep the document well formed.

<![CDATA[  if (path!=null && depth<2) { add(path); }  ]]>

Multiline string literals

[edit]

In many languages, string literals can contain literal newlines, spanning several lines. Alternatively, newlines can be escaped, most often as \n. For example:

echo 'foo
bar'

and

echo -e "foo\nbar"

are both valid bash, producing:

foo
bar

Languages that allow literal newlines include bash, Lua, Perl, PHP, R, and Tcl. In some other languages string literals cannot include newlines.

Two issues with multiline string literals are leading and trailing newlines, and indentation. If the initial or final delimiters are on separate lines, there are extra newlines, while if they are not, the delimiter makes the string harder to read, particularly for the first line, which is often indented differently from the rest. Further, the literal must be unindented, as leading whitespace is preserved – this breaks the flow of the code if the literal occurs within indented code.

The most common solution for these problems is here document-style string literals. Formally speaking, a here document is not a string literal, but instead a stream literal or file literal. These originate in shell scripts and allow a literal to be fed as input to an external command. The opening delimiter is <<END where END can be any word, and the closing delimiter is END on a line by itself, serving as a content boundary – the << is due to redirecting stdin from the literal. Due to the delimiter being arbitrary, these also avoid the problem of delimiter collision. These also allow initial tabs to be stripped via the variant syntax <<-END though leading spaces are not stripped. The same syntax has since been adopted for multiline string literals in a number of languages, most notably Perl, and are also referred to as here documents, and retain the syntax, despite being strings and not involving redirection. As with other string literals, these can sometimes have different behavior specified, such as variable interpolation.

Python, whose usual string literals do not allow literal newlines, instead has a special form of string, designed for multiline literals, called triple quoting. These use a tripled delimiter, either ''' or """. These literals are especially used for inline documentation, known as docstrings.

Tcl allows literal newlines in strings and has no special syntax to assist with multiline strings, though delimiters can be placed on lines by themselves and leading and trailing newlines stripped via string trim, while string map can be used to strip indentation.

String literal concatenation

[edit]

A few languages provide string literal concatenation, where adjacent string literals are implicitly joined into a single literal at compile time. This is a feature of C,[7][8] C++,[9] D,[10] Ruby,[11] and Python,[12] which copied it from C.[13] Notably, this concatenation happens at compile time, during lexical analysis (as a phase following initial tokenization), and is contrasted with both run time string concatenation (generally with the + operator)[14] and concatenation during constant folding, which occurs at compile time, but in a later phase (after phrase analysis or "parsing"). Most languages, such as C#, Java[15] and Perl, do not support implicit string literal concatenation, and instead require explicit concatenation, such as with the + operator (this is also possible in D and Python, but illegal in C/C++ – see below); in this case concatenation may happen at compile time, via constant folding, or may be deferred to run time.

Motivation

[edit]

In C, where the concept and term originate, string literal concatenation was introduced for two reasons:[16]

  • To allow long strings to span multiple lines with proper indentation in contrast to line continuation, which destroys the indentation scheme; and
  • To allow the construction of string literals by macros (via stringizing).[17]

In practical terms, this allows string concatenation in early phases of compilation ("translation", specifically as part of lexical analysis), without requiring phrase analysis or constant folding. For example, the following are valid C/C++:

char *s = "hello, " "world";
printf("hello, " "world");

However, the following are invalid:

char *s = "hello, " + "world";
printf("hello, " + "world");

This is because string literals have array type, char [n] (C) or const char [n] (C++), which cannot be added; this is not a restriction in most other languages.

This is particularly important when used in combination with the C preprocessor, to allow strings to be computed following preprocessing, particularly in macros.[13] As a simple example:

char *file_and_message = __FILE__ ": message";

will (if the file is called a.c) expand to:

char *file_and_message = "a.c" ": message";

which is then concatenated, being equivalent to:

char *file_and_message = "a.c: message";

A common use case is in constructing printf or scanf format strings, where format specifiers are given by macros.[18][19]

A more complex example uses stringification of integers (by the preprocessor) to define a macro that expands to a sequence of string literals, which are then concatenated to a single string literal with the file name and line number:[20]

#define STRINGIFY(x) #x
#define TOSTRING(x) STRINGIFY(x)
#define AT __FILE__ ":" TOSTRING(__LINE__)

Beyond syntactic requirements of C/C++, implicit concatenation is a form of syntactic sugar, making it simpler to split string literals across several lines, avoiding the need for line continuation (via backslashes) and allowing one to add comments to parts of strings. For example, in Python, one can comment a regular expression in this way:[21]

re.compile("[A-Za-z_]"       # letter or underscore
           "[A-Za-z0-9_]*"   # letter, digit or underscore
          )

Problems

[edit]

Implicit string concatenation is not required by modern compilers, which implement constant folding, and causes hard-to-spot errors due to unintentional concatenation from omitting a comma, particularly in vertical lists of strings, as in:

l = ['foo',
     'bar'
     'zork']

Accordingly, it is not used in most languages, and it has been proposed for deprecation from D[22] and Python.[13] However, removing the feature breaks backwards compatibility, and replacing it with a concatenation operator introduces issues of precedence – string literal concatenation occurs during lexing, prior to operator evaluation, but concatenation via an explicit operator occurs at the same time as other operators, hence precedence is an issue, potentially requiring parentheses to ensure desired evaluation order.

A subtler issue is that in C and C++,[23] there are different types of string literals, and concatenation of these has implementation-defined behavior, which poses a potential security risk.[24]

Different kinds of strings

[edit]

Some languages provide more than one kind of literal, which have different behavior. This is particularly used to indicate raw strings (no escaping), or to disable or enable variable interpolation, but has other uses, such as distinguishing character sets. Most often this is done by changing the quoting character or adding a prefix or suffix. This is comparable to prefixes and suffixes to integer literals, such as to indicate hexadecimal numbers or long integers.

One of the oldest examples is in shell scripts, where single quotes indicate a raw string or "literal string", while double quotes have escape sequences and variable interpolation.

For example, in Python, raw strings are preceded by an r or R – compare 'C:\\Windows' with r'C:\Windows' (though, a Python raw string cannot end in an odd number of backslashes). Python 2 also distinguishes two types of strings: 8-bit ASCII ("bytes") strings (the default), explicitly indicated with a b or B prefix, and Unicode strings, indicated with a u or U prefix.[25] while in Python 3 strings are Unicode by default and bytes are a separate bytes type that when initialized with quotes must be prefixed with a b.

C#'s notation for raw strings is called @-quoting.

@"C:\Foo\Bar\Baz\"

While this disables escaping, it allows double-up quotes, which allow one to represent quotes within the string:

@"I said, ""Hello there."""

C++11 allows raw strings, unicode strings (UTF-8, UTF-16, and UTF-32), and wide character strings, determined by prefixes. It also adds literals for the existing C++ string, which is generally preferred to the existing C-style strings.

In Tcl, brace-delimited strings are literal, while quote-delimited strings have escaping and interpolation.

Perl has a wide variety of strings, which are more formally considered operators, and are known as quote and quote-like operators. These include both a usual syntax (fixed delimiters) and a generic syntax, which allows a choice of delimiters; these include:[26]

''  ""  ``  //  m//  qr//  s///  y///
q{}  qq{}  qx{}  qw{}  m{}  qr{}  s{}{}  tr{}{}  y{}{}

REXX uses suffix characters to specify characters or strings using their hexadecimal or binary code. E.g.,

'20'x
"0010 0000"b
"00100000"b

all yield the space character, avoiding the function call X2C(20).

String interpolation

[edit]

In some languages, string literals may contain placeholders referring to variables or expressions in the current context, which are evaluated (usually at run time). This is referred to as variable interpolation, or more generally string interpolation. Languages that support interpolation generally distinguish strings literals that are interpolated from ones that are not. For example, in sh-compatible Unix shells (as well as Perl and Ruby), double-quoted (quotation-delimited, ") strings are interpolated, while single-quoted (apostrophe-delimited, ') strings are not. Non-interpolated string literals are sometimes referred to as "raw strings", but this is distinct from "raw string" in the sense of escaping. For example, in Python, a string prefixed with r or R has no escaping or interpolation, a normal string (no prefix) has escaping but no interpolation, and a string prefixed with f or F has escaping and interpolation.

For example, the following Perl code:

$name     = "Nancy";
$greeting = "Hello World";
print "$name said $greeting to the crowd of people.";

produces the output:

Nancy said Hello World to the crowd of people.

In this case, the metacharacter character ($) (not to be confused with the sigil in the variable assignment statement) is interpreted to indicate variable interpolation, and requires some escaping if it needs to be outputted literally.

This should be contrasted with the printf function, which produces the same output using notation such as:

printf "%s said %s to the crowd of people.", $name, $greeting;

but does not perform interpolation: the %s is a placeholder in a printf format string, but the variables themselves are outside the string.

This is contrasted with "raw" strings:

print '$name said $greeting to the crowd of people.';

which produce output like:

$name said $greeting to the crowd of people.

Here the $ characters are not metacharacters, and are not interpreted to have any meaning other than plain text.

Embedding source code in string literals

[edit]

Languages that lack flexibility in specifying string literals make it particularly cumbersome to write programming code that generates other programming code. This is particularly true when the generation language is the same or similar to the output language.

For example:

  • writing code to produce quines
  • generating an output language from within a web template;
  • using XSLT to generate XSLT, or SQL to generate more SQL
  • generating a PostScript representation of a document for printing purposes, from within a document-processing application written in C or some other language.

Nevertheless, some languages are particularly well-adapted to produce this sort of self-similar output, especially those that support multiple options for avoiding delimiter collision.

Using string literals as code that generates other code may have adverse security implications, especially if the output is based at least partially on untrusted user input. This is particularly acute in the case of Web-based applications, where malicious users can take advantage of such weaknesses to subvert the operation of the application, for example by mounting an SQL injection attack.

See also

[edit]

Notes

[edit]
  1. ^ The regex given here is not itself quoted or escaped, to reduce confusion.
  2. ^ a b Since this escape sequence represents a specific code unit instead of a specific character, what code point (if any) it represents depends on the encoding of the string literal it is found in.

References

[edit]
  1. ^ "ANSI C grammar (Lex)". liu.se. Retrieved 22 June 2016.
  2. ^ a b "Appendix B. Characters, strings, and escaping rules". realworldhaskell.org. Retrieved 22 June 2016.
  3. ^ a b "String". mozilla.org. Retrieved 22 June 2016.
  4. ^ a b c d e f g h i j k l m "Escape Sequences (C)". microsoft.com. Retrieved 22 June 2016.
  5. ^ a b "Rationale for International Standard - Programming Languages - C" (PDF). 5.10. April 2003. pp. 52, 153–154, 159. Archived (PDF) from the original on 2025-08-05. Retrieved 2025-08-05.
  6. ^ "6.35 The Character <ESC> in Constants", GCC 4.8.2 Manual, retrieved 2025-08-05
  7. ^ C11 draft standard, WG14 N1570 Committee Draft — April 12, 2011, 5.1.1.2 Translation phases, p. 11: "6. Adjacent string literal tokens are concatenated."
  8. ^ C syntax: String literal concatenation
  9. ^ C++11 draft standard, "Working Draft, Standard for Programming Language C++" (PDF)., 2.2 Phases of translation [lex.phases], p. 17: "6. Adjacent string literal tokens are concatenated." and 2.14.5 String literals [lex.string], note 13, p. 28–29: "In translation phase 6 (2.2), adjacent string literals are concatenated."
  10. ^ D Programming Language, Lexical Analysis, "String Literals": "Adjacent strings are concatenated with the ~ operator, or by simple juxtaposition:"
  11. ^ ruby: The Ruby Programming Language, Ruby Programming Language, 2025-08-05, retrieved 2025-08-05
  12. ^ The Python Language Reference, 2. Lexical analysis, 2.4.2. String literal concatenation: "Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation."
  13. ^ a b c Python-ideas, "Implicit string literal concatenation considered harmful?", Guido van Rossum, May 10, 2013
  14. ^ The Python Language Reference, 2. Lexical analysis, 2.4.2. String literal concatenation: "Note that this feature is defined at the syntactical level, but implemented at compile time. The ‘+’ operator must be used to concatenate string expressions at run time."
  15. ^ "Strings (The Java? Tutorials > Learning the Java Language > Numbers and Strings)". Docs.oracle.com. 2025-08-05. Retrieved 2025-08-05.
  16. ^ Rationale for the ANSI C Programming Language. Silicon Press. 1990. p. 31. ISBN 0-929306-07-4., 3.1.4 String literals: "A long string can be continued across multiple lines by using the backslash-newline line continuation, but this practice requires that the continuation of the string start in the first position of the next line. To permit more flexible layout, and to solve some preprocessing problems (see §3.8.3), the Committee introduced string literal concatenation. Two string literals in a row are pasted together (with no null character in the middle) to make one combined string literal. This addition to the C language allows a programmer to extend a string literal beyond the end of a physical line without having to use the backslash-newline mechanism and thereby destroying the indentation scheme of the program. An explicit concatenation operator was not introduced because the concatenation is a lexical construct rather than a run-time operation."
  17. ^ Rationale for the ANSI C Programming Language. Silicon Press. 1990. p. 6566. ISBN 0-929306-07-4., 3.8.3.2 The # operator: "The # operator has been introduced for stringizing. It may only be used in a #define expansion. It causes the formal parameter name following to be replaced by a string literal formed by stringizing the actual argument token sequence. In conjunction with string literal concatenation (see §3.1.4), use of this operator permits the construction of strings as effectively as by identifier replacement within a string. An example in the Standard illustrates this feature."
  18. ^ C/C++ Users Journal, Volume 19, p. 50
  19. ^ "python - Why allow concatenation of string literals?". Stack Overflow. Retrieved 2025-08-05.
  20. ^ "LINE__ to string (stringify) using preprocessor directives". Decompile.com. 2025-08-05. Retrieved 2025-08-05.
  21. ^ The Python Language Reference, 2. Lexical analysis, 2.4.2. String literal concatenation: "This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings, for example:
  22. ^ DLang's Issue Tracking System – Issue 3827 - Warn against and then deprecate implicit concatenation of adjacent string literals
  23. ^ C++11 draft standard, "Working Draft, Standard for Programming Language C++" (PDF)., 2.14.5 String literals [lex.string], note 13, p. 28–29: "Any other concatenations are conditionally supported with implementation-defined behavior."
  24. ^ "STR10-C. Do not concatenate different type of string literals - Secure Coding - CERT Secure Coding Standards". Archived from the original on July 14, 2014. Retrieved July 3, 2014.
  25. ^ "2. Lexical analysis — Python 2.7.12rc1 documentation". python.org. Retrieved 22 June 2016.
  26. ^ "perlop - perldoc.perl.org". perl.org. Retrieved 22 June 2016.
[edit]
什么补钙效果最好 红红的太阳像什么 林丹用的什么球拍 异淋是什么意思 拉肚子拉水是什么原因
黑管是什么乐器 什么颜色加什么颜色等于紫色 什么叫弱视 神经外科是看什么病的 牛什么饭
核医学科主要治什么病 皮肤干燥缺什么维生素 双子座上升星座是什么 六角龙吃什么食物 为什么健身
大熊猫属于什么科 单亲是什么意思 你是谁为了谁是什么歌 女性为什么会感染巨细胞病毒 正山小种属于什么茶
孕妇感冒挂什么科hcv7jop4ns8r.cn 沉香手串有什么好处hcv8jop8ns1r.cn 脑利钠肽前体值高预示什么hcv9jop5ns0r.cn 曼巴是什么意思hcv9jop3ns5r.cn 请问支气管炎吃什么药最有效xianpinbao.com
巨蟹座和什么座最配对hkuteam.com 滚球是什么意思hcv8jop8ns8r.cn π是什么意思hcv7jop6ns6r.cn 深渊是什么意思hcv9jop0ns1r.cn 三月底是什么星座bjhyzcsm.com
沙僧为什么被贬下凡间hcv7jop4ns7r.cn ppa是什么药hcv8jop1ns8r.cn kay是什么意思hcv7jop9ns6r.cn 女人为什么会患得患失chuanglingweilai.com 为什么会得手足口病hcv9jop0ns0r.cn
8月11是什么星座gangsutong.com 西安有什么好吃的hcv9jop1ns6r.cn 我可以组什么词hcv8jop8ns3r.cn 什么的地板hcv8jop8ns3r.cn 病案首页是什么hcv9jop4ns9r.cn
百度