从子是什么意思| 好久不见是什么意思| 奴役是什么意思| 什么叫点映| 十二生肖为什么老鼠排第一| 生石灰是什么| 4.23是什么星座| 什么治便秘| 许多的近义词是什么| 尿粘液丝高是什么原因| 龙生九子下一句是什么| 喉咙有痰吐出来有血是什么原因| 金鱼吃什么| 五指姑娘是什么意思| 什么什么什么人| 洗衣机什么品牌好| 乳糖不耐受吃什么奶粉好| 闭麦是什么意思| 身心俱疲是什么意思| 心悸心慌吃什么药最好| 5月25是什么星座| 尿频尿不尽吃什么药| 梦见朋友死了是什么意思| 胃胀想吐是什么原因| 兔对冲生肖是什么| 纤维硬结灶是什么意思| 润色是什么意思| 跟腱炎挂什么科| 湿疹为什么要查肝功能| 发高烧是什么原因引起的| 月经不调吃什么| 什么的梨花| 变白吃什么| 古代医院叫什么| 转氨酶高是什么问题| 抗宫炎软胶囊主要治什么| 便秘是什么| 医院总务科是干什么的| 负数是什么| cd是什么| 拍大腿内侧有什么好处| 无菌性前列腺炎吃什么药效果好| 男生叫你姑娘什么意思| 肌酐高什么东西不能吃| 月经期体重增加是什么原因| 寡妇年是什么意思| 什么是癔症病| 什么是中暑| 属蛇的人适合佩戴什么| 为什么会长荨麻疹| 改户口需要什么手续| 79是什么意思| 肾阴虚吃什么食物补| 人为什么会做噩梦| 咖啡什么牌子的好| 排卵试纸强阳说明什么| 感恩节吃什么| 1977年五行属什么| 隐形眼镜护理液可以用什么代替| 灼热是什么意思| 毅力是什么意思| 6月23日是什么星座| 黄芪长什么样| 尿蛋白高是什么意思| 电光性眼炎用什么眼药水| 微信被拉黑后显示什么| 夭寿是什么意思| 肾结石能吃什么水果| 镜检白细胞是什么意思| 大便干燥吃什么药| 黄疸偏高有什么危害| 怀孕一个星期有什么症状| 红光对皮肤有什么作用| 出水痘能吃什么食物| 丝瓜不能和什么一起吃| 年纪是什么意思| 白茶泡出来是什么颜色| 蒲公英的种子像什么| 肺癌晚期有什么症状| 回民不吃什么| 白羊座后面是什么星座| 三焦湿热吃什么中成药| 利福喷丁和利福平有什么区别| 羊肉和什么菜搭配最好| 签证是什么| 复读是什么意思| 经常流鼻血是什么情况| 七情六欲什么意思| 屁股眼痒是什么原因| 河南属于什么平原| 阴道口发白是什么原因| 画什么才好看| 心境障碍是什么病| 穹窿是什么意思| 不字五行属什么| 怀孕什么时候可以做b超| 慢性肠炎吃什么药| 肾钙化灶是什么意思| ihc是什么意思| 外阴炎用什么药膏| 男孩长虎牙预示什么| 什么是潜规则| 胸疼挂什么科室| 梦见旋风是什么预兆| 子字属于五行属什么| 麒麟什么意思| cta是什么意思| 宝宝缺钙吃什么补得快| mup是什么意思| 友五行属什么| 碎银子是什么茶| 12.21是什么星座| 钠高是什么原因| 囟门闭合早有什么影响| 吃完香蕉不能吃什么| hpv感染是什么| 喝绿茶对身体有什么好处| 两肺纹理增粗是什么意思| 眼睛感染用什么眼药水| 嘴唇神经跳动是什么原因| 卫冕冠军是什么意思| 拉大便出血是什么原因| 习字五行属什么| 视网膜病变是什么意思| 糖类抗原ca199偏高是什么原因| dr股票是什么意思| 拿铁和美式有什么区别| 葬花是什么意思| 甲流乙流吃什么药| 沙漠有什么动物| 慢性胆囊炎吃什么药| 今年26岁属什么生肖| 下呼吸道感染吃什么药| 巡视组组长什么级别| 人加一笔变成什么字| 小米性寒为什么养胃| 什么动物牙齿最多| 什么蔬菜降血压效果最好| 发呆表情是什么意思| 25岁属什么生肖| 唐氏综合征是什么原因造成的| 女人叫床最好喊什么| 小腿疼是什么原因| 莀字五行属什么| 提拉米苏是什么| 衪是什么意思| 什么时候喝牛奶效果最佳| 洁癖什么意思| 心率偏低是什么原因| 为什么会下雨| 饮食男女是什么意思| 10月出生的是什么星座| 一什么缸| 嘴角烂了擦什么药| 什么药膏能让疣体脱落| 血糖30多有什么危险| 梦见烙饼是什么意思| 饭前吃药和饭后吃药有什么区别| 梦见别人笑什么意思| 桃花是什么季节开的| 手指甲有竖纹是什么原因| 金陵十三钗是什么意思| 小腹一直疼是什么原因| 低血压高吃什么药好| 毛泽东什么时候死的| 12年属什么| 妲是什么意思| 吃护肝片有什么副作用| 八月生日什么星座| 杞人忧天是什么意思| 眩晕挂号挂什么科| 钾低会出现什么症状| 胆固醇偏高吃什么好| 吃香菜有什么好处| 熊是什么生肖| 拜阿司匹林什么时间吃最好| 胎儿打嗝是什么原因| 人绒毛膜促性腺激素是查什么的| 对调什么意思| 黄鼠狼喜欢吃什么东西| 列席是什么意思| 动脉硬化吃什么药最好| 旻字五行属什么| 猪寸骨是什么部位| 巨蟹男喜欢什么类型的女生| 什么是间质性肺炎| 双氧水又叫什么名字| 菀字五行属什么| 财迷是什么意思| 做腹部彩超挂什么科| 你什么都可以| 平肝潜阳是什么意思| 什么人容易得多囊卵巢| 集分宝是什么意思| 前列腺有什么作用| 为什么月经量少| 嫡是什么意思| 什么太阳| 牙齿发黄是什么原因| 洋姜学名叫什么| 血压的低压高是什么原因| 半什么半什么的成语| 性感染有什么症状| 膝跳反射属于什么反射| 人授后吃什么容易着床| 立本是什么意思| 男人割了皮包什么样子| aape是什么牌子| 置换补贴什么意思| 一只眼睛肿了是什么原因| 日语一个一个是什么意思| 使婢差奴过一生是什么意思| 花儿为什么那么红| 什么令什么申| 局方是什么意思| 茴香是什么| 怀孕为什么会肚子痛| 梅尼埃病是什么病| 整装待发是什么意思| 硝酸酯类药物有什么药| 做爱时间短吃什么药好| 咳嗽头晕是什么原因| 右侧卵巢内囊性结构什么意思| 五岳是什么| 拉肚子喝什么药| 肌酐高不能吃什么| 长命锁一般由什么人送| 荨麻疹需要注意什么| ed是什么意思| 金牛男和什么星座女最配| 男性解脲支原体是什么病| 男孩什么时候开始发育| 胆汁反流用什么药好| 回家心切什么意思| 喉咙疼痛吃什么药效果最好| 肠胃功能紊乱什么症状| 女王是什么意思| 什么感冒药效果最好| 鸡屁股叫什么| 肩袖损伤用什么药| xxoo是什么| mexican是什么牌子| 为什么会有脚气| 孕妇缺碘对胎儿有什么影响| 瘪嘴是什么意思| 妮子什么意思| nk是什么意思| 龙头龟身是什么神兽| 虾仁配什么蔬菜包饺子| 涧什么字| 一岁宝宝发烧吃什么药| 不负众望什么意思| 淋巴结转移是什么意思| 什么叫庚日| 牙疼吃什么水果好| 甲亢甲减有什么症状| 关帝是什么神| 闺蜜是什么意思| 6月8号是什么星座| 锁骨中间的窝叫什么| 泼皮是什么意思| 成人感冒挂什么科| 大殓是什么意思| 咽炎吃什么药| 百度Jump to content

El panda gigante Caitao en el Zoológico Safari Taman de Indonesia Spanish.xinhuanet.com

From Wikipedia, the free encyclopedia
(Redirected from Language modeling)
百度 (ssnake)

A language model is a model of the human brain's ability to produce natural language.[1][2] Language models are useful for a variety of tasks, including speech recognition,[3] machine translation,[4] natural language generation (generating more human-like text), optical character recognition, route optimization,[5] handwriting recognition,[6] grammar induction,[7] and information retrieval.[8][9]

Large language models (LLMs), currently their most advanced form, are predominantly based on transformers trained on larger datasets (frequently using texts scraped from the public internet). They have superseded recurrent neural network-based models, which had previously superseded the purely statistical models, such as the word n-gram language model.

History

[edit]

Noam Chomsky did pioneering work on language models in the 1950s by developing a theory of formal grammars.[10]

In 1980, statistical approaches were explored and found to be more useful for many purposes than rule-based formal grammars. Discrete representations like word n-gram language models, with probabilities for discrete combinations of words, made significant advances.

In the 2000s, continuous representations for words, such as word embeddings, began to replace discrete representations.[11] Typically, the representation is a real-valued vector that encodes the meaning of the word in such a way that the words that are closer in the vector space are expected to be similar in meaning, and common relationships between pairs of words like plurality or gender.

Pure statistical models

[edit]

In 1980, the first significant statistical language model was proposed, and during the decade IBM performed ‘Shannon-style’ experiments, in which potential sources for language modeling improvement were identified by observing and analyzing the performance of human subjects in predicting or correcting text.[12]

Models based on word n-grams

[edit]

A word n-gram language model is a purely statistical model of language. It has been superseded by recurrent neural network–based models, which have been superseded by large language models.[13] It is based on an assumption that the probability of the next word in a sequence depends only on a fixed size window of previous words. If only one previous word is considered, it is called a bigram model; if two words, a trigram model; if n ? 1 words, an n-gram model.[14] Special tokens are introduced to denote the start and end of a sentence and .

To prevent a zero probability being assigned to unseen words, each word's probability is slightly higher than its frequency count in a corpus. To calculate it, various methods were used, from simple "add-one" smoothing (assign a count of 1 to unseen n-grams, as an uninformative prior) to more sophisticated models, such as Good–Turing discounting or back-off models.

Exponential

[edit]

Maximum entropy language models encode the relationship between a word and the n-gram history using feature functions. The equation is

where is the partition function, is the parameter vector, and is the feature function. In the simplest case, the feature function is just an indicator of the presence of a certain n-gram. It is helpful to use a prior on or some form of regularization.

The log-bilinear model is another example of an exponential language model.

Skip-gram model

[edit]
1-skip-2-grams for the text "the rain in Spain falls mainly on the plain"

Skip-gram language model is an attempt at overcoming the data sparsity problem that the preceding model (i.e. word n-gram language model) faced. Words represented in an embedding vector were not necessarily consecutive anymore, but could leave gaps that are skipped over (thus the name "skip-gram").[15]

Formally, a k-skip-n-gram is a length-n subsequence where the components occur at distance at most k from each other.

For example, in the input text:

the rain in Spain falls mainly on the plain

the set of 1-skip-2-grams includes all the bigrams (2-grams), and in addition the subsequences

the in, rain Spain, in falls, Spain mainly, falls on, mainly the, and on plain.

In skip-gram model, semantic relations between words are represented by linear combinations, capturing a form of compositionality. For example, in some such models, if v is the function that maps a word w to its n-d vector representation, then

where ≈ is made precise by stipulating that its right-hand side must be the nearest neighbor of the value of the left-hand side.[16][17]

Neural models

[edit]

Recurrent neural network

[edit]

Continuous representations or embeddings of words are produced in recurrent neural network-based language models (known also as continuous space language models).[18] Such continuous space embeddings help to alleviate the curse of dimensionality, which is the consequence of the number of possible sequences of words increasing exponentially with the size of the vocabulary, further causing a data sparsity problem. Neural networks avoid this problem by representing words as non-linear combinations of weights in a neural net.[19]

Large language models

[edit]
A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pretrained transformers (GPTs), which are largely used in generative chatbots such as ChatGPT, Gemini or Claude. LLMs can be fine-tuned for specific tasks or guided by prompt engineering.[20] These models acquire predictive power regarding syntax, semantics, and ontologies[21] inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained in.[22]

Although sometimes matching human performance, it is not clear whether they are plausible cognitive models. At least for recurrent neural networks, it has been shown that they sometimes learn patterns that humans do not, but fail to learn patterns that humans typically do.[23]

Evaluation and benchmarks

[edit]

Evaluation of the quality of language models is mostly done by comparison to human created sample benchmarks created from typical language-oriented tasks. Other, less established, quality tests examine the intrinsic character of a language model or compare two such models. Since language models are typically intended to be dynamic and to learn from data they see, some proposed models investigate the rate of learning, e.g., through inspection of learning curves.[24]

Various data sets have been developed for use in evaluating language processing systems.[25] These include:

  • Massive Multitask Language Understanding (MMLU)[26]
  • Corpus of Linguistic Acceptability[27]
  • GLUE benchmark[28]
  • Microsoft Research Paraphrase Corpus[29]
  • Multi-Genre Natural Language Inference
  • Question Natural Language Inference
  • Quora Question Pairs[30]
  • Recognizing Textual Entailment[31]
  • Semantic Textual Similarity Benchmark
  • SQuAD question answering Test[32]
  • Stanford Sentiment Treebank[33]
  • Winograd NLI
  • BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, ARC, OpenBookQA, NaturalQuestions, TriviaQA, RACE, BIG-bench hard, GSM8k, RealToxicityPrompts, WinoGender, CrowS-Pairs[34]

See also

[edit]

References

[edit]
  1. ^ Blank, Idan A. (November 2023). "What are large language models supposed to model?". Trends in Cognitive Sciences. 27 (11): 987–989. doi:10.1016/j.tics.2023.08.006. PMID 37659920."LLMs are supposed to model how utterances behave."
  2. ^ Jurafsky, Dan; Martin, James H. (2021). "N-gram Language Models" (PDF). Speech and Language Processing (3rd ed.). Archived from the original on 22 May 2022. Retrieved 24 May 2022.
  3. ^ Kuhn, Roland, and Renato De Mori (1990). "A cache-based natural language model for speech recognition". IEEE transactions on pattern analysis and machine intelligence 12.6: 570–583.
  4. ^ Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). "Semantic parsing as machine translation" Archived 15 August 2020 at the Wayback Machine. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
  5. ^ Liu, Yang; Wu, Fanyou; Liu, Zhiyuan; Wang, Kai; Wang, Feiyue; Qu, Xiaobo (2023). "Can language models be used for real-world urban-delivery route optimization?". The Innovation. 4 (6): 100520. Bibcode:2023Innov...400520L. doi:10.1016/j.xinn.2023.100520. PMC 10587631. PMID 37869471.
  6. ^ Pham, Vu, et al (2014). "Dropout improves recurrent neural networks for handwriting recognition" Archived 11 November 2020 at the Wayback Machine. 14th International Conference on Frontiers in Handwriting Recognition. IEEE.
  7. ^ Htut, Phu Mon, Kyunghyun Cho, and Samuel R. Bowman (2018). "Grammar induction with neural language models: An unusual replication" Archived 14 August 2022 at the Wayback Machine. arXiv:1808.10000.
  8. ^ Ponte, Jay M.; Croft, W. Bruce (1998). A language modeling approach to information retrieval. Proceedings of the 21st ACM SIGIR Conference. Melbourne, Australia: ACM. pp. 275–281. doi:10.1145/290941.291008.
  9. ^ Hiemstra, Djoerd (1998). A linguistically motivated probabilistically model of information retrieval. Proceedings of the 2nd European conference on Research and Advanced Technology for Digital Libraries. LNCS, Springer. pp. 569–584. doi:10.1007/3-540-49653-X_34.
  10. ^ Chomsky, N. (September 1956). "Three models for the description of language". IRE Transactions on Information Theory. 2 (3): 113–124. doi:10.1109/TIT.1956.1056813. ISSN 2168-2712.
  11. ^ "The Nature Of Life, The Nature Of Thinking: Looking Back On Eugene Charniak's Work And Life". 22 February 2022. Archived from the original on 3 November 2024. Retrieved 5 February 2025.
  12. ^ Rosenfeld, Ronald (2000). "Two decades of statistical language modeling: Where do we go from here?". Proceedings of the IEEE. 88 (8): 1270–1278. doi:10.1109/5.880083. S2CID 10959945.
  13. ^ Bengio, Yoshua; Ducharme, Réjean; Vincent, Pascal; Janvin, Christian (1 March 2003). "A neural probabilistic language model". The Journal of Machine Learning Research. 3: 1137–1155 – via ACM Digital Library.
  14. ^ Jurafsky, Dan; Martin, James H. (7 January 2023). "N-gram Language Models". Speech and Language Processing (PDF) (3rd edition draft ed.). Retrieved 24 May 2022.
  15. ^ David Guthrie; et al. (2006). "A Closer Look at Skip-gram Modelling" (PDF). Archived from the original (PDF) on 17 May 2017. Retrieved 27 April 2014.
  16. ^ Mikolov, Tomas; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). "Efficient estimation of word representations in vector space". arXiv:1301.3781 [cs.CL].
  17. ^ Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg S.; Dean, Jeff (2013). Distributed Representations of Words and Phrases and their Compositionality (PDF). Advances in Neural Information Processing Systems. pp. 3111–3119. Archived (PDF) from the original on 29 October 2020. Retrieved 22 June 2015.
  18. ^ Karpathy, Andrej. "The Unreasonable Effectiveness of Recurrent Neural Networks". Archived from the original on 1 November 2020. Retrieved 27 January 2019.
  19. ^ Bengio, Yoshua (2008). "Neural net language models". Scholarpedia. Vol. 3. p. 3881. Bibcode:2008SchpJ...3.3881B. doi:10.4249/scholarpedia.3881. Archived from the original on 26 October 2020. Retrieved 28 August 2015.
  20. ^ Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (December 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (eds.). "Language Models are Few-Shot Learners" (PDF). Advances in Neural Information Processing Systems. 33. Curran Associates, Inc.: 1877–1901. arXiv:2005.14165. Archived (PDF) from the original on 17 November 2023. Retrieved 14 March 2023.
  21. ^ Fathallah, Nadeen; Das, Arunav; De Giorgis, Stefano; Poltronieri, Andrea; Haase, Peter; Kovriguina, Liubov (26 May 2024). NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning (PDF). Extended Semantic Web Conference 2024. Hersonissos, Greece.
  22. ^ Manning, Christopher D. (2022). "Human Language Understanding & Reasoning". Daedalus. 151 (2): 127–138. doi:10.1162/daed_a_01905. S2CID 248377870. Archived from the original on 17 November 2023. Retrieved 9 March 2023.
  23. ^ Hornstein, Norbert; Lasnik, Howard; Patel-Grosz, Pritty; Yang, Charles (9 January 2018). Syntactic Structures after 60 Years: The Impact of the Chomskyan Revolution in Linguistics. Walter de Gruyter GmbH & Co KG. ISBN 978-1-5015-0692-5. Archived from the original on 16 April 2023. Retrieved 11 December 2021.
  24. ^ Karlgren, Jussi; Schutze, Hinrich (2015), "Evaluating Learning Language Representations", International Conference of the Cross-Language Evaluation Forum, Lecture Notes in Computer Science, Springer International Publishing, pp. 254–260, doi:10.1007/978-3-319-64206-2_8, ISBN 9783319642055
  25. ^ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (10 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805 [cs.CL].
  26. ^ Hendrycks, Dan (14 March 2023), Measuring Massive Multitask Language Understanding, archived from the original on 15 March 2023, retrieved 15 March 2023
  27. ^ "The Corpus of Linguistic Acceptability (CoLA)". nyu-mll.github.io. Archived from the original on 7 December 2020. Retrieved 25 February 2019.
  28. ^ "GLUE Benchmark". gluebenchmark.com. Archived from the original on 4 November 2020. Retrieved 25 February 2019.
  29. ^ "Microsoft Research Paraphrase Corpus". Microsoft Download Center. Archived from the original on 25 October 2020. Retrieved 25 February 2019.
  30. ^ Aghaebrahimian, Ahmad (2017), "Quora Question Answer Dataset", Text, Speech, and Dialogue, Lecture Notes in Computer Science, vol. 10415, Springer International Publishing, pp. 66–73, doi:10.1007/978-3-319-64206-2_8, ISBN 9783319642055
  31. ^ Sammons, V.G.Vinod Vydiswaran, Dan Roth, Mark; Vydiswaran, V.G.; Roth, Dan. "Recognizing Textual Entailment" (PDF). Archived from the original (PDF) on 9 August 2017. Retrieved 24 February 2019.{{cite web}}: CS1 maint: multiple names: authors list (link)
  32. ^ "The Stanford Question Answering Dataset". rajpurkar.github.io. Archived from the original on 30 October 2020. Retrieved 25 February 2019.
  33. ^ "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank". nlp.stanford.edu. Archived from the original on 27 October 2020. Retrieved 25 February 2019.
  34. ^ "llama/MODEL_CARD.md at main · meta-llama/llama". GitHub. Retrieved 28 December 2024.

Further reading

[edit]
血为什么是红色的 嗓子疼发烧吃什么药 om是什么意思 减肥期间吃什么主食 早上五点是什么时辰
雷蒙欣氨麻美敏片是什么药 毛峰茶属于什么茶 人中长代表什么 喝什么茶不影响睡眠 吃什么东西对眼睛好
人肉什么意思 夏天能种什么菜 二个月不来月经是什么原因 农历10月份是什么星座 什么虫咬了起水泡
鱼翅是什么东西 备孕叶酸什么时候吃最好 皮肤暗黄是什么原因 贵州菜属于什么菜系 tablet是什么意思
鲨鱼为什么怕海豚hcv9jop6ns0r.cn 白鱼又叫什么鱼hcv9jop0ns9r.cn 梦见厕所是什么预兆hcv8jop0ns8r.cn 梦见换房子是什么预兆hcv9jop1ns1r.cn 水烧开后有白色沉淀物是什么weuuu.com
屈打成招是什么意思hebeidezhi.com 什么是云母hcv9jop2ns6r.cn 肃穆是什么意思hcv9jop3ns2r.cn 真菌感染什么症状hcv9jop6ns1r.cn 血液透析是什么意思hcv8jop3ns2r.cn
举不胜举的举是什么意思hcv9jop2ns1r.cn 什么夕阳hcv8jop3ns2r.cn 贫血三项是指什么检查hcv9jop4ns6r.cn 语重心长是什么意思hcv9jop4ns9r.cn 丘比特是什么意思hcv9jop5ns9r.cn
咳嗽吃什么hcv9jop4ns9r.cn 盛情款待是什么意思hcv8jop1ns0r.cn 甲状腺滤泡性肿瘤是什么意思hcv8jop8ns4r.cn 白板是什么意思gangsutong.com 皮癣用什么药膏hcv8jop5ns8r.cn
百度