鳙鱼是什么鱼| 23年属什么生肖| 乐子是什么意思| 忙碌的动物是什么生肖| ct挂号挂什么科| 姹紫嫣红是什么意思| 10.16是什么星座| 乳头为什么会内陷| 留低是什么意思| 想吃甜食是身体缺什么| 青光眼什么症状| 太阳穴凹陷是什么原因| 剁椒鱼头是什么菜系| 什么是早教机| 手指缝里长水泡还痒是什么原因| 尿维生素c弱阳性是什么意思| 什么人容易得胆汁淤积| 胃动力不足吃什么药| kitchen什么意思| 交可以组什么词| 嘴巴里长血泡是什么原因| 胃酸吃什么药| 克隆恩病是什么| 百合与什么搭配最好| 心衰为什么会引起水肿| 经常反义词是什么| 有蛇进屋预兆着什么| 1月24号什么星座| 吐奶严重是什么原因| 为什么睡觉会流口水| 就坡下驴什么意思| 儿童喉咙痒咳嗽吃什么药| 孕妇肾积水是什么原因引起的| haze是什么意思| 做ct挂什么科| loho是什么牌子| 头汗特别多是什么原因| 生化是检查什么的| 测幽门螺旋杆菌挂什么科| 东南方是什么生肖| 谷草谷丙高是什么原因| 夜里2点到3点醒什么原因| 放屁多什么原因| 盥洗室什么意思| 西泮片是什么药| 颈动脉斑块挂什么科| 吃木瓜有什么作用| 10mg是什么意思| 卵巢囊肿术后吃什么食物好| 脚底有黑痣有什么说法| 手淫多了有什么坏处| 半永久是什么意思| 排尿困难是什么原因男性| 湿热吃什么中成药| 目是什么单位| 乙肝两对半和乙肝五项有什么区别| 排卵期出血有什么症状| ckd5期是什么意思| 病毒性肠胃炎吃什么药| 细菌性结膜炎用什么药| c13阴性是什么意思| 蜂蜜什么时间喝最好| 柠檬泡水喝有什么功效| 月可以加什么偏旁| boby是什么意思| 皮肤过敏吃什么| 绿色加蓝色是什么颜色| 1点到3点是什么时辰| 牛腩烧什么好吃| 中将是什么级别的干部| 尿酸高早餐吃什么| 儿童过敏性结膜炎用什么眼药水| 淋球菌培养是检查什么| 乌鸡炖什么好吃| 澳大利亚属于什么洲| 夏至是什么| 山楂和什么不能一起吃| 眼睛发涩是什么原因导致的| 8月29日什么星座| 晚上喝蜂蜜水有什么好处| 拘留是什么意思| 贝伐珠单抗是什么药| 什么是尿崩症| 指甲有竖纹是什么病| 强化灶是什么意思| 琉璃色是什么颜色| 什么头什么颈| no是什么气体| 木字旁羽字是什么字| 绒穿和羊穿有什么区别| 擎什么意思| cea检查是什么意思| 1994属什么| 纳米是什么东西| 招字五行属什么| 炸酱面用什么酱| 巨蟹男喜欢什么样的女生| 每天吃维生素c有什么好处| 为什么会得中耳炎| 指甲凹凸不平是什么原因| hrv是什么| 得乙肝的人有什么症状| 什么情况下需要打破伤风针| 吃什么止泻| 梦到狗是什么意思| 红烧肉放什么调料| 疖是什么意思| 坐地户是什么意思| 九头身是什么意思| 梦见种地是什么意思| 什么的蘑菇| 乙肝抗体阴性是什么意思| 血压低是什么原因引起的| 抱大腿什么意思| 头面是什么| 女生做彩超是检查什么| 无忧什么意思| 2006年是什么年| 魏大勋和李沁什么关系| 什么消炎药效果好| 艾灸起水泡是什么原因| 宫外孕什么意思| 龙胆泻肝丸治什么病| 大黄和芒硝混合外敷有什么作用| 猪下水是什么| 肠胃炎吃什么药效果好| 眉心中间有痣代表什么| 一月二十三号是什么星座| 刻代表什么生肖| 省委组织部部长什么级别| 72岁属什么生肖| 眼窝凹陷是什么原因| 红参对子宫有什么作用| 荷叶又什么又什么| noon是什么意思| 蓝莓吃了有什么好处| 什么颜色属金| 射手是什么象星座| 什么情况下需要切除子宫| dha什么时候吃最好| 维生素ad和维生素d有什么区别| 垂体催乳素高是什么原因| 阴阳怪气什么意思| 血压高喝什么茶| 肾阴虚吃什么食物补| 嗓子发炎是什么原因引起的| 什么人会得胆囊炎| 昆仑雪菊有什么功效| 1984年属鼠是什么命| 舌苔厚白吃什么药| 经变是什么意思| 甲状腺功能亢进吃什么药| 更年期失眠吃什么药调理效果好| 阳虚是什么意思| 鳖是什么动物| 什么的搏斗| 鸭子什么时候下蛋| 雾里看花是什么意思| 日月星辰是什么意思| 左侧卵巢多囊样改变什么意思| 孕晚期吃什么长胎不长肉| 耳朵一直痒是什么原因| 幽门杆菌的症状是什么| 陪跑什么意思| 男人早泄吃什么药| 总胆固醇低是什么原因| 转氨酶异常有什么症状| 堪舆是什么意思| 什么气什么现| 龙马精神代表什么生肖| 发烧是什么症状| 新生儿呛奶是什么原因引起的| 血热是什么原因引起的| 十岁小孩尿床是什么原因| 乳头有点痛什么原因| 1月10日什么星座| 什么是sm| 禾末念什么| 睡美人最怕什么脑筋急转弯| 子宫肌瘤都有什么症状| 黑色素痣看什么科| 手指关节痛是什么原因| 梦见头发白了是什么意思| 腰酸胀是什么原因男性| 眼皮肿挂什么科| 喉咙发炎吃什么水果好| 心电轴右偏是什么意思| 莜面是什么面做的| 软件开发需要学什么| 什么是gay| 1664是什么酒| 养小鬼是什么意思| nike是什么意思| 梦到前男友是什么意思| 耳朵烫是什么预兆| 身体出油多是什么原因| dha什么时候吃效果最好| 感冒咳嗽一直不好是什么原因| 壅是什么意思| 人为什么没有尾巴| 危机四伏是什么生肖| 高铁上为什么没有e座| tsh是什么| 掌勺是什么意思| 男生纹身纹什么好| 9.29是什么星座| 黑眼圈严重是什么原因| 拉屎有血是什么原因| 1921年是什么年| 晚上吃什么不长肉| 热痱子长什么样| 为什么叫大姨妈| 什么的草原| 洋葱吃了有什么好处| 硒片不适合什么人吃| 县常委什么级别| 乙型肝炎病毒表面抗体阳性是什么意思| 马夫是什么意思| 也字五行属什么| 鹅喜欢吃什么食物| 地豆是什么| 芥末是什么| 土霉素喂鸡有什么作用| 四维和大排畸有什么区别| 种马文是什么意思| 玫瑰花和什么一起泡水喝好| 灌肤是什么意思| 腰间盘突出挂什么科| 高血压有什么危害| 体格检查是什么意思| 2011年是什么生肖| 腰椎间盘膨出是什么意思| 查甲状腺应该挂什么科| 血小板是什么意思| nov是什么意思| 什么星座最好| 花笺是什么意思| 什么是电解质水| 省政协主席什么级别| 假性近视是什么意思| 莲蓬可以用来做什么| 什么是克氏综合征| 煮玉米加盐有什么好处| 皮肤过敏不能吃什么| 乳头状瘤是什么病| 人为什么会焦虑| 阴疽是什么意思| 成双成对是什么数字| 怀孕吃什么可以快速流产| 6月15日是什么星座| 子宫糜烂有什么症状| 屋尘螨是什么东西| 黄河里有什么鱼| 卡密什么意思| au是什么金属| 脚心有痣代表什么| 张什么舞什么| 什么颜色加什么颜色等于灰色| 阳盛阴衰是什么意思| 冠冕堂皇是什么意思| 哕是什么意思| 喝酒肚子疼是什么原因| 左甲状腺是什么病| 11度穿什么衣服| 百度Jump to content

羟基维生素d是什么

From Wikipedia, the free encyclopedia
百度 随着天气转暖,市民可以适当增加户外运动,强身健体,缓解春困。

In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data.[1] Such algorithms function by making data-driven predictions or decisions,[2] through building a mathematical model from input data. These input data used to build the model are usually divided into multiple data sets. In particular, three data sets are commonly used in different stages of the creation of the model: training, validation, and test sets.

The model is initially fit on a training data set,[3] which is a set of examples used to fit the parameters (e.g. weights of connections between neurons in artificial neural networks) of the model.[4] The model (e.g. a naive Bayes classifier) is trained on the training data set using a supervised learning method, for example using optimization methods such as gradient descent or stochastic gradient descent. In practice, the training data set often consists of pairs of an input vector (or scalar) and the corresponding output vector (or scalar), where the answer key is commonly denoted as the target (or label). The current model is run with the training data set and produces a result, which is then compared with the target, for each input vector in the training data set. Based on the result of the comparison and the specific learning algorithm being used, the parameters of the model are adjusted. The model fitting can include both variable selection and parameter estimation.

Successively, the fitted model is used to predict the responses for the observations in a second data set called the validation data set.[3] The validation data set provides an unbiased evaluation of a model fit on the training data set while tuning the model's hyperparameters[5] (e.g. the number of hidden units—layers and layer widths—in a neural network[4]). Validation data sets can be used for regularization by early stopping (stopping training when the error on the validation data set increases, as this is a sign of over-fitting to the training data set).[6] This simple procedure is complicated in practice by the fact that the validation data set's error may fluctuate during training, producing multiple local minima. This complication has led to the creation of many ad-hoc rules for deciding when over-fitting has truly begun.[6]

Finally, the test data set is a data set used to provide an unbiased evaluation of a final model fit on the training data set.[5] If the data in the test data set has never been used in training (for example in cross-validation), the test data set is also called a holdout data set. The term "validation set" is sometimes used instead of "test set" in some literature (e.g., if the original data set was partitioned into only two subsets, the test set might be referred to as the validation set).[5]

Deciding the sizes and strategies for data set division in training, test and validation sets is very dependent on the problem and data available.[7]

Training data set

[edit]
Simplified example of training a neural network in object detection: The network is trained by multiple images that are known to depict starfish and sea urchins, which are correlated with "nodes" that represent visual features. The starfish match with a ringed texture and a star outline, whereas most sea urchins match with a striped texture and oval shape. However, the instance of a ring textured sea urchin creates a weakly weighted association between them.
Subsequent run of the network on an input image (left):[8] The network correctly detects the starfish. However, the weakly weighted association between ringed texture and sea urchin also confers a weak signal to the latter from one of two intermediate nodes. In addition, a shell that was not included in the training gives a weak signal for the oval shape, also resulting in a weak signal for the sea urchin output. These weak signals may result in a false positive result for sea urchin.
In reality, textures and outlines would not be represented by single nodes, but rather by associated weight patterns of multiple nodes.

A training data set is a data set of examples used during the learning process and is used to fit the parameters (e.g., weights) of, for example, a classifier.[9][10]

For classification tasks, a supervised learning algorithm looks at the training data set to determine, or learn, the optimal combinations of variables that will generate a good predictive model.[11] The goal is to produce a trained (fitted) model that generalizes well to new, unknown data.[12] The fitted model is evaluated using “new” examples from the held-out data sets (validation and test data sets) to estimate the model’s accuracy in classifying new data.[5] To reduce the risk of issues such as over-fitting, the examples in the validation and test data sets should not be used to train the model.[5]

Most approaches that search through training data for empirical relationships tend to overfit the data, meaning that they can identify and exploit apparent relationships in the training data that do not hold in general.

When a training set is continuously expanded with new data, then this is incremental learning.

Validation data set

[edit]

A validation data set is a data set of examples used to tune the hyperparameters (i.e. the architecture) of a model. It is sometimes also called the development set or the "dev set".[13] An example of a hyperparameter for artificial neural networks includes the number of hidden units in each layer.[9][10] It, as well as the testing set (as mentioned below), should follow the same probability distribution as the training data set.

In order to avoid overfitting, when any classification parameter needs to be adjusted, it is necessary to have a validation data set in addition to the training and test data sets. For example, if the most suitable classifier for the problem is sought, the training data set is used to train the different candidate classifiers, the validation data set is used to compare their performances and decide which one to take and, finally, the test data set is used to obtain the performance characteristics such as accuracy, sensitivity, specificity, F-measure, and so on. The validation data set functions as a hybrid: it is training data used for testing, but neither as part of the low-level training nor as part of the final testing.

The basic process of using a validation data set for model selection (as part of training data set, validation data set, and test data set) is:[10][14]

Since our goal is to find the network having the best performance on new data, the simplest approach to the comparison of different networks is to evaluate the error function using data which is independent of that used for training. Various networks are trained by minimization of an appropriate error function defined with respect to a training data set. The performance of the networks is then compared by evaluating the error function using an independent validation set, and the network having the smallest error with respect to the validation set is selected. This approach is called the hold out method. Since this procedure can itself lead to some overfitting to the validation set, the performance of the selected network should be confirmed by measuring its performance on a third independent set of data called a test set.

An application of this process is in early stopping, where the candidate models are successive iterations of the same network, and training stops when the error on the validation set grows, choosing the previous model (the one with minimum error).

Test data set

[edit]

A test data set is a data set that is independent of the training data set, but that follows the same probability distribution as the training data set. If a model fit to the training data set also fits the test data set well, minimal overfitting has taken place (see figure below). A better fitting of the training data set as opposed to the test data set usually points to over-fitting.

A test set is therefore a set of examples used only to assess the performance (i.e. generalization) of a fully specified classifier.[9][10] To do this, the final model is used to predict classifications of examples in the test set. Those predictions are compared to the examples' true classifications to assess the model's accuracy.[11]

In a scenario where both validation and test data sets are used, the test data set is typically used to assess the final model that is selected during the validation process. In the case where the original data set is partitioned into two subsets (training and test data sets), the test data set might assess the model only once (e.g., in the holdout method).[15] Note that some sources advise against such a method.[12] However, when using a method such as cross-validation, two partitions can be sufficient and effective since results are averaged after repeated rounds of model training and testing to help reduce bias and variability.[5][12]


A training set (left) and a test set (right) from the same statistical population are shown as blue points. Two predictive models are fit to the training data. Both fitted models are plotted with both the training and test sets. In the training set, the MSE of the fit shown in orange is 4 whereas the MSE for the fit shown in green is 9. In the test set, the MSE for the fit shown in orange is 15 and the MSE for the fit shown in green is 13. The orange curve severely overfits the training data, since its MSE increases by almost a factor of four when comparing the test set to the training set. The green curve overfits the training data much less, as its MSE increases by less than a factor of 2.

Confusion in terminology

[edit]

Testing is trying something to find out about it ("To put to the proof; to prove the truth, genuineness, or quality of by experiment" according to the Collaborative International Dictionary of English) and to validate is to prove that something is valid ("To confirm; to render valid" Collaborative International Dictionary of English). With this perspective, the most common use of the terms test set and validation set is the one here described. However, in both industry and academia, they are sometimes used interchanged, by considering that the internal process is testing different models to improve (test set as a development set) and the final model is the one that needs to be validated before real use with an unseen data (validation set). "The literature on machine learning often reverses the meaning of 'validation' and 'test' sets. This is the most blatant example of the terminological confusion that pervades artificial intelligence research."[16] Nevertheless, the important concept that must be kept is that the final set, whether called test or validation, should only be used in the final experiment.

Cross-validation

[edit]

In order to get more stable results and use all valuable data for training, a data set can be repeatedly split into several training and a validation data sets. This is known as cross-validation. To confirm the model's performance, an additional test data set held out from cross-validation is normally used.

It is possible to use cross-validation on training and validation sets, and within each training set have further cross-validation for a test set for hyperparameter tuning. This is known as nested cross-validation.

Causes of error

[edit]
Comic strip demonstrating a fictional erroneous computer output (making a coffee 5 million degrees, from a previous definition of "extra hot"). This can be classified as both a failure in logic and a failure to include various relevant environmental conditions.[17]

Omissions in the training of algorithms are a major cause of erroneous outputs.[17] Types of such omissions include:[17]

  • Particular circumstances or variations were not included.
  • Obsolete data
  • Ambiguous input information
  • Inability to change to new environments
  • Inability to request help from a human or another AI system when needed

An example of an omission of particular circumstances is a case where a boy was able to unlock the phone because his mother registered her face under indoor, nighttime lighting, a condition which was not appropriately included in the training of the system.[17][18]

Usage of relatively irrelevant input can include situations where algorithms use the background rather than the object of interest for object detection, such as being trained by pictures of sheep on grasslands, leading to a risk that a different object will be interpreted as a sheep if located on a grassland.[17]

See also

[edit]

References

[edit]
  1. ^ Ron Kohavi; Foster Provost (1998). "Glossary of terms". Machine Learning. 30: 271–274. doi:10.1023/A:1007411609915.
  2. ^ Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. New York: Springer. p. vii. ISBN 0-387-31073-8. Pattern recognition has its origins in engineering, whereas machine learning grew out of computer science. However, these activities can be viewed as two facets of the same field, and together they have undergone substantial development over the past ten years.
  3. ^ a b James, Gareth (2013). An Introduction to Statistical Learning: with Applications in R. Springer. p. 176. ISBN 978-1461471370.
  4. ^ a b Ripley, Brian (1996). Pattern Recognition and Neural Networks. Cambridge University Press. p. 354. ISBN 978-0521717700.
  5. ^ a b c d e f Brownlee, Jason (2025-08-06). "What is the Difference Between Test and Validation Datasets?". Retrieved 2025-08-06.
  6. ^ a b Prechelt, Lutz; Geneviève B. Orr (2025-08-06). "Early Stopping — But When?". In Grégoire Montavon; Klaus-Robert Müller (eds.). Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 53–67. doi:10.1007/978-3-642-35289-8_5. ISBN 978-3-642-35289-8.
  7. ^ "Machine learning - Is there a rule-of-thumb for how to divide a dataset into training and validation sets?". Stack Overflow. Retrieved 2025-08-06.
  8. ^ Ferrie, C., & Kaiser, S. (2019). Neural Networks for Babies. Sourcebooks. ISBN 978-1492671206.{{cite book}}: CS1 maint: multiple names: authors list (link)
  9. ^ a b c Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press, p. 354
  10. ^ a b c d "Subject: What are the population, sample, training set, design set, validation set, and test set?", Neural Network FAQ, part 1 of 7: Introduction (txt), comp.ai.neural-nets, Sarle, W.S., ed. (1997, last modified 2025-08-06)
  11. ^ a b Larose, D. T.; Larose, C. D. (2014). Discovering knowledge in data : an introduction to data mining. Hoboken: Wiley. doi:10.1002/9781118874059. ISBN 978-0-470-90874-7. OCLC 869460667.
  12. ^ a b c Xu, Yun; Goodacre, Royston (2018). "On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning". Journal of Analysis and Testing. 2 (3). Springer Science and Business Media LLC: 249–262. doi:10.1007/s41664-018-0068-2. ISSN 2096-241X. PMC 6373628. PMID 30842888.
  13. ^ "Deep Learning". Coursera. Retrieved 2025-08-06.
  14. ^ Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press, p. 372
  15. ^ Kohavi, Ron (2025-08-06). "A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection". 14. {{cite journal}}: Cite journal requires |journal= (help)
  16. ^ Ripley, Brian D. (2025-08-06). "Glossary". Pattern recognition and neural networks. Cambridge University Press. ISBN 9780521717700. OCLC 601063414.
  17. ^ a b c d e Chanda SS, Banerjee DN (2022). "Omission and commission errors underlying AI failures". AI Soc. 39 (3): 1–24. doi:10.1007/s00146-022-01585-x. PMC 9669536. PMID 36415822.
  18. ^ Greenberg A (2025-08-06). "Watch a 10-Year-Old's Face Unlock His Mom's iPhone X". Wired.
什么什么不断 淋巴细胞比率偏高是什么原因 眼睛为什么会得结膜炎 什么的船只 脚为什么会痒越抓越痒
失眠有什么办法解决 强调是什么意思 罗纹布是什么面料 猫尿床是因为什么原因 什么是尊严
腿酸是什么原因引起的 10月10号是什么星座 雷公根有什么功效 满目苍夷是什么意思 股级干部是什么级别
多汗症挂什么科 康熙是乾隆的什么人 同房后为什么会出血 私募是什么 梦见上班迟到什么意思
出殡下雨是什么兆头hcv8jop9ns6r.cn 鸡蛋干配什么菜炒好吃shenchushe.com 印第安老斑鸠什么意思hcv9jop0ns5r.cn 胸ct和肺ct有什么区别hcv8jop8ns8r.cn 艾滋病通过什么传染hcv9jop7ns3r.cn
线索是什么意思shenchushe.com 离婚需要什么hcv8jop4ns0r.cn 消炎药吃多了有什么副作用onlinewuye.com 小指麻木是什么原因hcv8jop1ns8r.cn 全身骨显像是查什么的hcv8jop3ns1r.cn
霸屏是什么意思aiwuzhiyu.com 葛根和什么搭配泡水好hcv8jop3ns5r.cn 1968年猴是什么命hcv8jop8ns6r.cn 甲状腺球蛋白抗体高说明什么hcv7jop6ns6r.cn 180度是什么角hcv9jop8ns3r.cn
扁桃体发炎是什么症状hcv8jop9ns9r.cn 国务院秘书长什么级别hcv8jop3ns5r.cn 猫什么时候打疫苗hcv8jop9ns9r.cn 红细胞分布宽度偏低是什么原因hcv9jop4ns2r.cn 猴日冲虎是什么意思hcv7jop6ns1r.cn
百度