钛是什么颜色| 胆水是什么| 年轮是什么意思| 秋天吃什么| 早上起来眼睛肿了是什么原因| 败血症是什么病| 痛风吃什么好得快| 河南属于什么平原| 向日葵什么时候种| 什么是前列腺炎| 子宫复旧是什么意思| 做梦吃酒席什么预兆| 寡情是什么意思| 痔疮挂什么科| 小腿抽筋吃什么药| 熬夜吃什么水果好| 中医行业五行属什么| 吃醋对身体有什么好处| 水煮鱼用什么鱼做好吃| 孩子干咳吃什么药效果好| 职业测试你适合什么工作| 什么是钓鱼网站| 丙辰是什么时辰| 正月初二是什么星座的| 汁字五行属什么| 痤疮是什么原因引起的| 猫可以吃什么水果| 人大常委会副主任是什么级别| aivei是什么品牌| 2006年属狗的是什么命| 胃窦炎吃什么药效果最好| 循证是什么意思| 梦见女儿哭意味着什么| 呵护是什么意思| 戒奶涨奶痛有什么缓解方法| 喝什么酒对身体好| 肾结石是由什么原因引起的| 无机磷偏低有什么影响| 一度是什么意思| 五条杠什么牌子| 娃儿发烧用什么方法退烧快| 后会有期什么意思| 吃雪燕有什么好处| 鸽子是什么生肖| 千山鸟飞绝的绝是什么意思| 切除子宫有什么危害| 五月底是什么星座| 海姆立克急救法是什么| 急性结膜炎用什么眼药水| coach是什么意思| 急得什么| 云南白药植物长什么样| 尿隐血十1是什么意思| 护理专业是干什么的| 滇红茶属于什么茶| 你的名字讲的什么故事| 知了猴有什么营养| ecg是什么意思| 迪奥是什么品牌| 定点医院什么意思| 丝瓜什么人不能吃| 胃痉挛什么症状| ab型血可以输什么血| 乳腺增生结节吃什么药效果好| 无花果有什么作用| 什么是暗网| 身宫是什么意思| 梅花什么季节开| 灯光什么| 痰多吃什么药好| 降血脂吃什么药效果好| 什么自语| 小孩口腔溃疡吃什么药| johnson是什么品牌| 胃病挂什么科| 幼儿急疹为什么不能碰水| 肠胃性感冒吃什么药| 农历11月25日是什么星座| 银手镯变黑是什么原因| 狐臭用什么药最好| 孕妇吃什么能马上通便| 梦见狗打架是什么意思| 双非是什么意思| 为什么大熊猫是国宝| 白细胞酯酶阳性是什么| 大姨妈期间吃什么好| lsp是什么意思| 氧化钙是什么| 牙齿涂氟是什么意思| 598分能上什么大学| 莲藕什么时候种植最佳| 滴虫性阴道炎用什么药好| 小腿发凉是什么原因造成的| 九宫是什么意思| 男性尿路感染有什么症状| 梦见骨灰盒是什么征兆| 为什么白天尿少晚上尿多| 玄关挂什么装饰画好| 床垫什么材质的好| 口粮是什么意思| 运动减肥为什么体重不减反增| 污蔑是什么意思| 脚抽筋是什么原因引起的| 59年属什么生肖| 11.28什么星座| 和珅属什么生肖| 艾灸灸出水泡是什么情况| 茯苓和土茯苓有什么区别| 红醋是什么醋| 贫血严重会得什么病| 头发偏黄是什么原因| silence是什么意思| 六月二十四是什么星座| 精血是什么意思| 为什么吃甲鱼不吃乌龟| 生存是什么意思| 木白念什么| 贵人多忘事什么意思| 当兵对牙齿有什么要求| 杨紫属什么生肖| 心脏早搏是什么意思| 马栗是什么植物| 黄体生成素是什么| 乙肝125阳性是什么意思| 玫瑰花像什么| 吃玉米有什么好处| 二甲双胍缓释片什么时候吃| 贲门炎是什么意思| 风湿性关节炎什么症状| 低密度脂蛋白高是什么意思| 什么是abo文| 鼻息肉长什么样| 花洒不出水什么原因| 什么是造影| 来月经腰疼是什么原因| 胰头占位是什么病| 妇科千金片主要治什么| 病毒感染会有什么症状| twitter是什么| 脚底有痣代表什么意思| 四季豆不能和什么一起吃| 外阴瘙痒用什么药膏擦| acei是什么| 湿气太重吃什么药最好| 水镜先生和司马懿是什么关系| 老有痰是什么原因| 处心积虑什么意思| min是什么单位| 1.30是什么星座| 心跳的快是什么原因| 霉菌用什么药| 贝尔发明了什么东西| 血清铁蛋白高说明什么| 宫寒是什么原因引起的| 剧情是什么意思| 莳字五行属什么| 什么情况下能吃脑络通| 新生儿便秘吃什么好| 玻璃五行属什么| 鹅拉绿色粪便是什么病| 盐巴是什么| kh什么意思| 手心有痣代表什么意思| 核能是什么| 眉毛上长痣代表什么| 2016年是什么命| 对乙酰氨基酚片是什么药| 3月16号是什么星座的| 嗔恨是什么意思| 检测怀孕最准确的方法是什么| 猕猴桃什么时候吃最好| 血浆是什么| 小孩抽多动症吃什么药最好| 梦见很多猪是什么意思| 九月初四是什么星座| 27岁属相是什么生肖| 共济会是什么| 1981年是什么年| 娘酒是什么酒| 月经期间适合吃什么水果| 刑冲破害是什么意思| 尿酸是什么| 射精无力吃什么药| 维生素检查项目叫什么| 初恋什么意思| 心窦过缓是什么原因| 庚是什么意思| 日本樱花什么时候开| 肝风内动是什么原因造成的| 什么字笔画最多| 什么吃蚊子| 六月初二是什么日子| 出什么入什么| 尿蛋白质弱阳性是什么意思| 中午适合吃什么| 喜是什么意思| 费心是什么意思| 舌苔白有齿痕吃什么药| 脓血症是什么病严重吗| i是什么| 嘴唇有黑斑是什么病| 浮躁的意思是什么| 天加一笔变成什么字| 5月28日什么星座| 眼前的苟且是什么意思| 9点到11点是什么经络| 5月30号是什么星座| 肉是什么结构| 今天天气适合穿什么衣服| 火险痣是什么意思| gap是什么档次的牌子| 普洱茶有什么功效与作用| 海马炖什么好小孩长高| 为什么直系亲属不能输血| 静脉曲张看什么科室| 来曲唑片是什么药| 释怀什么意思| 眼压高用什么眼药水| 什么样的情况下需要做肠镜| 尿结石什么症状| 扶山是什么意思| 白色情人节什么意思| 咖啡喝多了有什么副作用| 频繁是什么意思| 祖马龙是什么档次| 煮奶茶用什么茶叶| 收口是什么意思| 王维被称为什么| 盐酸哌替啶是什么药| 尿常规白细胞偏高是什么原因| 头昏吃什么药| 血管检查是做什么检查| hcy是什么检查项目| kamagra是什么药| 七月22号是什么星座| 脾胃虚吃什么调理| 人又不人鬼不鬼是什么生肖| 神奇的近义词是什么| 女性得乙肝有什么症状| 非赘生性囊肿什么意思| 鞠婧祎什么星座| 牙龈肿痛吃什么中成药| 什么水果去火| 豆种翡翠属于什么档次| 痛风吃什么蔬菜好| 外耳炎用什么药| 冷冻液是什么| 男生的隐私长什么样| 复方氯化钠注射作用是什么| 都市丽人是什么意思| 胃寒是什么原因引起的| 什么是高脂血症| 泉中水命是什么意思| 猫头鹰吃什么| 今天是什么年| 世界上最难写的字是什么字| 甲状腺是什么引起的原因| pr医学上什么意思| 1940年出生属什么生肖| 脾大是什么原因引起的| 瘿瘤是什么病| 梦见车翻了是什么预兆| 胃痛按什么部位可以缓解疼痛| 美人尖是什么意思| 百度Jump to content

什么是文爱

From Wikipedia, the free encyclopedia
Figure 1.  The green line represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and is likely to have a higher error rate on new unseen data, illustrated by black-outlined dots, compared to the black line.
Figure 2.  Noisy (roughly linear) data is fitted to a linear function and a polynomial function. Although the polynomial function is a perfect fit, the linear function can be expected to generalize better: If the two functions were used to extrapolate beyond the fitted data, the linear function should make better predictions.
Figure 3.  The blue dashed line represents an underfitted model. A straight line can never fit a parabola. This model is too simple.
百度 2018年纽约时装周后,李宁突然又火了!春节前的2月7日,李宁在纽约时装周举办了2018「悟道」系列的发布会,反响颇好,之后在网络上引发了「这还是我认识的李宁吗」的病毒式刷屏。

In mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably".[1] An overfitted model is a mathematical model that contains more parameters than can be justified by the data.[2] In the special case where the model consists of a polynomial function, these parameters represent the degree of a polynomial. The essence of overfitting is to have unknowingly extracted some of the residual variation (i.e., the noise) as if that variation represented underlying model structure.[3]:?45?

Underfitting occurs when a mathematical model cannot adequately capture the underlying structure of the data. An under-fitted model is a model where some parameters or terms that would appear in a correctly specified model are missing.[2] Underfitting would occur, for example, when fitting a linear model to nonlinear data. Such a model will tend to have poor predictive performance.

The possibility of over-fitting exists because the criterion used for selecting the model is not the same as the criterion used to judge the suitability of a model. For example, a model might be selected by maximizing its performance on some set of training data, and yet its suitability might be determined by its ability to perform well on unseen data; overfitting occurs when a model begins to "memorize" training data rather than "learning" to generalize from a trend.

As an extreme example, if the number of parameters is the same as or greater than the number of observations, then a model can perfectly predict the training data simply by memorizing the data in its entirety. (For an illustration, see Figure 2.) Such a model, though, will typically fail severely when making predictions.

Overfitting is directly related to approximation error of the selected function class and the optimization error of the optimization procedure. A function class that is too large, in a suitable sense, relative to the dataset size is likely to overfit.[4] Even when the fitted model does not have an excessive number of parameters, it is to be expected that the fitted relationship will appear to perform less well on a new dataset than on the dataset used for fitting (a phenomenon sometimes known as shrinkage).[2] In particular, the value of the coefficient of determination will shrink relative to the original data.

To lessen the chance or amount of overfitting, several techniques are available (e.g., model comparison, cross-validation, regularization, early stopping, pruning, Bayesian priors, or dropout). The basis of some techniques is to either (1) explicitly penalize overly complex models or (2) test the model's ability to generalize by evaluating its performance on a set of data not used for training, which is assumed to approximate the typical unseen data that a model will encounter.

Statistical inference

[edit]

In statistics, an inference is drawn from a statistical model, which has been selected via some procedure. Burnham & Anderson, in their much-cited text on model selection, argue that to avoid overfitting, we should adhere to the "Principle of Parsimony".[3] The authors also state the following.[3]:?32–33?

Overfitted models ... are often free of bias in the parameter estimators, but have estimated (and actual) sampling variances that are needlessly large (the precision of the estimators is poor, relative to what could have been accomplished with a more parsimonious model). False treatment effects tend to be identified, and false variables are included with overfitted models. ... A best approximating model is achieved by properly balancing the errors of underfitting and overfitting.

Overfitting is more likely to be a serious concern when there is little theory available to guide the analysis, in part because then there tend to be a large number of models to select from. The book Model Selection and Model Averaging (2008) puts it this way.[5]

Given a data set, you can fit thousands of models at the push of a button, but how do you choose the best? With so many candidate models, overfitting is a real danger. Is the monkey who typed Hamlet actually a good writer?

Regression

[edit]

In regression analysis, overfitting occurs frequently.[6] As an extreme example, if there are p variables in a linear regression with p data points, the fitted line can go exactly through every point.[7] For logistic regression or Cox proportional hazards models, there are a variety of rules of thumb (e.g. 5–9,[8] 10[9] and 10–15[10] — the guideline of 10 observations per independent variable is known as the "one in ten rule"). In the process of regression model selection, the mean squared error of the random regression function can be split into random noise, approximation bias, and variance in the estimate of the regression function. The bias–variance tradeoff is often used to overcome overfit models.

With a large set of explanatory variables that actually have no relation to the dependent variable being predicted, some variables will in general be falsely found to be statistically significant and the researcher may thus retain them in the model, thereby overfitting the model. This is known as Freedman's paradox.

Machine learning

[edit]
Figure 4. Overfitting/overtraining in supervised learning (e.g., a neural network). Training error is shown in blue, and validation error in red, both as a function of the number of training cycles. If the validation error increases (positive slope) while the training error steadily decreases (negative slope), then a situation of overfitting may have occurred. The best predictive and fitted model would be where the validation error has its global minimum.

Usually, a learning algorithm is trained using some set of "training data": exemplary situations for which the desired output is known. The goal is that the algorithm will also perform well on predicting the output when fed "validation data" that was not encountered during its training.

Overfitting is the use of models or procedures that violate Occam's razor, for example by including more adjustable parameters than are ultimately optimal, or by using a more complicated approach than is ultimately optimal. For an example where there are too many adjustable parameters, consider a dataset where training data for y can be adequately predicted by a linear function of two independent variables. Such a function requires only three parameters (the intercept and two slopes). Replacing this simple function with a new, more complex quadratic function, or with a new, more complex linear function on more than two independent variables, carries a risk: Occam's razor implies that any given complex function is a priori less probable than any given simple function. If the new, more complicated function is selected instead of the simple function, and if there was not a large enough gain in training data fit to offset the complexity increase, then the new complex function "overfits" the data and the complex overfitted function will likely perform worse than the simpler function on validation data outside the training dataset, even though the complex function performed as well, or perhaps even better, on the training dataset.[11]

When comparing different types of models, complexity cannot be measured solely by counting how many parameters exist in each model; the expressivity of each parameter must be considered as well. For example, it is nontrivial to directly compare the complexity of a neural net (which can track curvilinear relationships) with m parameters to a regression model with n parameters.[11]

Overfitting is especially likely in cases where learning was performed too long or where training examples are rare, causing the learner to adjust to very specific random features of the training data that have no causal relation to the target function. In this process of overfitting, the performance on the training examples still increases while the performance on unseen data becomes worse.

As a simple example, consider a database of retail purchases that includes the item bought, the purchaser, and the date and time of purchase. It's easy to construct a model that will fit the training set perfectly by using the date and time of purchase to predict the other attributes, but this model will not generalize at all to new data because those past times will never occur again.

Generally, a learning algorithm is said to overfit relative to a simpler one if it is more accurate in fitting known data (hindsight) but less accurate in predicting new data (foresight). One can intuitively understand overfitting from the fact that information from all past experience can be divided into two groups: information that is relevant for the future, and irrelevant information ("noise"). Everything else being equal, the more difficult a criterion is to predict (i.e., the higher its uncertainty), the more noise exists in past information that needs to be ignored. The problem is determining which part to ignore. A learning algorithm that can reduce the risk of fitting noise is called "robust."

Consequences

[edit]
A photograph of Anne Graham Lotz included in the training set of Stable Diffusion, a text-to-image model
An image generated by Stable Diffusion using the prompt "Anne Graham Lotz"
Overfitted generative models may produce outputs that are virtually identical to instances from their training set.[12]

The most obvious consequence of overfitting is poor performance on the validation dataset. Other negative consequences include:

  • A function that is overfitted is likely to request more information about each item in the validation dataset than does the optimal function; gathering this additional unneeded data can be expensive or error-prone, especially if each individual piece of information must be gathered by human observation and manual data entry.[11]
  • A more complex, overfitted function is likely to be less portable than a simple one. At one extreme, a one-variable linear regression is so portable that, if necessary, it could even be done by hand. At the other extreme are models that can be reproduced only by exactly duplicating the original modeler's entire setup, making reuse or scientific reproduction difficult.[11]
  • It may be possible to reconstruct details of individual training instances from an overfitted machine learning model's training set. This may be undesirable if, for example, the training data includes sensitive personally identifiable information (PII). This phenomenon also presents problems in the area of artificial intelligence and copyright, with the developers of some generative deep learning models such as Stable Diffusion and GitHub Copilot being sued for copyright infringement because these models have been found to be capable of reproducing certain copyrighted items from their training data.[12][13]

Remedy

[edit]

The optimal function usually needs verification on bigger or completely new datasets. There are, however, methods like minimum spanning tree or life-time of correlation that applies the dependence between correlation coefficients and time-series (window width). Whenever the window width is big enough, the correlation coefficients are stable and don't depend on the window width size anymore. Therefore, a correlation matrix can be created by calculating a coefficient of correlation between investigated variables. This matrix can be represented topologically as a complex network where direct and indirect influences between variables are visualized.

Dropout regularisation (random removal of training set data) can also improve robustness and therefore reduce over-fitting by probabilistically removing inputs to a layer.

Underfitting

[edit]
Figure 5.  The red line represents an underfitted model of the data points represented in blue. We would expect to see a parabola shaped line to represent the curvature of the data points.
Figure 6.  The blue line represents a fitted model of the data points represented in green.

Underfitting is the inverse of overfitting, meaning that the statistical model or machine learning algorithm is too simplistic to accurately capture the patterns in the data. A sign of underfitting is that there is a high bias and low variance detected in the current model or algorithm used (the inverse of overfitting: low bias and high variance). This can be gathered from the Bias-variance tradeoff, which is the method of analyzing a model or algorithm for bias error, variance error, and irreducible error. With a high bias and low variance, the result of the model is that it will inaccurately represent the data points and thus insufficiently be able to predict future data results (see Generalization error). As shown in Figure 5, the linear line could not represent all the given data points due to the line not resembling the curvature of the points. We would expect to see a parabola-shaped line as shown in Figure 6 and Figure 1. If we were to use Figure 5 for analysis, we would get false predictive results contrary to the results if we analyzed Figure 6.

Burnham & Anderson state the following.[3]:?32?

... an underfitted model would ignore some important replicable (i.e., conceptually replicable in most other samples) structure in the data and thus fail to identify effects that were actually supported by the data. In this case, bias in the parameter estimators is often substantial, and the sampling variance is underestimated, both factors resulting in poor confidence interval coverage. Underfitted models tend to miss important treatment effects in experimental settings.

Resolving underfitting

[edit]

There are multiple ways to deal with underfitting:

  1. Increase the complexity of the model: If the model is too simple, it may be necessary to increase its complexity by adding more features, increasing the number of parameters, or using a more flexible model. However, this should be done carefully to avoid overfitting.[14]
  2. Use a different algorithm: If the current algorithm is not able to capture the patterns in the data, it may be necessary to try a different one. For example, a neural network may be more effective than a linear regression model for some types of data.[14]
  3. Increase the amount of training data: If the model is underfitting due to a lack of data, increasing the amount of training data may help. This will allow the model to better capture the underlying patterns in the data.[14]
  4. Regularization: Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function that discourages large parameter values. It can also be used to prevent underfitting by controlling the complexity of the model.[15]
  5. Ensemble Methods: Ensemble methods combine multiple models to create a more accurate prediction. This can help reduce underfitting by allowing multiple models to work together to capture the underlying patterns in the data.
  6. Feature engineering: Feature engineering involves creating new model features from the existing ones that may be more relevant to the problem at hand. This can help improve the accuracy of the model and prevent underfitting.[14]

Benign overfitting

[edit]

Benign overfitting describes the phenomenon of a statistical model that seems to generalize well to unseen data, even when it has been fit perfectly on noisy training data (i.e., obtains perfect predictive accuracy on the training set). The phenomenon is of particular interest in deep neural networks, but is studied from a theoretical perspective in the context of much simpler models, such as linear regression. In particular, it has been shown that overparameterization is essential for benign overfitting in this setting. In other words, the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.[16]

See also

[edit]

Notes

[edit]
  1. ^ Definition of "overfitting" at OxfordDictionaries.com: this definition is specifically for statistics.
  2. ^ a b c Everitt B.S., Skrondal A. (2010), Cambridge Dictionary of Statistics, Cambridge University Press.
  3. ^ a b c d Burnham, K. P.; Anderson, D. R. (2002), Model Selection and Multimodel Inference (2nd ed.), Springer-Verlag.
  4. ^ Bottou, Léon; Bousquet, Olivier (2025-08-05), "The Tradeoffs of Large-Scale Learning", Optimization for Machine Learning, The MIT Press, pp. 351–368, doi:10.7551/mitpress/8996.003.0015, ISBN 978-0-262-29877-3, retrieved 2025-08-05
  5. ^ Claeskens, G.; Hjort, N.L. (2008), Model Selection and Model Averaging, Cambridge University Press.
  6. ^ Harrell, F. E. Jr. (2001), Regression Modeling Strategies, Springer.
  7. ^ Martha K. Smith (2025-08-05). "Overfitting". University of Texas at Austin. Retrieved 2025-08-05.
  8. ^ Vittinghoff, E.; McCulloch, C. E. (2007). "Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression". American Journal of Epidemiology. 165 (6): 710–718. doi:10.1093/aje/kwk052. PMID 17182981.
  9. ^ Draper, Norman R.; Smith, Harry (1998). Applied Regression Analysis (3rd ed.). Wiley. ISBN 978-0471170822.
  10. ^ Jim Frost (2025-08-05). "The Danger of Overfitting Regression Models". Retrieved 2025-08-05.
  11. ^ a b c d Hawkins, Douglas M (2004). "The problem of overfitting". Journal of Chemical Information and Modeling. 44 (1): 1–12. doi:10.1021/ci0342472. PMID 14741005. S2CID 12440383.
  12. ^ a b Lee, Timothy B. (3 April 2023). "Stable Diffusion copyright lawsuits could be a legal earthquake for AI". Ars Technica.
  13. ^ Vincent, James (2025-08-05). "The lawsuit that could rewrite the rules of AI copyright". The Verge. Retrieved 2025-08-05.
  14. ^ a b c d "ML | Underfitting and Overfitting". GeeksforGeeks. 2025-08-05. Retrieved 2025-08-05.
  15. ^ Nusrat, Ismoilov; Jang, Sung-Bong (November 2018). "A Comparison of Regularization Techniques in Deep Neural Networks". Symmetry. 10 (11): 648. Bibcode:2018Symm...10..648N. doi:10.3390/sym10110648. ISSN 2073-8994.
  16. ^ Bartlett, P.L., Long, P.M., Lugosi, G., & Tsigler, A. (2019). Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117, 30063 - 30070.

References

[edit]

Further reading

[edit]
[edit]
清华大学什么时候成立 糖类抗原CA125高是什么意思 喝酒为什么会脸红 什么叫介入治疗 穿刺是检查什么的
灰喜鹊吃什么 今年是什么年 二进宫是什么意思 吃百家饭是什么意思 梦到蝴蝶代表什么预兆
双龙什么 腱鞘炎是什么原因引起的 豫州是现在的什么地方 脾大是什么意思 副产品是什么意思
伊面是什么面 拔完智齿需要注意什么 女性排卵期有什么表现 死有余辜是什么意思 儿茶是什么中药
点字五行属什么hcv9jop3ns7r.cn 九牛一毛什么意思hcv7jop7ns2r.cn 增加白细胞吃什么食物最好hcv8jop6ns3r.cn 三唑仑是什么药96micro.com 矗读什么zsyouku.com
智字五行属什么hcv8jop5ns0r.cn 平躺头晕是什么原因hcv9jop6ns6r.cn 儿童热伤风吃什么药hcv9jop8ns0r.cn 为什么不敢挖雍正陵墓hcv7jop9ns5r.cn 属蛇女和什么属相最配hcv7jop7ns3r.cn
cea检查是什么意思hcv8jop2ns0r.cn 芡实是什么hcv7jop4ns7r.cn 甲亢什么症状hcv8jop2ns2r.cn 什么是白癜风hcv8jop1ns3r.cn 为什么会中暑hcv9jop2ns0r.cn
什么不什么干hcv9jop1ns5r.cn 白细胞和血小板高是什么原因hcv9jop1ns8r.cn 中国的国花是什么hcv9jop5ns7r.cn 感情洁癖什么意思hcv9jop2ns6r.cn 登高望远是什么生肖hcv8jop0ns0r.cn
百度