mpv什么意思| cd138阳性是什么意思| 陈醋与香醋有什么区别| 辩驳是什么意思| 恕是什么意思| 气短吃什么药立马见效| 沈阳是什么省| 七个星期五什么档次| 金黄色葡萄球菌是什么菌| 什么饮料好喝| 金代表什么数字| 女生左手中指戴戒指什么意思| 五月一日是什么星座| 全身检查要挂什么科| 蝙蝠飞进家里预示什么| 无料案内所是什么意思| 抬头头晕是什么原因| 长智齿是什么原因引起的| 晗字五行属什么| 吃什么补肾最快最有效| 什么地躺着| happy halloween是什么意思| 麻雀喜欢吃什么| 吃什么会胖| 法克是什么意思| 萧何字什么| 鱼露可以用什么代替| 乙肝15阳性是什么意思| 梅长苏是什么电视剧| 什么叫代谢| hp是什么牌子的电脑| 马踏飞燕什么意思| 二月七号是什么星座| 张一山和杨紫是什么关系| 音序是什么意思| 财年是什么意思| 古代男子成年叫什么| 一纸空文是什么意思| 有迹可循什么意思| 卵泡不破是什么原因造成的| 肺炎吃什么药有效| 血细胞分析五分类是查什么的| 中线是什么意思| 宝宝肚子疼吃什么药| 支气管发炎是什么原因引起的| 怀孕有褐色分泌物是什么原因| 什么人容易得心脏病| 女人胸疼是什么原因| 大力念什么| 节节草煮水喝治什么病| 危楼是什么意思| 淋巴结节吃什么药最好| 左是什么结构的字| 阴唇大什么原因| 肺积水是什么病| 怀不上孕做什么检查| 月经期体重增加是什么原因| 恢复是什么意思| 女性肾功能不好有什么症状| 3.8号是什么星座| 天下无不是的父母是什么意思| 开火车是什么意思| 低血糖的人吃什么东西最好| 中国红是什么颜色| 清心寡欲下一句是什么| 毕婚族是什么意思| 口若悬河是什么意思| 阳光像什么| 开胸手术吃什么补元气| 制动是什么意思| 216是什么意思| 梦见很多牛是什么兆头| 西布曲明是什么| 婴儿黄疸母亲忌口什么| 为什么同房后小腹隐隐作痛| 双手麻是什么原因| 解神是什么意思| 老是打喷嚏是什么原因| 7月27号是什么星座| 降血压吃什么| 父亲节什么时候| 仓鼠能吃什么| 酚氨咖敏片的别名叫什么| 喝什么可以排便通畅| 刘少奇属什么生肖| 急忙的反义词是什么| 耳朵后面痒是什么原因| 压到蛇了是有什么预兆| ac是胎儿的什么| 乳腺纤维瘤有什么症状表现| 胎儿胆囊偏小有什么影响| 堃读什么| 豆腐吃多了有什么坏处| 针灸后需要注意什么| 身体发烧是什么原因| 心肌梗塞是什么原因引起的| 乌龟喜欢吃什么| 大蒜味是什么中毒| 为什么会经常口腔溃疡| 什么是交际花| 移情是什么意思| 发扬什么精神| 女人出虚汗是什么原因引起的| 18k金是什么材质| 经常想睡觉是什么原因| 知柏地黄丸有什么作用| 吃什么长胎快| 右耳朵发热代表什么预兆| 不小心怀孕了吃什么药可以流掉| vans什么意思| 什么是取保候审| 血小板分布宽度低是什么原因| ldlc是什么意思| 木瓜吃了有什么好处| 冰火两重天什么意思| 耳顺是什么意思| 集少两撇是什么字| 皇帝的新装是什么意思| 氧化铜什么颜色| 赘肉是什么意思| 为什么不建议做肠镜| 做梦梦到理发是什么征兆| 尿液有泡沫什么原因| 万事大吉是什么意思| 阴囊积液是什么原因引起的| cpu是什么意思| 声线是什么意思| y什么意思| 属牛男最在乎女人什么| 失眠是什么原因引起的| 商纣王叫什么名字| 缺血灶是什么病| 什么是信念| 九月三日是什么纪念日| 胎毛什么时候剃最好| 长白班什么意思| 水煮鱼用什么鱼| columbia是什么牌子| 心衰吃什么药好| 酸菜鱼是用什么鱼| 什么是斜视| 一什么不什么| 心跳加速心慌吃什么药| 热感冒吃什么药好| 为什么会有痰| 明天是什么日子| 猪八戒的武器叫什么| ca153是什么检查项目| 肝阳性是什么意思| 粽子叶是什么植物的叶子| 外阴瘙痒用什么药膏| 奶嚼口是什么| 六子是什么意思| columbia是什么牌子| 狗肉和什么食物相克| 三伏天晒背有什么好处| g代表什么意思| 2016年属什么生肖| 右肺上叶结节什么意思| 长期贫血对身体有什么危害| 脾是起什么作用的| 褒义词什么意思| 一个口一个巴念什么字| 火烧是什么食物| 生物学父亲是什么意思| 遗精吃什么药| 申是什么生肖| 补肾吃什么药效果最好| 胃酸胃烧心吃什么药| 脸颊两侧长斑是什么原因怎么调理| 牛奶不能和什么一起吃| 荨麻疹要注意什么| 什么的小球| 桂花乌龙茶属于什么茶| 四查十对的内容是什么| 胃烧灼吃什么立马缓解| 查甲状腺功能挂什么科| 刀个刀个刀刀那是什么刀| 营养不良吃什么| 淡盐水漱口有什么好处| 最好的大学是什么大学| 越南用什么语言| 桃子不能和什么一起吃| 嘴唇起泡是什么原因引起的| 吕字五行属什么| 健康证检查什么| 哈喽是什么意思| 乌龟代表什么生肖| 叉烧是什么肉做的| 手刃是什么意思| 碘伏和络合碘有什么区别| 氯化钾是什么| 孕妇吃什么胎儿智商高| 瑞士为什么这么有钱| 迪拜货币叫什么| 比基尼是什么| 福祸相依什么意思| 淋证是什么病| 染色体是什么意思| ecg什么意思| 獐子是什么动物| 甯字五行属什么| 心电图p是什么意思| 喉咙长期有痰是什么原因| 风起云涌是什么生肖| 幽门螺旋杆菌弱阳性是什么意思| 梅毒通过什么传播| 西瓜可以做什么饮料| 什么是葡萄胎| 新西兰用什么货币| 膝盖发软无力是什么原因| 肛门长肉球是什么原因| 便秘吃什么药没有依赖性| 艾滋病潜伏期有什么症状| 宝宝经常发烧是什么原因引起的| 心不在焉什么意思| 思觉失调是什么意思| 早睡有什么好处| 什么品牌的卫浴好| 考试前吃什么| 白带是什么样子的| cro是什么意思| 柠檬酸是什么东西| 乳腺纤维瘤是什么原因引起的| dm是什么病| 舌头尖麻木是什么原因| 偏头痛吃什么药效果好| 胃胀腹胀吃什么药| 眼压低是什么原因| 喝可乐有什么危害| 碳酸氢钠是什么东西| 平板和ipad有什么区别| 什么样的情况下会怀孕| 516是什么意思| 女生有喉结是什么原因| 结肠炎吃什么药治疗效果好| 安睡裤是什么| 精子对女性有什么好处| 嗝气是什么原因| 简单是什么意思| 身份证穿什么颜色的衣服| 疏肝理气喝什么茶| 什么烟最好抽| 脚围指的是什么| 骨折吃什么恢复的快| 8.11是什么星座| 清末民初是什么时候| 呈味核苷酸二钠是什么| 梦见自己拉粑粑是什么意思| tommy什么牌子| 泛醇是什么| 黄墙绿地的作用是什么| 血糖挂什么科| 什么样的梦想| 拆线挂什么科| 3.2号是什么星座| tat是什么意思| 饮什么止渴| 食欲不振吃什么药| 脾胃不好有什么症状表现| 7.14号是什么节日| 梦见打死狗有什么预兆| 什么是沙发发质| 眼睛近视缺什么维生素| 百度Jump to content

2017年4月自学考试《中国近代史纲要》答案(网友

From Wikipedia, the free encyclopedia
百度   文明祭扫是当下最大的倡导,也是最大的共识,但从思想认识落实到行动,还有很长一段路要走。

Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. The goal of data wrangling is to assure quality and useful data. Data analysts typically spend the majority of their time in the process of data wrangling compared to the actual analysis of the data.

The process of data wrangling may include further munging, data visualization, data aggregation, training a statistical model, as well as many other potential uses. Data wrangling typically follows a set of general steps which begin with extracting the data in a raw form from the data source, "munging" the raw data (e.g. sorting) or parsing the data into predefined data structures, and finally depositing the resulting content into a data sink for storage and future use.[1] It is closely aligned with the ETL process.

Background

[edit]

The "wrangler" non-technical term is often said to derive from work done by the United States Library of Congress's National Digital Information Infrastructure and Preservation Program (NDIIPP) and their program partner the Emory University Libraries based MetaArchive Partnership. The term "mung" has roots in munging as described in the Jargon File.[2] The term "data wrangler" was also suggested as the best analogy to describe someone working with data.[3]

One of the first mentions of data wrangling in a scientific context was by Donald Cline during the NASA/NOAA Cold Lands Processes Experiment.[4] Cline stated the data wranglers "coordinate the acquisition of the entire collection of the experiment data." Cline also specifies duties typically handled by a storage administrator for working with large amounts of data. This can occur in areas like major research projects and the making of films with a large amount of complex computer-generated imagery. In research, this involves both data transfer from research instrument to storage grid or storage facility as well as data manipulation for re-analysis via high-performance computing instruments or access via cyberinfrastructure-based digital libraries.

With the upcoming of artificial intelligence in data science it has become increasingly important for automation of data wrangling to have very strict checks and balances, which is why the munging process of data has not been automated by machine learning. Data munging requires more than just an automated solution, it requires knowledge of what information should be removed and artificial intelligence is not to the point of understanding such things.[5]

Connection to data mining

[edit]

Data wrangling is a superset of data mining and requires processes that some data mining uses, but not always. The process of data mining is to find patterns within large data sets, where data wrangling transforms data in order to deliver insights about that data. Even though data wrangling is a superset of data mining does not mean that data mining does not use it, there are many use cases for data wrangling in data mining. Data wrangling can benefit data mining by removing data that does not benefit the overall set, or is not formatted properly, which will yield better results for the overall data mining process.

An example of data mining that is closely related to data wrangling is ignoring data from a set that is not connected to the goal: say there is a data set related to the state of Texas and the goal is to get statistics on the residents of Houston, the data in the set related to the residents of Dallas is not useful to the overall set and can be removed before processing to improve the efficiency of the data mining process.

Benefits

[edit]

With an increase of raw data comes an increase in the amount of data that is not inherently useful, this increases time spent on cleaning and organizing data before it can be analyzed which is where data wrangling comes into play. The result of data wrangling can provide important metadata statistics for further insights about the data, it is important to ensure metadata is consistent otherwise it can cause roadblocks. Data wrangling allows analysts to analyze more complex data more quickly, achieve more accurate results, and because of this better decisions can be made. Many businesses have moved to data wrangling because of the success that it has brought.

Core ideas

[edit]
Turning messy data into useful statistics

The main steps in data wrangling are as follows:

  1. Data discovery

    This all-encompassing term describes how to understand your data. This is the first step to familiarize yourself with your data.

  2. Structuring
    The next step is to organize the data. Raw data is typically unorganized and much of it may not be useful for the end product. This step is important for easier computation and analysis in the later steps.
  3. Cleaning
    There are many different forms of cleaning data, for example one form of cleaning data is catching dates formatted in a different way and another form is removing outliers that will skew results and also formatting null values. This step is important in assuring the overall quality of the data.
  4. Enriching
    At this step determine whether or not additional data would benefit the data set that could be easily added.
  5. Validating
    This step is similar to structuring and cleaning. Use repetitive sequences of validation rules to assure data consistency as well as quality and security. An example of a validation rule is confirming the accuracy of fields via cross checking data.
  6. Publishing
    Prepare the data set for use downstream, which could include use for users or software. Be sure to document any steps and logic during wrangling.

These steps are an iterative process that should yield a clean and usable data set that can then be used for analysis. This process is tedious but rewarding as it allows analysts to get the information they need out of a large set of data that would otherwise be unreadable.

Starting data
Name Phone Birth date State
John, Smith 445-881-4478 August 12, 1989 Maine
Jennifer Tal +1-189-456-4513 11/12/1965 Tx
Gates, Bill (876)546-8165 June 15, 72 Kansas
Alan Fitch 5493156648 2-6-1985 Oh
Jacob Alan 156-4896 January 3 Alabama
Result
Name Phone Birth date State
John Smith 445-881-4478 2025-08-05 Maine
Jennifer Tal 189-456-4513 2025-08-05 Texas
Bill Gates 876-546-8165 2025-08-05 Kansas
Alan Fitch 549-315-6648 2025-08-05 Ohio

The result of using the data wrangling process on this small data set shows a significantly easier data set to read. All names are now formatted the same way, {first name last name}, phone numbers are also formatted the same way {area code-XXX-XXXX}, dates are formatted numerically {YYYY-mm-dd}, and states are no longer abbreviated. The entry for Jacob Alan did not have fully formed data (the area code on the phone number is missing and the birth date had no year), so it was discarded from the data set. Now that the resulting data set is cleaned and readable, it is ready to be either deployed or evaluated.

Typical use

[edit]

The data transformations are typically applied to distinct entities (e.g. fields, rows, columns, data values, etc.) within a data set, and could include such actions as extractions, parsing, joining, standardizing, augmenting, cleansing, consolidating, and filtering to create desired wrangling outputs that can be leveraged downstream.

The recipients could be individuals, such as data architects or data scientists who will investigate the data further, business users who will consume the data directly in reports, or systems that will further process the data and write it into targets such as data warehouses, data lakes, or downstream applications.

Modus operandi

[edit]

Depending on the amount and format of the incoming data, data wrangling has traditionally been performed manually (e.g. via spreadsheets such as Excel), tools like KNIME or via scripts in languages such as Python or SQL. R, a language often used in data mining and statistical data analysis, is now also sometimes used for data wrangling.[6] Data wranglers typically have skills sets within: R or Python, SQL, PHP, Scala, and more languages typically used for analyzing data.

Visual data wrangling systems were developed to make data wrangling accessible for non-programmers, and simpler for programmers. Some of these also include embedded AI recommenders and programming by example facilities to provide user assistance, and program synthesis techniques to autogenerate scalable dataflow code. Early prototypes of visual data wrangling tools include OpenRefine and the Stanford/Berkeley Wrangler research system;[7] the latter evolved into Trifacta.

Other terms for these processes have included data franchising,[8] data preparation, and data munging.

Example

[edit]

Given a set of data that contains information on medical patients your goal is to find correlation for a disease. Before you can start iterating through the data ensure that you have an understanding of the result, are you looking for patients who have the disease? Are there other diseases that can be the cause? Once an understanding of the outcome is achieved then the data wrangling process can begin.

Start by determining the structure of the outcome, what is important to understand the disease diagnosis.

Once a final structure is determined, clean the data by removing any data points that are not helpful or are malformed, this could include patients that have not been diagnosed with any disease.

After cleaning look at the data again, is there anything that can be added to the data set that is already known that would benefit it? An example could be most common diseases in the area, America and India are very different when it comes to most common diseases.

Now comes the validation step, determine validation rules for which data points need to be checked for validity, this could include date of birth or checking for specific diseases.

After the validation step the data should now be organized and prepared for either deployment or evaluation. This process can be beneficial for determining correlations for disease diagnosis as it will reduce the vast amount of data into something that can be easily analyzed for an accurate result.

See also

[edit]

References

[edit]
  1. ^ "What Is Data Munging?". Archived from the original on 2025-08-05. Retrieved 2025-08-05.
  2. ^ "mung". Mung. Jargon File. Archived from the original on 2025-08-05. Retrieved 2025-08-05.
  3. ^ As coder is for code, X is for data Archived 2025-08-05 at the Wayback Machine, Open Knowledge Foundation blog post
  4. ^ Parsons, M. A.; Brodzik, M. J.; Rutter, N. J. (2004). "Data management for the Cold Land Processes Experiment: improving hydrological science". Hydrological Processes. 18 (18): 3637–3653. Bibcode:2004HyPr...18.3637P. doi:10.1002/hyp.5801. S2CID 129774847.
  5. ^ "What Is Data Wrangling? What are the steps in data wrangling?". Express Analytics. 2025-08-05. Archived from the original on 2025-08-05. Retrieved 2025-08-05.
  6. ^ Wickham, Hadley; Grolemund, Garrett (2016). "Chapter 9: Data Wrangling Introduction". R for data science : import, tidy, transform, visualize, and model data (First ed.). Sebastopol, CA: O'Reilly. ISBN 978-1491910399. Archived from the original on 2025-08-05. Retrieved 2025-08-05.
  7. ^ Kandel, Sean; Paepcke, Andreas (May 2011). "Wrangler: Interactive visual specification of data transformation scripts". Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 3363–3372. doi:10.1145/1978942.1979444. ISBN 978-1-4503-0228-9. S2CID 11133756.
  8. ^ What is Data Franchising? (2003 and 2017 IRI) Archived 2025-08-05 at the Wayback Machine
[edit]
蝉代表什么生肖 什么仗人势 6月16日是什么星座 48岁属什么 呕吐出血是什么原因
大姨妈吃什么水果 giada是什么牌子 腺瘤是什么意思 月亮象征着什么 otc属于什么药
香精是什么东西 天衣无缝什么意思 人体自由基是什么 护肝养肝吃什么好 最大的动物是什么
两只小船儿孤孤零零是什么歌 直立倾斜试验阳性是什么病 充盈是什么意思 又字五行属什么 小孩睡觉流鼻血是什么原因引起的
奶糕是什么hcv8jop3ns2r.cn 梦见下雪是什么征兆hcv9jop2ns0r.cn poss是什么意思hcv8jop1ns3r.cn 淋巴结肿大吃什么药消肿效果好bjhyzcsm.com 吃什么变碱性体质最快hcv9jop4ns1r.cn
官符是什么意思hcv8jop0ns7r.cn 头部MRI检查是什么意思hcv9jop0ns6r.cn 全国政协常委什么级别hcv9jop3ns7r.cn 睡眠瘫痪症是什么zsyouku.com 清华什么专业最好hcv9jop4ns9r.cn
一比吊糟什么意思fenrenren.com 氯超标是因为什么原因hcv8jop9ns7r.cn 净化心灵是什么意思hcv9jop1ns3r.cn 1969年属鸡是什么命hcv9jop4ns7r.cn 什么年马月hcv9jop4ns0r.cn
腮腺炎吃什么食物hcv8jop0ns8r.cn 金风送爽是什么意思hcv9jop6ns5r.cn 多云是什么意思chuanglingweilai.com 胃不舒服想吐吃什么药hcv8jop7ns3r.cn 早教是什么hcv9jop0ns3r.cn
百度