AutoML 2.0:数据科学家过时了吗?

人工智能现在可以驱动所谓的要素工程(Feature Engineering),允许用户自动发现和创建数据科学处理功能。这种做法开启了一种全新的数据科学方法,似乎会威胁到数据科学家的作用。

在过去几年里,AutoML快速增长。而且目前看来,经济衰退无可避免,人工智能(AI)和机器学习自动化开发的观念也必将越来越有吸引力。业界现在推出的各种新平台(都具有更多的自动化功能。人工智能现在可以驱动所谓的要素工程(Feature Engineering),允许用户自动发现和创建数据科学处理功能。这种做法开启了一种全新的数据科学方法,似乎会威胁到数据科学家的作用。


![The traditional data science process](https://specials-


The traditional data science process dotData, Inc


**AutoML 2.0, More Automation for Data Science**

AutoML 2.0必将令数据科学更自动化

First-generation AutoML platforms have focused on automating the machine learning part of the data science process. In a traditional data science workflow, however, the longest and most challenging part is the highly manual step known as feature engineering. Feature engineering involves connecting data sources and building a flat "feature table" with a rich, diverse set of "features" that is evaluated against multiple Machine Learning algorithms. The challenge of feature engineering is that it requires an elevated level of domain expertise to “ideate” new features and is very iterative as features are evaluated and rejected or chosen. New platforms, however, have recently emerged that provide additional capabilities and automation aimed at solving this challenge. Platforms with "Automated Feature Engineering" capabilities now allow for the automated creation of feature-tables from relational data sources as well as flat files. This ability to "auto-generate" features in the data science process is a game-changing capability. Suddenly, the "citizen" data scientists - Business Intelligence (BI) analysts, data engineers, and other technically savvy members of the organization with deep domain knowledge - can become valuable contributors to an organization's development of ML and AI models. Through Automated Feature Engineering, BI teams can suddenly develop sophisticated predictive analytics algorithms in days, significantly accelerating their productivity with minimal help from data scientists.




**Automating Data Science: Democratization**


One of the chief benefits of AutoML 2.0 platforms is true data science democratization. When data science automation can accelerate and automate the process of discovering and creating features, it allows for a more diverse and abundant group of users to contribute to the data science process. Automation of feature creation allows the "citizen" data scientist to create incredibly useful, highly optimized use-cases. Because citizen data scientists typically have a high degree of "domain expertise," they can focus on use cases that are of high value to the organization with minimal if any assistance from the data science team. The added benefit of enabling citizen data scientists is that it allows the business to expand their use of data science without having to worry about hiring armies of data scientists. The ability to empower new data science contributors is especially significant given the difficulty organizations in the US have had in hiring data scientists, as examined in[ a 2018 LinkedIn study]( report-august-2018  ). With economic uncertainty facing the global community, enabling a new class of AI/ML developers with minimal investments becomes a game-changing value proposition to maintain or increase competitive advantages.

AutoML 2.0平台的主要好处之一是可以用于真正的数据科学平民化。


开启公民数据科学家的另一个好处在于,企业无需担心招不到数据科学家而一样可以开拓数据科学的使用。2018 LinkedIn的一项研究表明,美国的组织在雇用数据科学家方面遇到困难。鉴于此,能够发掘新的数据科学贡献者就显得尤为重要。


**Automating Data Science: Productivity, Not Replacement**


Any conversation of AutoML 2.0 platforms, however, is misplaced if the focus is on replacing or displacing the data scientist. Most data scientists see feature-engineering as one of the most significant obstacles to their work. Automation can only help to accelerate the process by providing incredible productivity boosts that would not be otherwise possible without automation. By leveraging AutoML 2.0, data scientists can often accelerate their work dramatically - from months to days. Besides, the use of AI-based feature engineering in AutoML 2.0 platforms, allows data scientists to discover features that they would have never considered. AI-based feature engineering automatically builds, evaluates, and exposes features by combining data from multiple columns, often across different tables and sources. The ability of AutoML 2.0 to self-discover features allows data scientists to explore the so- called "unknown unknowns," the features the data scientists would have never even considered because of either lack of time or lack of domain expertise.

但任何AutoML 2.0平台如果将定位的重点放在替换或更替数据科学家上就大错特错了。大多数数据科学家都将要素工程视为工作中的最大障碍之一。自动化可以帮助加快要素工程的流程,靠的就是自动化可以提供令人难以置信的生产率提升,这种提升若无自动化是不可能实现的。

对于数据科学家来说,利用AutoML 2.0通常可以极大地加快自己的工作,缩短的工作时间从几天到几个月不等。而且,数据科学家在AutoML 2.0平台上使用基于AI的要素工程还可以发现他们从未考虑过的要素。基于AI的要素工程可以自动构建、评估和开通要素,而且可以结合来自基于多列的数据(通常是跨越不同的表和源)。

此外,AutoML 2.0还具有自我发现要素的功能,数据科学家借此功能可以探索所谓的未知的未知数,这种“未知的未知数”属于那些数据科学家由于缺乏时间或缺乏领域专业知识而从未考虑过的要素。

**AutoML 2.0: Creating A More Productive, More Inclusive AI/ML Program**

AutoML 2.0:创建更高效更具包容性的AI / ML程序

Rather than being a threat to the livelihood of data scientists, AutoML 2.0 platforms are, in fact, an enabling technology that helps accelerate and democratize the data science process. AutoML 2.0 provides the acceleration and automation necessary for data scientists to be more productive, giving them the ability to scale their work and providing an even more significant benefit to the business. This two-fold advantage of democratization and acceleration of the data science process are the most significant selling points of AutoML 2.0 platforms and the key to scaling the data science process in the modern organization.

所以,AutoML 2.0平台并没有威胁到数据科学家的生计,反而有助于加速数据科学过程及令数据科学平民化。与此同时, AutoML 2.0也为数据科学家提高生产力提供了必要的加速和自动化手段,令数据科学家能够扩展工作规模并为业务带来更大的效益。AutoML 2.0平台具有推动数据科学平民化和加速数据科学流程的双重优势,也是其最重要的卖点,这种双重优势是现代组织扩展数据科学流程规模的关键。









