Being a Data Scientist I keep on talking about the terms Data Science and Data Scientist, but one such day I was sitting and thinking about when was this term introduced first, who was the person who used it and what is the exact history of this term? And how did it become the sexiest job of the 21st century?
Now looking for the answers to these questions I google on these and found some interesting points which I thought why not I share with everyone.
In 1962, John Tukey wrote a paper titled The Future of Data Analysis and described a shift in the world of statistics, saying, “… as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt…I have come to feel that my central interest is in data analysis…” Tukey is referring to the merging of statistics and computers, when computers were first being used to solve mathematical problems and work with statistics, rather than doing the work by hand.
In 1974, Peter Naur authored the Concise Survey of Computer Methods, using the term “Data Science,” repeatedly. Naur presented his own convoluted definition of the new concept:
“The usefulness of data and data processes derives from their application in building and handling models of reality.”
In 1977, The IASC, also known as the International Association for Statistical Computing was formed. The first phrase of their mission statement reads, “It is the mission of the IASC to link traditional statistical methodology, modern computer technology, and the knowledge of domain experts in order to convert data into information and knowledge.”
In 1977, Tukey wrote a second paper, titled Exploratory Data Analysis, arguing the importance of using data in selecting “which” hypotheses to test, and that confirmatory data analysis and exploratory data analysis should work hand-in-hand.
In 1989, the Knowledge Discovery in Databases, which would mature into the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, organized its first workshop.
In 1994, Business Week ran the cover story, Database Marketing, revealing the ominous news companies had started gathering large amounts of personal information, with plans to start strange new marketing campaigns. The flood of data was, at best, confusing to many company managers, who were trying to decide what to do with so much disconnected information.
In 1999, Jacob Zahavi pointed out the need for new tools to handle the massive, and continuously growing, amounts of data available to businesses, in Mining Data for Nuggets of Knowledge. He wrote:
“Scalability is a huge issue in data mining… Conventional statistical methods work well with small data sets. Today’s databases, however, can involve millions of rows and scores of columns of data… Another technical challenge is developing models that can do a better job analyzing data, detecting non-linear relationships and interaction between elements… Special data mining tools may have to be developed to address web-site decisions.”
In 2001, Software-as-a-Service (SaaS) was created. This was the pre-cursor to using cloud-based applications.
In 2001, William S. Cleveland laid out plans for training data scientists to meet the needs of the future. He presented an action plan titled, Data Science: An Action Plan for Expanding the Technical Areas of the field of Statistics. (Look for the “read” icon at the bottom of the screen.) It described how to increase the technical experience and range of data analysts and specified six areas of study for university departments. It promoted developing specific resources for research in each of the six areas. His plan also applies to government and corporate research. In 2001, Software-as-a-Service (SaaS) was created. This was the pre-cursor to using cloud-based applications.
In 2002, the International Council for Science: Committee on Data for Science and Technology began publishing the Data Science Journal, a publication focused on issues such as the description of data systems, their publication on the internet, applications and legal issues. Articles for the Data Science Journal are accepted by their editors and must follow specific guidelines.
In 2006, Hadoop 0.1.0, an open-source, non-relational database, was released. Hadoop was based on Nutch, another open-source database. Two problems with processing big data are the storage of huge amounts of data and then processing that stored data. (Relational data base management systems (RDBMS) cannot process non-relational data.) Hadoop solved those problems. Apache Hadoop is now an open-sourced software library that allows for the research of big data.
In 2008, the title, “data scientist” became a buzzword, and eventually a part of the language. DJ Patil and Jeff Hammerbacher, of LinkedIn and Facebook, are given credit for initiating its use as a buzzword. (In 2012, Harvard University declared the data scientists had the sexiest job of the twenty-first century.)
In 2009, the term NoSQL was reintroduced (a variation had been used since 1998) by Johan Oskarsson, when he organized a discussion on “open-source, non-relational databases”.
In 2011, job listings for data scientists increased by 15,000%. There was also an increase in seminars and conferences devoted specifically to Data Science and big data. Data Science had proven itself to be a source of profits and had become a part of corporate culture. Alsi, in 2011, James Dixon, CTO of Pentaho promoted the concept of data lakes, rather than data warehouses. Dixon stated the difference between a data warehouse and a data lake is that the data warehouse pre-categorizes the data at the point of entry, wasting time and energy, while a data lakeaccepts the information using a non-relational database (NoSQL) and does not categorize the data, but simply stores it.
In 2013, IBM shared statistics showing 90% of the data in the world had been created within the last two years.
In 2015, using Deep Learning techniques, Google’s speech recognition, Google Voice, experienced a dramatic performance jump of 49 percent.
In 2015, Bloomberg’s Jack Clark, wrote that it had been a landmark year for artificial intelligence (AI). Within Google, the total of software projects using AI increased from “sporadic usage” to more than 2,700 projects over the year.
The complete timeline on Data Science is what I found in my research, I am sure there is lot more things that we need to know and looking forward to hearing what you guys think and also will be interested in knowing what and why you guys get so much attracted to this field.
For more such interesting content Join us on Linkedin Data Scientists Group