Hello Data Scientists, my name is Troy Sadkowsky and I am the owner of DataScientists.com and the creator of The Extraordinary Data Scientist Program. I’ve been sharing my thoughts and learnings on the road to be a Data Scientist since my first YouTube video on “How to Position Yourself as a Data Scientist” which I uploaded on the 20th of July 2011. Since then the I have observed that the publicly understood responsibilities of the Data Scientist have floated between things like big data management, data engineering, data analytics, programming, sensor data collection/IoT, predictive statistics, artificial intelligence, machine learning, data visualization, human computer interaction and business development to name a few. However, no matter what our own definition may be there have been some fundamentals that remain the same. Let me explain…
Theories Behind Data Science
If you’d like to perform data science there are several theories and principles that you need to understand. And once you understand these theories and principles, it will allow you to learn a certain set of practices that build skills essential for this new and challenging role of the data scientist. If you don’t understand these theories and principles, then you won’t be able to understand the practices or gain the skills. So first let me teach you a few theories and principles that are involved, and once you understand the theoretical elements, then I can teach you a simple step-by-step method for doing data science.
Firstly let’s talk about database theory. Database theory is about organising data and organising it in a way that makes storing and retrieving it efficient. Data can be categorised into objects, objects can be put into collections and objects and collections can have relationships between each other and themselves. The one thing you need to know about this theory is that they way you organise your data will impact the effort required to get answers from it.
Now let’s talk about the Agile Manifesto. The Agile Manifesto is a set of principles that ensures high quality outputs in environments subject to high levels of change and ambiguity. Agile methods overcome rapid changes and ambiguity through adopting an iterative development process. It utilises self managed teams and those that are passionate about technological advancements are drawn to it like scientists to big bang theory. The Agile Manifesto looks to remove all cultural barriers between developer, client and end user and focuses on using the latest technology to making things simple but not simpler. The one thing you need to know about this set of principles is that all things change and the longer you take to test your solution in the live environment the higher the risk of failure.
The last theory I’d like to touch on is Spiral Dynamics Theory. Spiral Dynamics is a theory of human development and behaviour and explains why humans do what we do. It explains the psychology behind why we get out of bed in the morning, why we feel compelled to create things and why we seek to better ourselves and better serve our loved ones. The theory talks about two mental states, one of “facts” and one of “values”. Facts are what we believe. Our beliefs are based on the knowledge we currently have and the environment we are currently in. Values are what we desire. Our desires are driven by our intentions and/or concerns which are also based on the knowledge we currently have and the environment we are currently in. The one thing you need to know about this theory is that our facts and our desires come from what data is presented to us.
Data Scientists perform data science. They use technology and skills to increase awareness, clarity and direction for those working with data. The data scientist role is here to accommodate the rapid changes that occur in our modern day environment and are bestowed the task of minimising the disruption that technology and data is having on the way we work, play and learn. Data Scientists don’t just present data, data scientists present data with an intelligence awareness of the consequences of presenting that data.
How To Do Data Science
The three components involved in data science are organising, packaging and delivering data. Organising is where the physical location and structure of the data is planned and executed. Packaging is where the prototypes are build, the statistics is performed and the visualisation is created. Delivering is where the story gets told and the value is obtained. However what separates data science from all other existing roles is that they also need to have a continual awareness of What, How, Who and Why. A data scientist needs to know what will be the output of the data science process and have a clear vision of this output. A data scientist needs to have a clearly defined plan on how will this output be achieved within the restraints of available resources and time. A data scientist needs to deeply understand who the people are that will be involved in creating the output. And most of all the data scientist must know why there is a motivation behind attempting to manifest the creative visualisation.
The Simple 3 Step Data Science Method
Step 1. Organise Data.
Organising data involves the physical storage and format of data and incorporated best practices in data management.
Step 2. Package Information.
Packaging information involves logically manipulating and joining the underlying raw data into a new representation and package.
Step 3. Deliver Validated Insights.
Delivering validated insights involves ensuring that insights discovered are valid and correctly interpreted.
Of course, this is just what I think. Please join in on the discussion at Data Scientists on LinkedIn