The 4 tools I used to teach myself data science without spending a dollar
I had just walked away from 8 years of study and hard work with no plan. You might be wondering why someone would do that. My boss was crushing my spirit and knew that I needed to make a change.
My boyfriend suggested becoming a data scientist. I said ‘you’re crazy!’ I didn’t know the first thing about programming. Surely he was overestimating what I was capable of. Imposter syndrome strikes again.
About two weeks later my friend Anna suggested the exact same thing, I thought about it some more and began to entertain the idea. Why not? I decided to become a beginner again and reinvent myself as a data scientist.
I wanted to learn at my own pace so I decided to take online courses. I figured that with a PhD in Neuroscience I probably had enough formal training to get a data science job. I just needed the practical skills.
This story will describe the 4 different courses that I took and how they led to a data science job at a healthcare startup in Silicone Valley.
At the time, most of the online courses I came across were free. So I challenged myself to gain the skills I needed without spending any money. What can I say, I am pretty stingy 😜
When I quit my post doc at UCSF I had zero programming experience. I had used statistics throughout all my research, but only on a small scale. All the datasets I had previously analysed were ones I had generated myself in the lab. Therefore the number of observations were very small. I needed to learn to code and analyse data on a much larger scale.
Getting started coding
When I decided I wanted to become a data scientist, the first thing I wanted to learn was how to write computer code. Since I had never coded before it was a complete unknown. I figured that if I really hated writing code, then data science would not be a great fit for me. So it seemed like a good place to start.
I was lucky that my partner Ben has worked in many areas of tech and was able to point me in the right direction. He suggested that Python might suit me best. Python is excellent for data analysis, versatile and deals well with large datasets, so that is where I began.
Learning to Code
To begin learning to code I used Codecademy. I started with Introduction to Python, but I am not sure if the course I completed still exists as it was back in 2014. If I was to use Codecademy to start learning python now I would probably choose the Analyze Data with Python course.
I found Codecademy an excellent starting point. The main advantage for me was to be able to write code right in my browser. Installing programming environments on my computer correctly is still my Achilles heel. So I was pleased to avoid it when starting off. It was comforting to know that if my code didn’t work it was because of the syntax and not because of an environment setup error.
I also liked how you could do a few minutes of work at a time with Codecademy. If I had some spare time I would log in and do a few problems because it was all there waiting for me. This piecemeal progression meant that I was not too intimidated to get stuck into it.
At the time that I completed the course, there were only a handful of Codecademy courses on offer and this one was free. I was so amazed at the quality of free courses available online.
Once I had learned the basics of Python, I needed to start to level up my stats experience and learn to analyse data on a larger scale.
Learning Data Analysis
2. Coursera Data Science Specialisation from Johns Hopkins
Second I took the Coursera Data Science Specialisation from Johns Hopkins. At the time, you could do an honour code certificate version for free and only had to pay if you wanted the verified certificate.
For me, the verified certificate did not seem important. Instead, I needed to be able to demonstrate the skills taught in the course during tech interviews. So I took the free version of the specialisation.
One drawback for me was that this series of courses is taught in R. R is an excellent programming language for statistical analysis and is favoured by academia. However, I wanted to learn Python for data science. I thought Python would be more useful in the startups where I wanted to work.
I looked into a few data analysis courses in Python but they seemed to assume quite a bit of knowledge that I did not yet have. I believe most of these courses were aimed at software engineers who were wanting to transition into data science. So they assumed that you had solid programming skills and already knew how to set up your python environment.
The main aspect that I liked about the Coursera Data Science Specialisation is that it started from the very beginning. There were step by step instructions on how to install R and R studio in the first course. This made it easy to tackle the subsequent courses knowing that there would not be any technical issues.
Another facet of the Johns Hopkins Data Science specialisation that suited me is that it was taught by the Public Health department. My health science domain expertise made it easy for me to follow the examples that they set out. They had examples using air quality impacts on asthma and other datasets related to healthcare. Therefore I could focus on the course content rather than figuring out the scenarios presented for data analysis.
This series of courses really set me up well with a base level understanding in the main aspects of data science work. It touched on programming in R, basic data cleaning, analysis, regression and machine learning. I really enjoyed learning to code and how to use code to analyse data, so that encouraged me to continue learning.
At this stage of my retraining, I started asking people in my network if they could introduce me to other people who had made the transition from academia to data science in San Francisco. A few were able to connect me so I set up as many informational interviews as I could.
A friend introduced me to a data scientist from Modcloth who had taken a similar pathway to me. She used to be a neuroscientist and I found her advice particularly helpful.
Her major recommendation was to learn SQL.
Learning to Query Databases
3. DB5 SQL Stanford Online
SQL was not covered at all in the Coursera data science specialisation from Johns Hopkins. She said that most of her day to day work was querying databases. She had to extract insights for the business development and marketing teams. Only a small portion of her time was spent doing statistical analysis and machine learning.
I took her advice and started a self-paced SQL course with Stanford Online. Of all the courses that I did, this was my favourite. I enjoyed it because the teacher was excellent and used simple examples to explain the concepts. She also explained each concept in multiple different ways.
I have since recommended this course to so many people because I think that a good foundation in SQL is essential for any data scientist. Data science courses I have come across do not cover how to get the data from a database using SQL. I think this is a huge oversight. Most courses have a CSV of data prepared for the student to use, but in my experience that is rarely the case in industry data science jobs.
Once I had completed the Stanford SQL course I started applying for data science positions. By that stage, I was living back in Australia and started doing Skype interviews with startups in the San Francisco bay area. Whilst interviewing I wanted to continue developing my skills.
4. edX Foundations of data analysis
I then took the Foundations of data analysis course using R by edX. It was so helpful to revise a lot of the concepts that I had already learned in the Coursera course.
I am a firm believer that learning concepts from different teachers can provide new insights. It was much easier to follow the statistics and machine learning concepts learning them the second time around. I felt like I got a deeper understanding through this course.
While I was finishing off the course, I was successful in one of my interviews with Amino, a healthcare startup in San Francisco and proceeded to get a working visa and move to the USA.
Getting the data science job
I think I was successful in that final interview because I had passable coding skills and a decent statistical understanding, but more importantly I had healthcare domain knowledge, experimental design and scientific method expertise.
In my opinion, it was these additional aspects that put my application over the top and led this startup to take a chance on me. I was very junior and required a lot more on the job training. I think that all the courses I did were just enough to get the hiring team to consider me and that my experience specific to the healthcare space got me over the line.
So if you are looking to change career paths into data science, I would recommend looking for a company where your existing domain knowledge is valuable.
What I wish I had learned
The major gap in my knowledge that I wish I had filled before commencing my new data science job was using git from the command line. I had never used the terminal or command line before and I had no idea how to use git to commit my code to the company’s Github repository.
It took several engineers quite a bit of time to get me up to speed. I would have liked to at least have an idea of how to use it before I started so that I would not have wasted their valuable time. My colleagues were awesome and didn’t seem to mind teaching me but I did feel like a bit of a burden in the first few days.
I did eventually catch up and found Learn Code the Hard Way Command Line extremely useful.
If you are thinking of following a similar pathway into data science I would encourage you to go for it! It was absolutely the right choice for me. Different people learn in different ways, but if you have the self-discipline to study and complete what you start it is certainly feasible to teach yourself data science through online courses. If that is your goal I wish you the best of luck and would be happy to answer any questions if I can.
In addition to data, my other passion is painting. You can find my wildlife art at www.katemarielewis.com