With a tightening job market and breakthroughs in AI, how can you continue to be ahead of the game as a Data Scientist?

As we approach the holidays, it’s a good time to look at where we stand in Data Science. This year marked the beginnings of some fundamental changes in the dynamics of the job market. After many years of rapid growth in opportunities and roles, it looks like we are entering our first Data Science downturn. Of course, this is just one component of a broader softening of the tech market, and there’s a good chance Data Science won’t be nearly as badly affected as some other tech fields. While this was always going to happen sometime, it’s still new and a little scary to the majority of data scientists, most of whom entered the job market during the boom.
Adding to this the recent release of ChatGPT and its frankly remarkable ability to code on the fly, it’s fair to say that many Data Scientists will be feeling nervous. They can no longer rely simply on their knowledge of a programming language to differentiate their skills. Not only is the job market on the turn, but will this kind of intelligence lead to a greater reliance on AI to perform some of the work of Data Scientists? It’s a perfectly fair and valid question, and there’s no doubt that some employers will already be considering it.
Lazy data scientists may be in trouble
If you regularly read my writing, you’ll know that I’ve always been a proponent of working hard to stay ahead of the game. It’s always been my point of view that in Data Science a knowledge advantage only lasts about 6 months. If you think you are on the cutting edge of a topic or method, it’s probably only half a year before some new method, package or technology pops up that knocks you back a step on the learning curve.
There are plenty of lazy data scientists out there who rely on a few tricks they learned several years ago, who have survived on simply being able to do some basic data wrangling in Python or R, and who have not invested in staying ahead of the game. They’ve been totally fine until now, because there have been more open jobs than data scientists to fill them. But now they may well be in trouble. Jobs are getting fewer, and if you haven’t learned anything new since April 2021 (ChatGPTs training cut-off), there’s a good chance that AI can do a lot of what you currently do.
But I don’t want to use this article to round on ‘lazy data scientists’. Instead, I want to give my thoughts on three areas where data scientists should focus their learning and development in 2023 in order to be successful in a tightening job market and in order to clearly differentiate their skills in ways that AI cannot compete with.
Priority 1: Math, Method and Judgment
For all the amazing development of AI, it’s still really bad at understanding and articulating underlying mathematics theory and method, and it has very little credible judgment on linking a method to a real life problem. But this is exactly what employers demand the most from their current data scientists. The top technical skills desired by employers are often related to methodological knowledge rather than programming ability, with statistics and linear algebra often at the top of lists.
Another rare skill which employers often seek in their ideal data scientists is the ability to select the right methodology for a business problem. This requires excellent methodological knowledge, but also judgment about the decision-making environment and which methods have the best likelihood of being accepted and understood. Growing suspicion of black box models means that data scientists need to have a variety of options for how they go about tackling a problem, with inferential statistics and ‘explanable AI’ methods top of the list.
It’s important to call out that many learning programs and degree courses index on the implementation of methods through code, at the expense of proper teaching time on the mathematical foundations of those methods and the kinds of real life problems they apply to. This will need to change as coding becomes more automatable. But it won’t change overnight and so right now it is incumbent on individuals who feel that they are lacking in the knowledge to go out and find it for themselves.
Priority 2: Cloud Architecture and Deployment
The growth in the capacity and variety of cloud services has been the pre-eminent feature of the tech space in the past few years. In 2022, 57% of organizations were moving their workloads to the cloud and this is projected to continue to grow rapidly.
In Data Science, the biggest roadblock to cloud adoption remains lack of knowledge. Numerous organizations still have data scientists working locally and constrained by local services and architectures, leading to deployment of data science products on sub-optimal tech stacks which under-perform and generally cause slower development.
Increasingly, Data Scientists with a strong understanding of cloud services and with knowledge of how to deploy specific workloads on the cloud will be in demand from employers who are trying to make the shift. In a future competitive data science job market where coding alone will not be enough, knowledge of how to deploy code in a performant and cost-efficient way will be an important differentiator.
This should be regarded as an exciting opportunity for up-and-coming data scientists because cloud services are available to everyone and many opportunities exist to develop skills through learning projects where you can build your own cloud applications. With the correct knowledge and planning, these projects can often be deployed on free tier services or with minimal costs. Companies like AWS have also led the way on formal certifications in cloud knowledge and skills, which employers are increasingly looking to see on resumes.
Priority 3: Staying ‘Code Current’
As impressive as ChatGPT’s coding ability is, it stops dead in April 2021. There are a full eighteen months of code development that it has had no access to in its training.
Eighteen months is a really long time in data science. Entire new versions of packages are released, new coding conventions are born, security flaws are discovered which could put deployments at risk. If you make the effort to stay abreast of all the recent developments in your particular space, you give yourself the distinct advantage of being able to write code that is more modern, and often more performant and secure. There is nothing more impressive to a technical interviewer than someone who is just as up to date or even more up to date than they are.
But staying up to date takes effort. You need to read the latest articles and follow key figures on social media. You need to regularly write code just for the sake of practicing new methods or approaches even if you don’t have a current project or use case for them. You need to frequently check in on your Github repos and ask yourself ‘is my code up to date?’. If you have the opportunity, you should consider participating in open source development of software in your field. In short, you need to be the exact opposite of a lazy data scientist.
We are entering a unique period where the market for data scientists will become increasingly competitive and where the ability to code will no longer be a primary differentiator. For those with the commitment and desire to stay ahead of the game, this should be viewed as an exciting opportunity to build a skills advantage. Don’t be left behind.
What will you be working on getting better at in 2023? Feel free to comment.