This pathway is a common way into data science for those looking for a quicker way into the career. Keep in mind this can be an issue longer-term as a majority of the senior roles require a bachelors degree at minimum and often a masters degree in a technical field in order to get you in the door. Don’t be dissuaded by this! There are still many opportunities where self-taught data scientists can thrive. It also may be useful to do this type of study alongside a degree to get into the industry early and complete a degree part-time, there are many different approaches!
There are still many opportunities where self-taught data scientists can thrive.
In this pathway, there are two types of study, study for understanding methods and approaches, and then studying for completion of the course and more formally recognised courses. To start with, boot camps is a topic that is often brought up, boot camps can be quite expensive for the time that you spend on them, and the knowledge (whilst structured to assist learning) is all available open-source, there is no special knowledge in data science, everything you need to know is free. With that in mind, boot camps can be beneficial if a structured program like that suits your learning style, but keep in mind that boot camps are not a degree and the qualification coking out of them has a mixed reputation in the industry.
there is no special knowledge in data science, everything you need to know is free.
A great place to start for an overview of machine learning, in particular, is the 100-page Machine Learning Book by Andriy Burkov, which is available online and as a hard copy. This book has a good overview of things you should know in statistics and calculus and goes into reasonable depths on these topics. ‘Machine Learning Yearning’ by Andrew Ng is also a fantastic book that is available as a draft copy on Andrew’s website for free at the time of writing. These books should be used as a guide of sorts, when referencing deeper concepts, study that outside of the book as you go along. There are literally thousands of textbooks and materials to study with, for a collated and up to date list of where you can start, having a look at the Reading section above.
The best place to start is ‘Machine Learning by Andrew Ng’ it’s been around for quite some time, but Andrew makes an effort to keep it updated, it will give you a taste of the various different areas of machine learning and give you clues on a number of other things to research. As with the study, there are many courses that are popping up in this space, we will list some ones that we have personally done and found value in, which is by no means an exhaustive list.
- https://www.coursera.org/learn/machine-learning This is a free course that is very well taught and offers insight into the facets of machine learning
- https://www.edx.org/micromasters/mitx-statistics-and-data-science This is a paid course by MIT that covers the foundations of data science
- http://cs109.github.io/2015/pages/videos.html This is a free course made available by Harvard on Data Science, it was produced in 2015 so some aspects are out of date but the content is incredibly well taught and useful
- https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp This course is focused around learning Python for data science specifically and covers all major components about how to interact with model API’s etc.
- There are also professionally produced training courses offered by the major cloud providers (Microsoft Azure, Google Cloud Platform, and Amazon Web Services) that offer data science training in their cloud environments that offer recognised industry certificates and are highly recommended.
When evaluating a course, find who it is produced by, and who wrote the content to make sure they have a good background in data science. There are some that try and take advantage of the newfound popularity of this career by making cookie-cutter content, that takes time and costs money but offers little rear value.
Github & Kaggle
The key thing you will have to do to prove your study and understanding of this space is to prove it. Get yourself a Github (if you want to show code) or Kaggle (if you want to compete with other Data Scientists), it may even be worth a simple personal website to illustrate the business use case around what you are creating, and not just code. Consider your two audiences when going for a data science role:
Recruiter/Business stakeholder: Recruiters want to see the right buzzwords, and that you have the abilities you are talking about, business stakeholders want to see that you can take a business concept or use case and implement that with a technical solution, so displaying this on a website (or as a presentation in the readme) is a prudent step.
Technical/Data Scientists: They want to see that the implementation of the code is well done, that you are writing usable and understandable code (never forget the code comments!!). To do this, research the coding standards for the language you are writing in (for example PEP8 or Google Python Style Guide for python).
it may even be worth a simple personal website to illustrate the business use case around what you are creating, and not just code.
Using the study and courses above, as well as many others that will provide value and content for your journey, you can craft a narrative around your study. As its a little bit difficult at times to article the amount you have studied, if you are able to pair this with practical examples on your Github/Kaggle/personal website and potentially explain business use cases it brings your work to life and helps the person evaluating you to understand that you are knowledgable of the field. This is extra effort, but it is an extra that may help you overcome any bias toward the pathway.
For the other parts of this article where we dive into the other pathways, have a look at the links below!
Jeremiah is an Director at PwC leading a Data Advisory team and founder AI Specialist Blog. He has received the ACS ICT Professional of the Year (2019), Top 25 Analytics Professionals Australia (2021, 2018). He has written articles for the AFR, IBM, and LearnDataSci.
Please get in touch if your business needs any help in the Data Science & Strategic advisory space!