So now that we have gone through the benefits of the cloud, and when you should use it versus when you should use on-premise systems, now let’s take a look at some of the ways that you as an AI Specialist can start to learn and understand the often confusing world of the cloud. Luckily there are several tools that are used across both local and cloud, and learning these helps you be in a great position to be adaptable. Lets have a look at the ways of getting started with Data Science tools on the Cloud.
Are there certifications I can take?
All the major cloud providers have a certification pathway that you can take, either sponsored by your workplace or paid for out of your pocket. The exams for these are quite cheap compared to many certifications, and the training is usually free. To obtain these certifications you need to pass a formal exam, the first entry-level certifications are generally multiple choice exams and take 1-2 weeks to study for, after this century level each cloud provider has individual tracks that you can take.
I would recommend picking one cloud to focus on initially, as they all have different terms and approaches that are specific to their environments.
For AWS, they have a dedicated data scientist path (link) which you can follow after the initial Cloud Practioner certification (link). GCP has a Data & Machine learning track (link) that gives you all the skills needed to use machine learning on Google’s platform. Azure also has a fantastic certification (link) that covers all the fundamentals of using the Azure platform for data science.
I would recommend picking one cloud to focus on initially, as they all have different terms and approaches that are specific to their environments. For example, serverless compute is called Lambda on AWS, Cloud Functions on GCP, and Azure Functions on Azure which can get confusing when you are starting. If you want to use the embedded AI tools within a cloud platform GCP or Azure would be the best place to start, if you are planning on going it alone, AWS or GCP are both cost-effective for raw compute, with AWS a little easier to understand initially.
How do I study for the exams?
All of the cloud providers provide their training materials and directions from training. Still, there are many third-party websites that offer fantastic in-depth training to help you not just pass the example but understand exactly what you can achieve with these platforms. Some of the ones I have personally used and would recommend are:
What tools should I focus on?
There are two main types of tools, open-source and closed source. You will find that a majority of data scientists use open sourcing tooling, this has many reasons and worthy of a post in itself, but essentially open-source promotes the ability to share and collaborate with others and ensures that the tool you are learning will be available into the future.
Most clouds will have Juptyer Notebook instances that you can deploy, for example on Azure you can use Azure Notebooks, and on AWS you can use Sagemaker to quickly and easily spin up a Juptyer Notebook to do you work. Jupyter is open-source software that has been adopted by cloud providers into software that they control, which means you don’t get direct control over versions (but still get to control features like kernels etc.). Becoming familiar with Jupyter Notebooks in your preferred language enables you to work on cloud environments as well as locally.
It would also be useful to become familiar with the basic commands of the Linux operating system (all clouds allow you to deploy various flavours of Linux) to use SSH tunnels to control your own instances.
All the various clouds have their own propriety tooling for undertaking data science and AI work, most are accessible by interfaces or using REST API’s, becoming familiar with using REST API’s is an excellent place to start for accessing most of these services. If you want to become familiar with a specific tool on a cloud (for example Azure’s Machine Learning Studio), keep in mind that this can be somewhat of a gamble, particularly if you end up at an employer that doesn’t use the Azure cloud, you may need to learn another tool.
Where do I start?
Reading through both parts of this series of articles you should start to get a sense of what you need to focus, if you require scale for your work, or if you want to use any of the specific tools form the various cloud providers. Choosing a cloud to focus on first can be as simple as selecting the cloud your company is primarily interested in, or by the cloud that works for your data security needs. Once you have chosen, undertaking the training and additional third party training is a great place to start, alongside this, start experimenting with running your data science process in the cloud. If you get stuck, there are thousands of articles and stack overflow posts to help you along.
One quick tip before you start becoming a cloud extraordinaire is making sure you always keep an eye on how much you are spending! It’s easy to learn a compute instance on and run up a big bill!
Check out the previous article in this series:
And check out the previous series on the various ways to transition into data science:
- Transitioning and Changing Careers – Getting into Data Science & AI
- University and Formal Study – Getting into Data Science & AI
- Courses, Bootcamps, and Self Study – Getting into Data Science & AI
Jeremiah is an Associate Director at Capgemini Invent leading a Data Science Advisory team and founder AI Specialist Blog. He has received the ACS ICT Professional of the Year (2019), Top 25 Analytics Professionals Australia (2018). He has written articles for the AFR, IBM, and LearnDataSci.
Please get in touch if your business needs any help in the Data Science & Strategic advisory space!