Data-related jobs have been gaining popularity in the past few years. Many students are aspiring to become data analysts or data scientists; either by taking courses at their universities that relate to those topics (Data mining, machine learning…) or by taking (free or paid) online (coding or theoretical) courses to acquire certifications as proof of their expertise in the domain. Similarly, many professionals are also following the trend.
A career shift into the field of data has become the new normal. However, many do not know how to approach this field after being certified and how to increase their chances of landing their dream job. Therefore, in the following, I will explain what is the easiest way to get a first hands-on experience in the field of data. This will help you highlight your newly acquired skills, from the online coding platforms or theoretical MOOCS that you have taken.
What is a portfolio?
The term portfolio is highly popular in the arts and creative industry. You would almost always hear a photographer, a graphic designer, or an architect… talking about their portfolios. Back in the day, when the internet did not exist, the portfolio used to be a filing folder that contains sketches, drafts or accomplished projects to be displayed to a recruiter. However, in our era, the portfolio is now a website containing these works. Many websites (free or paid) compete to provide those talented people a platform to make them shine and share their works.
What is a portfolio in the data field?
A data-related portfolio serves the same purpose, but for data analysts and data scientists. People with minimal to no real-life experience in data, can create their data-related projects and publish them on the internet. Some people might prefer to host their own website and deploy their machine-learning projects or their dashboards, and others publish them on GitHub (for free).
Some people also like to work directly on Kaggle by creating notebooks of their work. The reasons why people might use Kaggle is that it is easier to access the datasets and they can publish their notebooks for the public to see; in addition, it is a free cloud service like Google Colab and finally to improve their Kaggle score which can be highly relevant for some recruiters.
What is Google Colab?
Google Colab can be considered as an online Jupyter Notebook (for those familiar with Python). Google Colab is a free tool provided by google cloud services. It provides access to the use of GPU and high RAM (in case you are dealing with deep learning models or big matrices multiplications or tensors....). Although not rich as Kaggle, it contains a few datasets to practice on and to perform machine learning or data analysis projects.
The purpose of a portfolio in data science and analysis
- Show and prove to the recruiters that you cover more than just the theoretical components of the concepts that you learned (machine learning algorithms, visualization tools...)
- Practice newly acquired skills after finishing an online or a university course
- In some cases, to have ready-to-use prototypes for future needs (for experienced people)
Ultimately, you should have a diversified portfolio. Meaning, try not to use the same algorithms on every project and try not to specialize in one topic (i.e deep learning only); unless you want to apply for a very specific position (computer vision engineer for self-driving cars, machine-learning engineer in health care...).
Keep in mind that "more is not better". Having two to three projects is more than enough to highlight your skills. Especially, if you are applying for junior positions. Recruiters are not expecting you to know everything. In addition, what you do online (as a practice) and through competitions does not reflect the real-life job in many cases: you will not join a company that trains 'dogs versus cats classifiers'... In other words, your portfolio is not going to showcase ways to generate value or impact to a business.
What should I have in my portfolio (key elements)?
- Make sure to have a content table in your notebook to show a structure
- Introduce every part of your code so that the assessor can follow the logic
- Objectives
- Original input and expected output
- Argue the usage of certain tools (i.e. algorithms, techniques, graph choice, hyperparameters choice)
- For both DA and DS projects, try to include images or graphs. Non-technical recruiters will check them and if they look nice, you might get additional points for that while they are assessing your work
Why you should not overdo your data portfolio?
Unless you have no formal education, creating a huge portfolio is not worth it. The recruiter will be overwhelmed by which project they should start with. In addition, it takes a decent amount of time to produce a quality project. Doing ten or twenty projects cannot be done in a few days unless they are very superficial.
Most recruiters are interested in knowing that you cover the basics (if you are applying for a junior position). In addition, the company that you start working with, will assign a person to support and guide you throughout the journey at work. Furthermore, one should keep in mind that most (average) companies are still at the beginning of their journey in the data science world. Many data science positions are in reality data analysis positions, with basic machine learning models' implementation. In other words, they do not require the full stack of skills that a real data scientist should have.
Why should you use Notebooks?
- Markdown can be used to write your thoughts, include images, or include equations...
- Everything written (code-wise) can be instantly tested and displayed for a sanity check
- Algorithm chunks can be written separately to avoid having to run the full code every time
- A very nice tool to show visualizations
- Very easy to get used to and to use
- Presentable (think PPT but with code)
Make your project stand-out
As mentioned before, you can deploy your machine-learning model online. The best way to do that is when you have a classifier. For example, you can train a model to classify cats and dogs using deep learning. Then you create a website where you can upload an image to receive the prediction of whether the image is a cat or a dog. Other interesting projects like these can be thought of.
Is it better to have a portfolio or an internship on the résumé?
Having both would be optimal. However, a junior portfolio is not always enough. One should know that real-life datasets are not Kaggle-like datasets. The ten minutes you invest to clean your Kaggle dataset can translate to hours of work when using real-life data. Therefore, in my opinion, an internship is always superior to a portfolio when applying for jobs. Recruiters in general, prefer to have people who already have a feeling of what it is like to work at a company, especially when they recruit for jobs.
Can a portfolio help me secure a job?
A portfolio can help in securing an internship (almost always). However, for jobs, something extra might be needed. Nevertheless, one should not feel demotivated. Most internships can turn into a job offer. In addition, most data-related internships are paid. An internship is not only for learning purposes; once the training is done (the first few weeks), you will start to get your hands dirty with real-life projects: getting closer to your career goals!
DataCamp portfolio
DataCamp provides guided and unguided projects as part of their learning curriculum. Once finished, the projects can be saved and published on GitHub. However, keep in mind that those projects are done by thousands of other students and might not bring as much value as a project that you create from scratch.
Thoughts...
Many data scientists, computer scientist... keep on developing their portfolios even after securing a job. However, after a few years of experience, the portfolio becomes less useful. Your portfolio is now like a gallery of your previous works, but no recruiter will give it its weight in the recruiting process. Not in a million year a perfect portfolio done by a junior will compete against a person with ten years of experience in the same field. Keep in mind that I mean portfolios with irrelevant projects to real-life scenarios or businesses.
Finally, you might be wondering why I did not provide any examples of good projects. The reason is, plenty of websites share and copy the same ideas, influencing everyone's creativity and narrowing it into the same projects. If you google 'covid-19 data analysis projects' you will find thousands of people who have done the same project with minor tweaks here and there.
I will conclude with an image showing the search trends for some terms related to data science portfolios.
Increase in data-portfolios' popularity in the past decade |
Comments
Post a Comment