Skip to main content

How to become a Data Analyst in 2023

Data analysis skills are one of the hottest skills that have been in high demand on the job market for the past few years. A "data analyst" job title is not new to the market, however, due to the growth of data generation and the facilitation of data storage provided by cloud computing, many companies have now the capabilities to store their big data and to derive insights and value from it. Data analysis has been and will stay a fundamental skill to have for most jobs. In the following, I will discuss how to start a career as a data analyst and how I was able to secure a job as a data analyst at a reputable company. Disclaimer Prepare yourself for the worse; learn more about that here . You should read it if You are looking for an internship or a junior opportunity as a Data Analyst. Data Analyst Trends A simple search of the term " Data Analyst " on google trends can show us a graph with a positive trend of the frequency of searches. We can observe that from 2...

The Portfolio of Data Scientists and Analysts: An Overview


Data-related jobs have been gaining popularity in the past few years. Many students are aspiring to become data analysts or data scientists; either by taking courses at their universities that relate to those topics (Data mining, machine learning…) or by taking (free or paid) online (coding or theoretical) courses to acquire certifications as proof of their expertise in the domain. Similarly, many professionals are also following the trend. 

A career shift into the field of data has become the new normal. However, many do not know how to approach this field after being certified and how to increase their chances of landing their dream job. Therefore, in the following, I will explain what is the easiest way to get a first hands-on experience in the field of data. This will help you highlight your newly acquired skills, from the online coding platforms or theoretical MOOCS that you have taken.

What is a portfolio?

The term portfolio is highly popular in the arts and creative industry. You would almost always hear a photographer, a graphic designer, or an architect… talking about their portfolios. Back in the day, when the internet did not exist, the portfolio used to be a filing folder that contains sketches, drafts or accomplished projects to be displayed to a recruiter. However, in our era, the portfolio is now a website containing these works. Many websites (free or paid) compete to provide those talented people a platform to make them shine and share their works.

What is a portfolio in the data field?

A data-related portfolio serves the same purpose, but for data analysts and data scientists. People with minimal to no real-life experience in data, can create their data-related projects and publish them on the internet. Some people might prefer to host their own website and deploy their machine-learning projects or their dashboards, and others publish them on GitHub (for free)

Some people also like to work directly on Kaggle by creating notebooks of their work. The reasons why people might use Kaggle is that it is easier to access the datasets and they can publish their notebooks for the public to see; in addition, it is a free cloud service like Google Colab and finally to improve their Kaggle score which can be highly relevant for some recruiters.

What is Google Colab?

Google Colab can be considered as an online Jupyter Notebook (for those familiar with Python). Google Colab is a free tool provided by google cloud services. It provides access to the use of GPU and high RAM (in case you are dealing with deep learning models or big matrices multiplications or tensors....). Although not rich as Kaggle, it contains a few datasets to practice on and to perform machine learning or data analysis projects.

The purpose of a portfolio in data science and analysis

  1. Show and prove to the recruiters that you cover more than just the theoretical components of the concepts that you learned (machine learning algorithms, visualization tools...)
  2. Practice newly acquired skills after finishing an online or a university course 
  3. In some cases, to have ready-to-use prototypes for future needs (for experienced people)

Ultimately, you should have a diversified portfolio. Meaning, try not to use the same algorithms on every project and try not to specialize in one topic (i.e deep learning only); unless you want to apply for a very specific position (computer vision engineer for self-driving cars, machine-learning engineer in health care...).

Keep in mind that "more is not better". Having two to three projects is more than enough to highlight your skills. Especially, if you are applying for junior positions. Recruiters are not expecting you to know everything. In addition, what you do online (as a practice) and through competitions does not reflect the real-life job in many cases: you will not join a company that trains 'dogs versus cats classifiers'... In other words, your portfolio is not going to showcase ways to generate value or impact to a business.

What should I have in my portfolio (key elements)?

Assuming you uploaded your project on GitHub and you have your GitHub linked on your CV. The most important thing to keep in mind is that non-technical recruiters will go and check your work. Therefore, you need to impress them with what they see but do not understand.
  1. Make sure to have a content table in your notebook to show a structure
  2. Introduce every part of your code so that the assessor can follow the logic
    • Objectives
    • Original input and expected output
    • Argue the usage of certain tools (i.e. algorithms, techniques, graph choice, hyperparameters choice)
  3. For both DA and DS projects, try to include images or graphs. Non-technical recruiters will check them and if they look nice, you might get additional points for that while they are assessing your work
For a data analysis project, one can start with an assumption and see whether the data confirm it or rejects it. That way, you will add a storyline to your notebook that can be told in the interview.
Finally, I would say that recruiters (technical and non-technical) would rather see a very neat project showing basic hard skills than seeing a chaotic project but with top-notch coding skills. The former candidate, will have to learn new hard skills, which are easy to acquire. The latter will need to learn soft skills, which isn't something easy at all to acquire.

Why you should not overdo your data portfolio?

Unless you have no formal education, creating a huge portfolio is not worth it. The recruiter will be overwhelmed by which project they should start with. In addition, it takes a decent amount of time to produce a quality project. Doing ten or twenty projects cannot be done in a few days unless they are very superficial.

Most recruiters are interested in knowing that you cover the basics (if you are applying for a junior position). In addition, the company that you start working with, will assign a person to support and guide you throughout the journey at work. Furthermore, one should keep in mind that most (average) companies are still at the beginning of their journey in the data science world. Many data science positions are in reality data analysis positions, with basic machine learning models' implementation. In other words, they do not require the full stack of skills that a real data scientist should have.

Why should you use Notebooks?

The reason why a notebook should be used when creating a portfolio is its flexibility.
  • Markdown can be used to write your thoughts, include images, or include equations...
  • Everything written (code-wise) can be instantly tested and displayed for a sanity check
  • Algorithm chunks can be written separately to avoid having to run the full code every time
  • A very nice tool to show visualizations
  • Very easy to get used to and to use
  • Presentable (think PPT but with code)
To sum up, I do not see any other way to have an understandable (self-explaining to the readers) project for your portfolio without a notebook.

Make your project stand-out

As mentioned before, you can deploy your machine-learning model online. The best way to do that is when you have a classifier. For example, you can train a model to classify cats and dogs using deep learning. Then you create a website where you can upload an image to receive the prediction of whether the image is a cat or a dog. Other interesting projects like these can be thought of.

Is it better to have a portfolio or an internship on the résumé?

Having both would be optimal. However, a junior portfolio is not always enough. One should know that real-life datasets are not Kaggle-like datasets. The ten minutes you invest to clean your Kaggle dataset can translate to hours of work when using real-life data. Therefore, in my opinion, an internship is always superior to a portfolio when applying for jobs. Recruiters in general, prefer to have people who already have a feeling of what it is like to work at a company, especially when they recruit for jobs.

Can a portfolio help me secure a job?

A portfolio can help in securing an internship (almost always). However, for jobs, something extra might be needed. Nevertheless, one should not feel demotivated. Most internships can turn into a job offer. In addition, most data-related internships are paid. An internship is not only for learning purposes; once the training is done (the first few weeks), you will start to get your hands dirty with real-life projects: getting closer to your career goals!

DataCamp portfolio

DataCamp provides guided and unguided projects as part of their learning curriculum. Once finished, the projects can be saved and published on GitHub. However, keep in mind that those projects are done by thousands of other students and might not bring as much value as a project that you create from scratch.

Thoughts...

Many data scientists, computer scientist... keep on developing their portfolios even after securing a job. However, after a few years of experience, the portfolio becomes less useful. Your portfolio is now like a gallery of your previous works, but no recruiter will give it its weight in the recruiting process. Not in a million year a perfect portfolio done by a junior will compete against a person with ten years of experience in the same field. Keep in mind that I mean portfolios with irrelevant projects to real-life scenarios or businesses.
Finally, you might be wondering why I did not provide any examples of good projects. The reason is, plenty of websites share and copy the same ideas, influencing everyone's creativity and narrowing it into the same projects. If you google 'covid-19 data analysis projects' you will find thousands of people who have done the same project with minor tweaks here and there.
I will conclude with an image showing the search trends for some terms related to data science portfolios.

googletrends, datascience,dataanalysis, data, jobs
Increase in data-portfolios' popularity in the past decade

Comments