Skip to main content

How to become a Data Analyst in 2023

Data analysis skills are one of the hottest skills that have been in high demand on the job market for the past few years. A "data analyst" job title is not new to the market, however, due to the growth of data generation and the facilitation of data storage provided by cloud computing, many companies have now the capabilities to store their big data and to derive insights and value from it. Data analysis has been and will stay a fundamental skill to have for most jobs. In the following, I will discuss how to start a career as a data analyst and how I was able to secure a job as a data analyst at a reputable company. Disclaimer Prepare yourself for the worse; learn more about that here . You should read it if You are looking for an internship or a junior opportunity as a Data Analyst. Data Analyst Trends A simple search of the term " Data Analyst " on google trends can show us a graph with a positive trend of the frequency of searches. We can observe that from 2

Learning Python in 2021 For Data Analysis [Beginner to Intermediate]


Many people approaching the field of data science and data analysis, ask about programming languages. The two most frequent advice are to study either R or Python. However, saying Python without much context can be unobliging and lead the aspiring data analyst or scientist to lose their time on learning unneeded tools. To each field its tools and priorities: the following will explain what is required to start your journey as a data analyst with Python. 

python,SQL,machine learning, data analysis, data science

Why learn Python in 2021?

Python is a general/multi-purpose programming language. You can do everything related to coding on Python: software development (engineering), games development, data science and data analysis, dashboarding (and many more…). In addition, python is relatively simple to learn due to its English-like syntax. In my humble opinion, Python should be learned for the sake of understanding what coding is about, even if you will not use it daily.

How to learn Python in 2021? (For beginners)

In the following, I highlight how to best approach python. The list is limited to the essentials that will let you, at least, write your first function. However, please do not limit yourself to the proposed list. Enrich your python understanding as much as possible so you can feel at ease when playing around with data.

Calculator

The first step to learn Python is to use it as a calculator. Try to perform some basic calculations to familiarize with the operators and how they are defined in python (e.g. ‘**’ is ‘^’ in python). To find the list of operators you can check w3schools.

Creating variables

Creating variables is the most important thing to learn in python. Matter of fact, everything that you will do in python will revolve around defining variables.

Data Types

Data types are important to cover. It is essential that you understand how and why data types can or cannot work together. As part of types you can learn the different ways to define numbers (integer, float and complex numbers) and you can understand how casting works (e.g. transforming a number to string or vice versa). Finally, you will explore how Booleans work and how important they are, especially for creating masks in data analysis.

Data structure

List, tuples, sets, dictionaries and arrays. Learning these from a high level is crucial. If you do not learn them, you will never be a coder (not even a beginner). In-depth knowledge of these concepts can be beneficial for intermediate to advanced programming skills. However, low-level understanding means a bigger time investment. It is not worth it to ace data structures from the beginning but along the way, they should be revisited multiple times.

Loops (for/while) and conditions (if/else)

Loops are also essential for python. You need to know how to iterate over your data. The most important looping skills to have is list comprehensions. Once you cover loop comprehensions you could be assured that you are on the right track.

Writing functions

Learning python is all about writing concise and helpful functions to automate your tasks. With all the previously discussed, you should be able to write very basic functions by now, that will perform simple tasks.

Learn how to find help and answers to errors

Google and StackOverflow are the two best resources to find the answers to your coding-related questions. In fact, googling is a very valuable skill. The better you are at googling the easier your coding experience will get with time. Try to check a few videos on YouTube to improve your googling skills; you will be thankful for it.

Best books and resources to learn python

To avoid writing an extensive guide on learning python, I will share my favorite resources.
  • Mosh Programming (YouTube)
  • Automating the boring stuff (Book)
  • W3Schools
  • DataCamp

What 'Python for data analysis' means?

Personally, when I say python for DA, I mean learning to use notebooks (Jupyter(lab)). Notebooks are very helpful to showcase results easily and to perform analysis. However, it is highly not recommended to use notebooks to write scripts and anything related to software.

Here are my top reasons to use notebooks are:

  • Easy to use
  • Writing algorithm chunks
  • Run smaller code chunks
  • Good for visualization
  • Good for displaying data (tabular data in particular)

What are the essential packages and libraries for data analysis?

These four packages/libraries are the most used in data analysis. Pandas and numpy will help in cleaning and manipulating the datasets. Matplotlib and Seaborn will help in visualizing the data. The four packages are the pillars of data science and without them, it is nearly impossible to do anything.

Pandas for data analysis

Pandas (a.k.a pd) is a foundational package for data analysis. Using pandas you will learn how to read data from files (excel, csv…). You will also learn how to manipulate and perform transformations on datasets. Pandas is mainly used for tabular data. Therefore, if you are familiar with excel transformations pandas will cover you well! In the following, I will mention some important functions on pandas.

  • Reading files (pandas.read_excel, pandas.read_csv…)
  • Pivoting (pandas.pivot, pandas.pivot_table)
  • Merging and joining (pandas.join, pandas.merge)
  • Applying a function (pandas.apply)
  • DataFrames for tabular data (pandas.DataFrame)

Numpy for data analysis

Numpy is the best package for numerical computing in Python. Numpy, with its powerful vectorization tools, allows fast matrices/vectors (also known as arrays in numpy lingo) operations which can be beneficial for dataframes and linear algebra applications. Numpy has also additional ‘add-ons’ to make it more efficient (however these tools are out of the scope of a beginner's learning). To get a sense of what is the potential of numpy:

  • Creating matrices and vectors (Numpy.array)
  • Creating distributions and random numbers (numpy.random.normal…)
  • Dot products and more (numpy.dot)
  • ...

Matplotlib and Seaborn for data analysis

Matplotlib and seaborn are useful for creating visuals. The syntax is simple and intuitive. The main difference between the two is that seaborn can perform advanced visualizations. For example, It is easier to draw multiple categories at once for comparison purposes. In addition, seaborn has smoother themes to be used to improve the overall quality of visuals.

A common workflow in pandas for data analysis

The first step in any data analysis project is preparing the data to be used for the actual analysis. Data is usually very ugly. Which means a big amount of time is going to be dedicated for the cleaning. The following is a basic workflow for each dataset that you work with:

  1. Importing the data correctly (headers are correct, index is correct…)
  2. Checking for missing values to understand how good or bad the data is
  3. Dropping duplicates to avoid issues along the analysis
  4. Performing transformations (wide to long, long to wide…)
  5. Verifying that the values of your transformed data are still good and nothing is missing
  6. Checking that all the values are of the right type
  7. Plotting a few graphs for a final check

Comments