How to Become a data engineer?

social media, media, board-1989152.jpg

If you’ve ever felt like you’re surrounded by a variety of resources – some that require payment and others that are free, some that are in-depth and lengthy while others are short and to the point… i will suggest some good resources and a plan to get you started .

Before you start exploring all those resources, take a moment to really grasp the full range of responsibilities that come with being a data engineer. This will give you a solid foundation to make the most out of the various learning materials available to you. You can check out my other post on Who is a data engineer ?

So what skills are required ?

Becoming a proficient data engineer typically requires these essential skills:

  1. Database Management and Design: A strong grasp of database systems, both relational and non-relational, is paramount. This includes understanding data modeling, normalization, indexing, and structuring databases for efficient querying and storage.
  2. Programming Proficiency: Proficiency in programming languages like Python, Java, Scala, or others is vital. You’ll need to create scripts, automate processes, and manipulate data to build and maintain data pipelines.
  3. Big Data Technologies: Familiarity with big data frameworks and technologies such as Hadoop, Spark, and Kafka is crucial. These tools enable you to process, transform, and manage large-scale data effectively.
  4. ETL and Data Modeling: Skill in building ETL pipelines to extract, transform, and load data, alongside creating effective data models.
  5. Cloud Platforms : Knowledge of cloud services (AWS, Azure, etc.) .
  6. Soft Skills: Communication skills, problem-solving abilities, and collaboration are essential for working in cross-functional teams and understanding business needs. I feel this is highly underrated.

I personally feel the jobs in the market play around the % of the work based on these skills. Some Jobs require may require a 40 % of #1 , 25% of #2 and 25% of #3 and 10% #3.. which are usually the Data Operations kind of role … etc.

Overall, you will require all of these skills to get some interesting work.

Resources

1 ) A well curated list of paid and unpaid resources have been documented here : https://dataengineering.wiki/Learning+Resources

2) Datartic ! ~ View my data series which demonstrates real life practical work as a Data Engineer along with some good projects which are implemented from scratch end to end for you to learn.

3) Lots of Youtube and a curated list from individuals and internet :

https://dataschool.com/data-governance/

https://www.geeksforgeeks.org/data-warehouse-development-life-cycle-model/

https://docs.microsoft.com/en-us/power-query/dataflows/best-practices-for-dimensional-model-using-dataflows

Kimball in the context of the modern data warehouse

Star schema or single table in PowerBi

3) Someone recommended this https://github.com/ossu/computer-science — > Core applications . Haven’t personally tried it out but you can check if that works with ya.

How I got a Data Engineering Job with just 3 months of prep from scratch

road, path, question mark-3829020.jpg

Things that worked for me :
1) Started journey with all the understanding from https://dataengineering.wiki/Guides/Getting+Started+With+Data+Engineering
2) Started with SQL and Python ( Everyday 1 Hour each ) ~ 1 Month gave me a good grip . ( This playlist is golden : https://youtube.com/playlist?list=PL08903FB7ACA1C2FB&si=LU9k1vZz-G5kUkEo ~ Haven’t watched all the videos but the essential concepts are very well explained)
3) Understanding Relational Database, Data Pipeline, Data Modeling, Indexing & Query Optimization (Intermediate), Batch Data Processing (Intermediate) and Data Warehouse. ~ Everyday 2/3 hours for a Month should give you a deep understanding.
4) Complete 1 or 2 Projects which covers most of your learning. You should find some good resources in my posts that would cover an end to end Data pipeline which I have worked on in a professional job.
5) Start applying for Entry level jobs and meanwhile try geting a certification preferably in a Cloud . Maybe AWS , GCP or Azure . This is not a requirement but something good to have while you are applying for positions.

Remember its about consistency !

Leave a Comment

Your email address will not be published. Required fields are marked *