If you’ve ever felt like you’re surrounded by a variety of resources – some that require payment and others that are free, some that are in-depth and lengthy while others are short and to the point… i will suggest some good resources and a plan to get you started .
Before you start exploring all those resources, take a moment to really grasp the full range of responsibilities that come with being a data engineer. This will give you a solid foundation to make the most out of the various learning materials available to you. You can check out my other post on Who is a data engineer ?
So what skills are required ?
Becoming a proficient data engineer typically requires these essential skills:
- Database Management and Design: A strong grasp of database systems, both relational and non-relational, is paramount. This includes understanding data modeling, normalization, indexing, and structuring databases for efficient querying and storage.
- Programming Proficiency: Proficiency in programming languages like Python, Java, Scala, or others is vital. You’ll need to create scripts, automate processes, and manipulate data to build and maintain data pipelines.
- Big Data Technologies: Familiarity with big data frameworks and technologies such as Hadoop, Spark, and Kafka is crucial. These tools enable you to process, transform, and manage large-scale data effectively.
- ETL and Data Modeling: Skill in building ETL pipelines to extract, transform, and load data, alongside creating effective data models.
- Cloud Platforms : Knowledge of cloud services (AWS, Azure, etc.) .
- Soft Skills: Communication skills, problem-solving abilities, and collaboration are essential for working in cross-functional teams and understanding business needs. I feel this is highly underrated.
I personally feel the jobs in the market play around the % of the work based on these skills. Some Jobs require may require a 40 % of #1 , 25% of #2 and 25% of #3 and 10% #3.. which are usually the Data Operations kind of role … etc.
Overall, you will require all of these skills to get some interesting work.
Resources
1 ) A well curated list of paid and unpaid resources have been documented here : https://dataengineering.wiki/Learning+Resources
2) Datartic ! ~ View my data series which demonstrates real life practical work as a Data Engineer along with some good projects which are implemented from scratch end to end for you to learn.
3) Lots of Youtube and a curated list from individuals and internet :
https://dataschool.com/data-governance/
https://www.geeksforgeeks.org/data-warehouse-development-life-cycle-model/
Kimball in the context of the modern data warehouse
Star schema or single table in PowerBi
3) Someone recommended this https://github.com/ossu/computer-science — > Core applications . Haven’t personally tried it out but you can check if that works with ya.
How I got a Data Engineering Job with just 3 months of prep from scratch
Things that worked for me :
1) Started journey with all the understanding from https://dataengineering.wiki/Guides/Getting+Started+With+Data+Engineering
2) Started with SQL and Python ( Everyday 1 Hour each ) ~ 1 Month gave me a good grip . ( This playlist is golden : https://youtube.com/playlist?list=PL08903FB7ACA1C2FB&si=LU9k1vZz-G5kUkEo ~ Haven’t watched all the videos but the essential concepts are very well explained)
3) Understanding Relational Database, Data Pipeline, Data Modeling, Indexing & Query Optimization (Intermediate), Batch Data Processing (Intermediate) and Data Warehouse. ~ Everyday 2/3 hours for a Month should give you a deep understanding.
4) Complete 1 or 2 Projects which covers most of your learning. You should find some good resources in my posts that would cover an end to end Data pipeline which I have worked on in a professional job.
5) Start applying for Entry level jobs and meanwhile try geting a certification preferably in a Cloud . Maybe AWS , GCP or Azure . This is not a requirement but something good to have while you are applying for positions.