Tips For Data Engineers

☁️ Ümit Eroğlu 🌍🛰
4 min readJul 18, 2023

--

https://unsplash.com/photos/jo9eYA750V0

1.A (Book) Case for Eventual Consistency
2. A/B and How to Be
3. About the Storage Layer
4. Analytics as the Secret Glue for Microservice Architectures
5. Automate Your Infrastructure
6. Automate Your Pipeline Tests
7. Be Intentional About the Batching Model in Your Data Pipelines
8. Beware of Silver-Bullet Syndrome
9. Building a Career as a Data Engineer
10. Business Dashboards for Data Pipelines
11. Caution: Data Science Projects Can Turn into the Emperor’s New Clothes.
12. Change Data Capture
13. Column Names as Contracts
14. Consensual, Privacy-Aware Data Collection
15. Cultivate Good Working Relationships with Data Consumers
16. Data Engineering != Spark
17. Data Engineering for Autonomy and Rapid Innovation
18. Data Engineering from a Data Scientist’s Perspective
19. Data Pipeline Design Patterns for Reusability and Extensibility.
20. Data Quality for Data Engineers
21. Data Security for Data Engineers
22. Data Validation Is More Than Summary Statistics
23. Data Warehouses Are the Past, Present, and Future
24. Defining and Managing Messages in Log-Centric Architectures
25. Demystify the Source and Illuminate the Data Pipeline.
26. Develop Communities, Not Just Code
27. Effective Data Engineering in the Cloud World
28. Embrace the Data Lake Architecture
29. Embracing Data Silos
30. Engineering Reproducible Data Science Projects
31. Five Best Practices for Stable Data Processing
32. Focus on Maintainability and Break Up Those ETL Tasks
33. Friends Don’t Let Friends Do Dual-Writes
34. Fundamental Knowledge
35. Getting the “Structured” Back into SQL
36. Give Data Products a Frontend with Latent Documentation
37. How Data Pipelines Evolve
38. How to Build Your Data Platform like a Product
39. How to Prevent a Data Mutiny
40. Know the Value per Byte of Your Data
41. Know Your Latencies
42. Learn to Use a NoSQL Database, but Not like an RDBMS
43. Let the Robots Enforce the Rules
44. Listen to Your Users — but Not Too Much
45. Low-Cost Sensors and the Quality of Data
46. Maintain Your Mechanical Sympathy
47. Metadata ≥ Data
48. Metadata Services as a Core Component of the Data Platform
49. Mind the Gap: Your Data Lake Provides No ACID Guarantees
50. Modern Metadata for the Modern Data Stack
51. Most Data Problems Are Not Big Data Problems
52. Moving from Software Engineering to Data Engineering
53. Observability for Data Engineers
54. Perfect Is the Enemy of Good
55. Pipe Dreams
56. Preventing the Data Lake Abyss
57. Prioritizing User Experience in Messaging Systems
58. Privacy Is Your Problem
59. QA and All Its Sexiness
60. Seven Things Data Engineers Need to Watch Out for in ML Projects.
61. Six Dimensions for Picking an Analytical Data Warehouse.
62. Small Files in a Big Data World
63. Streaming Is Different from Batch
64. Tardy Data
65. Tech Should Take a Back Seat for Data Project Success
66. Ten Must-Ask Questions for Data-Engineering Projects
67. The Data Pipeline Is Not About Speed
68. The Dos and Don’ts of Data Engineering
69. The End of ETL as We Know It
70. The Haiku Approach to Writing Software
71. The Hidden Cost of Data Input/Output.
72. The Holy War Between Proprietary and Open Source Is a Lie
73. The Implications of the CAP Theorem
74. The Importance of Data Lineage
75. The Many Meanings of Missingness
76. The Six Words That Will Destroy Your Career
77. The Three Invaluable Benefits of Open Source for Testing Data Quality
78. The Three Rs of Data Engineering.
79. The Two Types of Data Engineering and Data Engineers
80. The Yin and Yang of Big Data Scalability
81. Threading and Concurrency in Data Processing.
82. Three Important Distributed Programming Concepts
83. Time (Semantics) Won’t Wait
84. Tools Don’t Matter, Patterns and Practices Do
85. Total Opportunity Cost of Ownership
86. Understanding the Ways Different Data Domains Solve Problems
87. What Is a Data Engineer? Clue: We’re Data Science Enablers
88. What Is a Data Mesh, and How Not to Mesh It Up
89. What Is Big Data?
90. What to Do When You Don’t Get Any Credit
91. When Our Data Science Team Didn’t Produce Value
92. When to Avoid the Naive Approach
93. When to Be Cautious About Sharing Data
94. When to Talk and When to Listen
95. Why Data Science Teams Need Generalists, Not Specialists
96. With Great Data Comes Great Responsibility
97. Your Data Tests Failed! Now What?

Source:

Turkish:

--

--

☁️ Ümit Eroğlu 🌍🛰
☁️ Ümit Eroğlu 🌍🛰

Written by ☁️ Ümit Eroğlu 🌍🛰

Software, Cloud, DevOps, IoT, GIS, Remote Sensing.

No responses yet