Senior Software Engineer, DEA Core
- Los Gatos, California
- Data Engineering and Infrastructure
Netflix is at the vanguard of data-driven companies. From optimizing global content spend to delivering bufferless streaming around the world, data is used to power decisions and fuel experiments across the business. Every day, we log hundreds of billions of events and run tens of thousands of Pig and Spark jobs. And we’re just getting started. We want to dramatically increase the quantity, complexity, and quality of analytics while also increasing self-serve capabilities for our 60PB data warehouse. To enable this, we’re investing in centralized solutions for our data engineers.
In this role, you’ll be contributing to Netflix’s industry-leading data engineering infrastructure. You’ll collaborate with members of our data engineering and data platform teams on opportunities to innovate, automate, and centralize. You’ll develop shared libraries and elegant components for our paved path -- which we’ll open source whenever possible. And you’ll provide support for engineers onboarding to your solutions.
What will you be doing?
- Develop Python libraries to support innovative new ways to build data pipelines and manage the data lifecycle
- Build a library of common Spark UDFs in Scala & Python to increase consistency, reduce development time, and improve job performance
- Curate contributions to shared libraries, ensuring code is efficient and implements best practices
- Evangelize our paved path to data engineers and provide support for the solutions you build
- Identify opportunities to improve existing solutions or create new ones
Who are you?
- Although you understand and value analytics in all its forms, you thrive at the intersection between big data and software engineering.
- You are a passionate advocate for the developer’s experience. You identify, and relentlessly pursue, opportunities to simplify all phases of the data development lifecycle.
- You love freedom and hate being micromanaged. Given context, you're capable of self-direction.
- You deliver results quickly with iteration, instead of waiting for perfection, and you actively solicit feedback.
- You seek to constantly improve your knowledge of existing technologies and explore new ones.
- You create code that is simple to understand and maintain, and you take pride in its quality.
What do you know? (This is not a rigorous litmus test.)
- You understand Hadoop technologies and data engineering concepts, and you have written or supported batch ETL jobs in a distributed environment (e.g. Hadoop, Spark, or MPP databases).
- You have achieved mastery of at least one major programming language -- bonus points if it’s Python, Scala, or Java.
- You have written reusable ETL components using Python or similar language -- such as UDFs, job orchestration, data quality, lifecycle management, or similar functionality.
- You have worked within a virtualized, containerized, and/or cloud computing environment at some point.
- You understand how to work and effectively communicate with both analytics practitioners and engineers.
- Experience with advanced analytics or data science is a plus.