What you'd actually do

Help design, build, and facilitate adoption of a modern Data+ML platform

Modularize complex ML code into standardized and repeatable components

Establish and facilitate adoption of repeatable patterns for model development, deployment, and monitoring

Build a platform that scales to thousands of users and offers self-service capability to build ML experimentation pipelines

Leverage workflow orchestration tools to deploy efficient and scalable execution of complex data and ML pipelines

Skills

Required

Python
distributed computing
orchestration technologies (Kubernetes, Airflow)
infrastructure-as-code tools (Terraform, FluxCD)
CI/CD frameworks (GitHub Actions)
containerization frameworks
ML Platform tools (Jupyter Notebooks, MLFlow, Ray, Vertex AI)
data platform tools (Apache Spark, Flink)
Machine Learning concepts
Distributed Systems Knowledge
Data Platform Experience

Nice to have

Java
Scala
Go
Iceberg
Pinot
Jenkins
Parquet
Protocol Buffers/GRPC

What the JD emphasized

B.S. in Computer Science, Data Science, Statistics, Applied Mathematics, or a related field and 10+ years related experience; or M.S. with 8+ years of experience; or Ph.D with 6+ years of experience

3+ years experience developing and deploying machine learning solutions to production

3+ years experience with ML Platform tools like Jupyter Notebooks, NVidia Workbench, MLFlow, Ray, Vertex AI etc.

Experience building data platform product(s) or features with (one of) Apache Spark, Flink or comparable tools in GCP

Expert level experience with Python

Expert level experience with CI/CD frameworks such as GitHub Actions

Expert level experience with containerization frameworks

Distributed Systems Knowledge

Data Platform Experience

Machine Learning concepts

As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn’t changed — we’re here to stop breaches, and we’ve redefined modern security with the world’s most advanced AI-native platform. We work on large scale distributed systems, processing almost 3 trillion events per day and this traffic is growing daily. Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We’re also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers. We’re always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.

About the Role:

The charter of the Data & ML Platform team is to harness all the data that is ingested and cataloged within the Data LakeHouse for exploration, insights, model development, ML Engineering and Insights Activation. This team is situated within the larger Data Platform group, which serves as one of the core pillars of our company. We process data at a truly immense scale. Our processing is composed of various facets including threat events collected via telemetry data, associated metadata, along with IT asset information, contextual information about threat exposure based on additional processing, etc. These facets comprise the overall data platform, which is currently over 200 PB and maintained in a hyper scale Data Lakehouse, built and owned by the Data Platform team. The ingestion mechanisms include both batch and near real-time streams that form the core Threat Analytics Platform used for insights, threat hunting, incident investigations and more.

As an engineer in this team, you will play an integral role as we build out our ML Experimentation Platform from the ground up. You will collaborate closely with Data Platform Software Engineers, Data Scientists & Threat Analysts to design, implement, and maintain scalable ML pipelines that will be used for Data Preparation, Cataloging, Feature Engineering, Model Training, and Model Serving that influence critical business decisions. You’ll be a key contributor in a production-focused culture that bridges the gap between model development and operational success. Future plans include generative AI investments for use cases such as modeling attack paths for IT assets.

Location: Bangalore (Hybrid) Candidate should be comfortable to visit office twice a week.

What You’ll Do:

Help design, build, and facilitate adoption of a modern Data+ML platform
Modularize complex ML code into standardized and repeatable components
Establish and facilitate adoption of repeatable patterns for model development, deployment, and monitoring
Build a platform that scales to thousands of users and offers self-service capability to build ML experimentation pipelines
Leverage workflow orchestration tools to deploy efficient and scalable execution of complex data and ML pipelines
Review code changes from data scientists and champion software development best practices
Leverage cloud services like Kubernetes, blob storage, and queues in our cloud first environment

What You’ll Need:

B.S. in Computer Science, Data Science, Statistics, Applied Mathematics, or a related field and 10+ years related experience; or M.S. with 8+ years of experience; or Ph.D with 6+ years of experience.
3+ years experience developing and deploying machine learning solutions to production. Familiarity with typical machine learning algorithms from an engineering perspective (how they are built and used, not necessarily the theory); familiarity with supervised / unsupervised approaches: how, why, and when and labeled data is created and used
3+ years experience with ML Platform tools like Jupyter Notebooks, NVidia Workbench, MLFlow, Ray, Vertex AI etc.
Experience building data platform product(s) or features with (one of) Apache Spark, Flink or comparable tools in GCP. Experience with Iceberg is highly desirable.
Proficiency in distributed computing and orchestration technologies (Kubernetes, Airflow, etc.)
Production experience with infrastructure-as-code tools such as Terraform, FluxCD
Expert level experience with Python; Java/Scala exposure is recommended. Ability to write Python interfaces to provide standardized and simplified interfaces for data scientists to utilize internal Crowdstrike tools
Expert level experience with CI/CD frameworks such as GitHub Actions
Expert level experience with containerization frameworks
Strong analytical and problem solving skills, capable of working in a dynamic environment
Exceptional interpersonal and communication skills. Work with stakeholders across multiple teams and synthesize their needs into software interfaces and processes.

Critical Skills Needed for Role:

Distributed Systems Knowledge
Data Platform Experience
Machine Learning concepts

Bonus Points:

Go
Iceberg
Pinot or other time-series/OLAP-style database
Jenkins
Parquet
Protocol Buffers/GRPC

#LI-DP1

#LI-SM2

#LI-Hybrid

**Benefits of Working at CrowdStrike: **

Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
Vibrant office culture with world class amenities
Great Place to Work Certified™ across the globe

CrowdStrike is proud to be an equal opportunity employer. We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed. We support veterans and individuals with disabilities through our affirmative action program.

CrowdStrike is committed to providing equal employment opportunity for all employees and applicants for employment. The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance, or any other characteristic protected by law. We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.

If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at recruiting@crowdstrike.com for further assistance.