Mid-level Data Engineer-python, Aws, Spark (hybrid)

State Farm State Farm · Insurance · Bloomington, IL +3 · Technology and UX

State Farm is seeking an Experienced Data Engineer to join their Property & Casualty Data Pipeline team. The role involves building and scaling data products and pipelines to improve business decisions, customer retention, and operational efficiency. Responsibilities include developing automated data assets, cleansing and transforming data, and investigating new data resources and technologies. The position requires proficiency in Python, Spark SQL, AWS services, and distributed data processing frameworks.

What you'd actually do

  1. Utilizes industry-adopted languages and frameworks in coding, testing, security, DevOps, DataOps and data engineering practices
  2. Develops and maintains reusable, scalable, and compliant data solutions across multiple platforms and compute environments
  3. Responsible for the identification, acquisition, cleansing, profiling, and ETL (extracting, transformation, and loading) of data used in analytic discovery and production solution deployment across multiple platforms
  4. Establishes business domain knowledge for existing State Farm data sources and investigates, recommends, and initiates acquisition of data resources, both internal and external
  5. Identifies and consults on emerging technologies and critical core systems, including techniques, tools, data sources, and platforms in the data engineering field

Skills

Required

  • Minimum of 2-4 years of professional experience as a Data Engineer
  • Proficiency in programming languages such as Python, Spark SQL (or PySpark), R, Java, Bash, etc.
  • Hands-on experience with AWS services including ETL tools (Glue, EMR Serverless), Lambda, Step Functions, EventBridge, S3, DynamoDB, Kinesis Firehose, Redshift, Iceberg, and SageMaker.
  • Experience with distributed data processing frameworks such as Apache Spark, Databricks.
  • Experience with infrastructure as code tools such as OpenTofu (formerly Terraform) for managing cloud resources and deployments.
  • Familiarity with CI/CD pipelines including automated testing, security scans, and tools like Airflow.

Nice to have

  • Experience or ability to rapidly gain P&C data domain knowledge, including rating, underwriting, and/or claims.
  • Experience with relational databases such as DB2, Postgres, Redshift, etc.
  • Experience with version control systems such as GitHub or GitLab.
  • Data access skills using SQL, and Athena.
  • Experience in designing, building, and maintaining data pipelines for automated data processing.
  • Knowledge of data modeling techniques such as star schema and snowflake schema, with an understanding of data architecture.