Senior Software Engineer - Python and Data Ecosystem

ClickHouse ClickHouse · Data AI · Israel +2 · Engineering

Senior Software Engineer to own and evolve ClickHouse's Python connector and SDK ecosystem, focusing on integrations with orchestration platforms, transformation tools, and the AI/LLM ecosystem. The role involves designing connectors for RAG architectures, ML feature pipelines, and LLM-powered data applications, making ClickHouse a natural fit for next-generation AI and data systems. Requires deep Python data ecosystem experience and hands-on experience with AI/ML in data engineering contexts.

What you'd actually do

  1. Own and evolve ClickHouse's Python connector and SDK ecosystem, raising the bar on performance, reliability, and API design
  2. Build and maintain integrations with orchestration platforms (Airflow, Dagster, Prefect) and transformation tools (dbt) to enterprise-grade quality standards
  3. Drive the AI/LLM integration strategy: designing connectors and patterns that make ClickHouse a natural fit in RAG architectures, ML feature pipelines, and LLM-powered data applications
  4. Engage actively with the open-source community: triage issues, support contributors, advocate for users, and shape the roadmap based on real-world feedback
  5. Collaborate with Product, Cloud, and other engineering teams to align integration work with broader platform priorities

Skills

Required

  • 7+ years of software development experience
  • hands-on time as a Data Engineer, Data Scientist, or ML Engineer
  • Deep, proven experience designing, building, and maintaining production-grade Python connectors, SDKs, or integrations for at least one major platform (orchestration, BI, MLOps, or data transformation)
  • Solid experience with the Python data ecosystem: Pandas, NumPy, Pydantic, and related libraries
  • Prior contributions to, or deep practical experience with, popular data orchestration tools (Airflow, Dagster, or Prefect)
  • Hands-on experience with AI/ML in data engineering contexts: embedding generation, vector search, feature pipelines, or LLM-powered tooling in production, not just experimentation
  • Strong understanding of database fundamentals: SQL, data modeling, query optimization, and familiarity with OLAP/analytical databases
  • Solid experience with concurrent Python: threading, multiprocessing, and async patterns
  • Outstanding written and verbal communication skills
  • comfortable collaborating across engineering functions and with open-source communities

Nice to have

  • Experience deploying AI/ML models in production, including inference APIs and vector databases
  • Prior experience as a Data Engineer or Data Scientist in a product-facing or platform role
  • Familiarity with ClickHouse or similar high-performance OLAP platforms
  • Familiarity with the JVM ecosystem

What the JD emphasized

  • lived the Data Engineer or Data Scientist experience firsthand
  • operated within them
  • Hands-on experience with AI/ML in data engineering contexts
  • not just experimentation

Other signals

  • AI-powered workflows
  • vector stores for RAG pipelines
  • backends for LLM-powered agents
  • ML feature stores
  • LLM-powered data applications
  • embedding pipelines
  • retrieval-augmented generation
  • ML feature pipelines
  • LLM-powered tooling