Senior Engineer-ai Inference

Bank of America Bank of America · Banking · Addison +3

Senior Engineer focused on designing, building, and serving Generative AI inferencing capabilities for a large enterprise platform. This role involves deploying and performance-tuning models (e.g., using vLLM/Triton), implementing RAG, and establishing monitoring and evaluation frameworks for AI/ML services.

What you'd actually do

  1. Ensures that the design and engineering approach for complex features are consistent with the larger portfolio solution
  2. Define the technology tool stack for the solution and evaluate and adapt new testing tool/framework/practices for team(s)
  3. Enables team(s)/applications with Continuous Integration/Continuous Development (CI/CD) capabilities and engages with other technical stakeholders pertaining to efficient functioning of CI-CD pipeline
  4. Guides and influences team(s) on design and best practices for high code performance –e.g. pairing, code reviews
  5. Provides end-to-end delivery of complex features, including automation, for either a single team or multiple teams, at the program level

Skills

Required

  • Python development on Linux
  • Model Ops
  • AI/ML
  • advanced analytics
  • design patterns
  • software engineering practices
  • fundamental algorithms
  • code optimization
  • vLLM
  • Triton Inference Server
  • performance tuning
  • inference metrics
  • monitoring
  • observability
  • serving multiple tenants/clients
  • secure boundaries
  • Atheization & Authorization
  • Policy as Code
  • Systems Integration
  • Model Routing
  • Model Evaluation frameworks
  • RAG
  • Model Monitoring
  • Model Drift
  • KPIs
  • Test Driven Development
  • continual integration
  • clean code principles

Nice to have

  • open-source data science platform architecture
  • storage & compute separation
  • interactive development workbenches
  • containers
  • Jupyter
  • VSCode
  • Redis
  • Solar
  • Postgres DB
  • FAISS
  • Teradata
  • Oracle
  • SQL Server
  • Hadoop
  • Gen AI training and Inferencing platform
  • open-source model
  • Gen AI Mode

What the JD emphasized

  • Minimum 8 years of relevant experience required
  • Experience in Model Ops and design, software development with proven effectiveness in delivering technology in fast-paced, demanding, industry driven environment for AI/ML, and advanced analytics
  • Hands on experience in both Python development on Linux
  • Experience with deploying models using vLLM/Triton Inference Server
  • Performance Tuning those models and deployment to provide higher throughput
  • Experience with various inference metrics, and related monitoring and observability
  • Experience with serving multiple tenants/clients with model endpoints with secure boundaries
  • Model Evaluation frameworks to evaluate different models and their tradeoffs between efficiency and metrics
  • Experience building RAG for various knowledge bases, and document types
  • Model Monitoring – Ability to collect metrics to measure things like Model Drift, KPIs

Other signals

  • Gen AI platform
  • inferencing capabilities
  • scalable
  • high-performance AI
  • design, build, and serve Gen AI inferencing
  • Model Ops
  • deploying models using vLLM/Triton Inference Server
  • Performance Tuning
  • serving multiple tenants/clients with model endpoints
  • Model Evaluation frameworks
  • building RAG
  • Model Monitoring