Software Engineer Iii- Gen AI Inferencing

Bank of America Bank of America · Banking · Addison, Charlotte

Software Engineer III focused on building and operating reusable toolkits for Gen AI RAG capabilities within an enterprise AI platform. The role involves designing, developing, and deploying models using frameworks like vLLM/Triton, MLOps, and fine-tuning techniques, with a strong emphasis on CI/CD and performance tuning for production environments. Experience with RAG processes and application development in related technologies is required.

What you'd actually do

  1. Codes solutions and unit test to deliver a requirement/story per the defined acceptance criteria and compliance requirements
  2. Designs, develops, and modifies architecture components, application interfaces, and solution enablers while ensuring principal architecture integrity is maintained
  3. Mentors other software engineers and coach team on Continuous Integration and Continuous Development (CI-CD) practices and automating tool stack
  4. Executes story refinement, definition of requirements, and estimating work necessary to realize a story through the delivery lifecycle
  5. Performs spike/proof of concept as necessary to mitigate risk or implement new ideas

Skills

Required

  • OOP in Python/Scala/Java
  • AI/ML/GenAI Lifecycle Management and Development
  • MLOps
  • Fine – Tuning techniques
  • Inference Frameworks
  • vLLM/Triton Inference Server
  • containers
  • production deployment
  • automation
  • Performance Tuning
  • Python/Unix based systems
  • generative AI RAG process
  • chunking, embedding, retrieval, reranking and summarization
  • application development
  • MongoDB
  • Redis
  • Angular/React Frameworks
  • Containerization
  • Building API based application leveraging FAST API services
  • JWT Integration
  • API Gateway
  • CI/CD practices

Nice to have

  • Scala
  • Java
  • Unix
  • Angular
  • React
  • MongoDB
  • Redis

What the JD emphasized

  • Gen AI Inferencing
  • Gen AI platform
  • AI/ML/GenAI Lifecycle Management and Development
  • MLOps
  • Fine – Tuning techniques
  • Inference Frameworks
  • vLLM/Triton Inference Server
  • production
  • automation
  • Performance Tuning
  • generative AI RAG process
  • chunking, embedding, retrieval, reranking and summarization
  • application development
  • FAST API services
  • AI/ML and GenAI work
  • Continuous Integration (CI)
  • Continuous Deployment (CD)

Other signals

  • design, build, and operate of reusable toolkits for Gen AI RAG capabilities
  • developing and delivering complex requirements to accomplish business goals
  • Experience with AI/ML/GenAI Lifecycle Management and Development and its Ecosystem
  • building frameworks using MLOps, Fine – Tuning techniques, Inference Frameworks
  • deploying models using vLLM/Triton Inference Server in containers in production with automation
  • Hands on experience and knowledge generative AI RAG process for various use cases, including chunking, embedding, retrieval, reranking and summarization