Humberger Nav
mployee.me logo
Senior Software Developer - Retrieval-Augmented Generation (RAG) System
Elsevier
linkedin
Philadelphia, PA
5-10 years
Not Disclosed
Full time
29 April 2026
Top Skills:
Access ControlAiApiArchitectureAuditAuthenticationAuthorizationAwsAzureCi/cdCloudCloud InfrastructureCode ReviewComplianceCost OptimizationDashboardData GovernanceData PipelineData ProcessingDesign ReviewDockerDocument RetrievalFeedGcpGovernanceHealthcareHitKubernetesNew RelicNlpPerformance OptimizationPipelineProduction SystemPythonQaReproducibilitySecurity ControlSparkSqlTooling

96

Get Personalized Job Matches with 1 Click

Job Description iconJob Description
Download Resume iconDownload Resume

Senior Software Engineer – Retrieval-Augmented Generation (RAG) System


We are seeking an engineer to work with a team to build and support a healthcare centered production-scale RAG system that combines document retrieval with response generation to deliver accurate, context-aware answers. This engineer we be expected to design, implement, and operate end-to-end RAG pipelines— LLM interaction, API creation, and high-performance, secure delivery of knowledge-grounded capabilities. You will collaborate with data engineers, platform teams, and product partners to ship reliable, scalable, and observable systems.


Role and responsibilities

  • Architect, implement, test, and operate end-to-end RAG workflows:
  • Ingest and normalize documents from diverse sources
  • Generate and manage embeddings; index and query vector databases
  • Retrieve relevant passages, apply reranking or fusion strategies, and feed prompts to LLMs
  • Build scalable, low-latency services and APIs (Python preferred; other languages acceptable) and ensure production-grade reliability (monitoring, tracing, alerting)
  • Integrate with vector databases and embedding pipelines and optimize for latency, throughput, and cost
  • Design and implement ML Ops workflows: model/version management, experiments, feature stores, CI/CD for ML-enabled services, rollback plans
  • Develop robust data pipelines and governance around ingestion, provenance, quality checks, and access controls
  • Collaborate with data engineers to improve retrieval quality (embedding strategies, reranking, cross-encoder models, prompt engineering) and implement evaluation metrics (precision/recall, MRR, QA accuracy, user-centric metrics)
  • Implement monitoring and observability for RAG components (latency, success rate, cache hit rate, retrieval quality, data drift)
  • Ensure security, privacy, and compliance (authentication, authorization, data masking, PII handling, audit logging)
  • Optimize for scalability and reliability in cloud environments (AWS/GCP/Azure) and containerized deployments (Docker, Kubernetes)
  • Contribute to architecture decisions, drive technical debt reduction, and mentor junior engineers
  • Collaborate with product, design, and data teams to translate requirements into robust software solutions
  • Document APIs, runbooks, and architectural decisions; participate in code reviews and design reviews
  • Required qualifications

    • 5+ years of professional software engineering experience designing and delivering production systems
    • Strong programming skills (Python required; NodeJs a plus)
    • Deep understanding of retrieval-augmented or application-scale NLP systems and practical experience building RAG-like pipelines
    • Hands-on experience with ML workflow tooling and MLOps concepts (model serving, versioning, experiments, feature stores, reproducibility)
    • Proficiency with cloud infrastructure and modern software practices (AWS/GCP/Azure; Docker; Kubernetes; CI/CD)
    • Strong problem-solving skills, excellent communication, and ability to work with cross-functional teams
    • Familiarity with data governance, privacy, and security best practices


    Preferred qualifications

    • Experience with agentic workflow tools (LangGraph) and familiarity with prompt engineering for LLMs
    • Exposure to working with and evaluating different LLMs
    • Knowledge of evaluation methodologies for retrieval and QA systems and the ability to set up A/B tests and dashboards
    • Experience with data processing frameworks (SQL, Pandas, Spark) and working with large-scale data pipelines
    • Background in performance optimization for low-latency AI services (MLflow)
    • Experience with monitoring and logging via New Relic, K9s, Portkey, etc
    • Experience with minimizing token usage and cost optimization
    • Comfortable with design and implementation of security controls for data-intensive AI systems