СберЗдоровье

Data Scientist (NLP|LLM)

6.0/10

СберЗдоровье

Not specified
Hybrid
mid
9 days ago
aitechPythonNLPMLPyTorchHuggingFaceMLflowClearML

AI Summary

The vacancy is strong in task clarity and requirements but lacks compensation details and company links.

Check Match — Just drop your CV

See your fit for Data Scientist (NLP|LLM) in seconds.

Description

What you'll do

  • Design and maintain the full cycle of improving medical LLMs: data collection, cleaning, versioning, training, and fine-tuning.
  • Build datasets and labeling contours: schemas and guidelines, consistency control, synthetic data generation, self-training, error and bias analysis.
  • Develop LLM-based pipelines and agents for medical tasks: RAG based on clinical recommendations, tool-calling, routing, multi-step workflows, orchestration (LangGraph and multi-agent frameworks), guardrails.
  • Create and develop an evaluation system: test sets and benchmarks, automatic metrics and LLM-as-a-judge where appropriate, expert validation with doctors, red-teaming, regression runs, A/B testing in production.
  • Conduct research iterations: formulate hypotheses, conduct experiments, perform ablation studies, document results, prepare scientific articles, and bring materials to publication.

Conditions

  • Strong team of professionals passionate about their work;
  • Opportunity for growth in a leading MedTech company in Russia;
  • Cozy office in the City with a panoramic view of the city, hybrid work format;
  • Corporate equipment;
  • Medical program including telemedicine consultations, in-person visits to clinics, psychologists, dentistry, laboratory and instrumental diagnostics;
  • Paid English language courses;
  • Support for an active lifestyle — choose sports activities you enjoy (corporate squash, running, football in Moscow, and reimbursement of your sports subscription);
  • SberUniversity and payment for specialized training and courses.

Requirements

  • 3+ years in NLP/ML, proficient in Python: typing, testing, profiling, clean production code.
  • Practical experience in training and fine-tuning transformers: PyTorch + HuggingFace, understanding of Accelerate, DeepSpeed, or similar.
  • Experience in building data pipelines and reproducible experiments: datasets, versions, configs, tracking (MLflow or ClearML), ability to make correct comparisons.
  • Understanding of LLM systems: retrieval, tool-calling, agents, quality degradation, hallucinations, production limitations.
  • Quality assessment skills: metrics, benchmarks, error analysis, ablations, working with labeling and expert validation.
Loading similar jobs...