Data Scientist (NLP|LLM)
6.0/10
СберЗдоровье
Not specified
Hybrid
mid
9 days ago
aitechPythonNLPMLPyTorchHuggingFaceMLflowClearML
AI Summary
The vacancy is strong in task clarity and requirements but lacks compensation details and company links.
Check Match — Just drop your CV
See your fit for Data Scientist (NLP|LLM) in seconds.
Description
What you'll do
- •Design and maintain the full cycle of improving medical LLMs: data collection, cleaning, versioning, training, and fine-tuning.
- •Build datasets and labeling contours: schemas and guidelines, consistency control, synthetic data generation, self-training, error and bias analysis.
- •Develop LLM-based pipelines and agents for medical tasks: RAG based on clinical recommendations, tool-calling, routing, multi-step workflows, orchestration (LangGraph and multi-agent frameworks), guardrails.
- •Create and develop an evaluation system: test sets and benchmarks, automatic metrics and LLM-as-a-judge where appropriate, expert validation with doctors, red-teaming, regression runs, A/B testing in production.
- •Conduct research iterations: formulate hypotheses, conduct experiments, perform ablation studies, document results, prepare scientific articles, and bring materials to publication.
Conditions
- •Strong team of professionals passionate about their work;
- •Opportunity for growth in a leading MedTech company in Russia;
- •Cozy office in the City with a panoramic view of the city, hybrid work format;
- •Corporate equipment;
- •Medical program including telemedicine consultations, in-person visits to clinics, psychologists, dentistry, laboratory and instrumental diagnostics;
- •Paid English language courses;
- •Support for an active lifestyle — choose sports activities you enjoy (corporate squash, running, football in Moscow, and reimbursement of your sports subscription);
- •SberUniversity and payment for specialized training and courses.
Requirements
- •3+ years in NLP/ML, proficient in Python: typing, testing, profiling, clean production code.
- •Practical experience in training and fine-tuning transformers: PyTorch + HuggingFace, understanding of Accelerate, DeepSpeed, or similar.
- •Experience in building data pipelines and reproducible experiments: datasets, versions, configs, tracking (MLflow or ClearML), ability to make correct comparisons.
- •Understanding of LLM systems: retrieval, tool-calling, agents, quality degradation, hallucinations, production limitations.
- •Quality assessment skills: metrics, benchmarks, error analysis, ablations, working with labeling and expert validation.
Loading similar jobs...