AI SummaryVerified by Aipplify AI
The vacancy is strong in task clarity and requirements but lacks compensation details.
AI quality score6.5 / 10
Check Match โ Just drop your CV
See your fit for Data Engineer for VLM Training Data (GigaChat Vision) in seconds.
Overview
Join Sber as a Data Engineer to work on VLM training data, focusing on data pipelines and ML team needs. Sber is a leading financial institution in Russia, providing a wide range of banking and financial services.
What you'll do
- โขGather and structure the ML team's data needs for training, fine-tuning, evaluation, and improvement of VLM.
- โขPropose and implement ideas for data cleaning, filtering, deduplication, categorization, and generation pipelines.
- โขNavigate modern practices for building datasets for Vision-Language Models: image-text pairs, synthetic data, filtering, quality scoring, data mixture design, dataset versioning.
- โขBe responsible for the infrastructure for data storage and preparation, including:
- โขimporting data from various sources: production, Common Crawl, open-source datasets, generated data;
- โขvalidating and controlling data quality;
- โขstoring and versioning datasets;
- โขexporting data in formats suitable for model training.
- โขDesign and implement data processing pipelines at scale, including tens of billions of images.
- โขDevelop pipelines for generating synthetic data for training and improving VLM.
- โขCollect statistics on data, build reports and visualizations for analyzing the composition, quality, and coverage of datasets.
- โขEnsure reproducibility, observability, and reliability of data processes.
- โขWork closely with ML engineers, researchers, and the infrastructure team.
Loading similar jobs...