Data Engineer for VLM Training Data (GigaChat Vision)

6.0/10

Sber

Not specified

Office / on-site

mid

about 3 hours ago

ai dev tech fintech Python data engineering data pipelines S3 YTsaurus DVC Git Docker

AI SummaryVerified by Aipplify AI

The vacancy is strong in task clarity and requirements but lacks compensation details.

6.5

/ 10

AI quality score6.5 / 10

Check Match — Just drop your CV

See your fit for Data Engineer for VLM Training Data (GigaChat Vision) in seconds.

Overview

Join Sber as a Data Engineer to work on VLM training data, focusing on data pipelines and ML team needs. Sber is a leading financial institution in Russia, providing a wide range of banking and financial services.

What you'll do

•Gather and structure the ML team's data needs for training, fine-tuning, evaluation, and improvement of VLM.
•Propose and implement ideas for data cleaning, filtering, deduplication, categorization, and generation pipelines.
•Navigate modern practices for building datasets for Vision-Language Models: image-text pairs, synthetic data, filtering, quality scoring, data mixture design, dataset versioning.
•Be responsible for the infrastructure for data storage and preparation, including:
•importing data from various sources: production, Common Crawl, open-source datasets, generated data;
•validating and controlling data quality;
•storing and versioning datasets;
•exporting data in formats suitable for model training.
•Design and implement data processing pipelines at scale, including tens of billions of images.
•Develop pipelines for generating synthetic data for training and improving VLM.
•Collect statistics on data, build reports and visualizations for analyzing the composition, quality, and coverage of datasets.
•Ensure reproducibility, observability, and reliability of data processes.
•Work closely with ML engineers, researchers, and the infrastructure team.

Skills

Python data engineering data pipelines S3 YTsaurus DVC Git Docker PostgreSQL

Cover letter

Free

Personalized intro — Telegram or ATS tone.

Adapt your CV

Free

Section edits + ATS-ready PDF for this role.

Salary not listed

Market range for similar roles

Based on 363 comparable Data Engineer openings (annual, USD)

$77k–$160k

Typical midpoint $113k

$77k$160k

Company Info

Sber

Banking

Sber (formerly Sberbank of Russia) is the largest bank in Russia and Central and Eastern Europe, providing commercial and retail banking services including corporate loans, asset management, online banking, and financial services to institutions.

Moscow, Russia

1000+ employees

Founded 1841

Website

More at Sber· 42 open