ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

Sber and Yandex veterans teach systematic LLM testing on May 28 | UncensoredHub

Industry

Sber and Yandex veterans teach systematic LLM testing on May 28

School of Higher Mathematics webinar walks developers through structured evaluation: from raw logs and user feedback to automated regression checks and measurable improvement cycles for AI features in production.

ByAlex Sokoloff·May 27, 2026

Sber and Yandex veterans teach systematic LLM testing on May 28

School of Higher Mathematics is hosting a webinar on May 28 at 19:30 Moscow time focused on moving LLM product teams from ad-hoc testing to systematic quality evaluation. The session targets developers, ML engineers, product managers, and team leads shipping AI features to production who currently assess model improvements by feel rather than measurement.

Andrey Kiselev, Head of Product at an AI company and former Revolut and Yandex engineer, and Fedor Azarov, head of data research at Sber CIB, lead the session. The format is a live demo with a reusable framework attendees can apply to commercial or side projects.

What stands out

01Raw log collection and feedback loops. Capture interaction data and convert subjective user feedback into measurable signals.
02Metric design for LLM outputs. Identify metrics that predict user satisfaction rather than just correlating with token count or prompt length.
03Automated regression suites. Flag when a new prompt or model version breaks existing scenarios without manual review.
04Structured before-and-after testing. Use A/B or holdout methods to confirm whether a change genuinely improved the feature or redistributed errors.
05End-to-end improvement cycle. Link logs → metrics → automation → deployment into a repeatable process.

ZenCreator

Sber and Yandex veterans teach systematic LLM testing on May 28

What stands out

More in Industry

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%

Soofi S 30B activates 3B parameters per token, tops European AI baselines