Vladimir Malinovskii

I am an ML Resident at Yandex Research, specializing in quantization algorithms for Large Language Models. I am pursuing a Master's degree in Computer Science at the Higher School of Economics (HSE), through a joint program with the Yandex School of Data Analysis.

Previously, I was a Software Engineer in the Infrastructure team at Yandex, contributing to the development and maintenance of high-scale deployment systems.

I hold a Bachelor of Science in Applied Mathematics and Physics with a minor in Data Analysis from the Moscow Institute of Physics and Technology (MIPT).

Email  /  CV  /  Scholar  /  Github  /  Linkedin

profile photo

Publications

PV‑Tuning: Beyond Straight‑Through Estimation for Extreme LLM Compression

Vladimir Malinovskii*, Denis Mazur*, Ivan Ilin*, Denis Kuznedelev, Konstantin Burlachenko, Kai Yi, Dan Alistarh, and Peter Richtarik

NIPS, 2024, Oral

Arxiv, Code

Pet projects

Llama-3.1-8B cpu inference in browser (demo)

I implemented WebAssembly LLM inference engine for the quantization algorithm my colleagues and I developed called AQLM+PV-tuning. It can run on almost any modern browser with sufficient ram on cpu. No gpu required!