Media Summary: In this episode of the AI Research Roundup, host Alex delves into a new approach for enhancing large language model ... Strengthen your technical foundations with Brilliant! Visit to start learning for free and save 20% off ... הרצאה זו היא חלק מכנס GenML 2025 של קהילת MDLI. אתם יכולים לצפות בשאר ההרצאות ובמצגות פה: Training ...

Teaching Llms With Rl From - Detailed Analysis & Overview

In this episode of the AI Research Roundup, host Alex delves into a new approach for enhancing large language model ... Strengthen your technical foundations with Brilliant! Visit to start learning for free and save 20% off ... הרצאה זו היא חלק מכנס GenML 2025 של קהילת MDLI. אתם יכולים לצפות בשאר ההרצאות ובמצגות פה: Training ... In this episode of the AI Research Roundup, host Alex explores a groundbreaking paper on unsupervised model improvement: ... In this video, I break down DeepSeek's Group Relative Policy Optimization (GRPO) from first principles, without assuming prior ... Full episode: Me on twitter: Richard Sutton is the father of reinforcement ...

Want your team maximizing Claude? I run 1:1 and team AI workshops for companies doing $1M+ per year: ... How does Reinforcement Learning work? A short cartoon that intuitively explains this amazing machine learning approach, and ... In this hands-on tutorial video, I am explaining Reasoning Start learning cyber security with TryHackMe: Use my code "BYCLOUD25" to get 25% off on annual ... Julien Launay, CEO, Adaptive ML About the Speaker: Julien is the CEO and co-founder of Adaptive ML, a company focused on ...

Photo Gallery

Teaching LLMs to Search Smarter with RL
Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems
Teaching LLMs with RL: From Scratch to GRPO and Beyond
Reinforcement Learning (RL) for LLMs
TTRL: LLMs Self-Improve with RL
DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs
The Fundamental Problem With LLMs – Richard Sutton
How to Train LLMs to "Think" (o1 & DeepSeek-R1)
Reinforcement Learning from scratch
How to finetune LLMs to THINK with Reinforcement Learning (GRPO from scratch!)
Reinforcement Learning with LLMs: a new era of AI agents
The RL Irony in LLMs (and its insane new meta)
Sponsored
Sponsored
View Detailed Profile
Teaching LLMs to Search Smarter with RL

Teaching LLMs to Search Smarter with RL

In this episode of the AI Research Roundup, host Alex delves into a new approach for enhancing large language model ...

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

Strengthen your technical foundations with Brilliant! Visit https://brilliant.org/AdamLucek/ to start learning for free and save 20% off ...

Sponsored
Teaching LLMs with RL: From Scratch to GRPO and Beyond

Teaching LLMs with RL: From Scratch to GRPO and Beyond

הרצאה זו היא חלק מכנס GenML 2025 של קהילת MDLI. אתם יכולים לצפות בשאר ההרצאות ובמצגות פה: https://mdli.co.il/en25. Training ...

Reinforcement Learning (RL) for LLMs

Reinforcement Learning (RL) for LLMs

Lecture on reinforcement learning (

TTRL: LLMs Self-Improve with RL

TTRL: LLMs Self-Improve with RL

In this episode of the AI Research Roundup, host Alex explores a groundbreaking paper on unsupervised model improvement: ...

Sponsored
DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs

DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs

In this video, I break down DeepSeek's Group Relative Policy Optimization (GRPO) from first principles, without assuming prior ...

The Fundamental Problem With LLMs – Richard Sutton

The Fundamental Problem With LLMs – Richard Sutton

Full episode: https://youtu.be/21EYKqUsPfg Me on twitter: https://x.com/dwarkesh_sp Richard Sutton is the father of reinforcement ...

How to Train LLMs to "Think" (o1 & DeepSeek-R1)

How to Train LLMs to "Think" (o1 & DeepSeek-R1)

Want your team maximizing Claude? I run 1:1 and team AI workshops for companies doing $1M+ per year: ...

Reinforcement Learning from scratch

Reinforcement Learning from scratch

How does Reinforcement Learning work? A short cartoon that intuitively explains this amazing machine learning approach, and ...

How to finetune LLMs to THINK with Reinforcement Learning (GRPO from scratch!)

How to finetune LLMs to THINK with Reinforcement Learning (GRPO from scratch!)

In this hands-on tutorial video, I am explaining Reasoning

Reinforcement Learning with LLMs: a new era of AI agents

Reinforcement Learning with LLMs: a new era of AI agents

Want your team maximizing Claude? I run 1:1 and team AI workshops for companies doing $1M+ per year: ...

The RL Irony in LLMs (and its insane new meta)

The RL Irony in LLMs (and its insane new meta)

Start learning cyber security with TryHackMe: https://tryhackme.com/bycloud Use my code "BYCLOUD25" to get 25% off on annual ...

Beyond Gemini: Using RL to unlock reliable AI agents with open LLMs

Beyond Gemini: Using RL to unlock reliable AI agents with open LLMs

Julien Launay, CEO, Adaptive ML About the Speaker: Julien is the CEO and co-founder of Adaptive ML, a company focused on ...