Reinforcement Learning from Human Feedback¶

Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors. vLLM can be used to generate the completions for RLHF.

The following open-source RL libraries use vLLM for fast rollouts (sorted alphabetically and non-exhaustive):

See the following basic examples to get started if you don't want to use an existing library:

See the following notebooks showing how to use vLLM for GRPO: