vllm.model_executor.layers.logits_processor ¶
A layer that compute logits from hidden_stats.
LogitsProcessor ¶
Bases: CustomOp
Process logits and apply logits processors from sampling metadata.
This layer does the following: 1. Gather logits from model hidden_states. 2. Scale logits if needed. 3. Apply logits processors (if any).
Source code in vllm/model_executor/layers/logits_processor.py
__init__ ¶
__init__(
vocab_size: int,
org_vocab_size: Optional[int] = None,
scale: float = 1.0,
logits_as_input: bool = False,
soft_cap: Optional[float] = None,
) -> None
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scale | float | A scaling factor to apply to the logits. | 1.0 |
Source code in vllm/model_executor/layers/logits_processor.py
_gather_logits ¶
gather/all-gather the logits tensor across model parallel group.
Source code in vllm/model_executor/layers/logits_processor.py
_get_logits ¶
_get_logits(
hidden_states: Tensor,
lm_head: VocabParallelEmbedding,
embedding_bias: Optional[Tensor],
) -> Optional[Tensor]
Source code in vllm/model_executor/layers/logits_processor.py
forward ¶
forward(
lm_head: VocabParallelEmbedding,
hidden_states: Tensor,
embedding_bias: Optional[Tensor] = None,
) -> Optional[Tensor]