vllm.v1.worker.dp_utils ¶
_get_device_and_group ¶
_get_device_and_group(parallel_config: ParallelConfig)
Source code in vllm/v1/worker/dp_utils.py
_post_process_ubatch ¶
Source code in vllm/v1/worker/dp_utils.py
_run_ar ¶
_run_ar(
should_ubatch: bool,
orig_num_tokens_per_ubatch: int,
padded_num_tokens_per_ubatch: int,
parallel_config: ParallelConfig,
) -> Tensor
Source code in vllm/v1/worker/dp_utils.py
_synchronize_dp_ranks ¶
_synchronize_dp_ranks(
num_tokens_unpadded: int,
num_tokens_padded: int,
should_attempt_ubatching: bool,
parallel_config: ParallelConfig,
) -> tuple[bool, Optional[Tensor]]
-
Decides if each DP rank is going to microbatch. Either all ranks run with microbatching or none of them do.
-
Determines the total number of tokens that each rank will run. All ranks will be padded out so that the run with the same number of tokens
tuple[
Name | Type | Description |
---|---|---|
should_ubatch | bool | Are all DP ranks going to microbatch |
num_tokens_after_padding | Optional[Tensor] | A tensor containing the total number of |
tuple[bool, Optional[Tensor]] | tokens per-microbatch for each DP rank including padding. |
]
Source code in vllm/v1/worker/dp_utils.py
coordinate_batch_across_dp ¶
coordinate_batch_across_dp(
num_scheduled_tokens_per_request: ndarray,
num_tokens_unpadded: int,
num_tokens_padded: int,
parallel_config: ParallelConfig,
allow_microbatching: bool,
uniform_decode: bool,
) -> tuple[Optional[UBatchSlices], Optional[Tensor]]
Coordinates amongst all DP ranks to determine if and how the full batch should be split into microbatches.
tuple[
Name | Type | Description |
---|---|---|
ubatch_slices | Optional[UBatchSlices] | if this is set then all DP ranks have agreed to |
Optional[Tensor] | microbatch | |
num_tokens_after_padding | tuple[Optional[UBatchSlices], Optional[Tensor]] | A tensor containing the total number of |
tuple[Optional[UBatchSlices], Optional[Tensor]] | tokens per-microbatch for each DP rank including padding. |
]