vllm.attention.backends.registry ¶
Attention backend registry
BACKEND_MAP module-attribute
¶
BACKEND_MAP = {
FLASH_ATTN: "vllm.v1.attention.backends.flash_attn.FlashAttentionBackend",
TRITON_ATTN: "vllm.v1.attention.backends.triton_attn.TritonAttentionBackend",
XFORMERS: "vllm.v1.attention.backends.xformers.XFormersAttentionBackend",
ROCM_ATTN: "vllm.v1.attention.backends.rocm_attn.RocmAttentionBackend",
ROCM_AITER_MLA: "vllm.v1.attention.backends.mla.rocm_aiter_mla.AiterMLABackend",
ROCM_AITER_FA: "vllm.v1.attention.backends.rocm_aiter_fa.AiterFlashAttentionBackend",
TORCH_SDPA: "vllm.v1.attention.backends.cpu_attn.TorchSDPABackend",
FLASHINFER: "vllm.v1.attention.backends.flashinfer.FlashInferBackend",
FLASHINFER_MLA: "vllm.v1.attention.backends.mla.flashinfer_mla.FlashInferMLABackend",
TRITON_MLA: "vllm.v1.attention.backends.mla.triton_mla.TritonMLABackend",
CUTLASS_MLA: "vllm.v1.attention.backends.mla.cutlass_mla.CutlassMLABackend",
FLASHMLA: "vllm.v1.attention.backends.mla.flashmla.FlashMLABackend",
FLASHMLA_SPARSE: "vllm.v1.attention.backends.mla.flashmla_sparse.FlashMLASparseBackend",
FLASH_ATTN_MLA: "vllm.v1.attention.backends.mla.flashattn_mla.FlashAttnMLABackend",
PALLAS: "vllm.v1.attention.backends.pallas.PallasAttentionBackend",
FLEX_ATTENTION: "vllm.v1.attention.backends.flex_attention.FlexAttentionBackend",
TREE_ATTN: "vllm.v1.attention.backends.tree_attn.TreeAttentionBackend",
ROCM_AITER_UNIFIED_ATTN: "vllm.v1.attention.backends.rocm_aiter_unified_attn.RocmAiterUnifiedAttentionBackend",
}
_Backend ¶
Bases: Enum
Source code in vllm/attention/backends/registry.py
backend_name_to_enum ¶
Convert a string backend name to a _Backend enum value.
Returns:
Name | Type | Description |
---|---|---|
_Backend | Optional[_Backend] | enum value if backend_name is a valid in-tree type |
None | Optional[_Backend] | otherwise it's an invalid in-tree type or an out-of-tree platform is loaded. |
Source code in vllm/attention/backends/registry.py
backend_to_class ¶
backend_to_class_str ¶
register_attn_backend ¶
Decorator: register a custom attention backend into BACKEND_MAPPING. - If class_path is provided, use it. - Otherwise, auto-generate from the class object. Validation: only checks if 'backend' is a valid _Backend enum member. Overwriting existing mappings is allowed. This enables other hardware platforms to plug in custom out-of-tree backends.