Skip to content

Need a concise example of Tensor Parallelism (TP) training using Trainer/SFTTrainer. #41141

@meet-minimalist

Description

@meet-minimalist

Feature request

I have checked the code and there are few places which talk about TP. I saw from_pretrained method for model contains tp_plan and device_mesh. I also checked that the TrainingArgument can take parallelism_config which defines the TP/CP plan along with FSDP. However, I am not able to successfully stitch things together to make the only TP based training work. Please help.

Ref:

Motivation

Need to enable only TP based training, but no tutorial or example is available.

Your contribution

Given proper understanding and proper guidance, I can come up with clean example and documentation for the same.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions