Feature request
I have checked the code and there are few places which talk about TP. I saw from_pretrained method for model contains tp_plan and device_mesh. I also checked that the TrainingArgument can take parallelism_config which defines the TP/CP plan along with FSDP. However, I am not able to successfully stitch things together to make the only TP based training work. Please help.
Ref:
Motivation
Need to enable only TP based training, but no tutorial or example is available.
Your contribution
Given proper understanding and proper guidance, I can come up with clean example and documentation for the same.