PEFT-FLANT5
2022In this notebook, we will fine-tune a FLAN-T5 model to generate less toxic content with Meta AI's hate speech reward model. The reward model is a binary classifier that predicts either "not hate" or "hate" for the given text. We will use Proximal Policy Optimization (PPO) to fine-tune and reduce the model's toxicity.
- Notebook
- Notebook
- Platform
- Google Colab
- Stack
- PyTorch, Transformers, PEFT, Evaluate
graph TD
%% Base Model & PEFT Setup
subgraph BaseSetup [Model Preparation]
BM["🧠 Pre-trained FLAN-T5 Model
(Base Model)"]
PEFT_A["⚙️ PEFT Adapter Layers
(LoRA, etc.)"]
BM --> PEFT_A
PEFT_A --- ParamSplit{ }
ParamSplit --> FP["❄️ Frozen Parameters"]
ParamSplit --> TP["🔓 Trainable Parameters"]
BM & TP --> PEFT_Mod["PEFT-modified
FLAN-T5 Model"]
end
%% Phase 1: Supervised Fine-Tuning
subgraph SFT [Phase 1: Supervised Fine-Tuning]
DSD["📄 Dialog Summarization Dataset
(Input: Dialog, Output: Summary)"]
Train_SFT["📈 Training Process
(Forward & Backward Pass)"]
FT_Model["📝 Fine-tuned FLAN-T5 Model
for Dialog Summarization"]
PEFT_Mod --> Train_SFT
DSD --> Train_SFT
Train_SFT --> FT_Model
end
%% Phase 2: RLHF
%% Connecting the Model from SFT to the entire RLHF Subgraph
FT_Model --> RLHF
subgraph RLHF [Phase 2: RLHF Alignment]
PPO["⚖️ PPO (RLHF)
Detoxification"]
Aligned["✨ Aligned Model"]
PPO --> Aligned
end
%% Styling - Dark theme compatible colors
style BM fill:#2d3748,stroke:#14b8a6,color:#e2e8f0
style PEFT_A fill:#2d3748,stroke:#14b8a6,color:#e2e8f0
style ParamSplit fill:#2d3748,stroke:#14b8a6,color:#e2e8f0
style FP fill:#2d3748,stroke:#718096,color:#e2e8f0
style TP fill:#2d3748,stroke:#14b8a6,color:#e2e8f0
style PEFT_Mod fill:#2d3748,stroke:#14b8a6,color:#e2e8f0,stroke-width:2px
style DSD fill:#2d3748,stroke:#9f7aea,color:#e2e8f0
style Train_SFT fill:#2d3748,stroke:#9f7aea,color:#e2e8f0
style FT_Model fill:#2d3748,stroke:#9f7aea,color:#e2e8f0,stroke-width:2px
style PPO fill:#2d3748,stroke:#f59e0b,color:#e2e8f0
style Aligned fill:#2d3748,stroke:#10b981,color:#e2e8f0,stroke-width:3px
style BaseSetup fill:#1a202c,stroke:#14b8a6,color:#14b8a6
style SFT fill:#1a202c,stroke:#9f7aea,color:#9f7aea
style RLHF fill:#1a202c,stroke:#10b981,color:#10b981