Back to Works

PEFT-FLANT5

2022

In this notebook, we will fine-tune a FLAN-T5 model to generate less toxic content with Meta AI's hate speech reward model. The reward model is a binary classifier that predicts either "not hate" or "hate" for the given text. We will use Proximal Policy Optimization (PPO) to fine-tune and reduce the model's toxicity.

Notebook
Notebook
Platform
Google Colab
Stack
PyTorch, Transformers, PEFT, Evaluate
graph TD
    %% Base Model & PEFT Setup
    subgraph BaseSetup [Model Preparation]
        BM["🧠 Pre-trained FLAN-T5 Model
(Base Model)"] PEFT_A["⚙️ PEFT Adapter Layers
(LoRA, etc.)"] BM --> PEFT_A PEFT_A --- ParamSplit{ } ParamSplit --> FP["❄️ Frozen Parameters"] ParamSplit --> TP["🔓 Trainable Parameters"] BM & TP --> PEFT_Mod["PEFT-modified
FLAN-T5 Model"] end %% Phase 1: Supervised Fine-Tuning subgraph SFT [Phase 1: Supervised Fine-Tuning] DSD["📄 Dialog Summarization Dataset
(Input: Dialog, Output: Summary)"] Train_SFT["📈 Training Process
(Forward & Backward Pass)"] FT_Model["📝 Fine-tuned FLAN-T5 Model
for Dialog Summarization"] PEFT_Mod --> Train_SFT DSD --> Train_SFT Train_SFT --> FT_Model end %% Phase 2: RLHF %% Connecting the Model from SFT to the entire RLHF Subgraph FT_Model --> RLHF subgraph RLHF [Phase 2: RLHF Alignment] PPO["⚖️ PPO (RLHF)
Detoxification"] Aligned["✨ Aligned Model"] PPO --> Aligned end %% Styling - Dark theme compatible colors style BM fill:#2d3748,stroke:#14b8a6,color:#e2e8f0 style PEFT_A fill:#2d3748,stroke:#14b8a6,color:#e2e8f0 style ParamSplit fill:#2d3748,stroke:#14b8a6,color:#e2e8f0 style FP fill:#2d3748,stroke:#718096,color:#e2e8f0 style TP fill:#2d3748,stroke:#14b8a6,color:#e2e8f0 style PEFT_Mod fill:#2d3748,stroke:#14b8a6,color:#e2e8f0,stroke-width:2px style DSD fill:#2d3748,stroke:#9f7aea,color:#e2e8f0 style Train_SFT fill:#2d3748,stroke:#9f7aea,color:#e2e8f0 style FT_Model fill:#2d3748,stroke:#9f7aea,color:#e2e8f0,stroke-width:2px style PPO fill:#2d3748,stroke:#f59e0b,color:#e2e8f0 style Aligned fill:#2d3748,stroke:#10b981,color:#e2e8f0,stroke-width:3px style BaseSetup fill:#1a202c,stroke:#14b8a6,color:#14b8a6 style SFT fill:#1a202c,stroke:#9f7aea,color:#9f7aea style RLHF fill:#1a202c,stroke:#10b981,color:#10b981