PEFT-FLANT5

2022

In this notebook, we will fine-tune a FLAN-T5 model to generate less toxic content with Meta AI's hate speech reward model. The reward model is a binary classifier that predicts either "not hate" or "hate" for the given text. We will use Proximal Policy Optimization (PPO) to fine-tune and reduce the model's toxicity.

Notebook: Notebook
Platform: Google Colab
Stack: PyTorch, Transformers, PEFT, Evaluate

graph TD
    %% Base Model & PEFT Setup
    subgraph BaseSetup [Model Preparation]
        BM["🧠 Pre-trained FLAN-T5 Model
(Base Model)"]
        PEFT_A["⚙️ PEFT Adapter Layers
(LoRA, etc.)"]

        BM --> PEFT_A

        PEFT_A --- ParamSplit{ }
        ParamSplit --> FP["❄️ Frozen Parameters"]
        ParamSplit --> TP["🔓 Trainable Parameters"]

        BM & TP --> PEFT_Mod["PEFT-modified
FLAN-T5 Model"]
    end

    %% Phase 1: Supervised Fine-Tuning
    subgraph SFT [Phase 1: Supervised Fine-Tuning]
        DSD["📄 Dialog Summarization Dataset
(Input: Dialog, Output: Summary)"]
        Train_SFT["📈 Training Process
(Forward & Backward Pass)"]
        FT_Model["📝 Fine-tuned FLAN-T5 Model
for Dialog Summarization"]

        PEFT_Mod --> Train_SFT
        DSD --> Train_SFT
        Train_SFT --> FT_Model
    end

    %% Phase 2: RLHF
    %% Connecting the Model from SFT to the entire RLHF Subgraph
    FT_Model --> RLHF

    subgraph RLHF [Phase 2: RLHF Alignment]
        PPO["⚖️ PPO (RLHF)
Detoxification"]
        Aligned["✨ Aligned Model"]

        PPO --> Aligned
    end

    %% Styling - Dark theme compatible colors
    style BM fill:#2d3748,stroke:#14b8a6,color:#e2e8f0
    style PEFT_A fill:#2d3748,stroke:#14b8a6,color:#e2e8f0
    style ParamSplit fill:#2d3748,stroke:#14b8a6,color:#e2e8f0
    style FP fill:#2d3748,stroke:#718096,color:#e2e8f0
    style TP fill:#2d3748,stroke:#14b8a6,color:#e2e8f0
    style PEFT_Mod fill:#2d3748,stroke:#14b8a6,color:#e2e8f0,stroke-width:2px
    style DSD fill:#2d3748,stroke:#9f7aea,color:#e2e8f0
    style Train_SFT fill:#2d3748,stroke:#9f7aea,color:#e2e8f0
    style FT_Model fill:#2d3748,stroke:#9f7aea,color:#e2e8f0,stroke-width:2px
    style PPO fill:#2d3748,stroke:#f59e0b,color:#e2e8f0
    style Aligned fill:#2d3748,stroke:#10b981,color:#e2e8f0,stroke-width:3px
    style BaseSetup fill:#1a202c,stroke:#14b8a6,color:#14b8a6
    style SFT fill:#1a202c,stroke:#9f7aea,color:#9f7aea
    style RLHF fill:#1a202c,stroke:#10b981,color:#10b981