notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

ConstitutionalAI.md (1040B)


      1 # Constitutional AI
      2 
      3 **Definition:** Constitutional AI is an alignment strategy where a model is trained, in part, using a short list of principle or instructions laid out prior to the beginning the stage of training.
      4 
      5 ## Harmlessness from AI Feedback
      6 
      7 ### Supervised Stage
      8 
      9 1) Generate responses to harmful prompts using a helpful model
     10 2) Ask the model to critique the responses based on a principle in the constitution
     11 3) Revise responses in a sequence, drawing from principles in the constitution randomly
     12 4) Finetune on revised responses
     13 
     14 ### RL Stage
     15 
     16 1) AI assistant trained with supervised learning from first stage generates pairs of responses to each prompt
     17 2) Formulate each prompt and pair into multiple choice questions where we ask which response is best according to a constitutional principle.
     18 3) Use these results as the preference dataset mixed with human feedback helpfulness dataset
     19 4) Train preference model on the dataset
     20 5) Finetune supervised learning model from the first stage via RL against this preference model