Article: Data Protectionism Is Self-Defeating

SquishyPillow@burggit.moe · 1 year ago

Article: Data Protectionism Is Self-Defeating

RA2lover@burggit.moe · 1 year ago

you don’t need to use all the output for training if you can separate the good parts. “OpenAI” reportedly used paid for (and is now using free) RLHF for this, Anthropic is trying to develop RLAIF to achieve the same.

SquishyPillow@burggit.moe · 1 year ago

Look into WizardLM. The researchers that trained it basically gave ChatGPT a bunch of algorithm-defined prompts, scraped the chat logs, and used them to train another model. Here is a link to their paper describing the process in detail.

rinkan 輪姦@burggit.moe · 1 year ago

Ah, that makes sense. So new data is being added, just in a different form.