New Release: LLaMA
Besides being up to 65B parameters model, LLaMA stands out being built on open-source training data, therefore the models are made available to researchers, while its target competitor OpenAI's GPT-3 is not open.
data:image/s3,"s3://crabby-images/4da9e/4da9ed9e19191cf79510c19849db89af9d026755" alt="Twitter avatar for @MetaAI"
data:image/s3,"s3://crabby-images/324eb/324eb62677bfab947347ee2066c367d03b2280e8" alt="Image"
Here are a few Meta employees tweeting about it.
Guillaume Lample, @GuillaumeLample
Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters.
LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B.
The weights for all models are open and available at https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/
data:image/s3,"s3://crabby-images/bf762/bf762e076a5adc2ec868011d29596dfe20eec186" alt="Twitter avatar for @GuillaumeLample"
data:image/s3,"s3://crabby-images/c7258/c725854af41ffdf17d540f385c7f02a317d28c2e" alt="Image"
data:image/s3,"s3://crabby-images/fd407/fd40759436772ef8e83f3bb28a151ba51935605f" alt="Image"
data:image/s3,"s3://crabby-images/cc5a8/cc5a8d9cdab80c75d02116677418add5a7827a57" alt="Image"
data:image/s3,"s3://crabby-images/8396b/8396bdf7c7571d929e3012eeac390b018dd2c34f" alt="Image"
Joelle Pineau, @jpineau1
Introducing LLaMA: A foundational, 65-billion-parameter large language model
And an analysis by non-Meta employee.
Guido Appenzeller, @appenz
1/5 Meta launched their GPT-3 competitor LLaMA today. Here is a quick analysis of how it stacks up, how open it is and how it changes the industry landscape.
data:image/s3,"s3://crabby-images/26ec3/26ec37bdcfe8c5ea178d8e56e2416d17cd5e8465" alt="Twitter avatar for @appenz"
Community: Stability AI
Stability AI, @StabilityAI, started an effort (i.e. a newsletter on substack) to share what the community is creating. I think this is a great idea.
Their first is an EXCLUSIVE interview w/ THE amazing @remi_molettee
data:image/s3,"s3://crabby-images/1e7ff/1e7fff2ad576a001b2023b73933f233550de0600" alt="Twitter avatar for @StabilityAI"
data:image/s3,"s3://crabby-images/13210/132103b35e457fbaecd9229bbcfe12dd7d7ad591" alt=""
Tutorial:
Misha Laskin, @MishaLaskin
Starting a blog about the engineering + scientific ideas behind training large models (e.g. transformers).
First post covers data parallelism, a simple and common technique for parallelizing computation across multiple devices.