Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy, so intended to answer almost anything and, far harder, even suggest what questions to ask! Grok is designed to answer questions with a bit of wit and has a rebellious streak, so please don’t use it if you hate humor! A unique and fundamental advantage of Grok is that it has real-time knowledge of the world via the 𝕏 platform. It will also answer spicy questions that are rejected by most other AI systems. Grok is still a very early beta product – the best we could do with 2 months of training – so expect it to improve rapidly with each passing week with your help.
Notes: "On these benchmarks, Grok-1 displayed strong results, surpassing all other models in its compute class, including ChatGPT-3.5 and Inflection-1. It is only surpassed by models that were trained with a significantly larger amount of training data and compute resources like GPT-4" Per table, Grok-1 is surpassed by Palm 2, Claude 2, GPT-4, so it required less compute than these three models. Palm 2 was trained on 7e24 FLOP. GPT-3.5 is ~2.6e24. Inflection-1's compute is not public/known by us but Inflection says Inflection-1 compute was <= Palm-540B's (which was ~2.5e24). For optimal training, our current working hypothesis is that you still need something like Chinchilla scaling on the total number of parameters in the model, even for MoE models, so optimal dataset size would be 20*310B tokens. With 25%*314B params active per forward pass, this would be around 3e24 FLOP. https://www.wolframalpha.com/input?i=20*310+billion+*+6+*+25%25+*+314+billion
Size Notes: (Speculative confidence, see compute notes)
Notes: "314B parameter Mixture-of-Experts model with 25% of the weights active on a given token". So effectively 78B parameters Mixture of 8 experts: https://github.com/xai-org/grok-1