We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated.
Notes: https://arxiv.org/ftp/arxiv/papers/2104/2104.10350.pdf Table 4 In the paper: "We trained our best model for 30 days on a TPUv3 Pod (2,048 TPU cores) on the Meena dataset containing 40B words (or 61B BPE tokens) [...] by the end of training, the model had traversed the full training set 164 times (or epochs) and observed a total of about 10T tokens" Hardware: 30 * 24 * 3600 * (2048/2) * 1.23e14 * 0.3 = 9.794e22 Ops counting: 6 * 10T * 2.6B = 1.56E23 Geometric mean: sqrt(9.79e22*1.56E23) = 1.24e23, very close to the figure in the link above.
Size Notes: "The final Meena dataset contains 341GB of text (40B words)" Converting from GB to words yields 6.8e10, which is in the same OOM
Notes: "We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token."