š¢ Welcome to the First High-Performance Trillion-Parameter MOE Architecture LLM trained jointly by xDAN and APUS. š¢ š¤ This a high-performance MOE model whose Math(GSM8k_Cot:79%), Reasoning(MMLU:75%)! š Feel free to use according to the inference code. APUS-xDAN-4.0-MOE leverages the innovative Mixture of Experts (MoE) architecture, incorporating components from dense language models. Specifically, it inherits its capabilities from the highly performant xDAN-L2 Series. With a total of 136 billion parameters, of which 30 billion are activated during runtime, APUS-xDAN-4.0-MOE demonstrates unparalleled efficiency. Through advanced quantization techniques, our open-source version occupies a mere 42GB, making it seamlessly compatible with consumer-grade GPUs like the 4090 and 3090.
Notes: According to reports, APUS-xDAN-4.0 (MoE) is the first open source large model in China with a MoE architecture of more than 100 billion parameters, with a parameter scale of 136 billion. This is also the largest open source model in China. Among the previous large open source models in China, the largest parameter scale was Alibaba's Qianwen 72B with 72 billion parameters, with a parameter scale of 72 billion.