当前位置:首页|资讯|OpenAI

芯片禁令难挡中国AI小虎逆袭OpenAI,模型成本仅硅谷大厂零头

作者:未来图灵发布时间:2024-12-18

在人工智能领域,中国公司正以惊人的速度追赶并超越西方同行。而在这个过程中,零一万物无疑是其中翘楚。由李开复亲自领导,零一万物通过一系列创新技术,成功地将AI 模型的训练成本和推理成本都降低了97% ,在成本方面远超追 OpenAI xAI Google 等西方科技巨头。

本周,在由加州大学伯克利分校SkyLab LMSYS 的研究人员所发布的大模型排名中,零一万物凭借 Yi-Lightning 模型与发布了 Grok-2 模型的xAI 一道并列世界第三的大模型公司,超越了 OpenAI 所发布的 GPT-4o 2024.05.13 ),这也是中国大模型首次实现超越 GPT-4o” 的里程碑式壮举。

更令人惊讶地是,零一万物的模型训练成本低到让马斯克都意想不到。xAI 为了进行 Grok-2 模型训练,花费了两万张 GPU ,耗时4 个月。而零一万物的 Yi-Lightning 仅用了 2000 GPU ,训练了一个月,训练成本仅为 300 万美元,总成本仅为 Grok-2 2.5% 左右。最后的结果呢?李开复打平了马斯克,并列第六。

在接受英国《金融时报》采访时,零一万物CEO 李开复对外透露了零一万物的逆袭秘诀。面临尖端芯片获取的限制,零一万物通过使用较小数据集来训练 AI 模型、使用混合专家模型架构等方式,自创模基共建方法论,结合了硬件、芯片、内存、AI 基础软件的垂直整合和优化,打造了Yi-Lightning (闪电模型)。该模型训练比xAI 节省了97.5% ,推理又比OpenAI GPT4o 便宜31 倍的。这样,零一万物既保证了模型性能处于世界第一梯队,又极大程度上降低了模型训练和推理成本。李开复补充说,零一万物的科研北极星不是无论多贵多大,打造世界第一模型,而是要打造一个世界第一梯队的模型、但是成本超低,能打造出高性价比模型,让开发者构建应用而不被推理成本压垮。

零一万物采用了混合专家模型架构,这是一种被美国研究人员率先提出但被中国公司发扬光大的模型架构。通过将多个基于特定行业数据训练的神经网络集合到一起,混合专家模型可以用较少的计算能力达到与稠密模型相同的智能水平。研究人员认为,混合专家模型架构是以较少的算力,达到与稠密模型相同智能水平的关键技术。但这种方法训练失败的可能性更大,比如说Meta Llama 就一直没有开发出来世界领先的混合专家模型,而中国公司似乎掌握了混合专家模型架构。尤其是零一万物和Deepseek ,做出了世界最快的混合专家模型。

同时,零一万物在数据收集方面也做出了巨大努力。李开复对英国《金融时报》透露:我们的数据收集方法远远优于传统的网络抓取方法,包括扫描书籍,爬取深网区数据,使用新颖的合成数据技术等。这种独特的数据收集方法,使得零一万物的模型具备了与竞争对手不同的特色和优势,就像孩子用了更优质的教材提升了教育水平一样。

这也是中国模型团队独有的优势所在,中国拥有大量技术优秀且价格合理的工程人才。李开复对英国《金融时报》表示:工程师在数据标注和排序方面需要做很多不为人知的艰苦工作,但中国在这方面比美国更有优势。这种人才红利,使得零一万物能够在更短的时间内实现技术的突破和应用的落地。

多方技术优势最终将Yi-Lightning 的推理成本降到了极低水平。让我们来看一组惊人的数据:零一万物的 Yi-Lightning 模型,每百万个 token 的推理成本仅为 14 美分(0.99 元人民币),而OpenAI 的更大模型 GPT-4o 的推理成本高达每百万个 token 4.4 美元。这是31 倍的差距。李开复说:中国的优势在于制造真正实惠的推理引擎,这是AI 应用蓬勃发展最重要的一件事。

中国的优势一般不在在预算不封顶的情况下做出前所未有的突破性研究,而从移动互联网到AI1.0 ,我们都看到中国能做更好、更快、更可靠和更低成本地实现落地。我们现在面临应用即将爆发的前夕,对中国AI 应用的未来,我非常乐观,李开复说。

Chinese AI groups get creative to drive down cost of models

01.ai, Alibaba and ByteDance have cut ‘inference’ costs despite Washington curbs on accessing cutting-edge chips

Eleanor Olcott in Beijing October 19 2024

Chinese artificial intelligence companies are driving down costs to create competitive models, as they contend with US chip restrictions and smaller budgets than their Western counterparts.

Start-ups such as 01.ai and DeepSeek have reduced prices by adopting strategies such as focusing on smaller data sets to train AI models and hiring cheap but skilled computer engineers.

Bigger technology groups such as Alibaba, Baidu and ByteDance have also engaged in a pricing war to cut “inference” costs, the price of calling upon large language models to generate a response, by more than 90 per cent and to a fraction of that offered by US counterparts.

This is despite Chinese companies having to navigate Washington’s ban on exports of the highest-end Nvidia AI chips, seen as crucial to developing the most cutting edge models in the US.

Beijing-based 01.ai, led by Lee Kai-Fu, the former head of Google China, said it has cut inference costs by building a model trained on smaller amounts of data that requires less computing power and optimising their hardware.

“China’s strength is to make really affordable inference engines and then to let applications proliferate,” Lee told the Financial Times.

This week, 01.ai’s Yi-Lightning model came joint third among LLM companies alongside x.AI’s Grok-2, but behind OpenAI and Google in a ranking released by researchers at UC Berkeley SkyLab and LMSYS.

The evaluations are based on users that score different models’ answers to queries. Other Chinese players, including ByteDance, Alibaba and DeepSeek have also crept up the ranking boards of LLMs.

Many Chinese AI groups, including 01.ai, DeepSeek, MiniMax and Stepfun have adopted a so-called “model-of-expert” approach, a strategy first popularised by US researchers.

Rather than training one “dense model” at once on a vast database that has scraped data from the internet and other sources, the approach combines many neural networks trained on industry-specific data.

Researchers view the model-of-expert approach as a key way to achieve the same level of intelligence as a dense model but with less computing power. But the approach can be more prone to failure as engineers have to orchestrate the training process across multiple “experts” rather than in one model.

Given the difficulty in securing a steady and ample supply of high-end AI chips, Chinese AI players have been competing over the past year to develop the highest-quality data sets to train these “experts” to set themselves apart from the competition.

Lee said 01.ai has approaches to data collection beyond the traditional method of scraping the internet, including scanning books and crawling articles on the messaging app WeChat that are inaccessible on the open web.

“There is a lot of thankless gruntwork” for engineers to label and rank data, he said, but added China — with its vast pool of cheap engineering talent — is better placed to do that than the US.

“China’s strength is not doing the best breakthrough research that no one has done before where the budget has no limit,” said Lee. “China’s strength is to build well, build fast, build reliably and build cheap.”

Additional reporting by Cristina Criddle in San Francisco


Copyright © 2024 aigcdaily.cn  北京智识时代科技有限公司  版权所有  京ICP备2023006237号-1