EVA-中文开放域对话预训练模型

EVA-中文开放域对话预训练模型,第1张

EVA-中文开放域对话训练模型

EVA 是目前最大的中文开放域对话预训练模型,拥有28亿参数,在 WDC-Dialogue 上预训练而成。该数据包含14亿个多领域的上文-回复对。实验表明 EVA 在自动指标和人工指标上都超越了现在其他的中文预训练对话模型。

官网:

智源开源开放平台 (wudaoai.cn)

github:GitHub - BAAI-WuDao/EVA

Paper link: https://arxiv.org/abs/2108.01547.

2 Dataset

We construct a dataset named WDC-Dialogue from Chinese social media to train EVA. Specifically, conversations from various sources are gathered and a rigorous data cleaning pipeline is designed to enforce the quality of WDC-Dialogue. We mainly focus on three categories of textual interaction data, i.e., repost on social media, comment / reply on various online forums and online question and answer (Q&A) exchanges. Each round of these textual interactions yields a dialogue session via well-designed parsing rules. The following table shows statistics of the filtered WDC-Dialogue dataset and other Chinese dialogue datasets.

2 数据集
我们从中国社交媒体构建了一个名为 WDC-Dialogue 的数据集来训练 EVA。 具体而言,收集了来自各种来源的对话,并设计了严格的数据清理管道,以确保 WDC-Dialogue 的质量。 我们主要关注三类文本交互数据,即社交媒体上的转发、各种在线论坛的评论/回复和在线问答(Q&A)交流。 这些文本交互的每一轮都通过精心设计的解析规则产生对话会话。 下表显示了过滤后的 WDC-Dialogue 数据集和其他中文对话数据集的统计数据。

 

3 Model

EVA is a Transformer-based dialogue model with a bi-directional encoder and a uni-directional decoder. We present the EVA's model details and a comparison with previous large-scale Chinese pre-trained dialogue models in the following table.

EVA 是一种基于 Transformer 的对话模型,具有双向编码器和单向解码器。 我们在下表中展示了 EVA 的模型细节以及与之前大规模中文预训练对话模型的比较。

The model can be downloaded in BAAI's repository. The downloaded folder should have the following structure:


该模型可以在 BAAI 的存储库中下载。 下载的文件夹应具有以下结构:

eva/
├── 222500
│   └── mp_rank_00_model_states.pt
├── latest_checkpointed_iteration.txt
4 Experiment

We compare EVA with Chinese pre-trained models including CDial-GPT and CPM. Results in the automatic evaluation including uni-gram F1, ROUGE-L, BLEU-4 and distinct n-grams are shown as follows:
我们将 EVA 与包括 CDial-GPT 和 CPM 在内的中国预训练模型进行了比较。 包括uni-gram F1、ROUGE-L、BLEU-4和distinct n-grams在内的自动评估结果如下所示:

We also present an example of multi-turn generation results in the interactive human evaluation:

我们还展示了交互式人类评估中多轮生成结果的示例:

 

5 Run the Code

We provide the inference code of EVA. The source code is provided in src/.

5.1 Environment

The inference code occupies only about 7000MB GPU memory. So generally a single GPU is enough. We provide 2 options to set up the environment. We recommend to use our docker directly to avoid the bugs in deepspeed.

推理代码仅占用约 7000MB GPU 内存。 所以一般单个GPU就足够了。 我们提供了 2 个选项来设置环境。 我们建议直接使用我们的 docker 以避免 deepspeed 的错误。

Option 1: Docker

docker pull gyxthu17/eva:1.2

Since the environment is ready in the docker, you don't need to set any environment variables. You may need to mount this directory to a directory in the docker. For example, to mount to /mnt, run the following code to run the docker image:

由于环境在docker中已经准备好了,所以不需要设置任何环境变量。 您可能需要将此目录挂载到 docker 中的某个目录。 例如,要挂载到 /mnt,请运行以下代码来运行 docker 镜像:

docker run -ti -v ${PWD}:/mnt gyxthu17/eva:1.2 /bin/bash

Option 2: Set up DeepSpeed

If you insist to set up DeepSpeed by yourself, please make sure the version is v0.3.9. It can be installed from its repo. Since there exists some bugs in DeepSpeed, you need to make some little modifications to this package. You can refer to https://github.com/TsinghuaAI/CPM-2-Finetune/issues/11 for more information. Specifically, you need to modify several lines of code in deepspeed/runtime/zero/stage1.py and deepspeed/runtime/engine.py. We provide the modified stage1.py and engine.py in our repo. You can simply replace deepspeed/runtime/zero/stage1.py with stage1.py and deepspeed/runtime/engine.py with engine.py in our repo.

选项 2:设置 DeepSpeed
如果您坚持自己设置DeepSpeed,请确保版本为v0.3.9。 它可以从它的 repo 安装。 由于 DeepSpeed 中存在一些错误,您需要对该包进行一些小的修改。 更多信息可以参考https://github.com/TsinghuaAI/CPM-2-Finetune/issues/11。 具体需要修改deepspeed/runtime/zero/stage1.py和deepspeed/runtime/engine.py中的几行代码。 我们在我们的 repo 中提供了修改后的 stage1.py 和 engine.py。 您可以简单地将 deepspeed/runtime/zero/stage1.py 替换为 stage1.py,并将 deepspeed/runtime/engine.py 替换为我们 repo 中的 engine.py。

5.2 Run

Before running the code, please change WORKING_DIR in the script to the path of this EVA directory, change CKPT_PATH to the path where the pre-trained weights are stored. You also need to change node-0 ${WORKING_DIR}/src/configs/host_files/hostfile to the ssh node name (or IP) where you run distributed training. Please refer to DeepSpeed for more detailed information of this configuration.

在运行代码之前,请将脚本中的WORKING_DIR 更改为该EVA 目录的路径,将CKPT_PATH 更改为存储预训练权重的路径。 您还需要将 node-0 ${WORKING_DIR}/src/configs/host_files/hostfile 更改为运行分布式训练的 ssh 节点名称(或 IP)。 有关此配置的更多详细信息,请参阅 DeepSpeed。

Run the following command:

cd src/
bash scripts/infer_enc_dec_interactive.sh

After running the command, please first make sure the pre-trained weights are load. If they are loaded, the log printed to the stdout should contain messages like successfully loaded /path-to-checkpoint/eva/mp_rank_01_model_states.pt. Otherwise, WARNING: could not find the metadata file latest_checkpointed_iteration.txt will not load any checkpoints and will start from random will display. Note that when you successfully load the model, you will see messages like The following zero checkpoints paths are missing: ['/path-to-checkpoint/eva/200000/zero_pp_rank_0_mp_rank_00_optim_states.pt',... which mean optimizer states are not loaded. This DOES NOT affect the use of model inference and you can just ignore it.

运行命令后,请首先确保预训练的权重是负载。 如果它们已加载,则打印到标准输出的日志应包含成功加载 /path-to-checkpoint/eva/mp_rank_01_model_states.pt 之类的消息。 否则,警告:找不到元数据文件 latest_checkpointed_iteration.txt 将不会加载任何检查点并将从随机开始显示。 请注意,当您成功加载模型时,您将看到类似以下零检查点路径丢失的消息:['/path-to-checkpoint/eva/200000/zero_pp_rank_0_mp_rank_00_optim_states.pt',...,这意味着未加载优化器状态 . 这不会影响模型推理的使用,您可以忽略它。

If things go well, you will eventually enter an interactive interface. Have fun talking to EVA!

如果一切顺利,你最终会进入一个交互界面。 和 EVA 聊天玩得开心!

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/5693725.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-17
下一篇 2022-12-17

发表评论

登录后才能评论

评论列表(0条)

保存