1.将pretrain 模型转换pb
from transformers import TFAutoModel
import tensorflow as tf
class WrappedModel(tf.Module):
def __init__(self):
super(WrappedModel, self).__init__()
self.model =TFAutoModel.from_pretrained("bert-base-chinese")
@tf.function
def __call__(self, x):
return self.model(x)
model = WrappedModel()
saved_model_path = "test_model"
call = model.__call__.get_concrete_function((
tf.TensorSpec([None, None], tf.int32,name='input_ids'),
tf.TensorSpec([None, None], tf.int32, name="token_type_ids"),
tf.TensorSpec([None, None], tf.int32, name='attention_mask')
))
tf.saved_model.save(model, saved_model_path, signatures=call, )
2.加载模型推理
test_model = tf.saved_model.load('test_model')
infer = test_model.signatures["serving_default"]
tokenizer = AutoTokenizer.from_pretrained('bert-base-chinese')
inputs = tokenizer(text,return_tensors="tf")
instance = dict(inputs)
outputs = infer(**instance)
3.拉取tensorflow serving 镜像
docker pull tensorflow/serving:latest
4.部署
docker run -t --rm -p 8501:8501 \
-v "/path/to/model:/models/test_model" \
-e MODEL_NAME=test_model \
tensorflow/serving:latest &
test_model 中要有个version的文件夹。
test_model:
----1
--------assets
--------variables
------------variables.data-00000-of-00001
------------variables.index
--------saved_model.pb
5.测试接口
可以采用row或者column两种
row:
curl -d '{"instances": [{"input_ids":[ 101, 3330, 4635, 6121, 6662, 1071, 671, 677, 102],"token_type_ids": [0, 0, 0, 0, 0, 0, 0, 0,0],"attention_mask":[1, 1, 1, 1, 1, 1, 1, 1,1]}]}' -X POST http://localhost:8501/v1/models/my_model:predict
column:
curl -d '{"inputs": {
"input_ids":[[ 101, 3330, 4635, 6121, 6662, 1071, 671, 677, 102]],
"token_type_ids":[[0, 0, 0, 0, 0, 0, 0, 0, 0]],
"attention_mask":[[1, 1, 1, 1, 1, 1, 1, 1, 1]]
}
}' -X POST http://localhost:8501/v1/models/my_model:predict
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)