前一段时间,在NVDLA上针对MNIST、CIFAR-10、IMAGENET2012这三个数据集上,训练了lenet5、resnet-18两个网络,并在NVDLA的vp环境下编译,运行,相关的模型可以在下文下载。
而NVDLA的文档描述里写到其是支持8bit的,但是如何将浮点模型在nvdla上量化成8bit并运行官方给出的说明很少,本文记述的内容有模型量化的方法、以及修复了官方代码中的一些问题。
Caffe 预训练的网路模型
caffe_lenet_mnis
resnet18-cifar10-caffe
resnet18-imagenet-caffe
NVDLA量化 在Github的仓库里,有给到量化的大体方向:LowPrecision.md
文中指出,我们需要使用TensorRT来进行模型的量化,并且最后得到一个calibration table。而如何使用TensorRT进行量化的内容可以参考TensorRT的INT8量化方法,简单的说就是需要求一个浮点数到定点数缩放因子,而这个求法的算法我们可以仅作了解,这里给出一个知乎链接:https://zhuanlan.zhihu.com/p/58182172
具体的步骤如下:
使用TensorRT生成Calibration Table 我推荐使用官方的docker镜像
1 docker pull nvcr.io/nvidia/tensorrt:20.12-py3
其实原本我以为,单为了生成calibration table文件装个TensorRT会不会太麻烦,首先尝试使用了圈圈虫的项目:https://github.com/BUG1989/caffe-int8-convert-tools 但遗憾的是不work。
./nvdla_compiler其实读取的并不是生成的calibration table文件(这是一个text文件),而是一个json文件,转换的脚本是calib_txt_to_json.py ,而该脚本并不支持虫叔的仓库生成的calibration table文件,仅支持tensorRT
经过实践发现、虫叔那一套方案没有TensorRT的高级,TensorRT量化的时候需要从验证集中选择一组数据来做辅助矫正,更高级,也可以得出矫正过后的精度,及量化之后的模型的精度如何,是很有用的。
最新版的TensorRT做int8的量化是支持Python API的,我们可以在/opt/tensorrt/samples/python/int8_caffe_mnist
里找到官方的量化方法,当然与之对应的还有C++版本。我更推荐Python、可读性更好、重写起来更方便,而且C++其实也是调用的.so的库。
如果我们想要重构一份自己的网络量化方法,核心的部分是要自己定义一个Class、继承trt.IInt8EntropyCalibrator2
,具体内容可以阅读MNIST量化的calibrator.py
文件,我们需要提供自己训练的时候使用的数据集,deploy.prototxt,caffemodel这三个文件,然后自己编写解析的方法,例如本文对针对ImageNet的resnet18网络做量化,就写成这样:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 def load_data (filepath ): test_imgs = [] global fileList fileList = os.listdir(filepath) mean = np.ones([3 , 224 , 224 ], dtype=np.float ) mean[0 ,:,:] = 104 mean[1 ,:,:] = 117 mean[2 ,:,:] = 123 for img_path in fileList: img_path = filepath +'/' + img_path img = cv.imread(img_path) img = crop_img(img, [224 , 224 ]) img = img.transpose((2 , 0 , 1 )) img = img - mean test_imgs.append(img) return np.ascontiguousarray(test_imgs).astype(np.float32)def load_labels (filepath ): global fileList test_labels = [] labels_mapping = {} with open (filepath, 'r' ) as f: lines = f.readlines() for each in lines: imageName, labels = each.strip('\n' ).split(' ' ) labels_mapping[imageName] = int (labels) for each in fileList: test_labels.append(labels_mapping[each]) return np.ascontiguousarray(test_labels)
而自己定义的EntropyCalibrator
如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 class CustomEntropyCalibrator (trt.IInt8EntropyCalibrator2): def __init__ (self, training_data, cache_file, batch_size=64 ): trt.IInt8EntropyCalibrator2.__init__(self) self.cache_file = cache_file self.data = load_data(training_data) self.batch_size = batch_size self.current_index = 0 self.device_input = cuda.mem_alloc(self.data[0 ].nbytes * self.batch_size) def get_batch_size (self ): return self.batch_size def get_batch (self, names ): if self.current_index + self.batch_size > self.data.shape[0 ]: return None current_batch = int (self.current_index / self.batch_size) if current_batch % 10 == 0 : print ("Calibrating batch {:}, containing {:} images" .format (current_batch, self.batch_size)) batch = self.data[self.current_index:self.current_index + self.batch_size].ravel() cuda.memcpy_htod(self.device_input, batch) self.current_index += self.batch_size return [self.device_input] def read_calibration_cache (self ): if os.path.exists(self.cache_file): with open (self.cache_file, "rb" ) as f: return f.read() def write_calibration_cache (self, cache ): with open (self.cache_file, "wb" ) as f: f.write(cache)
这四个方法都会在build_engine阶段自动调用,需要特别注意的是get_batch方法,在获取数据的时候需要使用np.ascontiguousarray展平,否则会导致失真严重。
运行程序之后的输出:
1 2 3 4 5 6 7 root@ea6ecad42f26:/opt/tensorrt# python samples/python/int8_caffe_cifar10/sample.py Calibrating batch 0, containing 64 images Calibrating batch 10, containing 64 images Validating batch 10 Validating batch 20 Validating batch 30 Total Accuracy: 58.54145854145854%
而在运行目录下生成的:cache文件就是calibration table了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 TRT-7202-EntropyCalibration2 data: 3f9839e4 (Unnamed Layer* 0) [Convolution]_output: 404efcc8 (Unnamed Layer* 1) [Scale]_output: 3dad1056 (Unnamed Layer* 2) [Scale]_output: 3c612a39 conv1: 3c35ea42 pool1: 3c35ea42 (Unnamed Layer* 5) [Convolution]_output: 3c44ec17 (Unnamed Layer* 6) [Scale]_output: 3d79ddee res2a_branch1: 3c71cc22 (Unnamed Layer* 8) [Convolution]_output: 3cbea256 (Unnamed Layer* 9) [Scale]_output: 3db5bbfc (Unnamed Layer* 10) [Scale]_output: 3c893a25 res2a_branch2a: 3bfa7ac1 (Unnamed Layer* 12) [Convolution]_output: 3bb2c9bf (Unnamed Layer* 13) [Scale]_output: 3d7d559f res2a_branch2b: 3c431934 (Unnamed Layer* 15) [ElementWise]_output: 3caeb939 res2a: 3c53f5bb (Unnamed Layer* 17) [Convolution]_output: 3c10df08 (Unnamed Layer* 18) [Scale]_output: 3d87d76c (Unnamed Layer* 19) [Scale]_output: 3c5511f5 res2b_branch2a: 3c2a88bb (Unnamed Layer* 21) [Convolution]_output: 3bb6625d (Unnamed Layer* 22) [Scale]_output: 3d88ba43 res2b_branch2b: 3c6ad191 (Unnamed Layer* 24) [ElementWise]_output: 3c996872 res2b: 3c90d5e3 (Unnamed Layer* 26) [Convolution]_output: 3b7d9a5e (Unnamed Layer* 27) [Scale]_output: 3d74307d res3a_branch1: 3bfdfab7 (Unnamed Layer* 29) [Convolution]_output: 3c2fe025 (Unnamed Layer* 30) [Scale]_output: 3d7def72 (Unnamed Layer* 31) [Scale]_output: 3c31f779 res3a_branch2a: 3c2c8afe (Unnamed Layer* 33) [Convolution]_output: 3bbf4a3a (Unnamed Layer* 34) [Scale]_output: 3d83d04d res3a_branch2b: 3c572369 (Unnamed Layer* 36) [ElementWise]_output: 3c69c835 res3a: 3c8064f2 (Unnamed Layer* 38) [Convolution]_output: 3c14239e (Unnamed Layer* 39) [Scale]_output: 3d8554e6 (Unnamed Layer* 40) [Scale]_output: 3c31028f res3b_branch2a: 3c169d63 (Unnamed Layer* 42) [Convolution]_output: 3b8624f8 (Unnamed Layer* 43) [Scale]_output: 3d9474e5 res3b_branch2b: 3c3b2d4d (Unnamed Layer* 45) [ElementWise]_output: 3c62c909 res3b: 3c3576b1 (Unnamed Layer* 47) [Convolution]_output: 3b206632 (Unnamed Layer* 48) [Scale]_output: 3d814cf0 res4a_branch1: 3b9986ae (Unnamed Layer* 50) [Convolution]_output: 3c17f128 (Unnamed Layer* 51) [Scale]_output: 3d827168 (Unnamed Layer* 52) [Scale]_output: 3c2b6742 res4a_branch2a: 3c1b4ed9 (Unnamed Layer* 54) [Convolution]_output: 3bab8f82 (Unnamed Layer* 55) [Scale]_output: 3d885e72 res4a_branch2b: 3c3ce0a1 (Unnamed Layer* 57) [ElementWise]_output: 3c4deb0b res4a: 3c63cff2 (Unnamed Layer* 59) [Convolution]_output: 3bdc2dea (Unnamed Layer* 60) [Scale]_output: 3d6cc90a (Unnamed Layer* 61) [Scale]_output: 3c2cbe15 res4b_branch2a: 3c05ab17 (Unnamed Layer* 63) [Convolution]_output: 3b636e4c (Unnamed Layer* 64) [Scale]_output: 3d857fd8 res4b_branch2b: 3c5bd401 (Unnamed Layer* 66) [ElementWise]_output: 3c6a9868 res4b: 3c5cbc54 (Unnamed Layer* 68) [Convolution]_output: 3ad16737 (Unnamed Layer* 69) [Scale]_output: 3d662897 res5a_branch1: 3bab30a7 (Unnamed Layer* 71) [Convolution]_output: 3bbe5321 (Unnamed Layer* 72) [Scale]_output: 3d82074f (Unnamed Layer* 73) [Scale]_output: 3c2da30b res5a_branch2a: 3c0fe297 (Unnamed Layer* 75) [Convolution]_output: 3b4dae0e (Unnamed Layer* 76) [Scale]_output: 3d7a5e58 res5a_branch2b: 3c59f7ca (Unnamed Layer* 78) [ElementWise]_output: 3c7a4a3e res5a: 3c69fa84 (Unnamed Layer* 80) [Convolution]_output: 3bea45a4 (Unnamed Layer* 81) [Scale]_output: 3d891fa0 (Unnamed Layer* 82) [Scale]_output: 3c4d88a0 res5b_branch2a: 3bc901ec (Unnamed Layer* 84) [Convolution]_output: 3af65712 (Unnamed Layer* 85) [Scale]_output: 3ddd4245 res5b_branch2b: 3e07db30 (Unnamed Layer* 87) [ElementWise]_output: 3e04055e res5b: 3e1b53d4 pool5: 3e1b53d4 fc1000: 3e1f8999 prob: 3acc1705
里面有很多的Unnamed Layer :blonde_woman: 我们之后再解决。
Calibration Table转JSON 而compiler需要接受的文件是json格式的,我们利用脚本转换(注意此时的环境不在刚才的tensorrt的docker containner中,切换到nvdla/vp环境下):
1 2 3 4 cd /usr/local/nvdla git clone https://github.com/nvdla/sw # cd 到存放calibration table文件的路径 python /usr/local/nvdla/sw/umd/utils/calibdata/calib_txt_to_json.py cifar10_calibration.cache resnet18-cifar10-int8.json
我们就可以拿这个json文件做compiler的calibtable参数了,但因为之前生成的代码中有许多的Unnamed
Layer,会导致有很多层找不到自己的scale是多少的错误,这个bug在2020年6月就有人提出来了::https://github.com/nvdla/sw/issues/201
但至今楼下都没给出解决方案,NVDLA原本的维护团队应该已经放弃治疗了( 我比对了一下官方提供的resnet50.json的History
原来官方曾经也有这样的问题。但他这个解决方法也没公布,我发现其其实就是把scale的name换成了每个网络节点的name,于是自己写了一份python脚本解决(本来想在原先的项目上提个pr,但方便解析prototxt需要安装额外的库所以就放下了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 import json from collections import OrderedDict from google.protobuf import text_format import caffe.proto.caffe_pb2 as caffe_pb2 # 载入caffe.proto编译生成的caffe_pb2文件 caffeprototxt_path = "./deploy.prototxt" calibletable_json_path = "./resnet18-cifar10-int8.json" # load deploy.prototxt net = caffe_pb2.NetParameter() text_format.Merge(open(caffeprototxt_path).read(), net) # load jsonfile with open(calibletable_json_path, "r") as f: calible = json.load(f, object_pairs_hook=OrderedDict) _scales = [] _mins = [] _maxs = [] _offsets = [] _new = OrderedDict() items = calible.items() for key, value in items: _scales.append(value['scale']) _mins.append(value['min']) _maxs.append(value['max']) _offsets.append(value['offset']) for idx, _layer in enumerate(net.layer): _tempDict = OrderedDict({ "scale": _scales[idx], "min": _mins[idx], "max": _maxs[idx], "offset": _offsets[idx], }) _new[_layer.name] =_tempDict with open('resnet18-cifar10-int8-fixed.json', 'w') as f: json.dump(_new, f)
如这段脚本,生成的文件是imagenet_resnet18_calibration.cache ,注意运行这段脚本需要caffe环境,并且编译出了pycaffe接口。
Compile and Runtime 针对fp16的compile:
1 ./nvdla_compiler --prototxt resnet18-imagenet-caffe/deploy.prototxt --caffemodel resnet18-imagenet-caffe/resnet-18.caffemodel --profile imagenet-fp16
针对fp16的runtime:
1 ./nvdla_runtime --loadable imagenet-fp16.nvdla --image resnet18-imagenet-caffe/resized/0_tench.jpg --mean 104,117,123 --rawdump
1 2 # cat output.dimg 0.97998 0.00159931 1.13845e-05 3.57628e-05 8.67844e-05 0.000154614 2.14577e-05 2.20537e-06 1.3113e-06 1.78814e-07 5.90086e-06 1.03116e-05 3.03984e-06 1.37091e-06 1.49012e-06 9.23872e-06 3.57628e-06 1.72853e-06 1.07288e-06 9.53674e-07 1.07288e-06 1.03712e-05 2.38419e-06 1.37091e-06 2.26498e-06 1.2517e-06 9.59635e-06 9.47714e-06 2.14577e-06 6.77109e-05 7.85589e-05 2.77758e-05 2.7895e-05 3.89218e-05 1.26958e-05 1.68681e-05 3.0458e-05 4.05312e-06 7.62939e-06 9.11951e-06 1.01924e-05 1.2517e-06 1.78814e-06 4.47035e-06 9.53674e-07 2.38419e-07 4.17233e-06 3.016e-05 7.7486e-07 1.60933e-06 3.39746e-06 2.56896e-05 1.3113e-06 1.01328e-06 4.76837e-07 1.96695e-06 5.36442e-07 2.98023e-07 2.26498e-06 1.66893e-06 1.84774e-06 1.3113e-06 1.90735e-06 2.44379e-06 8.16584e-06 3.8743e-06 1.78814e-06 4.76837e-07 9.53674e-07 1.96695e-06 6.55651e-07 1.43051e-06 1.13249e-06 2.86102e-06 8.34465e-07 1.60933e-06 1.13249e-06 2.98023e-07 1.43051e-06 4.17233e-07 2.20537e-06 5.36442e-07 1.84774e-06 3.8743e-06 1.2517e-06 7.15256e-07 4.29153e-06 5.30481e-06 1.68681e-05 9.89437e-06 1.60336e-05 4.05312e-06 1.90735e-05 8.64267e-06 1.10269e-05 9.53674e-06 1.2517e-05 8.9407e-06 2.47359e-05 6.79493e-06 1.78814e-06 1.3113e-06 2.86102e-06 0.000196218 8.34465e-07 2.20537e-06 2.5034e-06 6.4373e-06 9.59635e-06 3.15905e-06 3.21865e-06 1.54972e-06 1.35899e-05 5.26309e-05 3.09944e-05 5.1856e-06 5.60284e-06 8.34465e-06 1.92523e-05 4.41074e-06 3.69549e-06 1.03712e-05 1.64509e-05 3.57628e-06 2.43783e-05 2.14577e-06 7.62939e-06 5.96046e-07 2.02656e-06 4.23193e-06 3.8743e-06 1.13249e-06 2.6226e-06 1.70469e-05 3.8147e-06 1.60933e-06 3.21865e-06 1.07288e-06 8.9407e-07 4.76837e-07 1.37091e-06 1.78814e-06 1.43051e-06 1.07288e-06 8.22544e-06 1.01924e-05 1.78814e-06 3.39746e-06 3.03984e-06 3.9041e-05 1.36495e-05 6.31809e-06 8.34465e-07 5.96046e-07 2.80142e-06 5.36442e-07 1.13249e-06 2.32458e-06 4.11272e-06 1.49012e-06 1.01328e-06 2.86102e-06 2.98023e-06 9.53674e-06 1.01328e-06 2.08616e-06 2.38419e-06 2.14577e-06 4.29153e-06 7.15256e-07 4.76837e-07 1.96695e-06 1.07288e-06 1.66893e-06 1.13249e-06 3.63588e-06 1.37091e-06 2.98023e-07 9.53674e-07 2.14577e-06 3.27826e-06 4.76837e-07 2.32458e-06 8.34465e-07 2.02656e-06 2.26498e-06 3.33786e-06 5.1856e-06 1.19209e-06 1.54972e-06 6.55651e-07 2.74181e-06 1.37091e-06 5.24521e-06 2.02656e-06 1.84774e-06 3.57628e-07 2.38419e-07 8.9407e-07 8.9407e-07 9.53674e-07 1.72853e-06 1.2517e-06 5.96046e-07 1.01328e-06 7.7486e-07 9.53674e-07 5.84126e-06 3.27826e-06 2.6226e-06 1.01328e-06 4.11272e-06 1.07288e-06 1.13249e-06 1.13249e-06 1.37091e-06 2.02656e-06 2.38419e-07 1.84774e-06 5.96046e-07 2.86102e-06 6.55651e-07 7.7486e-07 4.23193e-06 1.54972e-06 8.16584e-06 1.96695e-06 2.26498e-06 7.7486e-07 2.98023e-07 1.72853e-06 1.84774e-06 5.96046e-07 4.76837e-07 2.86102e-06 1.12653e-05 1.07288e-06 7.689e-06 1.72853e-06 1.13249e-06 2.38419e-06 2.08616e-06 2.20537e-06 3.27826e-06 1.13249e-06 3.45707e-06 1.13249e-06 1.84774e-06 7.7486e-07 1.3113e-06 1.2517e-06 6.55651e-07 8.9407e-07 5.66244e-06 2.26498e-06 4.76837e-06 1.01328e-06 7.15256e-07 4.17233e-07 4.58956e-06 2.98023e-06 5.36442e-07 1.43647e-05 1.18613e-05 2.44379e-06 1.84774e-06 6.55651e-07 2.38419e-07 2.74181e-06 4.17233e-07 3.57628e-07 2.92063e-06 1.2517e-06 2.92063e-06 2.98023e-06 4.76837e-07 4.76837e-07 1.03116e-05 1.60933e-06 5.36442e-07 7.21216e-06 4.64916e-06 2.08616e-06 8.9407e-07 1.54972e-06 1.60933e-06 2.98023e-06 8.34465e-07 1.2517e-06 2.38419e-07 7.7486e-07 1.72853e-06 4.47035e-06 6.55651e-07 8.9407e-07 7.15256e-07 1.01328e-06 2.98023e-07 1.07288e-05 5.96046e-07 9.53674e-07 1.43051e-06 1.37091e-06 2.44379e-06 2.44379e-06 6.55651e-07 2.86102e-06 6.55651e-07 6.55651e-07 5.72205e-06 8.34465e-07 2.92063e-06 5.126e-06 1.2517e-06 2.86102e-06 2.44379e-06 9.53674e-07 2.563e-06 1.07288e-06 1.01328e-06 8.34465e-07 4.76837e-07 5.57899e-05 7.7486e-07 1.66893e-06 1.51992e-05 9.53674e-07 3.27826e-06 2.80142e-06 7.27177e-06 5.66244e-06 4.94719e-06 1.84774e-06 2.26498e-06 3.75509e-06 1.40071e-05 8.34465e-07 6.49691e-06 6.4373e-05 9.53674e-07 1.19209e-06 1.50204e-05 6.31809e-06 2.44379e-06 8.76188e-06 1.49012e-06 2.86102e-06 7.15256e-07 2.38419e-07 5.96046e-07 8.34465e-07 3.51667e-06 4.94719e-06 1.2517e-06 3.57628e-07 5.36442e-07 9.65595e-06 3.75509e-06 2.21133e-05 1.26362e-05 2.14577e-06 6.3777e-06 2.20537e-06 1.49608e-05 6.4373e-06 2.68221e-06 1.19209e-07 1.01328e-06 1.2517e-06 7.7486e-07 4.41074e-06 2.32458e-06 8.34465e-07 1.43051e-06 1.19209e-06 8.9407e-07 9.65595e-06 8.76188e-06 3.8147e-06 1.19805e-05 7.7486e-06 5.30481e-06 2.45571e-05 3.57628e-07 5.1856e-06 2.80142e-06 8.9407e-07 2.20537e-06 5.36442e-07 0.00881195 0.000128746 0.00135708 0.000327349 2.03252e-05 7.79033e-05 0.000499249 3.8147e-06 7.236e-05 4.17233e-07 1.3113e-06 1.01328e-06 5.36442e-07 5.36442e-07 7.7486e-07 5.36442e-07 5.96046e-07 2.98023e-07 7.15256e-07 1.3113e-06 1.72853e-06 2.92063e-06 4.17233e-07 5.36442e-07 2.80142e-06 5.96046e-07 1.3113e-06 1.2517e-06 2.20537e-06 5.36442e-07 2.55108e-05 7.7486e-07 4.17233e-07 2.80142e-06 8.34465e-07 5.36442e-07 4.76837e-07 1.3113e-06 2.14577e-06 1.43051e-06 1.72853e-06 7.7486e-07 9.53674e-07 7.15256e-07 2.38419e-06 1.96695e-06 2.02656e-06 7.15256e-07 1.37091e-06 1.43051e-06 1.37091e-06 7.15256e-07 1.43051e-06 8.34465e-07 1.66893e-06 1.01328e-06 8.34465e-07 9.53674e-07 2.02656e-06 1.07288e-06 2.74181e-06 8.9407e-07 2.20537e-06 1.96695e-06 1.3113e-06 4.76837e-07 3.57628e-07 5.60284e-06 1.37091e-06 3.57628e-07 1.07288e-06 4.17233e-07 2.20537e-06 4.11272e-06 2.38419e-06 5.1856e-06 3.51667e-06 6.55651e-07 3.45707e-06 6.55651e-07 2.98023e-06 1.90735e-06 1.13249e-06 2.08616e-06 8.70228e-06 4.76837e-06 1.60933e-06 3.45707e-06 1.37091e-06 7.15256e-07 8.34465e-07 5.96046e-07 4.35114e-06 6.61612e-06 9.53674e-07 1.01328e-06 1.96695e-06 6.55651e-07 2.5034e-06 1.49012e-06 9.53674e-07 1.72853e-06 5.72205e-06 6.55651e-07 5.96046e-07 2.563e-06 4.17233e-07 6.55651e-07 4.76837e-07 5.36442e-07 4.35114e-06 2.38419e-07 1.96695e-06 6.3777e-06 1.54972e-06 1.19209e-06 1.37091e-06 2.02656e-06 1.60933e-06 1.54972e-06 4.17233e-07 1.07288e-06 1.13249e-06 8.9407e-07 8.34465e-07 5.00679e-06 4.76837e-06 6.55651e-07 8.9407e-07 6.73532e-06 5.96046e-07 2.38419e-07 1.60933e-06 3.8147e-06 1.01328e-06 2.563e-06 1.54972e-06 4.76837e-07 1.01328e-06 1.96695e-06 2.74181e-06 1.07288e-06 1.43051e-06 3.57628e-07 8.34465e-07 8.9407e-07 3.39746e-05 1.19209e-06 5.96046e-07 5.36442e-07 5.54323e-06 1.72853e-06 1.78814e-06 1.37091e-06 2.98023e-06 1.43051e-06 2.32458e-06 8.9407e-07 5.96046e-07 3.57628e-07 7.15256e-07 1.2517e-06 2.74181e-06 1.84774e-06 5.36442e-07 1.37091e-06 4.17233e-07 6.55651e-07 1.37091e-06 5.36442e-07 1.66893e-06 1.19209e-06 3.57628e-07 1.3113e-06 8.34465e-07 2.38419e-07 5.36442e-07 3.75509e-06 4.23193e-06 7.15256e-07 7.7486e-07 3.33786e-06 8.34465e-07 1.84774e-06 4.76837e-07 2.26498e-06 1.3113e-06 1.49012e-06 6.67572e-06 4.17233e-07 1.78814e-07 2.32458e-06 6.55651e-07 9.53674e-07 8.34465e-07 1.39475e-05 5.96046e-07 3.57628e-07 1.54972e-06 1.78814e-06 2.26498e-06 4.11272e-06 1.60933e-06 2.92063e-06 3.57628e-07 2.98023e-07 6.55651e-07 2.80142e-06 5.66244e-06 4.17233e-07 2.26498e-06 5.30481e-06 7.15256e-07 2.6226e-06 2.38419e-07 1.19209e-06 1.54972e-06 2.26498e-06 3.45707e-06 2.86102e-06 5.36442e-07 5.54323e-06 1.60933e-06 8.9407e-07 3.03984e-06 1.72853e-06 2.23517e-05 1.13249e-06 2.68221e-06 1.2517e-06 7.7486e-07 6.55651e-07 2.20537e-06 8.9407e-07 1.2517e-06 5.36442e-07 9.53674e-07 6.55651e-07 4.17233e-07 5.96046e-07 8.34465e-07 2.563e-06 1.54972e-06 9.95398e-06 3.45707e-06 7.15256e-07 3.09944e-06 1.3113e-06 5.36442e-07 4.35114e-06 1.60933e-06 5.96046e-07 7.7486e-06 8.34465e-07 5.24521e-06 7.15256e-07 7.7486e-07 1.01328e-06 1.54972e-06 7.7486e-07 8.34465e-07 1.54972e-06 2.14577e-06 1.90735e-06 3.75509e-06 1.49012e-06 1.19209e-06 9.53674e-07 8.9407e-07 6.31809e-06 1.13249e-06 5.96046e-07 4.76837e-07 2.98023e-07 3.57628e-07 5.36442e-07 1.78814e-06 1.07288e-06 2.20537e-06 4.76837e-07 5.96046e-07 4.64916e-06 2.08616e-06 2.26498e-06 1.54972e-06 1.055e-05 7.15256e-07 8.16584e-06 9.53674e-07 3.39746e-06 5.96046e-07 4.82202e-05 4.76837e-07 2.98023e-07 3.57628e-07 1.39475e-05 1.37091e-06 3.75509e-06 2.98023e-07 4.17233e-07 8.34465e-07 1.49012e-06 5.66244e-06 3.03984e-06 5.24521e-06 1.2517e-06 2.26498e-06 1.78814e-05 2.68221e-06 4.76837e-07 1.13249e-06 3.15905e-06 3.21865e-06 1.49012e-06 7.15256e-07 7.15256e-07 2.98023e-07 4.17233e-07 5.96046e-07 6.55651e-07 8.9407e-07 6.4373e-06 6.55651e-07 5.06639e-06 1.37091e-06 5.42402e-06 5.06639e-06 1.01328e-06 4.76837e-07 1.19209e-06 8.07047e-05 2.74181e-06 1.13249e-06 3.63588e-06 3.57628e-06 1.90735e-06 1.78814e-06 1.5378e-05 1.37091e-06 6.19888e-06 1.60933e-06 7.7486e-07 1.96695e-06 1.2517e-06 6.55651e-06 2.98023e-07 1.64509e-05 8.9407e-07 6.55651e-07 2.14577e-06 5.60284e-06 2.08616e-06 9.53674e-07 2.26498e-06 5.96046e-07 1.84774e-06 8.9407e-07 7.7486e-07 4.76837e-07 1.2517e-06 8.04663e-06 1.13249e-06 7.7486e-07 1.96695e-06 1.3113e-06 7.7486e-07 5.96046e-07 4.76837e-07 6.55651e-07 0.00323105 2.68221e-06 5.96046e-07 1.78814e-06 7.7486e-07 4.35114e-06 1.96695e-06 1.3113e-06 8.04663e-06 2.02656e-06 1.54972e-06 5.96046e-07 1.07288e-06 1.2517e-06 2.02656e-06 1.72853e-06 1.90735e-06 3.75509e-06 4.76837e-07 6.07967e-06 3.57628e-07 2.98023e-07 4.41074e-06 2.98023e-07 8.34465e-07 1.37091e-06 2.32458e-06 8.40425e-06 8.9407e-07 2.44379e-06 1.01328e-06 1.19209e-06 6.55651e-07 4.76837e-07 2.68221e-06 2.68221e-06 1.60933e-06 3.45707e-06 5.06639e-06 3.03984e-06 1.19209e-06 5.96046e-07 5.96046e-07 1.18017e-05 1.49012e-06 4.17233e-07 4.47035e-06 2.20537e-06 1.66893e-06 2.08616e-06 5.00679e-06 1.19209e-06 5.96046e-07 3.57628e-06 1.2517e-06 2.08616e-06 2.02656e-06 1.2517e-06 7.689e-06 1.13249e-06 3.99351e-06 1.49012e-06 3.57628e-07 8.34465e-07 2.5034e-06 2.38419e-06 4.88758e-06 8.34465e-07 1.37091e-06 1.3113e-06 1.19209e-06 5.36442e-07 1.13249e-06 4.17233e-07 1.01328e-06 1.19209e-06 3.63588e-06 6.55651e-07 3.99351e-06 2.44379e-06 2.5034e-06 8.34465e-07 1.19209e-06 6.55651e-06 2.5034e-06 4.47035e-06 5.78165e-06 1.13249e-06 7.7486e-07 9.53674e-07 4.29153e-06 2.86102e-06 2.92063e-06 6.55651e-07 1.03116e-05 2.38419e-06 1.54972e-06 3.8147e-06 1.84774e-06 8.9407e-07 1.01328e-06 3.75509e-06 1.13249e-06 2.74181e-06 1.37091e-06 8.9407e-07 4.76837e-07 3.27826e-06 7.7486e-07 8.34465e-07 1.01328e-06 1.54972e-06 2.02656e-06 5.54323e-06 6.55651e-07 2.98023e-07 4.76837e-07 6.55651e-07 1.37091e-06 4.17233e-07 8.34465e-07 8.9407e-07 1.43051e-06 4.17233e-07 9.53674e-07 3.33786e-06 8.34465e-07 5.36442e-06 7.15256e-07 1.13249e-06 5.96046e-07 2.44379e-06 5.96046e-07 1.43051e-06 5.96046e-07 1.37091e-06 1.43051e-06 2.98023e-06 5.36442e-07 1.07288e-06 2.32458e-06 2.98023e-06 2.98023e-07 1.3113e-06 7.98702e-06 3.21865e-06 2.5034e-06 7.7486e-07 4.76837e-06 1.43051e-06 4.76837e-07 1.66893e-06 3.03984e-06 1.13845e-05 2.68221e-06 1.78814e-06 3.09944e-06 4.17233e-07 1.3113e-06 4.35114e-06 8.9407e-07 5.36442e-07 9.53674e-07 2.68221e-06 2.98023e-07 6.25849e-06 1.72853e-06 3.33786e-06 4.11272e-06 5.96046e-07 4.76837e-07 3.21865e-06 6.49691e-06 1.3113e-06 1.54972e-06 1.49012e-06 2.20537e-06 5.36442e-07 1.33514e-05 4.41074e-06 8.9407e-07 9.71556e-06 3.75509e-06 1.96695e-06 3.51667e-06 4.11272e-06 2.74181e-05 1.37091e-06 7.92742e-06 5.60284e-06 6.3777e-06 1.60933e-06 1.66893e-06 4.23193e-06 1.13249e-06 1.96695e-06 2.20537e-06 2.58684e-05 1.90735e-06 1.3113e-06 1.72853e-06 8.9407e-07 1.43051e-06 7.7486e-07 1.3113e-06 1.01328e-06 2.08616e-06 1.96695e-06 1.13249e-06 7.15256e-07 1.3113e-06 1.13249e-06 1.2517e-06 2.32458e-06 1.96695e-06 3.69549e-05 2.98023e-07 1.69277e-05 1.66893e-06 1.90735e-06 2.563e-06 7.56979e-06 2.38419e-06 1.84774e-06 3.57628e-07 1.50204e-05 1.78814e-06 1.96695e-06 1.54376e-05 7.45058e-06 2.5034e-06 8.34465e-07 1.96695e-06 2.38419e-06 1.54972e-06 9.53674e-07 3.33786e-06 9.23872e-06 8.10623e-06 4.35114e-05 1.38283e-05 5.96046e-07
看的太累,可以直接看top5:
1 2 3 4 5 6 cat output.dimg | sed "s#\ #\n#g" | cat -n | sort -gr -k2,2 | head -5 1 0.97998 390 0.00881195 759 0.00323105 2 0.00159931 392 0.00135708
int8的编译
1 2 3 4 5 6 7 # compile per-kernel ./nvdla_compiler --prototxt resnet18-imagenet-caffe/deploy.prototxt --caffemodel resnet18-imagenet-caffe/resnet-18.caffemodel --profile imagenet-int8.kernel --cprecision int8 --calibtable resnet18-imagenet-caffe/resnet18.imagenet.fixed.int8.json # compile per-filter 主要是对比一下了两者有没有不同 ./nvdla_compiler --prototxt resnet18-imagenet-caffe/deploy.prototxt --caffemodel resnet18-imagenet-caffe/resnet-18.caffemodel --profile imagenet-int8.filter --cprecision int8 --calibtable resnet18-imagenet-caffe/resnet18.imagenet.fixed.int8.json --quantizationMode per-filter ./nvdla_runtime --loadable imagenet-int8.filter.nvdla --image resnet18-im agenet-caffe/resized/0_tench.jpg --rawdump
int8的runtime
1 127 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Comments