Posts 内存同步模式
Post
Cancel

内存同步模式

原始:

1
2
3
4
5
6
7
8
9
10
11
12
AscendSession::RunOpImplOrigin()
			
AscendSession::LoadInputData()  // load input data to device
			
for (size_t i = 0; i < inputs.size(); ++i) {
    AscendDeviceAddress::SyncHostToDevice()
}
			
		SyncStream()
		SyncMemory()
			
		rtMemcpy()

修改后:

1
2
3
4
5
6
7
8
9
10
11
12
13
AscendSession::RunOpImplOrigin()
			
AscendSession::LoadInputData()  // load input data to device
			
for (size_t i = 0; i < inputs.size(); ++i) {
    AscendDeviceAddress::SyncHostToDevice()
}
			
		SyncMemory()
			
		MemcpyAsync()
			
		rtMemcpyAsync()

对比图如下:

image-20220801150230828

LoadInputData里按照input的数量循环调用SyncHostToDevice->CopyHostMemToDeviceAsync->cudaMemcpyAsync进行Host2Device的内存同步操作,但是最后加了SyncStream操作,GPU上底层调用的就是cudaStreamSynchronize,使得相当于调用的是同步内存拷贝,影响性能。

This post is licensed under CC BY 4.0 by the author.

图缓存

probability_programming