* Do not mutate cgraph for fused ADDs
1. We should try to minimize in-place changes to the incoming
ggml_cgraph where possible (those should happen in graph_optimize)
2. Modifying in-place leads to an additional, unnecessary graph capture
step as we store the properties before modifying the graph in-place
in the cuda-backend
* Assert ggml_tensor is trivially copyable
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Co-authored-by: Aman Gupta <redacted>
---------
Co-authored-by: Aman Gupta <redacted>
n_fuse++;
if (n_fuse > 1) {
+ ggml_tensor fused_add_node;
+ memcpy(&fused_add_node, node, sizeof(ggml_tensor));
for (int j = 0; j < n_fuse - 1; ++j) {
- node->src[j + 2] = cgraph->nodes[i + j + 1]->src[1];
+ fused_add_node.src[j + 2] = cgraph->nodes[i + j + 1]->src[1];
}
- cgraph->nodes[i + n_fuse - 1]->data = node->data;
- ggml_cuda_op_fused_add(*cuda_ctx, node, n_fuse);
+ fused_add_node.data = cgraph->nodes[i + n_fuse - 1]->data;
+ ggml_cuda_op_fused_add(*cuda_ctx, &fused_add_node, n_fuse);
i += n_fuse - 1;
continue;