* convert : use n_groups instead of hardcoded values in reshape
This commit modifies the conversion script for NemotronHModel to use
the 'n_groups' hyperparameter, and allow Python to calculate the the
last dimension, using -1, when reshaping the 'mixer.norm.weight' tensor.
* use self.n_group instead of self.hparams["n_groups"]
return [(mapped_name, reshaped_data)]
if name.endswith("mixer.norm.weight"):
- reshaped_data = data_torch.reshape(8, 512)
+ reshaped_data = data_torch.reshape(self.n_group, -1)
mapped_name = self.map_tensor_name(name)
return [(mapped_name, reshaped_data)]