Supported ONNX operators#

OnnxAbs

OnnxComMicrosoftQGemm_1

OnnxMaxUnpool_9

OnnxAbs_1

OnnxComMicrosoftQLinearAdd

OnnxMax_1

OnnxAbs_13

OnnxComMicrosoftQLinearAdd_1

OnnxMax_12

OnnxAbs_6

OnnxComMicrosoftQLinearAveragePool

OnnxMax_13

OnnxAcos

OnnxComMicrosoftQLinearAveragePool_1

OnnxMax_6

OnnxAcos_7

OnnxComMicrosoftQLinearConcat

OnnxMax_8

OnnxAcosh

OnnxComMicrosoftQLinearConcat_1

OnnxMean

OnnxAcosh_9

OnnxComMicrosoftQLinearConv

OnnxMeanVarianceNormalization

OnnxAdd

OnnxComMicrosoftQLinearConv_1

OnnxMeanVarianceNormalization_13

OnnxAdd_1

OnnxComMicrosoftQLinearGlobalAveragePool

OnnxMeanVarianceNormalization_9

OnnxAdd_13

OnnxComMicrosoftQLinearGlobalAveragePool_1

OnnxMean_1

OnnxAdd_14

OnnxComMicrosoftQLinearLeakyRelu

OnnxMean_13

OnnxAdd_6

OnnxComMicrosoftQLinearLeakyRelu_1

OnnxMean_6

OnnxAdd_7

OnnxComMicrosoftQLinearMul

OnnxMean_8

OnnxAiOnnxMlArrayFeatureExtractor

OnnxComMicrosoftQLinearMul_1

OnnxMelWeightMatrix

OnnxAiOnnxMlArrayFeatureExtractor_1

OnnxComMicrosoftQLinearReduceMean

OnnxMelWeightMatrix_17

OnnxAiOnnxMlBinarizer

OnnxComMicrosoftQLinearReduceMean_1

OnnxMin

OnnxAiOnnxMlBinarizer_1

OnnxComMicrosoftQLinearSigmoid

OnnxMin_1

OnnxAiOnnxMlCastMap

OnnxComMicrosoftQLinearSigmoid_1

OnnxMin_12

OnnxAiOnnxMlCastMap_1

OnnxComMicrosoftQuantizeLinear

OnnxMin_13

OnnxAiOnnxMlCategoryMapper

OnnxComMicrosoftQuantizeLinear_1

OnnxMin_6

OnnxAiOnnxMlCategoryMapper_1

OnnxComMicrosoftRange

OnnxMin_8

OnnxAiOnnxMlDictVectorizer

OnnxComMicrosoftRange_1

OnnxMish

OnnxAiOnnxMlDictVectorizer_1

OnnxComMicrosoftRecordEvent

OnnxMish_18

OnnxAiOnnxMlFeatureVectorizer

OnnxComMicrosoftRecordEvent_1

OnnxMod

OnnxAiOnnxMlFeatureVectorizer_1

OnnxComMicrosoftRecv

OnnxMod_10

OnnxAiOnnxMlImputer

OnnxComMicrosoftRecv_1

OnnxMod_13

OnnxAiOnnxMlImputer_1

OnnxComMicrosoftReduceAllL2

OnnxMul

OnnxAiOnnxMlLabelEncoder

OnnxComMicrosoftReduceAllL2_1

OnnxMul_1

OnnxAiOnnxMlLabelEncoder_1

OnnxComMicrosoftReduceSumInteger

OnnxMul_13

OnnxAiOnnxMlLabelEncoder_2

OnnxComMicrosoftReduceSumInteger_1

OnnxMul_14

OnnxAiOnnxMlLinearClassifier

OnnxComMicrosoftReduceSumTraining

OnnxMul_6

OnnxAiOnnxMlLinearClassifier_1

OnnxComMicrosoftReduceSumTraining_1

OnnxMul_7

OnnxAiOnnxMlLinearRegressor

OnnxComMicrosoftReluGrad

OnnxMultinomial

OnnxAiOnnxMlLinearRegressor_1

OnnxComMicrosoftReluGrad_1

OnnxMultinomial_7

OnnxAiOnnxMlNormalizer

OnnxComMicrosoftRfft

OnnxNeg

OnnxAiOnnxMlNormalizer_1

OnnxComMicrosoftRfft_1

OnnxNeg_1

OnnxAiOnnxMlOneHotEncoder

OnnxComMicrosoftSGDOptimizer

OnnxNeg_13

OnnxAiOnnxMlOneHotEncoder_1

OnnxComMicrosoftSGDOptimizer_1

OnnxNeg_6

OnnxAiOnnxMlSVMClassifier

OnnxComMicrosoftSampleOp

OnnxNegativeLogLikelihoodLoss

OnnxAiOnnxMlSVMClassifier_1

OnnxComMicrosoftSampleOp_1

OnnxNegativeLogLikelihoodLoss_12

OnnxAiOnnxMlSVMRegressor

OnnxComMicrosoftScale

OnnxNegativeLogLikelihoodLoss_13

OnnxAiOnnxMlSVMRegressor_1

OnnxComMicrosoftScale_1

OnnxNonMaxSuppression

OnnxAiOnnxMlScaler

OnnxComMicrosoftSend

OnnxNonMaxSuppression_10

OnnxAiOnnxMlScaler_1

OnnxComMicrosoftSend_1

OnnxNonMaxSuppression_11

OnnxAiOnnxMlTreeEnsembleClassifier

OnnxComMicrosoftSigmoidGrad

OnnxNonZero

OnnxAiOnnxMlTreeEnsembleClassifier_1

OnnxComMicrosoftSigmoidGrad_1

OnnxNonZero_13

OnnxAiOnnxMlTreeEnsembleClassifier_3

OnnxComMicrosoftSimplifiedLayerNormalizationGrad

OnnxNonZero_9

OnnxAiOnnxMlTreeEnsembleRegressor

OnnxComMicrosoftSimplifiedLayerNormalizationGrad_1

OnnxNot

OnnxAiOnnxMlTreeEnsembleRegressor_1

OnnxComMicrosoftSkipLayerNormalization

OnnxNot_1

OnnxAiOnnxMlTreeEnsembleRegressor_3

OnnxComMicrosoftSkipLayerNormalization_1

OnnxOneHot

OnnxAiOnnxMlZipMap

OnnxComMicrosoftSliceGrad

OnnxOneHot_11

OnnxAiOnnxMlZipMap_1

OnnxComMicrosoftSliceGrad_1

OnnxOneHot_9

OnnxAiOnnxPreviewTrainingAdagrad

OnnxComMicrosoftSnpe

OnnxOptional

OnnxAiOnnxPreviewTrainingAdagrad_1

OnnxComMicrosoftSnpe_1

OnnxOptionalGetElement

OnnxAiOnnxPreviewTrainingAdam

OnnxComMicrosoftSoftmaxCrossEntropy

OnnxOptionalGetElement_15

OnnxAiOnnxPreviewTrainingAdam_1

OnnxComMicrosoftSoftmaxCrossEntropyGrad

OnnxOptionalGetElement_18

OnnxAiOnnxPreviewTrainingGradient

OnnxComMicrosoftSoftmaxCrossEntropyGrad_1

OnnxOptionalHasElement

OnnxAiOnnxPreviewTrainingGradient_1

OnnxComMicrosoftSoftmaxCrossEntropyLossGrad

OnnxOptionalHasElement_15

OnnxAiOnnxPreviewTrainingMomentum

OnnxComMicrosoftSoftmaxCrossEntropyLossGrad_1

OnnxOptionalHasElement_18

OnnxAiOnnxPreviewTrainingMomentum_1

OnnxComMicrosoftSoftmaxCrossEntropyLossInternal

OnnxOptional_15

OnnxAnd

OnnxComMicrosoftSoftmaxCrossEntropyLossInternalGrad

OnnxOr

OnnxAnd_1

OnnxComMicrosoftSoftmaxCrossEntropyLossInternalGrad_1

OnnxOr_1

OnnxAnd_7

OnnxComMicrosoftSoftmaxCrossEntropyLossInternal_1

OnnxOr_7

OnnxArgMax

OnnxComMicrosoftSoftmaxCrossEntropy_1

OnnxOrgPytorchAtenATen

OnnxArgMax_1

OnnxComMicrosoftSoftmaxGrad

OnnxOrgPytorchAtenATen_1

OnnxArgMax_11

OnnxComMicrosoftSoftmaxGrad_1

OnnxPRelu

OnnxArgMax_12

OnnxComMicrosoftSoftmaxGrad_13

OnnxPRelu_1

OnnxArgMax_13

OnnxComMicrosoftSoftmaxGrad_13_1

OnnxPRelu_16

OnnxArgMin

OnnxComMicrosoftSparseToDenseMatMul

OnnxPRelu_6

OnnxArgMin_1

OnnxComMicrosoftSparseToDenseMatMul_1

OnnxPRelu_7

OnnxArgMin_11

OnnxComMicrosoftSplitTraining

OnnxPRelu_9

OnnxArgMin_12

OnnxComMicrosoftSplitTraining_1

OnnxPad

OnnxArgMin_13

OnnxComMicrosoftSummaryHistogram

OnnxPad_1

OnnxAsin

OnnxComMicrosoftSummaryHistogram_1

OnnxPad_11

OnnxAsin_7

OnnxComMicrosoftSummaryMerge

OnnxPad_13

OnnxAsinh

OnnxComMicrosoftSummaryMerge_1

OnnxPad_18

OnnxAsinh_9

OnnxComMicrosoftSummaryScalar

OnnxPad_2

OnnxAtan

OnnxComMicrosoftSummaryScalar_1

OnnxPow

OnnxAtan_7

OnnxComMicrosoftSummaryText

OnnxPow_1

OnnxAtanh

OnnxComMicrosoftSummaryText_1

OnnxPow_12

OnnxAtanh_9

OnnxComMicrosoftTanhGrad

OnnxPow_13

OnnxAveragePool

OnnxComMicrosoftTanhGrad_1

OnnxPow_15

OnnxAveragePool_1

OnnxComMicrosoftTokenizer

OnnxPow_7

OnnxAveragePool_10

OnnxComMicrosoftTokenizer_1

OnnxQLinearConv

OnnxAveragePool_11

OnnxComMicrosoftTorchEmbedding

OnnxQLinearConv_10

OnnxAveragePool_7

OnnxComMicrosoftTorchEmbedding_1

OnnxQLinearMatMul

OnnxBatchNormalization

OnnxComMicrosoftTransposeMatMul

OnnxQLinearMatMul_10

OnnxBatchNormalization_1

OnnxComMicrosoftTransposeMatMul_1

OnnxQuantizeLinear

OnnxBatchNormalization_14

OnnxComMicrosoftTrilu

OnnxQuantizeLinear_10

OnnxBatchNormalization_15

OnnxComMicrosoftTrilu_1

OnnxQuantizeLinear_13

OnnxBatchNormalization_6

OnnxComMicrosoftUnique

OnnxRNN

OnnxBatchNormalization_7

OnnxComMicrosoftUnique_1

OnnxRNN_1

OnnxBatchNormalization_9

OnnxComMicrosoftView

OnnxRNN_14

OnnxBernoulli

OnnxComMicrosoftView_1

OnnxRNN_7

OnnxBernoulli_15

OnnxComMicrosoftWaitEvent

OnnxRandomNormal

OnnxBitShift

OnnxComMicrosoftWaitEvent_1

OnnxRandomNormalLike

OnnxBitShift_11

OnnxComMicrosoftWordConvEmbedding

OnnxRandomNormalLike_1

OnnxBitwiseAnd

OnnxComMicrosoftWordConvEmbedding_1

OnnxRandomNormal_1

OnnxBitwiseAnd_18

OnnxComMicrosoftYieldOp

OnnxRandomUniform

OnnxBitwiseNot

OnnxComMicrosoftYieldOp_1

OnnxRandomUniformLike

OnnxBitwiseNot_18

OnnxComMicrosoftZeroGradient

OnnxRandomUniformLike_1

OnnxBitwiseOr

OnnxComMicrosoftZeroGradient_1

OnnxRandomUniform_1

OnnxBitwiseOr_18

OnnxComMsInternalNhwcConv

OnnxRange

OnnxBitwiseXor

OnnxComMsInternalNhwcConv_11

OnnxRange_11

OnnxBitwiseXor_18

OnnxComMsInternalNhwcMaxPool

OnnxReciprocal

OnnxBlackmanWindow

OnnxComMsInternalNhwcMaxPool_11

OnnxReciprocal_1

OnnxBlackmanWindow_17

OnnxCompress

OnnxReciprocal_13

OnnxCast

OnnxCompress_11

OnnxReciprocal_6

OnnxCastLike

OnnxCompress_9

OnnxReduceL1

OnnxCastLike_15

OnnxConcat

OnnxReduceL1_1

OnnxCast_1

OnnxConcatFromSequence

OnnxReduceL1_11

OnnxCast_13

OnnxConcatFromSequence_11

OnnxReduceL1_13

OnnxCast_6

OnnxConcat_1

OnnxReduceL1_18

OnnxCast_9

OnnxConcat_11

OnnxReduceL2

OnnxCeil

OnnxConcat_13

OnnxReduceL2_1

OnnxCeil_1

OnnxConcat_4

OnnxReduceL2_11

OnnxCeil_13

OnnxConstant

OnnxReduceL2_13

OnnxCeil_6

OnnxConstantOfShape

OnnxReduceL2_18

OnnxCelu

OnnxConstantOfShape_9

OnnxReduceLogSum

OnnxCelu_12

OnnxConstant_1

OnnxReduceLogSumExp

OnnxCenterCropPad

OnnxConstant_11

OnnxReduceLogSumExp_1

OnnxCenterCropPad_18

OnnxConstant_12

OnnxReduceLogSumExp_11

OnnxClip

OnnxConstant_13

OnnxReduceLogSumExp_13

OnnxClip_1

OnnxConstant_9

OnnxReduceLogSumExp_18

OnnxClip_11

OnnxConv

OnnxReduceLogSum_1

OnnxClip_12

OnnxConvInteger

OnnxReduceLogSum_11

OnnxClip_13

OnnxConvInteger_10

OnnxReduceLogSum_13

OnnxClip_6

OnnxConvTranspose

OnnxReduceLogSum_18

OnnxCol2Im

OnnxConvTranspose_1

OnnxReduceMax

OnnxCol2Im_18

OnnxConvTranspose_11

OnnxReduceMax_1

OnnxComMicrosoftAdamOptimizer

OnnxConv_1

OnnxReduceMax_11

OnnxComMicrosoftAdamOptimizer_1

OnnxConv_11

OnnxReduceMax_12

OnnxComMicrosoftAdamWOptimizer

OnnxCos

OnnxReduceMax_13

OnnxComMicrosoftAdamWOptimizer_1

OnnxCos_7

OnnxReduceMax_18

OnnxComMicrosoftAdasumAllReduce

OnnxCosh

OnnxReduceMean

OnnxComMicrosoftAdasumAllReduce_1

OnnxCosh_9

OnnxReduceMean_1

OnnxComMicrosoftAll

OnnxCumSum

OnnxReduceMean_11

OnnxComMicrosoftAll_1

OnnxCumSum_11

OnnxReduceMean_13

OnnxComMicrosoftAttention

OnnxCumSum_14

OnnxReduceMean_18

OnnxComMicrosoftAttention_1

OnnxDFT

OnnxReduceMin

OnnxComMicrosoftAttnLSTM

OnnxDFT_17

OnnxReduceMin_1

OnnxComMicrosoftAttnLSTM_1

OnnxDepthToSpace

OnnxReduceMin_11

OnnxComMicrosoftBatchNormInternal

OnnxDepthToSpace_1

OnnxReduceMin_12

OnnxComMicrosoftBatchNormInternal_1

OnnxDepthToSpace_11

OnnxReduceMin_13

OnnxComMicrosoftBatchNormalizationGrad

OnnxDepthToSpace_13

OnnxReduceMin_18

OnnxComMicrosoftBatchNormalizationGrad_1

OnnxDequantizeLinear

OnnxReduceProd

OnnxComMicrosoftBeamSearch

OnnxDequantizeLinear_10

OnnxReduceProd_1

OnnxComMicrosoftBeamSearch_1

OnnxDequantizeLinear_13

OnnxReduceProd_11

OnnxComMicrosoftBiasDropout

OnnxDet

OnnxReduceProd_13

OnnxComMicrosoftBiasDropout_1

OnnxDet_11

OnnxReduceProd_18

OnnxComMicrosoftBiasFastGeluGrad_dX

OnnxDiv

OnnxReduceSum

OnnxComMicrosoftBiasFastGeluGrad_dX_1

OnnxDiv_1

OnnxReduceSumSquare

OnnxComMicrosoftBiasGelu

OnnxDiv_13

OnnxReduceSumSquare_1

OnnxComMicrosoftBiasGeluGrad_dX

OnnxDiv_14

OnnxReduceSumSquare_11

OnnxComMicrosoftBiasGeluGrad_dX_1

OnnxDiv_6

OnnxReduceSumSquare_13

OnnxComMicrosoftBiasGelu_1

OnnxDiv_7

OnnxReduceSumSquare_18

OnnxComMicrosoftBiasSoftmax

OnnxDropout

OnnxReduceSum_1

OnnxComMicrosoftBiasSoftmax_1

OnnxDropout_1

OnnxReduceSum_11

OnnxComMicrosoftBifurcationDetector

OnnxDropout_10

OnnxReduceSum_13

OnnxComMicrosoftBifurcationDetector_1

OnnxDropout_12

OnnxRelu

OnnxComMicrosoftBitmaskBiasDropout

OnnxDropout_13

OnnxRelu_1

OnnxComMicrosoftBitmaskBiasDropout_1

OnnxDropout_6

OnnxRelu_13

OnnxComMicrosoftBitmaskDropout

OnnxDropout_7

OnnxRelu_14

OnnxComMicrosoftBitmaskDropoutGrad

OnnxDynamicQuantizeLinear

OnnxRelu_6

OnnxComMicrosoftBitmaskDropoutGrad_1

OnnxDynamicQuantizeLinear_11

OnnxReshape

OnnxComMicrosoftBitmaskDropout_1

OnnxEinsum

OnnxReshape_1

OnnxComMicrosoftBroadcastGradientArgs

OnnxEinsum_12

OnnxReshape_13

OnnxComMicrosoftBroadcastGradientArgs_1

OnnxElu

OnnxReshape_14

OnnxComMicrosoftCDist

OnnxElu_1

OnnxReshape_5

OnnxComMicrosoftCDist_1

OnnxElu_6

OnnxResize

OnnxComMicrosoftComplexMul

OnnxEqual

OnnxResize_10

OnnxComMicrosoftComplexMulConj

OnnxEqual_1

OnnxResize_11

OnnxComMicrosoftComplexMulConj_1

OnnxEqual_11

OnnxResize_13

OnnxComMicrosoftComplexMul_1

OnnxEqual_13

OnnxResize_18

OnnxComMicrosoftConcatTraining

OnnxEqual_7

OnnxReverseSequence

OnnxComMicrosoftConcatTraining_1

OnnxErf

OnnxReverseSequence_10

OnnxComMicrosoftConvGrad

OnnxErf_13

OnnxRoiAlign

OnnxComMicrosoftConvGrad_1

OnnxErf_9

OnnxRoiAlign_10

OnnxComMicrosoftConvTransposeWithDynamicPads

OnnxExp

OnnxRoiAlign_16

OnnxComMicrosoftConvTransposeWithDynamicPads_1

OnnxExp_1

OnnxRound

OnnxComMicrosoftCropAndResize

OnnxExp_13

OnnxRound_11

OnnxComMicrosoftCropAndResize_1

OnnxExp_6

OnnxSTFT

OnnxComMicrosoftDecoderAttention

OnnxExpand

OnnxSTFT_17

OnnxComMicrosoftDecoderAttention_1

OnnxExpand_13

OnnxScan

OnnxComMicrosoftDequantizeLinear

OnnxExpand_8

OnnxScan_11

OnnxComMicrosoftDequantizeLinear_1

OnnxEyeLike

OnnxScan_16

OnnxComMicrosoftDivGrad

OnnxEyeLike_9

OnnxScan_8

OnnxComMicrosoftDivGrad_1

OnnxFlatten

OnnxScan_9

OnnxComMicrosoftDropoutGrad

OnnxFlatten_1

OnnxScatter

OnnxComMicrosoftDropoutGrad_1

OnnxFlatten_11

OnnxScatterElements

OnnxComMicrosoftDynamicQuantizeLSTM

OnnxFlatten_13

OnnxScatterElements_11

OnnxComMicrosoftDynamicQuantizeLSTM_1

OnnxFlatten_9

OnnxScatterElements_13

OnnxComMicrosoftDynamicQuantizeMatMul

OnnxFloor

OnnxScatterElements_16

OnnxComMicrosoftDynamicQuantizeMatMul_1

OnnxFloor_1

OnnxScatterElements_18

OnnxComMicrosoftEmbedLayerNormalization

OnnxFloor_13

OnnxScatterND

OnnxComMicrosoftEmbedLayerNormalization_1

OnnxFloor_6

OnnxScatterND_11

OnnxComMicrosoftExpandDims

OnnxGRU

OnnxScatterND_13

OnnxComMicrosoftExpandDims_1

OnnxGRU_1

OnnxScatterND_16

OnnxComMicrosoftFastGelu

OnnxGRU_14

OnnxScatterND_18

OnnxComMicrosoftFastGeluGrad

OnnxGRU_3

OnnxScatter_11

OnnxComMicrosoftFastGeluGrad_1

OnnxGRU_7

OnnxScatter_9

OnnxComMicrosoftFastGelu_1

OnnxGather

OnnxSelu

OnnxComMicrosoftFusedConv

OnnxGatherElements

OnnxSelu_1

OnnxComMicrosoftFusedConv_1

OnnxGatherElements_11

OnnxSelu_6

OnnxComMicrosoftFusedGemm

OnnxGatherElements_13

OnnxSequenceAt

OnnxComMicrosoftFusedGemm_1

OnnxGatherND

OnnxSequenceAt_11

OnnxComMicrosoftFusedMatMul

OnnxGatherND_11

OnnxSequenceConstruct

OnnxComMicrosoftFusedMatMul_1

OnnxGatherND_12

OnnxSequenceConstruct_11

OnnxComMicrosoftGatherElementsGrad

OnnxGatherND_13

OnnxSequenceEmpty

OnnxComMicrosoftGatherElementsGrad_1

OnnxGather_1

OnnxSequenceEmpty_11

OnnxComMicrosoftGatherGrad

OnnxGather_11

OnnxSequenceErase

OnnxComMicrosoftGatherGrad_1

OnnxGather_13

OnnxSequenceErase_11

OnnxComMicrosoftGatherND

OnnxGemm

OnnxSequenceInsert

OnnxComMicrosoftGatherNDGrad

OnnxGemm_1

OnnxSequenceInsert_11

OnnxComMicrosoftGatherNDGrad_1

OnnxGemm_11

OnnxSequenceLength

OnnxComMicrosoftGatherND_1

OnnxGemm_13

OnnxSequenceLength_11

OnnxComMicrosoftGelu

OnnxGemm_6

OnnxSequenceMap

OnnxComMicrosoftGeluGrad

OnnxGemm_7

OnnxSequenceMap_17

OnnxComMicrosoftGeluGrad_1

OnnxGemm_9

OnnxShape

OnnxComMicrosoftGelu_1

OnnxGlobalAveragePool

OnnxShape_1

OnnxComMicrosoftGistBinarizeDecoder

OnnxGlobalAveragePool_1

OnnxShape_13

OnnxComMicrosoftGistBinarizeDecoder_1

OnnxGlobalLpPool

OnnxShape_15

OnnxComMicrosoftGistBinarizeEncoder

OnnxGlobalLpPool_1

OnnxShrink

OnnxComMicrosoftGistBinarizeEncoder_1

OnnxGlobalLpPool_2

OnnxShrink_9

OnnxComMicrosoftGistPack16Decoder

OnnxGlobalMaxPool

OnnxSigmoid

OnnxComMicrosoftGistPack16Decoder_1

OnnxGlobalMaxPool_1

OnnxSigmoid_1

OnnxComMicrosoftGistPack16Encoder

OnnxGreater

OnnxSigmoid_13

OnnxComMicrosoftGistPack16Encoder_1

OnnxGreaterOrEqual

OnnxSigmoid_6

OnnxComMicrosoftGistPack1Decoder

OnnxGreaterOrEqual_12

OnnxSign

OnnxComMicrosoftGistPack1Decoder_1

OnnxGreaterOrEqual_16

OnnxSign_13

OnnxComMicrosoftGistPack1Encoder

OnnxGreater_1

OnnxSign_9

OnnxComMicrosoftGistPack1Encoder_1

OnnxGreater_13

OnnxSin

OnnxComMicrosoftGistPack8Decoder

OnnxGreater_7

OnnxSin_7

OnnxComMicrosoftGistPack8Decoder_1

OnnxGreater_9

OnnxSinh

OnnxComMicrosoftGistPack8Encoder

OnnxGridSample

OnnxSinh_9

OnnxComMicrosoftGistPack8Encoder_1

OnnxGridSample_16

OnnxSize

OnnxComMicrosoftGistPackMsfp15Decoder

OnnxGroupNormalization

OnnxSize_1

OnnxComMicrosoftGistPackMsfp15Decoder_1

OnnxGroupNormalization_18

OnnxSize_13

OnnxComMicrosoftGistPackMsfp15Encoder

OnnxHammingWindow

OnnxSlice

OnnxComMicrosoftGistPackMsfp15Encoder_1

OnnxHammingWindow_17

OnnxSlice_1

OnnxComMicrosoftGridSample

OnnxHannWindow

OnnxSlice_10

OnnxComMicrosoftGridSample_1

OnnxHannWindow_17

OnnxSlice_11

OnnxComMicrosoftGroup

OnnxHardSigmoid

OnnxSlice_13

OnnxComMicrosoftGroup_1

OnnxHardSigmoid_1

OnnxSoftmax

OnnxComMicrosoftInPlaceAccumulator

OnnxHardSigmoid_6

OnnxSoftmaxCrossEntropyLoss

OnnxComMicrosoftInPlaceAccumulator_1

OnnxHardSwish

OnnxSoftmaxCrossEntropyLoss_12

OnnxComMicrosoftInverse

OnnxHardSwish_14

OnnxSoftmaxCrossEntropyLoss_13

OnnxComMicrosoftInverse_1

OnnxHardmax

OnnxSoftmax_1

OnnxComMicrosoftInvertibleLayerNormalizationGrad

OnnxHardmax_1

OnnxSoftmax_11

OnnxComMicrosoftInvertibleLayerNormalizationGrad_1

OnnxHardmax_11

OnnxSoftmax_13

OnnxComMicrosoftIrfft

OnnxHardmax_13

OnnxSoftplus

OnnxComMicrosoftIrfft_1

OnnxIdentity

OnnxSoftplus_1

OnnxComMicrosoftIsAllFinite

OnnxIdentity_1

OnnxSoftsign

OnnxComMicrosoftIsAllFinite_1

OnnxIdentity_13

OnnxSoftsign_1

OnnxComMicrosoftIsFinite

OnnxIdentity_14

OnnxSpaceToDepth

OnnxComMicrosoftIsFinite_1

OnnxIdentity_16

OnnxSpaceToDepth_1

OnnxComMicrosoftLambOptimizer

OnnxIf

OnnxSpaceToDepth_13

OnnxComMicrosoftLambOptimizer_1

OnnxIf_1

OnnxSplit

OnnxComMicrosoftLayerNormalizationGrad

OnnxIf_11

OnnxSplitToSequence

OnnxComMicrosoftLayerNormalizationGrad_1

OnnxIf_13

OnnxSplitToSequence_11

OnnxComMicrosoftLogSoftmaxGrad

OnnxIf_16

OnnxSplit_1

OnnxComMicrosoftLogSoftmaxGrad_1

OnnxInstanceNormalization

OnnxSplit_11

OnnxComMicrosoftLogSoftmaxGrad_13

OnnxInstanceNormalization_1

OnnxSplit_13

OnnxComMicrosoftLogSoftmaxGrad_13_1

OnnxInstanceNormalization_6

OnnxSplit_18

OnnxComMicrosoftLongformerAttention

OnnxIsInf

OnnxSplit_2

OnnxComMicrosoftLongformerAttention_1

OnnxIsInf_10

OnnxSqrt

OnnxComMicrosoftMatMulInteger16

OnnxIsNaN

OnnxSqrt_1

OnnxComMicrosoftMatMulInteger16_1

OnnxIsNaN_13

OnnxSqrt_13

OnnxComMicrosoftMatMulIntegerToFloat

OnnxIsNaN_9

OnnxSqrt_6

OnnxComMicrosoftMatMulIntegerToFloat_1

OnnxLRN

OnnxSqueeze

OnnxComMicrosoftMaxpoolWithMask

OnnxLRN_1

OnnxSqueeze_1

OnnxComMicrosoftMaxpoolWithMask_1

OnnxLRN_13

OnnxSqueeze_11

OnnxComMicrosoftMegatronF

OnnxLSTM

OnnxSqueeze_13

OnnxComMicrosoftMegatronF_1

OnnxLSTM_1

OnnxStringNormalizer

OnnxComMicrosoftMegatronG

OnnxLSTM_14

OnnxStringNormalizer_10

OnnxComMicrosoftMegatronG_1

OnnxLSTM_7

OnnxSub

OnnxComMicrosoftMixedPrecisionScale

OnnxLayerNormalization

OnnxSub_1

OnnxComMicrosoftMixedPrecisionScale_1

OnnxLayerNormalization_17

OnnxSub_13

OnnxComMicrosoftMulInteger

OnnxLeakyRelu

OnnxSub_14

OnnxComMicrosoftMulInteger_1

OnnxLeakyRelu_1

OnnxSub_6

OnnxComMicrosoftMurmurHash3

OnnxLeakyRelu_16

OnnxSub_7

OnnxComMicrosoftMurmurHash3_1

OnnxLeakyRelu_6

OnnxSum

OnnxComMicrosoftNGramRepeatBlock

OnnxLess

OnnxSum_1

OnnxComMicrosoftNGramRepeatBlock_1

OnnxLessOrEqual

OnnxSum_13

OnnxComMicrosoftNcclAllGather

OnnxLessOrEqual_12

OnnxSum_6

OnnxComMicrosoftNcclAllGather_1

OnnxLessOrEqual_16

OnnxSum_8

OnnxComMicrosoftNcclAllReduce

OnnxLess_1

OnnxTan

OnnxComMicrosoftNcclAllReduce_1

OnnxLess_13

OnnxTan_7

OnnxComMicrosoftNcclReduceScatter

OnnxLess_7

OnnxTanh

OnnxComMicrosoftNcclReduceScatter_1

OnnxLess_9

OnnxTanh_1

OnnxComMicrosoftNchwcAveragePool

OnnxLog

OnnxTanh_13

OnnxComMicrosoftNchwcAveragePool_1

OnnxLogSoftmax

OnnxTanh_6

OnnxComMicrosoftNchwcConv

OnnxLogSoftmax_1

OnnxTfIdfVectorizer

OnnxComMicrosoftNchwcConv_1

OnnxLogSoftmax_11

OnnxTfIdfVectorizer_9

OnnxComMicrosoftNchwcGlobalAveragePool

OnnxLogSoftmax_13

OnnxThresholdedRelu

OnnxComMicrosoftNchwcGlobalAveragePool_1

OnnxLog_1

OnnxThresholdedRelu_10

OnnxComMicrosoftNchwcGlobalMaxPool

OnnxLog_13

OnnxTile

OnnxComMicrosoftNchwcGlobalMaxPool_1

OnnxLog_6

OnnxTile_1

OnnxComMicrosoftNchwcMaxPool

OnnxLoop

OnnxTile_13

OnnxComMicrosoftNchwcMaxPool_1

OnnxLoop_1

OnnxTile_6

OnnxComMicrosoftNchwcReorderInput

OnnxLoop_11

OnnxTopK

OnnxComMicrosoftNchwcReorderInput_1

OnnxLoop_13

OnnxTopK_1

OnnxComMicrosoftNchwcReorderOutput

OnnxLoop_16

OnnxTopK_10

OnnxComMicrosoftNchwcReorderOutput_1

OnnxLpNormalization

OnnxTopK_11

OnnxComMicrosoftNchwcUpsample

OnnxLpNormalization_1

OnnxTranspose

OnnxComMicrosoftNchwcUpsample_1

OnnxLpPool

OnnxTranspose_1

OnnxComMicrosoftNegativeLogLikelihoodLossInternal

OnnxLpPool_1

OnnxTranspose_13

OnnxComMicrosoftNegativeLogLikelihoodLossInternal2

OnnxLpPool_11

OnnxTrilu

OnnxComMicrosoftNegativeLogLikelihoodLossInternal2_1

OnnxLpPool_18

OnnxTrilu_14

OnnxComMicrosoftNegativeLogLikelihoodLossInternal_1

OnnxLpPool_2

OnnxUnique

OnnxComMicrosoftNhwcConv

OnnxMatMul

OnnxUnique_11

OnnxComMicrosoftNhwcConv_1

OnnxMatMulInteger

OnnxUnsqueeze

OnnxComMicrosoftNhwcMaxPool

OnnxMatMulInteger_10

OnnxUnsqueeze_1

OnnxComMicrosoftNhwcMaxPool_1

OnnxMatMul_1

OnnxUnsqueeze_11

OnnxComMicrosoftPad

OnnxMatMul_13

OnnxUnsqueeze_13

OnnxComMicrosoftPad_1

OnnxMatMul_9

OnnxUpsample

OnnxComMicrosoftPassThrough

OnnxMax

OnnxUpsample_10

OnnxComMicrosoftPassThrough_1

OnnxMaxPool

OnnxUpsample_7

OnnxComMicrosoftPythonOp

OnnxMaxPool_1

OnnxUpsample_9

OnnxComMicrosoftPythonOpGrad

OnnxMaxPool_10

OnnxWhere

OnnxComMicrosoftPythonOpGrad_1

OnnxMaxPool_11

OnnxWhere_16

OnnxComMicrosoftPythonOp_1

OnnxMaxPool_12

OnnxWhere_9

OnnxComMicrosoftQAttention

OnnxMaxPool_8

OnnxXor

OnnxComMicrosoftQAttention_1

OnnxMaxRoiPool

OnnxXor_1

OnnxComMicrosoftQEmbedLayerNormalization

OnnxMaxRoiPool_1

OnnxXor_7

OnnxComMicrosoftQEmbedLayerNormalization_1

OnnxMaxUnpool

OnnxComMicrosoftQGemm

OnnxMaxUnpool_11

OnnxAbs#

class mlprodict.npy.xop_auto_import_.OnnxAbs(*args, **kwargs)#

Version

  • name: Abs (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Absolute takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the absolute is, y = abs(x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxAbs_1#

class mlprodict.npy.xop_auto_import_.OnnxAbs_1(*args, **kwargs)#

Version

  • name: Abs (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Absolute takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the absolute is, y = abs(x), is applied to the tensor elementwise.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAbs_13#

class mlprodict.npy.xop_auto_import_.OnnxAbs_13(*args, **kwargs)#

Version

  • name: Abs (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Absolute takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the absolute is, y = abs(x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxAbs_6#

class mlprodict.npy.xop_auto_import_.OnnxAbs_6(*args, **kwargs)#

Version

  • name: Abs (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Absolute takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the absolute is, y = abs(x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxAcos#

class mlprodict.npy.xop_auto_import_.OnnxAcos(*args, **kwargs)#

Version

  • name: Acos (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the arccosine (inverse of cosine) of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The arccosine of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAcos_7#

class mlprodict.npy.xop_auto_import_.OnnxAcos_7(*args, **kwargs)#

Version

  • name: Acos (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the arccosine (inverse of cosine) of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The arccosine of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAcosh#

class mlprodict.npy.xop_auto_import_.OnnxAcosh(*args, **kwargs)#

Version

  • name: Acosh (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Calculates the hyperbolic arccosine of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic arccosine values of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAcosh_9#

class mlprodict.npy.xop_auto_import_.OnnxAcosh_9(*args, **kwargs)#

Version

  • name: Acosh (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Calculates the hyperbolic arccosine of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic arccosine values of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAdd#

class mlprodict.npy.xop_auto_import_.OnnxAdd(*args, **kwargs)#

Version

  • name: Add (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Performs element-wise binary addition (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

(Opset 14 change): Extend supported types to include uint8, int8, uint16, and int16.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxAdd_1#

class mlprodict.npy.xop_auto_import_.OnnxAdd_1(*args, **kwargs)#

Version

  • name: Add (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Performs element-wise binary addition (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of element size 1 (including a scalar tensor and any tensor with rank equal to or smaller than the first tensor), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor shape(A) = (2, 3, 4, 5), shape(B) = (5,) shape(A) = (2, 3, 4, 5), shape(B) = (4, 5) shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1 shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

  • axis: If set, defines the broadcast dimensions. See doc for details.

  • broadcast: Pass 1 to enable broadcasting Default value is 0.

  • consumed_inputs: legacy optimization attribute.

Inputs

  • A (heterogeneous) - T: First operand, should share the type with the second operand.

  • B (heterogeneous) - T: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

  • C (heterogeneous) - T: Result, has same dimensions and type as A

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAdd_13#

class mlprodict.npy.xop_auto_import_.OnnxAdd_13(*args, **kwargs)#

Version

  • name: Add (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Performs element-wise binary addition (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxAdd_14#

class mlprodict.npy.xop_auto_import_.OnnxAdd_14(*args, **kwargs)#

Version

  • name: Add (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Performs element-wise binary addition (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

(Opset 14 change): Extend supported types to include uint8, int8, uint16, and int16.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxAdd_6#

class mlprodict.npy.xop_auto_import_.OnnxAdd_6(*args, **kwargs)#

Version

  • name: Add (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Performs element-wise binary addition (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of element size 1 (including a scalar tensor and any tensor with rank equal to or smaller than the first tensor), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor shape(A) = (2, 3, 4, 5), shape(B) = (5,) shape(A) = (2, 3, 4, 5), shape(B) = (4, 5) shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1 shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

  • axis: If set, defines the broadcast dimensions. See doc for details.

  • broadcast: Pass 1 to enable broadcasting Default value is 0.

Inputs

  • A (heterogeneous) - T: First operand, should share the type with the second operand.

  • B (heterogeneous) - T: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

  • C (heterogeneous) - T: Result, has same dimensions and type as A

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxAdd_7#

class mlprodict.npy.xop_auto_import_.OnnxAdd_7(*args, **kwargs)#

Version

  • name: Add (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Performs element-wise binary addition (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxAiOnnxMlArrayFeatureExtractor#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlArrayFeatureExtractor(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Select elements of the input tensor based on the indices passed.

The indices are applied to the last axes of the tensor.

Inputs

  • X (heterogeneous) - T: Data to be selected

  • Y (heterogeneous) - tensor(int64): The indices, based on 0 as the first index of any dimension.

Outputs

  • Z (heterogeneous) - T: Selected output data as an array

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string) ): The input must be a tensor of a numeric type or string. The output will be of the same tensor type.

OnnxAiOnnxMlArrayFeatureExtractor_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlArrayFeatureExtractor_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Select elements of the input tensor based on the indices passed.

The indices are applied to the last axes of the tensor.

Inputs

  • X (heterogeneous) - T: Data to be selected

  • Y (heterogeneous) - tensor(int64): The indices, based on 0 as the first index of any dimension.

Outputs

  • Z (heterogeneous) - T: Selected output data as an array

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string) ): The input must be a tensor of a numeric type or string. The output will be of the same tensor type.

OnnxAiOnnxMlBinarizer#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlBinarizer(*args, **kwargs)#

Version

  • name: Binarizer (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Maps the values of the input tensor to either 0 or 1, element-wise, based on the outcome of a comparison against a threshold value.

Attributes

  • threshold: Values greater than this are mapped to 1, others to 0. Default value is 0.0.

Inputs

  • X (heterogeneous) - T: Data to be binarized

Outputs

  • Y (heterogeneous) - T: Binarized output data

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type. The output will be of the same tensor type.

OnnxAiOnnxMlBinarizer_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlBinarizer_1(*args, **kwargs)#

Version

  • name: Binarizer (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Maps the values of the input tensor to either 0 or 1, element-wise, based on the outcome of a comparison against a threshold value.

Attributes

  • threshold: Values greater than this are mapped to 1, others to 0. Default value is 0.0.

Inputs

  • X (heterogeneous) - T: Data to be binarized

Outputs

  • Y (heterogeneous) - T: Binarized output data

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type. The output will be of the same tensor type.

OnnxAiOnnxMlCastMap#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlCastMap(*args, **kwargs)#

Version

  • name: CastMap (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Converts a map to a tensor. The map key must be an int64 and the values will be ordered in ascending order based on this key. The operator supports dense packing or sparse packing. If using sparse packing, the key cannot exceed the max_map-1 value.

Attributes

  • cast_to: A string indicating the desired element type of the output tensor, one of ‘TO_FLOAT’, ‘TO_STRING’, ‘TO_INT64’. Default value is 'TO_FLOAT'.

  • map_form: Indicates whether to only output as many values as are in the input (dense), or position the input based on using the key of the map as the index of the output (sparse).<br>One of ‘DENSE’, ‘SPARSE’. Default value is 'DENSE'.

  • max_map: If the value of map_form is ‘SPARSE,’ this attribute indicates the total length of the output tensor. Default value is 1.

Inputs

  • X (heterogeneous) - T1: The input map that is to be cast to a tensor

Outputs

  • Y (heterogeneous) - T2: A tensor representing the same data as the input map, ordered by their keys

Type Constraints

  • T1 in ( map(int64, float), map(int64, string) ): The input must be an integer map to either string or float.

  • T2 in ( tensor(float), tensor(int64), tensor(string) ): The output is a 1-D tensor of string, float, or integer.

OnnxAiOnnxMlCastMap_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlCastMap_1(*args, **kwargs)#

Version

  • name: CastMap (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Converts a map to a tensor. The map key must be an int64 and the values will be ordered in ascending order based on this key. The operator supports dense packing or sparse packing. If using sparse packing, the key cannot exceed the max_map-1 value.

Attributes

  • cast_to: A string indicating the desired element type of the output tensor, one of ‘TO_FLOAT’, ‘TO_STRING’, ‘TO_INT64’. Default value is 'TO_FLOAT'.

  • map_form: Indicates whether to only output as many values as are in the input (dense), or position the input based on using the key of the map as the index of the output (sparse).<br>One of ‘DENSE’, ‘SPARSE’. Default value is 'DENSE'.

  • max_map: If the value of map_form is ‘SPARSE,’ this attribute indicates the total length of the output tensor. Default value is 1.

Inputs

  • X (heterogeneous) - T1: The input map that is to be cast to a tensor

Outputs

  • Y (heterogeneous) - T2: A tensor representing the same data as the input map, ordered by their keys

Type Constraints

  • T1 in ( map(int64, float), map(int64, string) ): The input must be an integer map to either string or float.

  • T2 in ( tensor(float), tensor(int64), tensor(string) ): The output is a 1-D tensor of string, float, or integer.

OnnxAiOnnxMlCategoryMapper#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlCategoryMapper(*args, **kwargs)#

Version

  • name: CategoryMapper (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Converts strings to integers and vice versa.

Two sequences of equal length are used to map between integers and strings, with strings and integers at the same index detailing the mapping.

Each operator converts either integers to strings or strings to integers, depending on which default value attribute is provided. Only one default value attribute should be defined.

If the string default value is set, it will convert integers to strings. If the int default value is set, it will convert strings to integers.

Attributes

  • cats_int64s: The integers of the map. This sequence must be the same length as the ‘cats_strings’ sequence.

  • cats_strings: The strings of the map. This sequence must be the same length as the ‘cats_int64s’ sequence

  • default_int64: An integer to use when an input string value is not found in the map.<br>One and only one of the ‘default_*’ attributes must be defined. Default value is -1.

  • default_string: A string to use when an input integer value is not found in the map.<br>One and only one of the ‘default_*’ attributes must be defined. Default value is '_Unused'.

Inputs

  • X (heterogeneous) - T1: Input data

Outputs

  • Y (heterogeneous) - T2: Output data. If strings are input, the output values are integers, and vice versa.

Type Constraints

  • T1 in ( tensor(int64), tensor(string) ): The input must be a tensor of strings or integers, either [N,C] or [C].

  • T2 in ( tensor(int64), tensor(string) ): The output is a tensor of strings or integers. Its shape will be the same as the input shape.

OnnxAiOnnxMlCategoryMapper_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlCategoryMapper_1(*args, **kwargs)#

Version

  • name: CategoryMapper (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Converts strings to integers and vice versa.

Two sequences of equal length are used to map between integers and strings, with strings and integers at the same index detailing the mapping.

Each operator converts either integers to strings or strings to integers, depending on which default value attribute is provided. Only one default value attribute should be defined.

If the string default value is set, it will convert integers to strings. If the int default value is set, it will convert strings to integers.

Attributes

  • cats_int64s: The integers of the map. This sequence must be the same length as the ‘cats_strings’ sequence.

  • cats_strings: The strings of the map. This sequence must be the same length as the ‘cats_int64s’ sequence

  • default_int64: An integer to use when an input string value is not found in the map.<br>One and only one of the ‘default_*’ attributes must be defined. Default value is -1.

  • default_string: A string to use when an input integer value is not found in the map.<br>One and only one of the ‘default_*’ attributes must be defined. Default value is '_Unused'.

Inputs

  • X (heterogeneous) - T1: Input data

Outputs

  • Y (heterogeneous) - T2: Output data. If strings are input, the output values are integers, and vice versa.

Type Constraints

  • T1 in ( tensor(int64), tensor(string) ): The input must be a tensor of strings or integers, either [N,C] or [C].

  • T2 in ( tensor(int64), tensor(string) ): The output is a tensor of strings or integers. Its shape will be the same as the input shape.

OnnxAiOnnxMlDictVectorizer#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlDictVectorizer(*args, **kwargs)#

Version

  • name: DictVectorizer (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Uses an index mapping to convert a dictionary to an array.

Given a dictionary, each key is looked up in the vocabulary attribute corresponding to the key type. The index into the vocabulary array at which the key is found is then used to index the output 1-D tensor ‘Y’ and insert into it the value found in the dictionary ‘X’.

The key type of the input map must correspond to the element type of the defined vocabulary attribute. Therefore, the output array will be equal in length to the index mapping vector parameter. All keys in the input dictionary must be present in the index mapping vector. For each item in the input dictionary, insert its value in the output array. Any keys not present in the input dictionary, will be zero in the output array.

For example: if the string_vocabulary parameter is set to ["a", "c", "b", "z"], then an input of {"a": 4, "c": 8} will produce an output of [4, 8, 0, 0].

Attributes

  • int64_vocabulary: An integer vocabulary array.<br>One and only one of the vocabularies must be defined.

  • string_vocabulary: A string vocabulary array.<br>One and only one of the vocabularies must be defined.

Inputs

  • X (heterogeneous) - T1: A dictionary.

Outputs

  • Y (heterogeneous) - T2: A 1-D tensor holding values from the input dictionary.

Type Constraints

  • T1 in ( map(int64, double), map(int64, float), map(int64, string), map(string, double), map(string, float), map(string, int64) ): The input must be a map from strings or integers to either strings or a numeric type. The key and value types cannot be the same.

  • T2 in ( tensor(double), tensor(float), tensor(int64), tensor(string) ): The output will be a tensor of the value type of the input map. It’s shape will be [1,C], where C is the length of the input dictionary.

OnnxAiOnnxMlDictVectorizer_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlDictVectorizer_1(*args, **kwargs)#

Version

  • name: DictVectorizer (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Uses an index mapping to convert a dictionary to an array.

Given a dictionary, each key is looked up in the vocabulary attribute corresponding to the key type. The index into the vocabulary array at which the key is found is then used to index the output 1-D tensor ‘Y’ and insert into it the value found in the dictionary ‘X’.

The key type of the input map must correspond to the element type of the defined vocabulary attribute. Therefore, the output array will be equal in length to the index mapping vector parameter. All keys in the input dictionary must be present in the index mapping vector. For each item in the input dictionary, insert its value in the output array. Any keys not present in the input dictionary, will be zero in the output array.

For example: if the string_vocabulary parameter is set to ["a", "c", "b", "z"], then an input of {"a": 4, "c": 8} will produce an output of [4, 8, 0, 0].

Attributes

  • int64_vocabulary: An integer vocabulary array.<br>One and only one of the vocabularies must be defined.

  • string_vocabulary: A string vocabulary array.<br>One and only one of the vocabularies must be defined.

Inputs

  • X (heterogeneous) - T1: A dictionary.

Outputs

  • Y (heterogeneous) - T2: A 1-D tensor holding values from the input dictionary.

Type Constraints

  • T1 in ( map(int64, double), map(int64, float), map(int64, string), map(string, double), map(string, float), map(string, int64) ): The input must be a map from strings or integers to either strings or a numeric type. The key and value types cannot be the same.

  • T2 in ( tensor(double), tensor(float), tensor(int64), tensor(string) ): The output will be a tensor of the value type of the input map. It’s shape will be [1,C], where C is the length of the input dictionary.

OnnxAiOnnxMlFeatureVectorizer#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlFeatureVectorizer(*args, **kwargs)#

Version

  • name: FeatureVectorizer (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Concatenates input tensors into one continuous output.

All input shapes are 2-D and are concatenated along the second dimention. 1-D tensors are treated as [1,C]. Inputs are copied to the output maintaining the order of the input arguments.

All inputs must be integers or floats, while the output will be all floating point values.

Attributes

  • inputdimensions: The size of each input in the input list

Inputs

Between 1 and 2147483647 inputs.

  • X (variadic, heterogeneous) - T1: An ordered collection of tensors, all with the same element type.

Outputs

  • Y (heterogeneous) - tensor(float): The output array, elements ordered as the inputs.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type.

OnnxAiOnnxMlFeatureVectorizer_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlFeatureVectorizer_1(*args, **kwargs)#

Version

  • name: FeatureVectorizer (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Concatenates input tensors into one continuous output.

All input shapes are 2-D and are concatenated along the second dimention. 1-D tensors are treated as [1,C]. Inputs are copied to the output maintaining the order of the input arguments.

All inputs must be integers or floats, while the output will be all floating point values.

Attributes

  • inputdimensions: The size of each input in the input list

Inputs

Between 1 and 2147483647 inputs.

  • X (variadic, heterogeneous) - T1: An ordered collection of tensors, all with the same element type.

Outputs

  • Y (heterogeneous) - tensor(float): The output array, elements ordered as the inputs.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type.

OnnxAiOnnxMlImputer#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlImputer(*args, **kwargs)#

Version

  • name: Imputer (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Replaces inputs that equal one value with another, leaving all other elements alone.

This operator is typically used to replace missing values in situations where they have a canonical representation, such as -1, 0, NaN, or some extreme value.

One and only one of imputed_value_floats or imputed_value_int64s should be defined – floats if the input tensor holds floats, integers if the input tensor holds integers. The imputed values must all fit within the width of the tensor element type. One and only one of the replaced_value_float or replaced_value_int64 should be defined, which one depends on whether floats or integers are being processed.

The imputed_value attribute length can be 1 element, or it can have one element per input feature. In other words, if the input tensor has the shape [*,F], then the length of the attribute array may be 1 or F. If it is 1, then it is broadcast along the last dimension and applied to each feature.

Attributes

  • imputed_value_floats: Value(s) to change to

  • imputed_value_int64s: Value(s) to change to.

  • replaced_value_float: A value that needs replacing. Default value is 0.0.

  • replaced_value_int64: A value that needs replacing. Default value is 0.

Inputs

  • X (heterogeneous) - T: Data to be processed.

Outputs

  • Y (heterogeneous) - T: Imputed output data

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type, either [N,C] or [C]. The output type will be of the same tensor type and shape.

OnnxAiOnnxMlImputer_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlImputer_1(*args, **kwargs)#

Version

  • name: Imputer (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Replaces inputs that equal one value with another, leaving all other elements alone.

This operator is typically used to replace missing values in situations where they have a canonical representation, such as -1, 0, NaN, or some extreme value.

One and only one of imputed_value_floats or imputed_value_int64s should be defined – floats if the input tensor holds floats, integers if the input tensor holds integers. The imputed values must all fit within the width of the tensor element type. One and only one of the replaced_value_float or replaced_value_int64 should be defined, which one depends on whether floats or integers are being processed.

The imputed_value attribute length can be 1 element, or it can have one element per input feature. In other words, if the input tensor has the shape [*,F], then the length of the attribute array may be 1 or F. If it is 1, then it is broadcast along the last dimension and applied to each feature.

Attributes

  • imputed_value_floats: Value(s) to change to

  • imputed_value_int64s: Value(s) to change to.

  • replaced_value_float: A value that needs replacing. Default value is 0.0.

  • replaced_value_int64: A value that needs replacing. Default value is 0.

Inputs

  • X (heterogeneous) - T: Data to be processed.

Outputs

  • Y (heterogeneous) - T: Imputed output data

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type, either [N,C] or [C]. The output type will be of the same tensor type and shape.

OnnxAiOnnxMlLabelEncoder#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlLabelEncoder(*args, **kwargs)#

Version

  • name: LabelEncoder (GitHub)

  • domain: ai.onnx.ml

  • since_version: 2

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 2 of domain ai.onnx.ml.

Summary

Maps each element in the input tensor to another value.

The mapping is determined by the two parallel attributes, ‘keys_*’ and ‘values_*’ attribute. The i-th value in the specified ‘keys_*’ attribute would be mapped to the i-th value in the specified ‘values_*’ attribute. It implies that input’s element type and the element type of the specified ‘keys_*’ should be identical while the output type is identical to the specified ‘values_*’ attribute. If an input element can not be found in the specified ‘keys_*’ attribute, the ‘default_*’ that matches the specified ‘values_*’ attribute may be used as its output value.

Let’s consider an example which maps a string tensor to an integer tensor. Assume and ‘keys_strings’ is [“Amy”, “Sally”], ‘values_int64s’ is [5, 6], and ‘default_int64’ is ‘-1’. The input [“Dori”, “Amy”, “Amy”, “Sally”, “Sally”] would be mapped to [-1, 5, 5, 6, 6].

Since this operator is an one-to-one mapping, its input and output shapes are the same. Notice that only one of ‘keys_*’/’values_*’ can be set.

For key look-up, bit-wise comparison is used so even a float NaN can be mapped to a value in ‘values_*’ attribute.

Attributes

  • default_float: A float. Default value is -0.0.

  • default_int64: An integer. Default value is -1.

  • default_string: A string. Default value is '_Unused'.

  • keys_floats: A list of floats.

  • keys_int64s: A list of ints.

  • keys_strings: A list of strings. One and only one of ‘keys_*’s should be set.

  • values_floats: A list of floats.

  • values_int64s: A list of ints.

  • values_strings: A list of strings. One and only one of ‘value_*’s should be set.

Inputs

  • X (heterogeneous) - T1: Input data. It can be either tensor or scalar.

Outputs

  • Y (heterogeneous) - T2: Output data.

Type Constraints

  • T1 in ( tensor(float), tensor(int64), tensor(string) ): The input type is a tensor of any shape.

  • T2 in ( tensor(float), tensor(int64), tensor(string) ): Output type is determined by the specified ‘values_*’ attribute.

OnnxAiOnnxMlLabelEncoder_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlLabelEncoder_1(*args, **kwargs)#

Version

  • name: LabelEncoder (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Converts strings to integers and vice versa.

If the string default value is set, it will convert integers to strings. If the int default value is set, it will convert strings to integers.

Each operator converts either integers to strings or strings to integers, depending on which default value attribute is provided. Only one default value attribute should be defined.

When converting from integers to strings, the string is fetched from the ‘classes_strings’ list, by simple indexing.

When converting from strings to integers, the string is looked up in the list and the index at which it is found is used as the converted value.

Attributes

  • classes_strings: A list of labels.

  • default_int64: An integer to use when an input string value is not found in the map.<br>One and only one of the ‘default_*’ attributes must be defined. Default value is -1.

  • default_string: A string to use when an input integer value is not found in the map.<br>One and only one of the ‘default_*’ attributes must be defined. Default value is '_Unused'.

Inputs

  • X (heterogeneous) - T1: Input data.

Outputs

  • Y (heterogeneous) - T2: Output data. If strings are input, the output values are integers, and vice versa.

Type Constraints

  • T1 in ( tensor(int64), tensor(string) ): The input type must be a tensor of integers or strings, of any shape.

  • T2 in ( tensor(int64), tensor(string) ): The output type will be a tensor of strings or integers, and will have the same shape as the input.

OnnxAiOnnxMlLabelEncoder_2#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlLabelEncoder_2(*args, **kwargs)#

Version

  • name: LabelEncoder (GitHub)

  • domain: ai.onnx.ml

  • since_version: 2

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 2 of domain ai.onnx.ml.

Summary

Maps each element in the input tensor to another value.

The mapping is determined by the two parallel attributes, ‘keys_*’ and ‘values_*’ attribute. The i-th value in the specified ‘keys_*’ attribute would be mapped to the i-th value in the specified ‘values_*’ attribute. It implies that input’s element type and the element type of the specified ‘keys_*’ should be identical while the output type is identical to the specified ‘values_*’ attribute. If an input element can not be found in the specified ‘keys_*’ attribute, the ‘default_*’ that matches the specified ‘values_*’ attribute may be used as its output value.

Let’s consider an example which maps a string tensor to an integer tensor. Assume and ‘keys_strings’ is [“Amy”, “Sally”], ‘values_int64s’ is [5, 6], and ‘default_int64’ is ‘-1’. The input [“Dori”, “Amy”, “Amy”, “Sally”, “Sally”] would be mapped to [-1, 5, 5, 6, 6].

Since this operator is an one-to-one mapping, its input and output shapes are the same. Notice that only one of ‘keys_*’/’values_*’ can be set.

For key look-up, bit-wise comparison is used so even a float NaN can be mapped to a value in ‘values_*’ attribute.

Attributes

  • default_float: A float. Default value is -0.0.

  • default_int64: An integer. Default value is -1.

  • default_string: A string. Default value is '_Unused'.

  • keys_floats: A list of floats.

  • keys_int64s: A list of ints.

  • keys_strings: A list of strings. One and only one of ‘keys_*’s should be set.

  • values_floats: A list of floats.

  • values_int64s: A list of ints.

  • values_strings: A list of strings. One and only one of ‘value_*’s should be set.

Inputs

  • X (heterogeneous) - T1: Input data. It can be either tensor or scalar.

Outputs

  • Y (heterogeneous) - T2: Output data.

Type Constraints

  • T1 in ( tensor(float), tensor(int64), tensor(string) ): The input type is a tensor of any shape.

  • T2 in ( tensor(float), tensor(int64), tensor(string) ): Output type is determined by the specified ‘values_*’ attribute.

OnnxAiOnnxMlLinearClassifier#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlLinearClassifier(*args, **kwargs)#

Version

  • name: LinearClassifier (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Linear classifier

Attributes

  • classlabels_ints: Class labels when using integer labels. One and only one ‘classlabels’ attribute must be defined.

  • classlabels_strings: Class labels when using string labels. One and only one ‘classlabels’ attribute must be defined.

  • coefficients (required): A collection of weights of the model(s).

  • intercepts: A collection of intercepts.

  • multi_class: Indicates whether to do OvR or multinomial (0=OvR is the default). Default value is 0.

  • post_transform: Indicates the transform to apply to the scores vector.<br>One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT’ Default value is 'NONE'.

Inputs

  • X (heterogeneous) - T1: Data to be classified.

Outputs

  • Y (heterogeneous) - T2: Classification outputs (one class per example).

  • Z (heterogeneous) - tensor(float): Classification scores ([N,E] - one score for each class and example

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type, and of shape [N,C] or [C]. In the latter case, it will be treated as [1,C]

  • T2 in ( tensor(int64), tensor(string) ): The output will be a tensor of strings or integers.

OnnxAiOnnxMlLinearClassifier_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlLinearClassifier_1(*args, **kwargs)#

Version

  • name: LinearClassifier (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Linear classifier

Attributes

  • classlabels_ints: Class labels when using integer labels. One and only one ‘classlabels’ attribute must be defined.

  • classlabels_strings: Class labels when using string labels. One and only one ‘classlabels’ attribute must be defined.

  • coefficients (required): A collection of weights of the model(s).

  • intercepts: A collection of intercepts.

  • multi_class: Indicates whether to do OvR or multinomial (0=OvR is the default). Default value is 0.

  • post_transform: Indicates the transform to apply to the scores vector.<br>One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT’ Default value is 'NONE'.

Inputs

  • X (heterogeneous) - T1: Data to be classified.

Outputs

  • Y (heterogeneous) - T2: Classification outputs (one class per example).

  • Z (heterogeneous) - tensor(float): Classification scores ([N,E] - one score for each class and example

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type, and of shape [N,C] or [C]. In the latter case, it will be treated as [1,C]

  • T2 in ( tensor(int64), tensor(string) ): The output will be a tensor of strings or integers.

OnnxAiOnnxMlLinearRegressor#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlLinearRegressor(*args, **kwargs)#

Version

  • name: LinearRegressor (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Generalized linear regression evaluation.

If targets is set to 1 (default) then univariate regression is performed.

If targets is set to M then M sets of coefficients must be passed in as a sequence and M results will be output for each input n in N.

The coefficients array is of length n, and the coefficients for each target are contiguous. Intercepts are optional but if provided must match the number of targets.

Attributes

  • coefficients: Weights of the model(s).

  • intercepts: Weights of the intercepts, if used.

  • post_transform: Indicates the transform to apply to the regression output vector.<br>One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT’ Default value is 'NONE'.

  • targets: The total number of regression targets, 1 if not defined. Default value is 1.

Inputs

  • X (heterogeneous) - T: Data to be regressed.

Outputs

  • Y (heterogeneous) - tensor(float): Regression outputs (one per target, per example).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type.

OnnxAiOnnxMlLinearRegressor_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlLinearRegressor_1(*args, **kwargs)#

Version

  • name: LinearRegressor (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Generalized linear regression evaluation.

If targets is set to 1 (default) then univariate regression is performed.

If targets is set to M then M sets of coefficients must be passed in as a sequence and M results will be output for each input n in N.

The coefficients array is of length n, and the coefficients for each target are contiguous. Intercepts are optional but if provided must match the number of targets.

Attributes

  • coefficients: Weights of the model(s).

  • intercepts: Weights of the intercepts, if used.

  • post_transform: Indicates the transform to apply to the regression output vector.<br>One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT’ Default value is 'NONE'.

  • targets: The total number of regression targets, 1 if not defined. Default value is 1.

Inputs

  • X (heterogeneous) - T: Data to be regressed.

Outputs

  • Y (heterogeneous) - tensor(float): Regression outputs (one per target, per example).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type.

OnnxAiOnnxMlNormalizer#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlNormalizer(*args, **kwargs)#

Version

  • name: Normalizer (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Normalize the input. There are three normalization modes, which have the corresponding formulas, defined using element-wise infix operators ‘/’ and ‘^’ and tensor-wide functions ‘max’ and ‘sum’:

Max: Y = X / max(X)

L1: Y = X / sum(X)

L2: Y = sqrt(X^2 / sum(X^2)}

In all modes, if the divisor is zero, Y == X.

For batches, that is, [N,C] tensors, normalization is done along the C axis. In other words, each row of the batch is normalized independently.

Attributes

  • norm: One of ‘MAX,’ ‘L1,’ ‘L2’ Default value is 'MAX'.

Inputs

  • X (heterogeneous) - T: Data to be encoded, a tensor of shape [N,C] or [C]

Outputs

  • Y (heterogeneous) - tensor(float): Encoded output data

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type.

OnnxAiOnnxMlNormalizer_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlNormalizer_1(*args, **kwargs)#

Version

  • name: Normalizer (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Normalize the input. There are three normalization modes, which have the corresponding formulas, defined using element-wise infix operators ‘/’ and ‘^’ and tensor-wide functions ‘max’ and ‘sum’:

Max: Y = X / max(X)

L1: Y = X / sum(X)

L2: Y = sqrt(X^2 / sum(X^2)}

In all modes, if the divisor is zero, Y == X.

For batches, that is, [N,C] tensors, normalization is done along the C axis. In other words, each row of the batch is normalized independently.

Attributes

  • norm: One of ‘MAX,’ ‘L1,’ ‘L2’ Default value is 'MAX'.

Inputs

  • X (heterogeneous) - T: Data to be encoded, a tensor of shape [N,C] or [C]

Outputs

  • Y (heterogeneous) - tensor(float): Encoded output data

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type.

OnnxAiOnnxMlOneHotEncoder#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlOneHotEncoder(*args, **kwargs)#

Version

  • name: OneHotEncoder (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Replace each input element with an array of ones and zeros, where a single one is placed at the index of the category that was passed in. The total category count will determine the size of the extra dimension of the output array Y.

For example, if we pass a tensor with a single value of 4, and a category count of 8, the output will be a tensor with [0,0,0,0,1,0,0,0].

This operator assumes every input feature is from the same set of categories.

If the input is a tensor of float, int32, or double, the data will be cast to integers and the cats_int64s category list will be used for the lookups.

Attributes

  • cats_int64s: List of categories, ints.<br>One and only one of the ‘cats_*’ attributes must be defined.

  • cats_strings: List of categories, strings.<br>One and only one of the ‘cats_*’ attributes must be defined.

  • zeros: If true and category is not present, will return all zeros; if false and a category if not found, the operator will fail. Default value is 1.

Inputs

  • X (heterogeneous) - T: Data to be encoded.

Outputs

  • Y (heterogeneous) - tensor(float): Encoded output data, having one more dimension than X.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string) ): The input must be a tensor of a numeric type.

OnnxAiOnnxMlOneHotEncoder_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlOneHotEncoder_1(*args, **kwargs)#

Version

  • name: OneHotEncoder (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Replace each input element with an array of ones and zeros, where a single one is placed at the index of the category that was passed in. The total category count will determine the size of the extra dimension of the output array Y.

For example, if we pass a tensor with a single value of 4, and a category count of 8, the output will be a tensor with [0,0,0,0,1,0,0,0].

This operator assumes every input feature is from the same set of categories.

If the input is a tensor of float, int32, or double, the data will be cast to integers and the cats_int64s category list will be used for the lookups.

Attributes

  • cats_int64s: List of categories, ints.<br>One and only one of the ‘cats_*’ attributes must be defined.

  • cats_strings: List of categories, strings.<br>One and only one of the ‘cats_*’ attributes must be defined.

  • zeros: If true and category is not present, will return all zeros; if false and a category if not found, the operator will fail. Default value is 1.

Inputs

  • X (heterogeneous) - T: Data to be encoded.

Outputs

  • Y (heterogeneous) - tensor(float): Encoded output data, having one more dimension than X.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(string) ): The input must be a tensor of a numeric type.

OnnxAiOnnxMlSVMClassifier#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlSVMClassifier(*args, **kwargs)#

Version

  • name: SVMClassifier (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Support Vector Machine classifier

Attributes

  • classlabels_ints: Class labels if using integer labels.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • classlabels_strings: Class labels if using string labels.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • coefficients:

  • kernel_params: List of 3 elements containing gamma, coef0, and degree, in that order. Zero if unused for the kernel.

  • kernel_type: The kernel type, one of ‘LINEAR,’ ‘POLY,’ ‘RBF,’ ‘SIGMOID’. Default value is 'LINEAR'.

  • post_transform: Indicates the transform to apply to the score. <br>One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT’ Default value is 'NONE'.

  • prob_a: First set of probability coefficients.

  • prob_b: Second set of probability coefficients. This array must be same size as prob_a.<br>If these are provided then output Z are probability estimates, otherwise they are raw scores.

  • rho:

  • support_vectors:

  • vectors_per_class:

Inputs

  • X (heterogeneous) - T1: Data to be classified.

Outputs

  • Y (heterogeneous) - T2: Classification outputs (one class per example).

  • Z (heterogeneous) - tensor(float): Class scores (one per class per example), if prob_a and prob_b are provided they are probabilities for each class, otherwise they are raw scores.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type, either [C] or [N,C].

  • T2 in ( tensor(int64), tensor(string) ): The output type will be a tensor of strings or integers, depending on which of the classlabels_* attributes is used. Its size will match the bactch size of the input.

OnnxAiOnnxMlSVMClassifier_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlSVMClassifier_1(*args, **kwargs)#

Version

  • name: SVMClassifier (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Support Vector Machine classifier

Attributes

  • classlabels_ints: Class labels if using integer labels.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • classlabels_strings: Class labels if using string labels.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • coefficients:

  • kernel_params: List of 3 elements containing gamma, coef0, and degree, in that order. Zero if unused for the kernel.

  • kernel_type: The kernel type, one of ‘LINEAR,’ ‘POLY,’ ‘RBF,’ ‘SIGMOID’. Default value is 'LINEAR'.

  • post_transform: Indicates the transform to apply to the score. <br>One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT’ Default value is 'NONE'.

  • prob_a: First set of probability coefficients.

  • prob_b: Second set of probability coefficients. This array must be same size as prob_a.<br>If these are provided then output Z are probability estimates, otherwise they are raw scores.

  • rho:

  • support_vectors:

  • vectors_per_class:

Inputs

  • X (heterogeneous) - T1: Data to be classified.

Outputs

  • Y (heterogeneous) - T2: Classification outputs (one class per example).

  • Z (heterogeneous) - tensor(float): Class scores (one per class per example), if prob_a and prob_b are provided they are probabilities for each class, otherwise they are raw scores.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type, either [C] or [N,C].

  • T2 in ( tensor(int64), tensor(string) ): The output type will be a tensor of strings or integers, depending on which of the classlabels_* attributes is used. Its size will match the bactch size of the input.

OnnxAiOnnxMlSVMRegressor#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlSVMRegressor(*args, **kwargs)#

Version

  • name: SVMRegressor (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Support Vector Machine regression prediction and one-class SVM anomaly detection.

Attributes

  • coefficients: Support vector coefficients.

  • kernel_params: List of 3 elements containing gamma, coef0, and degree, in that order. Zero if unused for the kernel.

  • kernel_type: The kernel type, one of ‘LINEAR,’ ‘POLY,’ ‘RBF,’ ‘SIGMOID’. Default value is 'LINEAR'.

  • n_supports: The number of support vectors. Default value is 0.

  • one_class: Flag indicating whether the regression is a one-class SVM or not. Default value is 0.

  • post_transform: Indicates the transform to apply to the score. <br>One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT.’ Default value is 'NONE'.

  • rho:

  • support_vectors: Chosen support vectors

Inputs

  • X (heterogeneous) - T: Data to be regressed.

Outputs

  • Y (heterogeneous) - tensor(float): Regression outputs (one score per target per example).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type, either [C] or [N,C].

OnnxAiOnnxMlSVMRegressor_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlSVMRegressor_1(*args, **kwargs)#

Version

  • name: SVMRegressor (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Support Vector Machine regression prediction and one-class SVM anomaly detection.

Attributes

  • coefficients: Support vector coefficients.

  • kernel_params: List of 3 elements containing gamma, coef0, and degree, in that order. Zero if unused for the kernel.

  • kernel_type: The kernel type, one of ‘LINEAR,’ ‘POLY,’ ‘RBF,’ ‘SIGMOID’. Default value is 'LINEAR'.

  • n_supports: The number of support vectors. Default value is 0.

  • one_class: Flag indicating whether the regression is a one-class SVM or not. Default value is 0.

  • post_transform: Indicates the transform to apply to the score. <br>One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT.’ Default value is 'NONE'.

  • rho:

  • support_vectors: Chosen support vectors

Inputs

  • X (heterogeneous) - T: Data to be regressed.

Outputs

  • Y (heterogeneous) - tensor(float): Regression outputs (one score per target per example).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type, either [C] or [N,C].

OnnxAiOnnxMlScaler#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlScaler(*args, **kwargs)#

Version

  • name: Scaler (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Rescale input data, for example to standardize features by removing the mean and scaling to unit variance.

Attributes

  • offset: First, offset by this.<br>Can be length of features in an [N,F] tensor or length 1, in which case it applies to all features, regardless of dimension count.

  • scale: Second, multiply by this.<br>Can be length of features in an [N,F] tensor or length 1, in which case it applies to all features, regardless of dimension count.<br>Must be same length as ‘offset’

Inputs

  • X (heterogeneous) - T: Data to be scaled.

Outputs

  • Y (heterogeneous) - tensor(float): Scaled output data.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type.

OnnxAiOnnxMlScaler_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlScaler_1(*args, **kwargs)#

Version

  • name: Scaler (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Rescale input data, for example to standardize features by removing the mean and scaling to unit variance.

Attributes

  • offset: First, offset by this.<br>Can be length of features in an [N,F] tensor or length 1, in which case it applies to all features, regardless of dimension count.

  • scale: Second, multiply by this.<br>Can be length of features in an [N,F] tensor or length 1, in which case it applies to all features, regardless of dimension count.<br>Must be same length as ‘offset’

Inputs

  • X (heterogeneous) - T: Data to be scaled.

Outputs

  • Y (heterogeneous) - tensor(float): Scaled output data.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input must be a tensor of a numeric type.

OnnxAiOnnxMlTreeEnsembleClassifier#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlTreeEnsembleClassifier(*args, **kwargs)#

Version

This version of the operator has been available since version 3 of domain ai.onnx.ml.

Summary

Tree Ensemble classifier. Returns the top class for each of N inputs.

The attributes named ‘nodes_X’ form a sequence of tuples, associated by index into the sequences, which must all be of equal length. These tuples define the nodes.

Similarly, all fields prefixed with ‘class_’ are tuples of votes at the leaves. A leaf may have multiple votes, where each vote is weighted by the associated class_weights index.

One and only one of classlabels_strings or classlabels_int64s will be defined. The class_ids are indices into this list. All fields ending with <i>_as_tensor</i> can be used instead of the same parameter without the suffix if the element type is double and not float.

Attributes

  • base_values: Base values for classification, added to final class score; the size must be the same as the classes or can be left unassigned (assumed 0)

  • base_values_as_tensor: Base values for classification, added to final class score; the size must be the same as the classes or can be left unassigned (assumed 0)

  • class_ids: The index of the class list that each weight is for.

  • class_nodeids: node id that this weight is for.

  • class_treeids: The id of the tree that this node is in.

  • class_weights: The weight for the class in class_id.

  • class_weights_as_tensor: The weight for the class in class_id.

  • classlabels_int64s: Class labels if using integer labels.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • classlabels_strings: Class labels if using string labels.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • nodes_falsenodeids: Child node if expression is false.

  • nodes_featureids: Feature id for each node.

  • nodes_hitrates: Popularity of each node, used for performance and may be omitted.

  • nodes_hitrates_as_tensor: Popularity of each node, used for performance and may be omitted.

  • nodes_missing_value_tracks_true: For each node, define what to do in the presence of a missing value: if a value is missing (NaN), use the ‘true’ or ‘false’ branch based on the value in this array.<br>This attribute may be left undefined, and the defalt value is false (0) for all nodes.

  • nodes_modes: The node kind, that is, the comparison to make at the node. There is no comparison to make at a leaf node.<br>One of ‘BRANCH_LEQ’, ‘BRANCH_LT’, ‘BRANCH_GTE’, ‘BRANCH_GT’, ‘BRANCH_EQ’, ‘BRANCH_NEQ’, ‘LEAF’

  • nodes_nodeids: Node id for each node. Ids may restart at zero for each tree, but it not required to.

  • nodes_treeids: Tree id for each node.

  • nodes_truenodeids: Child node if expression is true.

  • nodes_values: Thresholds to do the splitting on for each node.

  • nodes_values_as_tensor: Thresholds to do the splitting on for each node.

  • post_transform: Indicates the transform to apply to the score. <br> One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT.’ Default value is 'NONE'.

Inputs

  • X (heterogeneous) - T1: Input of shape [N,F]

Outputs

  • Y (heterogeneous) - T2: N, Top class for each point

  • Z (heterogeneous) - tensor(float): The class score for each class, for each point, a tensor of shape [N,E].

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type.

  • T2 in ( tensor(int64), tensor(string) ): The output type will be a tensor of strings or integers, depending on which of the classlabels_* attributes is used.

OnnxAiOnnxMlTreeEnsembleClassifier_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlTreeEnsembleClassifier_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Tree Ensemble classifier. Returns the top class for each of N inputs.

The attributes named ‘nodes_X’ form a sequence of tuples, associated by index into the sequences, which must all be of equal length. These tuples define the nodes.

Similarly, all fields prefixed with ‘class_’ are tuples of votes at the leaves. A leaf may have multiple votes, where each vote is weighted by the associated class_weights index.

One and only one of classlabels_strings or classlabels_int64s will be defined. The class_ids are indices into this list.

Attributes

  • base_values: Base values for classification, added to final class score; the size must be the same as the classes or can be left unassigned (assumed 0)

  • class_ids: The index of the class list that each weight is for.

  • class_nodeids: node id that this weight is for.

  • class_treeids: The id of the tree that this node is in.

  • class_weights: The weight for the class in class_id.

  • classlabels_int64s: Class labels if using integer labels.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • classlabels_strings: Class labels if using string labels.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • nodes_falsenodeids: Child node if expression is false.

  • nodes_featureids: Feature id for each node.

  • nodes_hitrates: Popularity of each node, used for performance and may be omitted.

  • nodes_missing_value_tracks_true: For each node, define what to do in the presence of a missing value: if a value is missing (NaN), use the ‘true’ or ‘false’ branch based on the value in this array.<br>This attribute may be left undefined, and the defalt value is false (0) for all nodes.

  • nodes_modes: The node kind, that is, the comparison to make at the node. There is no comparison to make at a leaf node.<br>One of ‘BRANCH_LEQ’, ‘BRANCH_LT’, ‘BRANCH_GTE’, ‘BRANCH_GT’, ‘BRANCH_EQ’, ‘BRANCH_NEQ’, ‘LEAF’

  • nodes_nodeids: Node id for each node. Ids may restart at zero for each tree, but it not required to.

  • nodes_treeids: Tree id for each node.

  • nodes_truenodeids: Child node if expression is true.

  • nodes_values: Thresholds to do the splitting on for each node.

  • post_transform: Indicates the transform to apply to the score. <br> One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT.’ Default value is 'NONE'.

Inputs

  • X (heterogeneous) - T1: Input of shape [N,F]

Outputs

  • Y (heterogeneous) - T2: N, Top class for each point

  • Z (heterogeneous) - tensor(float): The class score for each class, for each point, a tensor of shape [N,E].

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type.

  • T2 in ( tensor(int64), tensor(string) ): The output type will be a tensor of strings or integers, depending on which of the classlabels_* attributes is used.

OnnxAiOnnxMlTreeEnsembleClassifier_3#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlTreeEnsembleClassifier_3(*args, **kwargs)#

Version

This version of the operator has been available since version 3 of domain ai.onnx.ml.

Summary

Tree Ensemble classifier. Returns the top class for each of N inputs.

The attributes named ‘nodes_X’ form a sequence of tuples, associated by index into the sequences, which must all be of equal length. These tuples define the nodes.

Similarly, all fields prefixed with ‘class_’ are tuples of votes at the leaves. A leaf may have multiple votes, where each vote is weighted by the associated class_weights index.

One and only one of classlabels_strings or classlabels_int64s will be defined. The class_ids are indices into this list. All fields ending with <i>_as_tensor</i> can be used instead of the same parameter without the suffix if the element type is double and not float.

Attributes

  • base_values: Base values for classification, added to final class score; the size must be the same as the classes or can be left unassigned (assumed 0)

  • base_values_as_tensor: Base values for classification, added to final class score; the size must be the same as the classes or can be left unassigned (assumed 0)

  • class_ids: The index of the class list that each weight is for.

  • class_nodeids: node id that this weight is for.

  • class_treeids: The id of the tree that this node is in.

  • class_weights: The weight for the class in class_id.

  • class_weights_as_tensor: The weight for the class in class_id.

  • classlabels_int64s: Class labels if using integer labels.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • classlabels_strings: Class labels if using string labels.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • nodes_falsenodeids: Child node if expression is false.

  • nodes_featureids: Feature id for each node.

  • nodes_hitrates: Popularity of each node, used for performance and may be omitted.

  • nodes_hitrates_as_tensor: Popularity of each node, used for performance and may be omitted.

  • nodes_missing_value_tracks_true: For each node, define what to do in the presence of a missing value: if a value is missing (NaN), use the ‘true’ or ‘false’ branch based on the value in this array.<br>This attribute may be left undefined, and the defalt value is false (0) for all nodes.

  • nodes_modes: The node kind, that is, the comparison to make at the node. There is no comparison to make at a leaf node.<br>One of ‘BRANCH_LEQ’, ‘BRANCH_LT’, ‘BRANCH_GTE’, ‘BRANCH_GT’, ‘BRANCH_EQ’, ‘BRANCH_NEQ’, ‘LEAF’

  • nodes_nodeids: Node id for each node. Ids may restart at zero for each tree, but it not required to.

  • nodes_treeids: Tree id for each node.

  • nodes_truenodeids: Child node if expression is true.

  • nodes_values: Thresholds to do the splitting on for each node.

  • nodes_values_as_tensor: Thresholds to do the splitting on for each node.

  • post_transform: Indicates the transform to apply to the score. <br> One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT.’ Default value is 'NONE'.

Inputs

  • X (heterogeneous) - T1: Input of shape [N,F]

Outputs

  • Y (heterogeneous) - T2: N, Top class for each point

  • Z (heterogeneous) - tensor(float): The class score for each class, for each point, a tensor of shape [N,E].

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type.

  • T2 in ( tensor(int64), tensor(string) ): The output type will be a tensor of strings or integers, depending on which of the classlabels_* attributes is used.

OnnxAiOnnxMlTreeEnsembleRegressor#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlTreeEnsembleRegressor(*args, **kwargs)#

Version

This version of the operator has been available since version 3 of domain ai.onnx.ml.

Summary

Tree Ensemble regressor. Returns the regressed values for each input in N.

All args with nodes_ are fields of a tuple of tree nodes, and it is assumed they are the same length, and an index i will decode the tuple across these inputs. Each node id can appear only once for each tree id.

All fields prefixed with target_ are tuples of votes at the leaves.

A leaf may have multiple votes, where each vote is weighted by the associated target_weights index.

All fields ending with <i>_as_tensor</i> can be used instead of the same parameter without the suffix if the element type is double and not float. All trees must have their node ids start at 0 and increment by 1.

Mode enum is BRANCH_LEQ, BRANCH_LT, BRANCH_GTE, BRANCH_GT, BRANCH_EQ, BRANCH_NEQ, LEAF

Attributes

  • aggregate_function: Defines how to aggregate leaf values within a target. <br>One of ‘AVERAGE,’ ‘SUM,’ ‘MIN,’ ‘MAX.’ Default value is 'SUM'.

  • base_values: Base values for classification, added to final class score; the size must be the same as the classes or can be left unassigned (assumed 0)

  • base_values_as_tensor: Base values for classification, added to final class score; the size must be the same as the classes or can be left unassigned (assumed 0)

  • n_targets: The total number of targets.

  • nodes_falsenodeids: Child node if expression is false

  • nodes_featureids: Feature id for each node.

  • nodes_hitrates: Popularity of each node, used for performance and may be omitted.

  • nodes_hitrates_as_tensor: Popularity of each node, used for performance and may be omitted.

  • nodes_missing_value_tracks_true: For each node, define what to do in the presence of a NaN: use the ‘true’ (if the attribute value is 1) or ‘false’ (if the attribute value is 0) branch based on the value in this array.<br>This attribute may be left undefined and the defalt value is false (0) for all nodes.

  • nodes_modes: The node kind, that is, the comparison to make at the node. There is no comparison to make at a leaf node.<br>One of ‘BRANCH_LEQ’, ‘BRANCH_LT’, ‘BRANCH_GTE’, ‘BRANCH_GT’, ‘BRANCH_EQ’, ‘BRANCH_NEQ’, ‘LEAF’

  • nodes_nodeids: Node id for each node. Node ids must restart at zero for each tree and increase sequentially.

  • nodes_treeids: Tree id for each node.

  • nodes_truenodeids: Child node if expression is true

  • nodes_values: Thresholds to do the splitting on for each node.

  • nodes_values_as_tensor: Thresholds to do the splitting on for each node.

  • post_transform: Indicates the transform to apply to the score. <br>One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT’ Default value is 'NONE'.

  • target_ids: The index of the target that each weight is for

  • target_nodeids: The node id of each weight

  • target_treeids: The id of the tree that each node is in.

  • target_weights: The weight for each target

  • target_weights_as_tensor: The weight for each target

Inputs

  • X (heterogeneous) - T: Input of shape [N,F]

Outputs

  • Y (heterogeneous) - tensor(float): N classes

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type.

OnnxAiOnnxMlTreeEnsembleRegressor_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlTreeEnsembleRegressor_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Tree Ensemble regressor. Returns the regressed values for each input in N.

All args with nodes_ are fields of a tuple of tree nodes, and it is assumed they are the same length, and an index i will decode the tuple across these inputs. Each node id can appear only once for each tree id.

All fields prefixed with target_ are tuples of votes at the leaves.

A leaf may have multiple votes, where each vote is weighted by the associated target_weights index.

All trees must have their node ids start at 0 and increment by 1.

Mode enum is BRANCH_LEQ, BRANCH_LT, BRANCH_GTE, BRANCH_GT, BRANCH_EQ, BRANCH_NEQ, LEAF

Attributes

  • aggregate_function: Defines how to aggregate leaf values within a target. <br>One of ‘AVERAGE,’ ‘SUM,’ ‘MIN,’ ‘MAX.’ Default value is 'SUM'.

  • base_values: Base values for classification, added to final class score; the size must be the same as the classes or can be left unassigned (assumed 0)

  • n_targets: The total number of targets.

  • nodes_falsenodeids: Child node if expression is false

  • nodes_featureids: Feature id for each node.

  • nodes_hitrates: Popularity of each node, used for performance and may be omitted.

  • nodes_missing_value_tracks_true: For each node, define what to do in the presence of a NaN: use the ‘true’ (if the attribute value is 1) or ‘false’ (if the attribute value is 0) branch based on the value in this array.<br>This attribute may be left undefined and the defalt value is false (0) for all nodes.

  • nodes_modes: The node kind, that is, the comparison to make at the node. There is no comparison to make at a leaf node.<br>One of ‘BRANCH_LEQ’, ‘BRANCH_LT’, ‘BRANCH_GTE’, ‘BRANCH_GT’, ‘BRANCH_EQ’, ‘BRANCH_NEQ’, ‘LEAF’

  • nodes_nodeids: Node id for each node. Node ids must restart at zero for each tree and increase sequentially.

  • nodes_treeids: Tree id for each node.

  • nodes_truenodeids: Child node if expression is true

  • nodes_values: Thresholds to do the splitting on for each node.

  • post_transform: Indicates the transform to apply to the score. <br>One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT’ Default value is 'NONE'.

  • target_ids: The index of the target that each weight is for

  • target_nodeids: The node id of each weight

  • target_treeids: The id of the tree that each node is in.

  • target_weights: The weight for each target

Inputs

  • X (heterogeneous) - T: Input of shape [N,F]

Outputs

  • Y (heterogeneous) - tensor(float): N classes

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type.

OnnxAiOnnxMlTreeEnsembleRegressor_3#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlTreeEnsembleRegressor_3(*args, **kwargs)#

Version

This version of the operator has been available since version 3 of domain ai.onnx.ml.

Summary

Tree Ensemble regressor. Returns the regressed values for each input in N.

All args with nodes_ are fields of a tuple of tree nodes, and it is assumed they are the same length, and an index i will decode the tuple across these inputs. Each node id can appear only once for each tree id.

All fields prefixed with target_ are tuples of votes at the leaves.

A leaf may have multiple votes, where each vote is weighted by the associated target_weights index.

All fields ending with <i>_as_tensor</i> can be used instead of the same parameter without the suffix if the element type is double and not float. All trees must have their node ids start at 0 and increment by 1.

Mode enum is BRANCH_LEQ, BRANCH_LT, BRANCH_GTE, BRANCH_GT, BRANCH_EQ, BRANCH_NEQ, LEAF

Attributes

  • aggregate_function: Defines how to aggregate leaf values within a target. <br>One of ‘AVERAGE,’ ‘SUM,’ ‘MIN,’ ‘MAX.’ Default value is 'SUM'.

  • base_values: Base values for classification, added to final class score; the size must be the same as the classes or can be left unassigned (assumed 0)

  • base_values_as_tensor: Base values for classification, added to final class score; the size must be the same as the classes or can be left unassigned (assumed 0)

  • n_targets: The total number of targets.

  • nodes_falsenodeids: Child node if expression is false

  • nodes_featureids: Feature id for each node.

  • nodes_hitrates: Popularity of each node, used for performance and may be omitted.

  • nodes_hitrates_as_tensor: Popularity of each node, used for performance and may be omitted.

  • nodes_missing_value_tracks_true: For each node, define what to do in the presence of a NaN: use the ‘true’ (if the attribute value is 1) or ‘false’ (if the attribute value is 0) branch based on the value in this array.<br>This attribute may be left undefined and the defalt value is false (0) for all nodes.

  • nodes_modes: The node kind, that is, the comparison to make at the node. There is no comparison to make at a leaf node.<br>One of ‘BRANCH_LEQ’, ‘BRANCH_LT’, ‘BRANCH_GTE’, ‘BRANCH_GT’, ‘BRANCH_EQ’, ‘BRANCH_NEQ’, ‘LEAF’

  • nodes_nodeids: Node id for each node. Node ids must restart at zero for each tree and increase sequentially.

  • nodes_treeids: Tree id for each node.

  • nodes_truenodeids: Child node if expression is true

  • nodes_values: Thresholds to do the splitting on for each node.

  • nodes_values_as_tensor: Thresholds to do the splitting on for each node.

  • post_transform: Indicates the transform to apply to the score. <br>One of ‘NONE,’ ‘SOFTMAX,’ ‘LOGISTIC,’ ‘SOFTMAX_ZERO,’ or ‘PROBIT’ Default value is 'NONE'.

  • target_ids: The index of the target that each weight is for

  • target_nodeids: The node id of each weight

  • target_treeids: The id of the tree that each node is in.

  • target_weights: The weight for each target

  • target_weights_as_tensor: The weight for each target

Inputs

  • X (heterogeneous) - T: Input of shape [N,F]

Outputs

  • Y (heterogeneous) - tensor(float): N classes

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64) ): The input type must be a tensor of a numeric type.

OnnxAiOnnxMlZipMap#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlZipMap(*args, **kwargs)#

Version

  • name: ZipMap (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Creates a map from the input and the attributes.

The values are provided by the input tensor, while the keys are specified by the attributes. Must provide keys in either classlabels_strings or classlabels_int64s (but not both).

The columns of the tensor correspond one-by-one to the keys specified by the attributes. There must be as many columns as keys.

Attributes

  • classlabels_int64s: The keys when using int keys.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • classlabels_strings: The keys when using string keys.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

Inputs

  • X (heterogeneous) - tensor(float): The input values

Outputs

  • Z (heterogeneous) - T: The output map

Type Constraints

  • T in ( seq(map(int64, float)), seq(map(string, float)) ): The output will be a sequence of string or integer maps to float.

OnnxAiOnnxMlZipMap_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxMlZipMap_1(*args, **kwargs)#

Version

  • name: ZipMap (GitHub)

  • domain: ai.onnx.ml

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.ml.

Summary

Creates a map from the input and the attributes.

The values are provided by the input tensor, while the keys are specified by the attributes. Must provide keys in either classlabels_strings or classlabels_int64s (but not both).

The columns of the tensor correspond one-by-one to the keys specified by the attributes. There must be as many columns as keys.

Attributes

  • classlabels_int64s: The keys when using int keys.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

  • classlabels_strings: The keys when using string keys.<br>One and only one of the ‘classlabels_*’ attributes must be defined.

Inputs

  • X (heterogeneous) - tensor(float): The input values

Outputs

  • Z (heterogeneous) - T: The output map

Type Constraints

  • T in ( seq(map(int64, float)), seq(map(string, float)) ): The output will be a sequence of string or integer maps to float.

OnnxAiOnnxPreviewTrainingAdagrad#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxPreviewTrainingAdagrad(*args, **kwargs)#

Version

  • name: Adagrad (GitHub)

  • domain: ai.onnx.preview.training

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.preview.training.

Summary

Compute one iteration of ADAGRAD, a stochastic gradient based optimization algorithm. This operator can conduct the optimization of multiple tensor variables.

Let’s define the behavior of this operator. As you can imagine, ADAGRAD requires some parameters:

  • The initial learning-rate “R”.

  • The update count “T”. That is, the number of training iterations conducted.

  • A L2-norm regularization coefficient “norm_coefficient”.

  • A learning-rate decay factor “decay_factor”.

  • A small constant “epsilon” to avoid dividing-by-zero.

At each ADAGRAD iteration, the optimized tensors are moved along a direction computed based on their estimated gradient and accumulated squared gradient. Assume that only a single tensor “X” is updated by this operator. We need the value of “X”, its gradient “G”, and its accumulated squared gradient “H”. Therefore, variables in this operator’s input list are sequentially “R”, “T”, “X”, “G”, and “H”. Other parameters are given as attributes because they are usually constants. Also, the corresponding output tensors are the new value of “X” (called “X_new”), and then the new accumulated squared gradient (called “H_new”). Those outputs are computed from the given inputs following the pseudo code below.

Let “+”, “-”, “*”, and “/” are all element-wise arithmetic operations with numpy-style broadcasting support. The pseudo code to compute those outputs is:

// Compute a scalar learning-rate factor. At the first update of X, T is generally // 0 (0-based update index) or 1 (1-based update index). r = R / (1 + T * decay_factor);

// Add gradient of 0.5 * norm_coefficient * ||X||_2^2, where ||X||_2 is the 2-norm. G_regularized = norm_coefficient * X + G;

// Compute new accumulated squared gradient. H_new = H + G_regularized * G_regularized;

// Compute the adaptive part of per-coordinate learning rate. Note that Sqrt(…) // computes element-wise square-root. H_adaptive = Sqrt(H_new) + epsilon

// Compute the new value of “X”. X_new = X - r * G_regularized / H_adaptive;

If one assign this operators to optimize multiple inputs, for example, “X_1” and “X_2”, the same pseudo code may be extended to handle all tensors jointly. More specifically, we can view “X” as a concatenation of “X_1” and “X_2” (of course, their gradient and accumulate gradient should be concatenated too) and then just reuse the entire pseudo code.

Note that ADAGRAD was first proposed in http://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf. In that reference paper, this operator is a special case of the Figure 1’s composite mirror descent update.

Attributes

  • decay_factor: The decay factor of learning rate after one update.The effective learning rate is computed by r = R / (1 + T * decay_factor). Default to 0 so that increasing update counts doesn’t reduce the learning rate. Default value is 0.0.

  • epsilon: Small scalar to avoid dividing by zero. Default value is 9.999999974752427e-07.

  • norm_coefficient: Regularization coefficient in 0.5 * norm_coefficient * ||X||_2^2. Default to 0, which means no regularization. Default value is 0.0.

Inputs

Between 3 and 2147483647 inputs.

  • R (heterogeneous) - T1: The initial learning rate.

  • T (heterogeneous) - T2: The update count of “X”. It should be a scalar.

  • inputs (variadic) - T3: The current values of optimized tensors, followed by their respective gradients, followed by their respective accumulated squared gradients.For example, if two tensor “X_1” and “X_2” are optimized, The input list would be [“X_1”, “X_2”, gradient of “X_1”, gradient of “X_2”, accumulated squared gradient of “X_1”, accumulated squared gradient of “X_2”].

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T3: Updated values of optimized tensors, followed by their updated values of accumulated squared gradients. For example, if two tensor “X_1” and “X_2” are optimized, the output list would be [new value of “X_1,” new value of “X_2” new accumulated squared gradient of “X_1”, new accumulated squared gradient of “X_2”].

Type Constraints

  • T1 in ( tensor(double), tensor(float) ): Constrain input types to float scalars.

  • T2 in ( tensor(int64) ): Constrain input types to 64-bit integer scalars.

  • T3 in ( tensor(double), tensor(float) ): Constrain input and output types to float tensors.

OnnxAiOnnxPreviewTrainingAdagrad_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxPreviewTrainingAdagrad_1(*args, **kwargs)#

Version

  • name: Adagrad (GitHub)

  • domain: ai.onnx.preview.training

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.preview.training.

Summary

Compute one iteration of ADAGRAD, a stochastic gradient based optimization algorithm. This operator can conduct the optimization of multiple tensor variables.

Let’s define the behavior of this operator. As you can imagine, ADAGRAD requires some parameters:

  • The initial learning-rate “R”.

  • The update count “T”. That is, the number of training iterations conducted.

  • A L2-norm regularization coefficient “norm_coefficient”.

  • A learning-rate decay factor “decay_factor”.

  • A small constant “epsilon” to avoid dividing-by-zero.

At each ADAGRAD iteration, the optimized tensors are moved along a direction computed based on their estimated gradient and accumulated squared gradient. Assume that only a single tensor “X” is updated by this operator. We need the value of “X”, its gradient “G”, and its accumulated squared gradient “H”. Therefore, variables in this operator’s input list are sequentially “R”, “T”, “X”, “G”, and “H”. Other parameters are given as attributes because they are usually constants. Also, the corresponding output tensors are the new value of “X” (called “X_new”), and then the new accumulated squared gradient (called “H_new”). Those outputs are computed from the given inputs following the pseudo code below.

Let “+”, “-”, “*”, and “/” are all element-wise arithmetic operations with numpy-style broadcasting support. The pseudo code to compute those outputs is:

// Compute a scalar learning-rate factor. At the first update of X, T is generally // 0 (0-based update index) or 1 (1-based update index). r = R / (1 + T * decay_factor);

// Add gradient of 0.5 * norm_coefficient * ||X||_2^2, where ||X||_2 is the 2-norm. G_regularized = norm_coefficient * X + G;

// Compute new accumulated squared gradient. H_new = H + G_regularized * G_regularized;

// Compute the adaptive part of per-coordinate learning rate. Note that Sqrt(…) // computes element-wise square-root. H_adaptive = Sqrt(H_new) + epsilon

// Compute the new value of “X”. X_new = X - r * G_regularized / H_adaptive;

If one assign this operators to optimize multiple inputs, for example, “X_1” and “X_2”, the same pseudo code may be extended to handle all tensors jointly. More specifically, we can view “X” as a concatenation of “X_1” and “X_2” (of course, their gradient and accumulate gradient should be concatenated too) and then just reuse the entire pseudo code.

Note that ADAGRAD was first proposed in http://jmlr.org/papers/volume12/duchi11a/duchi11a.pdf. In that reference paper, this operator is a special case of the Figure 1’s composite mirror descent update.

Attributes

  • decay_factor: The decay factor of learning rate after one update.The effective learning rate is computed by r = R / (1 + T * decay_factor). Default to 0 so that increasing update counts doesn’t reduce the learning rate. Default value is 0.0.

  • epsilon: Small scalar to avoid dividing by zero. Default value is 9.999999974752427e-07.

  • norm_coefficient: Regularization coefficient in 0.5 * norm_coefficient * ||X||_2^2. Default to 0, which means no regularization. Default value is 0.0.

Inputs

Between 3 and 2147483647 inputs.

  • R (heterogeneous) - T1: The initial learning rate.

  • T (heterogeneous) - T2: The update count of “X”. It should be a scalar.

  • inputs (variadic) - T3: The current values of optimized tensors, followed by their respective gradients, followed by their respective accumulated squared gradients.For example, if two tensor “X_1” and “X_2” are optimized, The input list would be [“X_1”, “X_2”, gradient of “X_1”, gradient of “X_2”, accumulated squared gradient of “X_1”, accumulated squared gradient of “X_2”].

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T3: Updated values of optimized tensors, followed by their updated values of accumulated squared gradients. For example, if two tensor “X_1” and “X_2” are optimized, the output list would be [new value of “X_1,” new value of “X_2” new accumulated squared gradient of “X_1”, new accumulated squared gradient of “X_2”].

Type Constraints

  • T1 in ( tensor(double), tensor(float) ): Constrain input types to float scalars.

  • T2 in ( tensor(int64) ): Constrain input types to 64-bit integer scalars.

  • T3 in ( tensor(double), tensor(float) ): Constrain input and output types to float tensors.

OnnxAiOnnxPreviewTrainingAdam#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxPreviewTrainingAdam(*args, **kwargs)#

Version

  • name: Adam (GitHub)

  • domain: ai.onnx.preview.training

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.preview.training.

Summary

Compute one iteration of Adam, a stochastic gradient based optimization algorithm. This operator can conduct the optimization of multiple tensor variables.

Let’s define the behavior of this operator. First of all, Adam requires some parameters:

  • The learning-rate “R”.

  • The update count “T”. That is, the number of training iterations conducted.

  • A L2-norm regularization coefficient “norm_coefficient”.

  • A small constant “epsilon” to avoid dividing-by-zero.

  • Two coefficients, “alpha” and “beta”.

At each Adam iteration, the optimized tensors are moved along a direction computed based on their exponentially-averaged historical gradient and exponentially-averaged historical squared gradient. Assume that only a tensor “X” is being optimized. The rest of required information is

  • the value of “X”,

  • “X“‘s gradient (denoted by “G”),

  • “X“‘s exponentially-averaged historical gradient (denoted by “V”), and

  • “X“‘s exponentially-averaged historical squared gradient (denoted by “H”).

Some of those parameters are passed into this operator as input tensors and others are stored as this operator’s attributes. Specifically, this operator’s input tensor list is [“R”, “T”, “X”, “G”, “V”, “H”]. That is, “R” is the first input, “T” is the second input, and so on. Other parameters are given as attributes because they are constants. Moreover, the corresponding output tensors are

  • the new value of “X” (called “X_new”),

  • the new exponentially-averaged historical gradient (denoted by “V_new”), and

  • the new exponentially-averaged historical squared gradient (denoted by “H_new”).

Those outputs are computed following the pseudo code below.

Let “+”, “-”, “*”, and “/” are all element-wise arithmetic operations with numpy-style broadcasting support. The pseudo code to compute those outputs is:

// Add gradient of 0.5 * norm_coefficient * ||X||_2^2, where ||X||_2 is the 2-norm. G_regularized = norm_coefficient * X + G

// Update exponentially-averaged historical gradient. V_new = alpha * V + (1 - alpha) * G_regularized

// Update exponentially-averaged historical squared gradient. H_new = beta * H + (1 - beta) * G_regularized * G_regularized

// Compute the element-wise square-root of H_new. V_new will be element-wisely // divided by H_sqrt for a better update direction. H_sqrt = Sqrt(H_new) + epsilon

// Compute learning-rate. Note that “alpha**T”/”beta**T” is alpha’s/beta’s T-th power. R_adjusted = T > 0 ? R * Sqrt(1 - beta**T) / (1 - alpha**T) : R

// Compute new value of “X”. X_new = X - R_adjusted * V_new / H_sqrt

// Post-update regularization. X_final = (1 - norm_coefficient_post) * X_new

If there are multiple inputs to be optimized, the pseudo code will be applied independently to each of them.

Attributes

  • alpha: Coefficient of previously accumulated gradient in running average. Default to 0.9. Default value is 0.8999999761581421.

  • beta: Coefficient of previously accumulated squared-gradient in running average. Default to 0.999. Default value is 0.9990000128746033.

  • epsilon: Small scalar to avoid dividing by zero. Default value is 9.999999974752427e-07.

  • norm_coefficient: Regularization coefficient of 0.5 * norm_coefficient * ||X||_2^2. Default to 0, which means no regularization. Default value is 0.0.

  • norm_coefficient_post: Regularization coefficient of 0.5 * norm_coefficient * ||X||_2^2. Default to 0, which means no regularization. Default value is 0.0.

Inputs

Between 3 and 2147483647 inputs.

  • R (heterogeneous) - T1: The initial learning rate.

  • T (heterogeneous) - T2: The update count of “X”. It should be a scalar.

  • inputs (variadic) - T3: The tensors to be optimized, followed by their respective gradients, followed by their respective accumulated gradients (aka momentum), followed by their respective accumulated squared gradients. For example, to optimize tensors “X_1” and “X_2,”, the input list would be [“X_1”, “X_2”, gradient of “X_1”, gradient of “X_2”, accumulated gradient of “X_1”, accumulated gradient of “X_2”, accumulated squared gradient of “X_1”, accumulated squared gradient of “X_2”].

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T3: New values of optimized tensors, followed by their respective new accumulated gradients, followed by their respective new accumulated squared gradients. For example, if two tensors “X_1” and “X_2” are optimized, the outputs list would be [new value of “X_1”, new value of “X_2”, new accumulated gradient of “X_1”, new accumulated gradient of “X_2”, new accumulated squared gradient of “X_1”, new accumulated squared gradient of “X_2”].

Type Constraints

  • T1 in ( tensor(double), tensor(float) ): Constrain input types to float scalars.

  • T2 in ( tensor(int64) ): Constrain input types to 64-bit integer scalars.

  • T3 in ( tensor(double), tensor(float) ): Constrain input and output types to float tensors.

OnnxAiOnnxPreviewTrainingAdam_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxPreviewTrainingAdam_1(*args, **kwargs)#

Version

  • name: Adam (GitHub)

  • domain: ai.onnx.preview.training

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.preview.training.

Summary

Compute one iteration of Adam, a stochastic gradient based optimization algorithm. This operator can conduct the optimization of multiple tensor variables.

Let’s define the behavior of this operator. First of all, Adam requires some parameters:

  • The learning-rate “R”.

  • The update count “T”. That is, the number of training iterations conducted.

  • A L2-norm regularization coefficient “norm_coefficient”.

  • A small constant “epsilon” to avoid dividing-by-zero.

  • Two coefficients, “alpha” and “beta”.

At each Adam iteration, the optimized tensors are moved along a direction computed based on their exponentially-averaged historical gradient and exponentially-averaged historical squared gradient. Assume that only a tensor “X” is being optimized. The rest of required information is

  • the value of “X”,

  • “X“‘s gradient (denoted by “G”),

  • “X“‘s exponentially-averaged historical gradient (denoted by “V”), and

  • “X“‘s exponentially-averaged historical squared gradient (denoted by “H”).

Some of those parameters are passed into this operator as input tensors and others are stored as this operator’s attributes. Specifically, this operator’s input tensor list is [“R”, “T”, “X”, “G”, “V”, “H”]. That is, “R” is the first input, “T” is the second input, and so on. Other parameters are given as attributes because they are constants. Moreover, the corresponding output tensors are

  • the new value of “X” (called “X_new”),

  • the new exponentially-averaged historical gradient (denoted by “V_new”), and

  • the new exponentially-averaged historical squared gradient (denoted by “H_new”).

Those outputs are computed following the pseudo code below.

Let “+”, “-”, “*”, and “/” are all element-wise arithmetic operations with numpy-style broadcasting support. The pseudo code to compute those outputs is:

// Add gradient of 0.5 * norm_coefficient * ||X||_2^2, where ||X||_2 is the 2-norm. G_regularized = norm_coefficient * X + G

// Update exponentially-averaged historical gradient. V_new = alpha * V + (1 - alpha) * G_regularized

// Update exponentially-averaged historical squared gradient. H_new = beta * H + (1 - beta) * G_regularized * G_regularized

// Compute the element-wise square-root of H_new. V_new will be element-wisely // divided by H_sqrt for a better update direction. H_sqrt = Sqrt(H_new) + epsilon

// Compute learning-rate. Note that “alpha**T”/”beta**T” is alpha’s/beta’s T-th power. R_adjusted = T > 0 ? R * Sqrt(1 - beta**T) / (1 - alpha**T) : R

// Compute new value of “X”. X_new = X - R_adjusted * V_new / H_sqrt

// Post-update regularization. X_final = (1 - norm_coefficient_post) * X_new

If there are multiple inputs to be optimized, the pseudo code will be applied independently to each of them.

Attributes

  • alpha: Coefficient of previously accumulated gradient in running average. Default to 0.9. Default value is 0.8999999761581421.

  • beta: Coefficient of previously accumulated squared-gradient in running average. Default to 0.999. Default value is 0.9990000128746033.

  • epsilon: Small scalar to avoid dividing by zero. Default value is 9.999999974752427e-07.

  • norm_coefficient: Regularization coefficient of 0.5 * norm_coefficient * ||X||_2^2. Default to 0, which means no regularization. Default value is 0.0.

  • norm_coefficient_post: Regularization coefficient of 0.5 * norm_coefficient * ||X||_2^2. Default to 0, which means no regularization. Default value is 0.0.

Inputs

Between 3 and 2147483647 inputs.

  • R (heterogeneous) - T1: The initial learning rate.

  • T (heterogeneous) - T2: The update count of “X”. It should be a scalar.

  • inputs (variadic) - T3: The tensors to be optimized, followed by their respective gradients, followed by their respective accumulated gradients (aka momentum), followed by their respective accumulated squared gradients. For example, to optimize tensors “X_1” and “X_2,”, the input list would be [“X_1”, “X_2”, gradient of “X_1”, gradient of “X_2”, accumulated gradient of “X_1”, accumulated gradient of “X_2”, accumulated squared gradient of “X_1”, accumulated squared gradient of “X_2”].

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T3: New values of optimized tensors, followed by their respective new accumulated gradients, followed by their respective new accumulated squared gradients. For example, if two tensors “X_1” and “X_2” are optimized, the outputs list would be [new value of “X_1”, new value of “X_2”, new accumulated gradient of “X_1”, new accumulated gradient of “X_2”, new accumulated squared gradient of “X_1”, new accumulated squared gradient of “X_2”].

Type Constraints

  • T1 in ( tensor(double), tensor(float) ): Constrain input types to float scalars.

  • T2 in ( tensor(int64) ): Constrain input types to 64-bit integer scalars.

  • T3 in ( tensor(double), tensor(float) ): Constrain input and output types to float tensors.

OnnxAiOnnxPreviewTrainingGradient#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxPreviewTrainingGradient(*args, **kwargs)#

Version

  • name: Gradient (GitHub)

  • domain: ai.onnx.preview.training

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.preview.training.

Summary

Gradient operator computes the partial derivatives of a specific tensor w.r.t. some other tensors. This operator is widely used in gradient-based training algorithms. To illustrate its use, let’s consider a computation graph,

X -----.
       |
       v
W --> Conv --> H --> Gemm --> Y
                      ^
                      |
                      Z

, where W and Z are trainable tensors. Note that operators’ attributes are omitted for the sake of simplicity. Let dY/dW (dY/dZ) be the gradient of Y with respect to W (Z). The user can compute gradient by inserting Gradient operator to form another graph shown below.

W --> Conv --> H --> Gemm --> Y
|      ^              ^
|      |              |
|      X              Z
|      |              |
|      |   .----------'
|      |   |  (W/Z/X is the 1st/2nd/3rd input of Gradient as shown in
|      |   |   "xs" followed by "zs")
|      v   v
'---> Gradient(xs=["W", "Z"], zs=["X"], y="Y")
       |   |
       |   '-----------------------------------> dY/dW (1st output of Gradient)
       |
       '---------------------------------------> dY/dZ (2nd output of Gradient)

By definition, the tensor “y” is a function of independent variables in “xs” and “zs”. Since we only compute the gradient of “y” w.r.t. the differentiable variables in “xs”, this Gradient only outputs dY/dW and dY/dZ. Note that “H” cannot appear in “xs” and “zs”. The reason is that “H” can be determined by tensors “W” and “X” and therefore “H” is not an independent variable.

All outputs are optional. If needed, for example, user can assign an empty string to the 1st output name of that Gradient to skip the generation of dY/dW. Note that the concept of optional outputs can also be found in ONNX’s RNN, GRU, and LSTM.

Gradient operator can compute derivative against intermediate tensors. For example, the gradient of Y with respect to H can be done via

W --> Conv --> H --> Gemm --> Y
       ^       |      ^
       |       |      |
       X       |      Z
       .-------'      |
       |   .----------'
       |   | (H/Z is the 1st/2nd input of Gradient as shown in "xs")
       v   v
      Gradient(xs=["H", "Z"], y="Y")
       |   |
       |   '-----------------------------------> dY/dH (1st output of Gradient)
       |
       '---------------------------------------> dY/dZ (2nd output of Gradient)

It is possible to represent high-order differentiation using Gradient operators. For example, given the following linear model:

W --> Gemm --> Y --> Loss --> O
       ^              ^
       |              |
       X              L

To compute the 2nd order derivative of O with respect to W (denoted by d^2O/dW^2), one can do

W --> Gemm --> Y --> Loss --> O
|      ^              ^
|      |              |
|      X .------------L
|      | |            |
|      | |            v
+------+-+> Gradient(xs=["X", "W"], zs=["L"], y="O") ---> dO/dX (1st output of Gradient)
|      | |    |
|      | |    '---> dO/dW (2nd output of Gradient)
|      v v
'---> Gradient(xs=["X", "W"], zs=["L"], y="dO/dW") ---> d(dO/dW)dX (1st output of
       |                                                  Gradient)
       |
       |
       '---> d^2O/dW^2 (2nd output of Gradient)

The tensors named in attributes “xs”, “zs”, and “y” define the differentiated computation graph, and the inputs to Gradient node define the values at which the gradient is computed. We can feed different tensors to the identified graph. For example, one can compute the gradient of Y with respect to H at a specific value of H, H_1, by providing that value as an input to the Gradient node.

W --> Conv --> H --> Gemm --> Y
       ^              ^
       |              |
       X              Z

          Z_1 (2nd input of Gradient)
           |
           v
H_1 --> Gradient(xs=["H", "Z"], y="Y") ---> dY/dH when H = H_1 and Y = Y_1.
           |
           '------------------------------> dY/dZ (2nd output of Gradient)

When the inputs of Gradient are the tensors named in “xs” and “zs”, the computation can be optimized. More specifically, intermediate variables in forward pass can be reused if the gradient is computed via reverse-mode auto-differentiation.

Attributes

  • xs (required): Input tensor names of the differentiated sub-graph. It contains only the necessary differentiated inputs of a (sub-)graph. Variables (usually called intermediate variables) that can be generated from inputs cannot be included in this attribute.

  • y (required): The targeted tensor. It can be viewed as the output of the differentiated function. The attribute “xs” and attribute “zs” are the minimal independent variable set that determines the value of “y”.

  • zs: Input tensor names of the differentiated sub-graph. It contains only the necessary non-differentiated inputs of a (sub-)graph. Variables (usually called intermediate variables) that can be generated from inputs cannot be included in this attribute.

Inputs

Between 1 and 2147483647 inputs.

  • Inputs (variadic) - T1: The values fed into graph identified by the attributes. The i-th input is the value of the i-th tensor specified in the concatenated list of the attribute “xs” and the attribute “zs”. For example, if xs=[“A”, “B”] and zs=[“C”], the first input is used as the value of symbol “A” and the 3rd input is substituted for all the occurrences of “C”.

Outputs

Between 1 and 2147483647 outputs.

  • Outputs (variadic) - T2: The gradient of the tensor specified by the attribute “y” with respect to each of tensors specified in the attribute “xs”. The i-th output is the gradient of “y” with respect to the i-th tensor specified in the attribute “xs”.

Type Constraints

  • T1 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Allow outputs to be any kind of tensor.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Allow inputs to be any kind of floating-point tensor.

OnnxAiOnnxPreviewTrainingGradient_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxPreviewTrainingGradient_1(*args, **kwargs)#

Version

  • name: Gradient (GitHub)

  • domain: ai.onnx.preview.training

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1 of domain ai.onnx.preview.training.

Summary

Gradient operator computes the partial derivatives of a specific tensor w.r.t. some other tensors. This operator is widely used in gradient-based training algorithms. To illustrate its use, let’s consider a computation graph,

X -----.
       |
       v
W --> Conv --> H --> Gemm --> Y
                      ^
                      |
                      Z

, where W and Z are trainable tensors. Note that operators’ attributes are omitted for the sake of simplicity. Let dY/dW (dY/dZ) be the gradient of Y with respect to W (Z). The user can compute gradient by inserting Gradient operator to form another graph shown below.

W --> Conv --> H --> Gemm --> Y
|      ^              ^
|      |              |
|      X              Z
|      |              |
|      |   .----------'
|      |   |  (W/Z/X is the 1st/2nd/3rd input of Gradient as shown in
|      |   |   "xs" followed by "zs")
|      v   v
'---> Gradient(xs=["W", "Z"], zs=["X"], y="Y")
       |   |
       |   '-----------------------------------> dY/dW (1st output of Gradient)
       |
       '---------------------------------------> dY/dZ (2nd output of Gradient)

By definition, the tensor “y” is a function of independent variables in “xs” and “zs”. Since we only compute the gradient of “y” w.r.t. the differentiable variables in “xs”, this Gradient only outputs dY/dW and dY/dZ. Note that “H” cannot appear in “xs” and “zs”. The reason is that “H” can be determined by tensors “W” and “X” and therefore “H” is not an independent variable.

All outputs are optional. If needed, for example, user can assign an empty string to the 1st output name of that Gradient to skip the generation of dY/dW. Note that the concept of optional outputs can also be found in ONNX’s RNN, GRU, and LSTM.

Gradient operator can compute derivative against intermediate tensors. For example, the gradient of Y with respect to H can be done via

W --> Conv --> H --> Gemm --> Y
       ^       |      ^
       |       |      |
       X       |      Z
       .-------'      |
       |   .----------'
       |   | (H/Z is the 1st/2nd input of Gradient as shown in "xs")
       v   v
      Gradient(xs=["H", "Z"], y="Y")
       |   |
       |   '-----------------------------------> dY/dH (1st output of Gradient)
       |
       '---------------------------------------> dY/dZ (2nd output of Gradient)

It is possible to represent high-order differentiation using Gradient operators. For example, given the following linear model:

W --> Gemm --> Y --> Loss --> O
       ^              ^
       |              |
       X              L

To compute the 2nd order derivative of O with respect to W (denoted by d^2O/dW^2), one can do

W --> Gemm --> Y --> Loss --> O
|      ^              ^
|      |              |
|      X .------------L
|      | |            |
|      | |            v
+------+-+> Gradient(xs=["X", "W"], zs=["L"], y="O") ---> dO/dX (1st output of Gradient)
|      | |    |
|      | |    '---> dO/dW (2nd output of Gradient)
|      v v
'---> Gradient(xs=["X", "W"], zs=["L"], y="dO/dW") ---> d(dO/dW)dX (1st output of
       |                                                  Gradient)
       |
       |
       '---> d^2O/dW^2 (2nd output of Gradient)

The tensors named in attributes “xs”, “zs”, and “y” define the differentiated computation graph, and the inputs to Gradient node define the values at which the gradient is computed. We can feed different tensors to the identified graph. For example, one can compute the gradient of Y with respect to H at a specific value of H, H_1, by providing that value as an input to the Gradient node.

W --> Conv --> H --> Gemm --> Y
       ^              ^
       |              |
       X              Z

          Z_1 (2nd input of Gradient)
           |
           v
H_1 --> Gradient(xs=["H", "Z"], y="Y") ---> dY/dH when H = H_1 and Y = Y_1.
           |
           '------------------------------> dY/dZ (2nd output of Gradient)

When the inputs of Gradient are the tensors named in “xs” and “zs”, the computation can be optimized. More specifically, intermediate variables in forward pass can be reused if the gradient is computed via reverse-mode auto-differentiation.

Attributes

  • xs (required): Input tensor names of the differentiated sub-graph. It contains only the necessary differentiated inputs of a (sub-)graph. Variables (usually called intermediate variables) that can be generated from inputs cannot be included in this attribute.

  • y (required): The targeted tensor. It can be viewed as the output of the differentiated function. The attribute “xs” and attribute “zs” are the minimal independent variable set that determines the value of “y”.

  • zs: Input tensor names of the differentiated sub-graph. It contains only the necessary non-differentiated inputs of a (sub-)graph. Variables (usually called intermediate variables) that can be generated from inputs cannot be included in this attribute.

Inputs

Between 1 and 2147483647 inputs.

  • Inputs (variadic) - T1: The values fed into graph identified by the attributes. The i-th input is the value of the i-th tensor specified in the concatenated list of the attribute “xs” and the attribute “zs”. For example, if xs=[“A”, “B”] and zs=[“C”], the first input is used as the value of symbol “A” and the 3rd input is substituted for all the occurrences of “C”.

Outputs

Between 1 and 2147483647 outputs.

  • Outputs (variadic) - T2: The gradient of the tensor specified by the attribute “y” with respect to each of tensors specified in the attribute “xs”. The i-th output is the gradient of “y” with respect to the i-th tensor specified in the attribute “xs”.

Type Constraints

  • T1 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Allow outputs to be any kind of tensor.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Allow inputs to be any kind of floating-point tensor.

OnnxAiOnnxPreviewTrainingMomentum#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxPreviewTrainingMomentum(*args, **kwargs)#

Version

  • name: Momentum (GitHub)

  • domain: ai.onnx.preview.training

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.preview.training.

Summary

Compute one iteration of stochastic gradient update with momentum. This operator can conduct the optimization of multiple tensor variables.

Let’s define the behavior of this operator. As you can imagine, SG with momentum requires several parameters:

  • The learning-rate “R”.

  • The update count “T”. That is, the number of conducted training iterations. It should be zero in the first training iteration.

  • A L2-norm regularization coefficient “norm_coefficient”.

  • A decay coefficient of previous accumulated gradient (i.e., momentum) “alpha”.

  • The scaling coefficient of current gradient “beta”.

  • An attribute to choose either standard momentum or Nesterov’s momentum “mode” should be used.

For the sake of simplicity, assume that there is only one tensor (called “X”) to be optimized. Other necessary inputs are “X“‘s gradient (called “G”) and “X“‘s momentum (called “V”). This Momentum operator maps all these inputs to the new value of “X” (called “X_new”) and its new momentum (called “V_new”).

This operator supports two different momentum algorithms. Set the attribute “mode” to “nesterov” if Nesterov’s momentum is desired. Otherwise, set the attribute “model” to “standard” to use standard momentum. Computation details are described subsequently.

Let “+”, “-”, “*”, and “/” are all element-wise operations with numpy-style broadcasting.

Pseudo code for SG with standard momentum:

// Add gradient of 0.5 * norm_coefficient * ||X||^2, where ||X|| is the sum of squared // values of all elements in X. G_regularized = norm_coefficient * X + G

// In the first training iteration, beta should always be 1. beta_adjusted = T > 0 ? beta : 1

// Compute the current momentum based on previous momentum and the current gradient. V_new = alpha * V + beta_adjusted * G_regularized

// Update X. X_new = X - R * V_new

Pseudo code for SG with Nesterov’s momentum:

// Add gradient of 0.5 * norm_coefficient * ||X||^2, where ||X|| is the sum of squared // values of all elements in X. G_regularized = norm_coefficient * X + G;

// In the first training iteration, beta should always be 1. beta_adjusted = T > 0 ? beta : 1

// Compute the current momentum based on previous momentum and the current gradient. V_new = alpha * V + beta_adjusted * G_regularized;

// Compute final update direction and then update X. X_new = X - R * (G_regularized + alpha * V_new)

If one assign this operators to optimize multiple inputs, for example, “X_1” and “X_2”. The same pseudo code would be extended to handle all tensors jointly. More specifically, we can view “X” as a concatenation of “X_1” and “X_2” (of course, their gradient and accumulate gradient should be concatenated too) and then our pseudo code becomes applicable.

Attributes

  • alpha (required): The decay factor of momentum. It should be a scalar.

  • beta (required): The coefficient of gradient in computing new momentum. It should be a scalar.

  • mode (required): Its value should be either “nesterov” or “standard”. The value “nesterov” leads to the use of Nesterov’s momentum while “standard” invokes stochastic gradient method using standard momentum

  • norm_coefficient (required): Coefficient of 0.5 * norm_coefficient * ||X||^2.

Inputs

Between 3 and 2147483647 inputs.

  • R (heterogeneous) - T1: The learning rate.

  • T (heterogeneous) - T2: Update count of “X”. It should be a scalar.

  • inputs (variadic) - T3: It sequentially contains the current values of optimized tensors, then their gradient tensors, and finally their momentum tensors. For example, if two tensors “X_1” and “X_2” are optimized, The expected input list would be [“X_1”, “X_2”, gradient of “X_1”, gradient of “X_2”, momentum of “X_1”, momentum of “X_2”].

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T3: It sequentially contains the new values of optimized tensors and then the new values of their momentum tensors. For example, if two tensors “X_1” and “X_2” are optimized, the output list would be [new value of “X_1,” new value of “X_2” new momentum of “X_1”, new momentum of “X_2”].

Type Constraints

  • T1 in ( tensor(double), tensor(float) ): Constrain input types to float scalars.

  • T2 in ( tensor(int64) ): Constrain input types to 64-bit integer scalars.

  • T3 in ( tensor(double), tensor(float) ): Constrain input types to float tensors.

OnnxAiOnnxPreviewTrainingMomentum_1#

class mlprodict.npy.xop_auto_import_.OnnxAiOnnxPreviewTrainingMomentum_1(*args, **kwargs)#

Version

  • name: Momentum (GitHub)

  • domain: ai.onnx.preview.training

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1 of domain ai.onnx.preview.training.

Summary

Compute one iteration of stochastic gradient update with momentum. This operator can conduct the optimization of multiple tensor variables.

Let’s define the behavior of this operator. As you can imagine, SG with momentum requires several parameters:

  • The learning-rate “R”.

  • The update count “T”. That is, the number of conducted training iterations. It should be zero in the first training iteration.

  • A L2-norm regularization coefficient “norm_coefficient”.

  • A decay coefficient of previous accumulated gradient (i.e., momentum) “alpha”.

  • The scaling coefficient of current gradient “beta”.

  • An attribute to choose either standard momentum or Nesterov’s momentum “mode” should be used.

For the sake of simplicity, assume that there is only one tensor (called “X”) to be optimized. Other necessary inputs are “X“‘s gradient (called “G”) and “X“‘s momentum (called “V”). This Momentum operator maps all these inputs to the new value of “X” (called “X_new”) and its new momentum (called “V_new”).

This operator supports two different momentum algorithms. Set the attribute “mode” to “nesterov” if Nesterov’s momentum is desired. Otherwise, set the attribute “model” to “standard” to use standard momentum. Computation details are described subsequently.

Let “+”, “-”, “*”, and “/” are all element-wise operations with numpy-style broadcasting.

Pseudo code for SG with standard momentum:

// Add gradient of 0.5 * norm_coefficient * ||X||^2, where ||X|| is the sum of squared // values of all elements in X. G_regularized = norm_coefficient * X + G

// In the first training iteration, beta should always be 1. beta_adjusted = T > 0 ? beta : 1

// Compute the current momentum based on previous momentum and the current gradient. V_new = alpha * V + beta_adjusted * G_regularized

// Update X. X_new = X - R * V_new

Pseudo code for SG with Nesterov’s momentum:

// Add gradient of 0.5 * norm_coefficient * ||X||^2, where ||X|| is the sum of squared // values of all elements in X. G_regularized = norm_coefficient * X + G;

// In the first training iteration, beta should always be 1. beta_adjusted = T > 0 ? beta : 1

// Compute the current momentum based on previous momentum and the current gradient. V_new = alpha * V + beta_adjusted * G_regularized;

// Compute final update direction and then update X. X_new = X - R * (G_regularized + alpha * V_new)

If one assign this operators to optimize multiple inputs, for example, “X_1” and “X_2”. The same pseudo code would be extended to handle all tensors jointly. More specifically, we can view “X” as a concatenation of “X_1” and “X_2” (of course, their gradient and accumulate gradient should be concatenated too) and then our pseudo code becomes applicable.

Attributes

  • alpha (required): The decay factor of momentum. It should be a scalar.

  • beta (required): The coefficient of gradient in computing new momentum. It should be a scalar.

  • mode (required): Its value should be either “nesterov” or “standard”. The value “nesterov” leads to the use of Nesterov’s momentum while “standard” invokes stochastic gradient method using standard momentum

  • norm_coefficient (required): Coefficient of 0.5 * norm_coefficient * ||X||^2.

Inputs

Between 3 and 2147483647 inputs.

  • R (heterogeneous) - T1: The learning rate.

  • T (heterogeneous) - T2: Update count of “X”. It should be a scalar.

  • inputs (variadic) - T3: It sequentially contains the current values of optimized tensors, then their gradient tensors, and finally their momentum tensors. For example, if two tensors “X_1” and “X_2” are optimized, The expected input list would be [“X_1”, “X_2”, gradient of “X_1”, gradient of “X_2”, momentum of “X_1”, momentum of “X_2”].

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T3: It sequentially contains the new values of optimized tensors and then the new values of their momentum tensors. For example, if two tensors “X_1” and “X_2” are optimized, the output list would be [new value of “X_1,” new value of “X_2” new momentum of “X_1”, new momentum of “X_2”].

Type Constraints

  • T1 in ( tensor(double), tensor(float) ): Constrain input types to float scalars.

  • T2 in ( tensor(int64) ): Constrain input types to 64-bit integer scalars.

  • T3 in ( tensor(double), tensor(float) ): Constrain input types to float tensors.

OnnxAnd#

class mlprodict.npy.xop_auto_import_.OnnxAnd(*args, **kwargs)#

Version

  • name: And (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Returns the tensor resulted from performing the and logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool) ): Constrain input to boolean tensor.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxAnd_1#

class mlprodict.npy.xop_auto_import_.OnnxAnd_1(*args, **kwargs)#

Version

  • name: And (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Returns the tensor resulted from performing the and logical operation elementwise on the input tensors A and B.

If broadcasting is enabled, the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. See the doc of Add for a detailed description of the broadcasting rules.

Attributes

  • axis: If set, defines the broadcast dimensions.

  • broadcast: Enable broadcasting Default value is 0.

Inputs

  • A (heterogeneous) - T: Left input tensor for the logical operator.

  • B (heterogeneous) - T: Right input tensor for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool) ): Constrain input to boolean tensor.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxAnd_7#

class mlprodict.npy.xop_auto_import_.OnnxAnd_7(*args, **kwargs)#

Version

  • name: And (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Returns the tensor resulted from performing the and logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool) ): Constrain input to boolean tensor.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxArgMax#

class mlprodict.npy.xop_auto_import_.OnnxArgMax(*args, **kwargs)#

Version

  • name: ArgMax (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the indices of the max elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned. If select_last_index is True (default False), the index of the last occurrence of the max is selected if the max appears more than once in the input. Otherwise the index of the first occurrence is selected. The type of the output tensor is integer.

Attributes

  • axis: The axis in which to compute the arg indices. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • select_last_index: Whether to select the last index or the first index if the {name} appears in multiple indices, default is False (first index). Default value is 0.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - tensor(int64): Reduced output tensor with integer data type.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxArgMax_1#

class mlprodict.npy.xop_auto_import_.OnnxArgMax_1(*args, **kwargs)#

Version

  • name: ArgMax (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the indices of the max elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned. The type of the output tensor is integer.

Attributes

  • axis: The axis in which to compute the arg indices. Default value is 0.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - tensor(int64): Reduced output tensor with integer data type.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxArgMax_11#

class mlprodict.npy.xop_auto_import_.OnnxArgMax_11(*args, **kwargs)#

Version

  • name: ArgMax (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the indices of the max elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulting tensor has the reduced dimension pruned. The type of the output tensor is integer.

Attributes

  • axis: The axis in which to compute the arg indices. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - tensor(int64): Reduced output tensor with integer data type.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxArgMax_12#

class mlprodict.npy.xop_auto_import_.OnnxArgMax_12(*args, **kwargs)#

Version

  • name: ArgMax (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Computes the indices of the max elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulting tensor has the reduced dimension pruned. If select_last_index is True (default False), the index of the last occurrence of the max is selected if the max appears more than once in the input. Otherwise the index of the first occurrence is selected. The type of the output tensor is integer.

Attributes

  • axis: The axis in which to compute the arg indices. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • select_last_index: Whether to select the last index or the first index if the {name} appears in multiple indices, default is False (first index). Default value is 0.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - tensor(int64): Reduced output tensor with integer data type.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxArgMax_13#

class mlprodict.npy.xop_auto_import_.OnnxArgMax_13(*args, **kwargs)#

Version

  • name: ArgMax (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the indices of the max elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned. If select_last_index is True (default False), the index of the last occurrence of the max is selected if the max appears more than once in the input. Otherwise the index of the first occurrence is selected. The type of the output tensor is integer.

Attributes

  • axis: The axis in which to compute the arg indices. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • select_last_index: Whether to select the last index or the first index if the {name} appears in multiple indices, default is False (first index). Default value is 0.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - tensor(int64): Reduced output tensor with integer data type.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxArgMin#

class mlprodict.npy.xop_auto_import_.OnnxArgMin(*args, **kwargs)#

Version

  • name: ArgMin (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the indices of the min elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned. If select_last_index is True (default False), the index of the last occurrence of the min is selected if the min appears more than once in the input. Otherwise the index of the first occurrence is selected. The type of the output tensor is integer.

Attributes

  • axis: The axis in which to compute the arg indices. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • select_last_index: Whether to select the last index or the first index if the {name} appears in multiple indices, default is False (first index). Default value is 0.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - tensor(int64): Reduced output tensor with integer data type.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxArgMin_1#

class mlprodict.npy.xop_auto_import_.OnnxArgMin_1(*args, **kwargs)#

Version

  • name: ArgMin (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the indices of the min elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned. The type of the output tensor is integer.

Attributes

  • axis: The axis in which to compute the arg indices. Default value is 0.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - tensor(int64): Reduced output tensor with integer data type.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxArgMin_11#

class mlprodict.npy.xop_auto_import_.OnnxArgMin_11(*args, **kwargs)#

Version

  • name: ArgMin (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the indices of the min elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulting tensor has the reduced dimension pruned. The type of the output tensor is integer.

Attributes

  • axis: The axis in which to compute the arg indices. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - tensor(int64): Reduced output tensor with integer data type.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxArgMin_12#

class mlprodict.npy.xop_auto_import_.OnnxArgMin_12(*args, **kwargs)#

Version

  • name: ArgMin (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Computes the indices of the min elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulting tensor has the reduced dimension pruned. If select_last_index is True (default False), the index of the last occurrence of the min is selected if the min appears more than once in the input. Otherwise the index of the first occurrence is selected. The type of the output tensor is integer.

Attributes

  • axis: The axis in which to compute the arg indices. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • select_last_index: Whether to select the last index or the first index if the {name} appears in multiple indices, default is False (first index). Default value is 0.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - tensor(int64): Reduced output tensor with integer data type.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxArgMin_13#

class mlprodict.npy.xop_auto_import_.OnnxArgMin_13(*args, **kwargs)#

Version

  • name: ArgMin (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the indices of the min elements of the input tensor’s element along the provided axis. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned. If select_last_index is True (default False), the index of the last occurrence of the min is selected if the min appears more than once in the input. Otherwise the index of the first occurrence is selected. The type of the output tensor is integer.

Attributes

  • axis: The axis in which to compute the arg indices. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • select_last_index: Whether to select the last index or the first index if the {name} appears in multiple indices, default is False (first index). Default value is 0.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - tensor(int64): Reduced output tensor with integer data type.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxAsin#

class mlprodict.npy.xop_auto_import_.OnnxAsin(*args, **kwargs)#

Version

  • name: Asin (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the arcsine (inverse of sine) of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The arcsine of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAsin_7#

class mlprodict.npy.xop_auto_import_.OnnxAsin_7(*args, **kwargs)#

Version

  • name: Asin (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the arcsine (inverse of sine) of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The arcsine of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAsinh#

class mlprodict.npy.xop_auto_import_.OnnxAsinh(*args, **kwargs)#

Version

  • name: Asinh (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Calculates the hyperbolic arcsine of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic arcsine values of the input tensor computed element- wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAsinh_9#

class mlprodict.npy.xop_auto_import_.OnnxAsinh_9(*args, **kwargs)#

Version

  • name: Asinh (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Calculates the hyperbolic arcsine of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic arcsine values of the input tensor computed element- wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAtan#

class mlprodict.npy.xop_auto_import_.OnnxAtan(*args, **kwargs)#

Version

  • name: Atan (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the arctangent (inverse of tangent) of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The arctangent of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAtan_7#

class mlprodict.npy.xop_auto_import_.OnnxAtan_7(*args, **kwargs)#

Version

  • name: Atan (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the arctangent (inverse of tangent) of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The arctangent of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAtanh#

class mlprodict.npy.xop_auto_import_.OnnxAtanh(*args, **kwargs)#

Version

  • name: Atanh (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Calculates the hyperbolic arctangent of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic arctangent values of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAtanh_9#

class mlprodict.npy.xop_auto_import_.OnnxAtanh_9(*args, **kwargs)#

Version

  • name: Atanh (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Calculates the hyperbolic arctangent of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic arctangent values of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAveragePool#

class mlprodict.npy.xop_auto_import_.OnnxAveragePool(*args, **kwargs)#

Version

  • name: AveragePool (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

AveragePool consumes an input tensor X and applies average pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. average pooling consisting of computing the average on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

or#

output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i]

The output of each pooling window is divided by the number of elements (exclude pad when attribute count_include_pad is zero).

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is 0.

  • count_include_pad: Whether include pad pixels when calculating values for the edges. Default is 0, doesn’t count include pad. Default value is 0.

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAveragePool_1#

class mlprodict.npy.xop_auto_import_.OnnxAveragePool_1(*args, **kwargs)#

Version

  • name: AveragePool (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

AveragePool consumes an input tensor X and applies average pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. average pooling consisting of computing the average on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i]

The output of each pooling window is divided by the number of elements exclude pad.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAveragePool_10#

class mlprodict.npy.xop_auto_import_.OnnxAveragePool_10(*args, **kwargs)#

Version

  • name: AveragePool (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

AveragePool consumes an input tensor X and applies average pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. average pooling consisting of computing the average on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

or#

output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i]

The output of each pooling window is divided by the number of elements (exclude pad when attribute count_include_pad is zero).

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is 0.

  • count_include_pad: Whether include pad pixels when calculating values for the edges. Default is 0, doesn’t count include pad. Default value is 0.

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAveragePool_11#

class mlprodict.npy.xop_auto_import_.OnnxAveragePool_11(*args, **kwargs)#

Version

  • name: AveragePool (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

AveragePool consumes an input tensor X and applies average pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. average pooling consisting of computing the average on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

or#

output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i]

The output of each pooling window is divided by the number of elements (exclude pad when attribute count_include_pad is zero).

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is 0.

  • count_include_pad: Whether include pad pixels when calculating values for the edges. Default is 0, doesn’t count include pad. Default value is 0.

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxAveragePool_7#

class mlprodict.npy.xop_auto_import_.OnnxAveragePool_7(*args, **kwargs)#

Version

  • name: AveragePool (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

AveragePool consumes an input tensor X and applies average pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. average pooling consisting of computing the average on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i]

The output of each pooling window is divided by the number of elements (exclude pad when attribute count_include_pad is zero).

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.

  • count_include_pad: Whether include pad pixels when calculating values for the edges. Default is 0, doesn’t count include pad. Default value is 0.

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxBatchNormalization#

class mlprodict.npy.xop_auto_import_.OnnxBatchNormalization(*args, **kwargs)#

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, There are five required inputs ‘X’, ‘scale’, ‘B’, ‘input_mean’ and ‘input_var’. Note that ‘input_mean’ and ‘input_var’ are expected to be the estimated statistics in inference mode (training_mode=False, default), and the running statistics in training mode (training_mode=True). There are multiple cases for the number of outputs, which we list below:

Output case #1: Y, running_mean, running_var (training_mode=True) Output case #2: Y (training_mode=False)

When training_mode=False, extra outputs are invalid. The outputs are updated as follows when training_mode=True:

running_mean = input_mean * momentum + current_mean * (1 - momentum)
running_var = input_var * momentum + current_var * (1 - momentum)

Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B

where:

current_mean = ReduceMean(X, axis=all_except_channel_index)
current_var =  ReduceVar(X, axis=all_except_channel_index)

Notice that ReduceVar refers to the population variance, and it equals to
sum(sqrd(x_i - x_avg)) / N
where N is the population size (this formula does not use sample size N - 1).

The computation of ReduceMean and ReduceVar uses float to avoid overflow for float16 inputs.

When training_mode=False:

Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B

For previous (depreciated) non-spatial cases, implementors are suggested to flatten the input shape to (N x C * D1 * D2 * … * Dn) before a BatchNormalization Op. This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.

  • momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.

  • training_mode: If set to true, it indicates BatchNormalization is being used for training, and outputs 1, 2, 3, and 4 would be populated. Default value is 0.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1

  • scale (heterogeneous) - T1: Scale tensor of shape (C).

  • B (heterogeneous) - T1: Bias tensor of shape (C).

  • input_mean (heterogeneous) - T2: running (training) or estimated (testing) mean tensor of shape (C).

  • input_var (heterogeneous) - T2: running (training) or estimated (testing) variance tensor of shape (C).

Outputs

Between 1 and 3 outputs.

  • Y (heterogeneous) - T: The output tensor of the same shape as X

  • running_mean (optional, heterogeneous) - T2: The running mean after the BatchNormalization operator.

  • running_var (optional, heterogeneous) - T2: The running variance after the BatchNormalization operator. This op uses the population size (N) for calculating variance, and not the sample size N-1.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain scale and bias types to float tensors.

  • T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain mean and variance types to float tensors.

OnnxBatchNormalization_1#

class mlprodict.npy.xop_auto_import_.OnnxBatchNormalization_1(*args, **kwargs)#

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

Attributes

  • consumed_inputs (required): legacy optimization attribute.

  • epsilon: The epsilon value to use to avoid division by zero, default is 1e-5f. Default value is 9.999999747378752e-06.

  • is_test: If set to nonzero, run spatial batch normalization in test mode, default is 0. Default value is 0.

  • momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum), default is 0.9f. Default value is 0.8999999761581421.

  • spatial: If true, compute the mean and variance across all spatial elements If false, compute the mean and variance across per feature.Default is 1. Default value is 1.

Inputs

  • X (heterogeneous) - T: The input 4-dimensional tensor of shape NCHW.

  • scale (heterogeneous) - T: The scale as a 1-dimensional tensor of size C to be applied to the output.

  • B (heterogeneous) - T: The bias as a 1-dimensional tensor of size C to be applied to the output.

  • mean (heterogeneous) - T: The running mean (training) or the estimated mean (testing) as a 1-dimensional tensor of size C.

  • var (heterogeneous) - T: The running variance (training) or the estimated variance (testing) as a 1-dimensional tensor of size C.

Outputs

Between 1 and 5 outputs.

  • Y (heterogeneous) - T: The output 4-dimensional tensor of the same shape as X.

  • mean (optional, heterogeneous) - T: The running mean after the BatchNormalization operator. Must be in- place with the input mean. Should not be used for testing.

  • var (optional, heterogeneous) - T: The running variance after the BatchNormalization operator. Must be in-place with the input var. Should not be used for testing.

  • saved_mean (optional, heterogeneous) - T: Saved mean used during training to speed up gradient computation. Should not be used for testing.

  • saved_var (optional, heterogeneous) - T: Saved variance used during training to speed up gradient computation. Should not be used for testing.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxBatchNormalization_14#

class mlprodict.npy.xop_auto_import_.OnnxBatchNormalization_14(*args, **kwargs)#

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, There are five required inputs ‘X’, ‘scale’, ‘B’, ‘input_mean’ and ‘input_var’. Note that ‘input_mean’ and ‘input_var’ are expected to be the estimated statistics in inference mode (training_mode=False, default), and the running statistics in training mode (training_mode=True). There are multiple cases for the number of outputs, which we list below:

Output case #1: Y, running_mean, running_var (training_mode=True) Output case #2: Y (training_mode=False)

When training_mode=False, extra outputs are invalid. The outputs are updated as follows when training_mode=True:

running_mean = input_mean * momentum + current_mean * (1 - momentum)
running_var = input_var * momentum + current_var * (1 - momentum)

Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B

where:

current_mean = ReduceMean(X, axis=all_except_channel_index)
current_var =  ReduceVar(X, axis=all_except_channel_index)

Notice that ReduceVar refers to the population variance, and it equals to
sum(sqrd(x_i - x_avg)) / N
where N is the population size (this formula does not use sample size N - 1).

When training_mode=False:

Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B

For previous (depreciated) non-spatial cases, implementors are suggested to flatten the input shape to (N x C * D1 * D2 * … * Dn) before a BatchNormalization Op. This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.

  • momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.

  • training_mode: If set to true, it indicates BatchNormalization is being used for training, and outputs 1, 2, 3, and 4 would be populated. Default value is 0.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1

  • scale (heterogeneous) - T: Scale tensor of shape (C).

  • B (heterogeneous) - T: Bias tensor of shape (C).

  • input_mean (heterogeneous) - U: running (training) or estimated (testing) mean tensor of shape (C).

  • input_var (heterogeneous) - U: running (training) or estimated (testing) variance tensor of shape (C).

Outputs

Between 1 and 3 outputs.

  • Y (heterogeneous) - T: The output tensor of the same shape as X

  • running_mean (optional, heterogeneous) - U: The running mean after the BatchNormalization operator.

  • running_var (optional, heterogeneous) - U: The running variance after the BatchNormalization operator. This op uses the population size (N) for calculating variance, and not the sample size N-1.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • U in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain mean and variance types to float tensors. It allows all float type for U.

OnnxBatchNormalization_15#

class mlprodict.npy.xop_auto_import_.OnnxBatchNormalization_15(*args, **kwargs)#

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, There are five required inputs ‘X’, ‘scale’, ‘B’, ‘input_mean’ and ‘input_var’. Note that ‘input_mean’ and ‘input_var’ are expected to be the estimated statistics in inference mode (training_mode=False, default), and the running statistics in training mode (training_mode=True). There are multiple cases for the number of outputs, which we list below:

Output case #1: Y, running_mean, running_var (training_mode=True) Output case #2: Y (training_mode=False)

When training_mode=False, extra outputs are invalid. The outputs are updated as follows when training_mode=True:

running_mean = input_mean * momentum + current_mean * (1 - momentum)
running_var = input_var * momentum + current_var * (1 - momentum)

Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B

where:

current_mean = ReduceMean(X, axis=all_except_channel_index)
current_var =  ReduceVar(X, axis=all_except_channel_index)

Notice that ReduceVar refers to the population variance, and it equals to
sum(sqrd(x_i - x_avg)) / N
where N is the population size (this formula does not use sample size N - 1).

The computation of ReduceMean and ReduceVar uses float to avoid overflow for float16 inputs.

When training_mode=False:

Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B

For previous (depreciated) non-spatial cases, implementors are suggested to flatten the input shape to (N x C * D1 * D2 * … * Dn) before a BatchNormalization Op. This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.

  • momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.

  • training_mode: If set to true, it indicates BatchNormalization is being used for training, and outputs 1, 2, 3, and 4 would be populated. Default value is 0.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1

  • scale (heterogeneous) - T1: Scale tensor of shape (C).

  • B (heterogeneous) - T1: Bias tensor of shape (C).

  • input_mean (heterogeneous) - T2: running (training) or estimated (testing) mean tensor of shape (C).

  • input_var (heterogeneous) - T2: running (training) or estimated (testing) variance tensor of shape (C).

Outputs

Between 1 and 3 outputs.

  • Y (heterogeneous) - T: The output tensor of the same shape as X

  • running_mean (optional, heterogeneous) - T2: The running mean after the BatchNormalization operator.

  • running_var (optional, heterogeneous) - T2: The running variance after the BatchNormalization operator. This op uses the population size (N) for calculating variance, and not the sample size N-1.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain scale and bias types to float tensors.

  • T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain mean and variance types to float tensors.

OnnxBatchNormalization_6#

class mlprodict.npy.xop_auto_import_.OnnxBatchNormalization_6(*args, **kwargs)#

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

Attributes

  • epsilon: The epsilon value to use to avoid division by zero, default is 1e-5f. Default value is 9.999999747378752e-06.

  • is_test: If set to nonzero, run spatial batch normalization in test mode, default is 0. Default value is 0.

  • momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum), default is 0.9f. Default value is 0.8999999761581421.

  • spatial: If true, compute the mean and variance across all spatial elements If false, compute the mean and variance across per feature.Default is 1. Default value is 1.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

  • scale (heterogeneous) - T: The scale as a 1-dimensional tensor of size C to be applied to the output.

  • B (heterogeneous) - T: The bias as a 1-dimensional tensor of size C to be applied to the output.

  • mean (heterogeneous) - T: The running mean (training) or the estimated mean (testing) as a 1-dimensional tensor of size C.

  • var (heterogeneous) - T: The running variance (training) or the estimated variance (testing) as a 1-dimensional tensor of size C.

Outputs

Between 1 and 5 outputs.

  • Y (heterogeneous) - T: The output tensor of the same shape as X.

  • mean (optional, heterogeneous) - T: The running mean after the BatchNormalization operator. Must be in- place with the input mean. Should not be used for testing.

  • var (optional, heterogeneous) - T: The running variance after the BatchNormalization operator. Must be in-place with the input var. Should not be used for testing.

  • saved_mean (optional, heterogeneous) - T: Saved mean used during training to speed up gradient computation. Should not be used for testing.

  • saved_var (optional, heterogeneous) - T: Saved variance used during training to speed up gradient computation. Should not be used for testing.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxBatchNormalization_7#

class mlprodict.npy.xop_auto_import_.OnnxBatchNormalization_7(*args, **kwargs)#

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.

  • momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.

  • spatial: If true, compute the mean and variance across per activation. If false, compute the mean and variance across per feature over each mini-batch. Default value is 1.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

  • scale (heterogeneous) - T: If spatial is true, the dimension of scale is (C). If spatial is false, the dimensions of scale are (C x D1 x … x Dn)

  • B (heterogeneous) - T: If spatial is true, the dimension of bias is (C). If spatial is false, the dimensions of bias are (C x D1 x … x Dn)

  • mean (heterogeneous) - T: If spatial is true, the dimension of the running mean (training) or the estimated mean (testing) is (C). If spatial is false, the dimensions of the running mean (training) or the estimated mean (testing) are (C x D1 x … x Dn).

  • var (heterogeneous) - T: If spatial is true, the dimension of the running variance(training) or the estimated variance (testing) is (C). If spatial is false, the dimensions of the running variance(training) or the estimated variance (testing) are (C x D1 x … x Dn).

Outputs

Between 1 and 5 outputs.

  • Y (heterogeneous) - T: The output tensor of the same shape as X

  • mean (optional, heterogeneous) - T: The running mean after the BatchNormalization operator.

  • var (optional, heterogeneous) - T: The running variance after the BatchNormalization operator.

  • saved_mean (optional, heterogeneous) - T: Saved mean used during training to speed up gradient computation.

  • saved_var (optional, heterogeneous) - T: Saved variance used during training to speed up gradient computation.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxBatchNormalization_9#

class mlprodict.npy.xop_auto_import_.OnnxBatchNormalization_9(*args, **kwargs)#

Version

  • name: BatchNormalization (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

Output case #1: Y, mean, var, saved_mean, saved_var (training mode) Output case #2: Y (test mode)

For previous (depreciated) non-spatial cases, implementors are suggested to flatten the input shape to (N x C*D1*D2 ..*Dn) before a BatchNormalization Op. This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.

  • momentum: Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum). Default value is 0.8999999761581421.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1

  • scale (heterogeneous) - T: Scale tensor of shape (C).

  • B (heterogeneous) - T: Bias tensor of shape (C).

  • mean (heterogeneous) - T: running (training) or estimated (testing) mean tensor of shape (C).

  • var (heterogeneous) - T: running (training) or estimated (testing) variance tensor of shape (C).

Outputs

Between 1 and 5 outputs.

  • Y (heterogeneous) - T: The output tensor of the same shape as X

  • mean (optional, heterogeneous) - T: The running mean after the BatchNormalization operator.

  • var (optional, heterogeneous) - T: The running variance after the BatchNormalization operator.

  • saved_mean (optional, heterogeneous) - T: Saved mean used during training to speed up gradient computation.

  • saved_var (optional, heterogeneous) - T: Saved variance used during training to speed up gradient computation.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxBernoulli#

class mlprodict.npy.xop_auto_import_.OnnxBernoulli(*args, **kwargs)#

Version

  • name: Bernoulli (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Draws binary random numbers (0 or 1) from a Bernoulli distribution. The input tensor should be a tensor containing probabilities p (a value in the range [0,1]) to be used for drawing the binary random number, where an output of 1 is produced with probability p and an output of 0 is produced with probability (1-p).

This operator is non-deterministic and may not produce the same values in different implementations (even if a seed is specified).

Attributes

  • dtype: The data type for the elements of the output tensor. if not specified, we will use the data type of the input tensor.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

  • input (heterogeneous) - T1: All values in input have to be in the range:[0, 1].

Outputs

  • output (heterogeneous) - T2: The returned output tensor only has values 0 or 1, same shape as input tensor.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain input types to float tensors.

  • T2 in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to all numeric tensors and bool tensors.

OnnxBernoulli_15#

class mlprodict.npy.xop_auto_import_.OnnxBernoulli_15(*args, **kwargs)#

Version

  • name: Bernoulli (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Draws binary random numbers (0 or 1) from a Bernoulli distribution. The input tensor should be a tensor containing probabilities p (a value in the range [0,1]) to be used for drawing the binary random number, where an output of 1 is produced with probability p and an output of 0 is produced with probability (1-p).

This operator is non-deterministic and may not produce the same values in different implementations (even if a seed is specified).

Attributes

  • dtype: The data type for the elements of the output tensor. if not specified, we will use the data type of the input tensor.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

  • input (heterogeneous) - T1: All values in input have to be in the range:[0, 1].

Outputs

  • output (heterogeneous) - T2: The returned output tensor only has values 0 or 1, same shape as input tensor.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain input types to float tensors.

  • T2 in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to all numeric tensors and bool tensors.

OnnxBitShift#

class mlprodict.npy.xop_auto_import_.OnnxBitShift(*args, **kwargs)#

Version

  • name: BitShift (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Bitwise shift operator performs element-wise operation. For each input element, if the

attribute “direction” is “RIGHT”, this operator moves its binary representation toward the right side so that the input value is effectively decreased. If the attribute “direction” is “LEFT”, bits of binary representation moves toward the left side, which results the increase of its actual value. The input X is the tensor to be shifted and another input Y specifies the amounts of shifting. For example, if “direction” is “Right”, X is [1, 4], and S is [1, 1], the corresponding output Z would be [0, 2]. If “direction” is “LEFT” with X=[1, 2] and S=[1, 2], the corresponding output Y would be [2, 8].

Because this operator supports Numpy-style broadcasting, X’s and Y’s shapes are not necessarily identical.

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Attributes

  • direction (required): Direction of moving bits. It can be either “RIGHT” (for right shift) or “LEFT” (for left shift).

Inputs

  • X (heterogeneous) - T: First operand, input to be shifted.

  • Y (heterogeneous) - T: Second operand, amounts of shift.

Outputs

  • Z (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to integer tensors.

OnnxBitShift_11#

class mlprodict.npy.xop_auto_import_.OnnxBitShift_11(*args, **kwargs)#

Version

  • name: BitShift (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Bitwise shift operator performs element-wise operation. For each input element, if the

attribute “direction” is “RIGHT”, this operator moves its binary representation toward the right side so that the input value is effectively decreased. If the attribute “direction” is “LEFT”, bits of binary representation moves toward the left side, which results the increase of its actual value. The input X is the tensor to be shifted and another input Y specifies the amounts of shifting. For example, if “direction” is “Right”, X is [1, 4], and S is [1, 1], the corresponding output Z would be [0, 2]. If “direction” is “LEFT” with X=[1, 2] and S=[1, 2], the corresponding output Y would be [2, 8].

Because this operator supports Numpy-style broadcasting, X’s and Y’s shapes are not necessarily identical.

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Attributes

  • direction (required): Direction of moving bits. It can be either “RIGHT” (for right shift) or “LEFT” (for left shift).

Inputs

  • X (heterogeneous) - T: First operand, input to be shifted.

  • Y (heterogeneous) - T: Second operand, amounts of shift.

Outputs

  • Z (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to integer tensors.

OnnxBitwiseAnd#

class mlprodict.npy.xop_auto_import_.OnnxBitwiseAnd(*args, **kwargs)#

Version

  • name: BitwiseAnd (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Returns the tensor resulting from performing the bitwise and operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the bitwise operator.

  • B (heterogeneous) - T: Second input operand for the bitwise operator.

Outputs

  • C (heterogeneous) - T: Result tensor.

Type Constraints

  • T in ( tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to integer tensors.

OnnxBitwiseAnd_18#

class mlprodict.npy.xop_auto_import_.OnnxBitwiseAnd_18(*args, **kwargs)#

Version

  • name: BitwiseAnd (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Returns the tensor resulting from performing the bitwise and operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the bitwise operator.

  • B (heterogeneous) - T: Second input operand for the bitwise operator.

Outputs

  • C (heterogeneous) - T: Result tensor.

Type Constraints

  • T in ( tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to integer tensors.

OnnxBitwiseNot#

class mlprodict.npy.xop_auto_import_.OnnxBitwiseNot(*args, **kwargs)#

Version

  • name: BitwiseNot (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Returns the bitwise not of the input tensor element-wise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input/output to integer tensors.

OnnxBitwiseNot_18#

class mlprodict.npy.xop_auto_import_.OnnxBitwiseNot_18(*args, **kwargs)#

Version

  • name: BitwiseNot (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Returns the bitwise not of the input tensor element-wise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input/output to integer tensors.

OnnxBitwiseOr#

class mlprodict.npy.xop_auto_import_.OnnxBitwiseOr(*args, **kwargs)#

Version

  • name: BitwiseOr (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Returns the tensor resulting from performing the bitwise or operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the bitwise operator.

  • B (heterogeneous) - T: Second input operand for the bitwise operator.

Outputs

  • C (heterogeneous) - T: Result tensor.

Type Constraints

  • T in ( tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to integer tensors.

OnnxBitwiseOr_18#

class mlprodict.npy.xop_auto_import_.OnnxBitwiseOr_18(*args, **kwargs)#

Version

  • name: BitwiseOr (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Returns the tensor resulting from performing the bitwise or operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the bitwise operator.

  • B (heterogeneous) - T: Second input operand for the bitwise operator.

Outputs

  • C (heterogeneous) - T: Result tensor.

Type Constraints

  • T in ( tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to integer tensors.

OnnxBitwiseXor#

class mlprodict.npy.xop_auto_import_.OnnxBitwiseXor(*args, **kwargs)#

Version

  • name: BitwiseXor (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Returns the tensor resulting from performing the bitwise xor operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the bitwise operator.

  • B (heterogeneous) - T: Second input operand for the bitwise operator.

Outputs

  • C (heterogeneous) - T: Result tensor.

Type Constraints

  • T in ( tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to integer tensors.

OnnxBitwiseXor_18#

class mlprodict.npy.xop_auto_import_.OnnxBitwiseXor_18(*args, **kwargs)#

Version

  • name: BitwiseXor (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Returns the tensor resulting from performing the bitwise xor operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the bitwise operator.

  • B (heterogeneous) - T: Second input operand for the bitwise operator.

Outputs

  • C (heterogeneous) - T: Result tensor.

Type Constraints

  • T in ( tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to integer tensors.

OnnxBlackmanWindow#

class mlprodict.npy.xop_auto_import_.OnnxBlackmanWindow(*args, **kwargs)#

Version

  • name: BlackmanWindow (GitHub)

  • domain: main

  • since_version: 17

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Generates a Blackman window as described in the paper https://ieeexplore.ieee.org/document/1455106.

Attributes

  • output_datatype: The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T2. The default value is 1 = FLOAT. Default value is 1.

  • periodic: If 1, returns a window to be used as periodic function. If 0, return a symmetric window. When ‘periodic’ is specified, hann computes a window of length size + 1 and returns the first size points. The default value is 1. Default value is 1.

Inputs

  • size (heterogeneous) - T1: A scalar value indicating the length of the window.

Outputs

  • output (heterogeneous) - T2: A Blackman window with length: size. The output has the shape: [size].

Type Constraints

  • T1 in ( tensor(int32), tensor(int64) ): Constrain the input size to int64_t.

  • T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to numeric tensors.

OnnxBlackmanWindow_17#

class mlprodict.npy.xop_auto_import_.OnnxBlackmanWindow_17(*args, **kwargs)#

Version

  • name: BlackmanWindow (GitHub)

  • domain: main

  • since_version: 17

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Generates a Blackman window as described in the paper https://ieeexplore.ieee.org/document/1455106.

Attributes

  • output_datatype: The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T2. The default value is 1 = FLOAT. Default value is 1.

  • periodic: If 1, returns a window to be used as periodic function. If 0, return a symmetric window. When ‘periodic’ is specified, hann computes a window of length size + 1 and returns the first size points. The default value is 1. Default value is 1.

Inputs

  • size (heterogeneous) - T1: A scalar value indicating the length of the window.

Outputs

  • output (heterogeneous) - T2: A Blackman window with length: size. The output has the shape: [size].

Type Constraints

  • T1 in ( tensor(int32), tensor(int64) ): Constrain the input size to int64_t.

  • T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to numeric tensors.

OnnxCast#

class mlprodict.npy.xop_auto_import_.OnnxCast(*args, **kwargs)#

Version

  • name: Cast (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The operator casts the elements of a given input tensor to a data type specified by the ‘to’ argument and returns an output tensor of the same size in the converted type. The ‘to’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message.

Casting from string tensor in plain (e.g., “3.14” and “1000”) and scientific numeric representations (e.g., “1e-5” and “1E8”) to float types is supported. For example, converting string “100.5” to an integer may result 100. There are some string literals reserved for special floating-point values; “+INF” (and “INF”), “-INF”, and “NaN” are positive infinity, negative infinity, and not-a-number, respectively. Any string which can exactly match “+INF” in a case-insensitive way would be mapped to positive infinite. Similarly, this case-insensitive rule is applied to “INF” and “NaN”. When casting from numeric tensors to string tensors, plain floating-point representation (such as “314.15926”) would be used. Converting non-numerical-literal string such as “Hello World!” is an undefined behavior. Cases of converting string representing floating-point arithmetic value, such as “2.718”, to INT is an undefined behavior.

Conversion from a numerical type to any numerical type is always allowed. User must be aware of precision loss and value change caused by range difference between two types. For example, a 64-bit float 3.1415926459 may be round to a 32-bit float 3.141592. Similarly, converting an integer 36 to Boolean may produce 1 because we truncate bits which can’t be stored in the targeted type.

In more detail, the conversion among numerical types should follow these rules:

  • Casting from floating point to: * floating point: +/- infinity if OOR (out of range). * fixed point: undefined if OOR. * bool: +/- 0.0 to False; all else to True.

  • Casting from fixed point to: * floating point: +/- infinity if OOR. (+ infinity in the case of uint) * fixed point: when OOR, discard higher bits and reinterpret (with respect to two’s complement representation for

signed types). For example, 200 (int16) -> -56 (int8).
  • bool: zero to False; nonzero to True.

  • Casting from bool to: * floating point: {1.0, 0.0}. * fixed point: {1, 0}. * bool: no change.

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto

Inputs

  • input (heterogeneous) - T1: Input tensor to be cast.

Outputs

  • output (heterogeneous) - T2: Output tensor with the same shape as input with type specified by the ‘to’ argument

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types. Casting from complex is not supported.

  • T2 in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types. Casting to complex is not supported.

OnnxCastLike#

class mlprodict.npy.xop_auto_import_.OnnxCastLike(*args, **kwargs)#

Version

  • name: CastLike (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

The operator casts the elements of a given input tensor (the first input) to the same data type as the elements of the second input tensor. See documentation of the Cast operator for further details.

Inputs

  • input (heterogeneous) - T1: Input tensor to be cast.

  • target_type (heterogeneous) - T2: The (first) input tensor will be cast to produce a tensor of the same type as this (second input) tensor.

Outputs

  • output (heterogeneous) - T2: Output tensor produced by casting the first input tensor to have the same type as the second input tensor.

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types. Casting from complex is not supported.

  • T2 in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types. Casting to complex is not supported.

OnnxCastLike_15#

class mlprodict.npy.xop_auto_import_.OnnxCastLike_15(*args, **kwargs)#

Version

  • name: CastLike (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

The operator casts the elements of a given input tensor (the first input) to the same data type as the elements of the second input tensor. See documentation of the Cast operator for further details.

Inputs

  • input (heterogeneous) - T1: Input tensor to be cast.

  • target_type (heterogeneous) - T2: The (first) input tensor will be cast to produce a tensor of the same type as this (second input) tensor.

Outputs

  • output (heterogeneous) - T2: Output tensor produced by casting the first input tensor to have the same type as the second input tensor.

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types. Casting from complex is not supported.

  • T2 in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types. Casting to complex is not supported.

OnnxCast_1#

class mlprodict.npy.xop_auto_import_.OnnxCast_1(*args, **kwargs)#

Version

  • name: Cast (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

The operator casts the elements of a given input tensor to a data type specified by the ‘to’ argument and returns an output tensor of the same size in the converted type. The ‘to’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message. NOTE: Casting to and from strings is not supported yet.

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto

Inputs

  • input (heterogeneous) - T1: Input tensor to be cast.

Outputs

  • output (heterogeneous) - T2: Output tensor with the same shape as input with type specified by the ‘to’ argument

Type Constraints

  • T1 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types. Casting from strings and complex are not supported.

  • T2 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types. Casting to strings and complex are not supported.

OnnxCast_13#

class mlprodict.npy.xop_auto_import_.OnnxCast_13(*args, **kwargs)#

Version

  • name: Cast (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The operator casts the elements of a given input tensor to a data type specified by the ‘to’ argument and returns an output tensor of the same size in the converted type. The ‘to’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message.

Casting from string tensor in plain (e.g., “3.14” and “1000”) and scientific numeric representations (e.g., “1e-5” and “1E8”) to float types is supported. For example, converting string “100.5” to an integer may result 100. There are some string literals reserved for special floating-point values; “+INF” (and “INF”), “-INF”, and “NaN” are positive infinity, negative infinity, and not-a-number, respectively. Any string which can exactly match “+INF” in a case-insensitive way would be mapped to positive infinite. Similarly, this case-insensitive rule is applied to “INF” and “NaN”. When casting from numeric tensors to string tensors, plain floating-point representation (such as “314.15926”) would be used. Converting non-numerical-literal string such as “Hello World!” is an undefined behavior. Cases of converting string representing floating-point arithmetic value, such as “2.718”, to INT is an undefined behavior.

Conversion from a numerical type to any numerical type is always allowed. User must be aware of precision loss and value change caused by range difference between two types. For example, a 64-bit float 3.1415926459 may be round to a 32-bit float 3.141592. Similarly, converting an integer 36 to Boolean may produce 1 because we truncate bits which can’t be stored in the targeted type.

In more detail, the conversion among numerical types should follow these rules:

  • Casting from floating point to: * floating point: +/- infinity if OOR (out of range). * fixed point: undefined if OOR. * bool: +/- 0.0 to False; all else to True.

  • Casting from fixed point to: * floating point: +/- infinity if OOR. (+ infinity in the case of uint) * fixed point: when OOR, discard higher bits and reinterpret (with respect to two’s complement representation for

signed types). For example, 200 (int16) -> -56 (int8).
  • bool: zero to False; nonzero to True.

  • Casting from bool to: * floating point: {1.0, 0.0}. * fixed point: {1, 0}. * bool: no change.

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto

Inputs

  • input (heterogeneous) - T1: Input tensor to be cast.

Outputs

  • output (heterogeneous) - T2: Output tensor with the same shape as input with type specified by the ‘to’ argument

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types. Casting from complex is not supported.

  • T2 in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types. Casting to complex is not supported.

OnnxCast_6#

class mlprodict.npy.xop_auto_import_.OnnxCast_6(*args, **kwargs)#

Version

  • name: Cast (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

The operator casts the elements of a given input tensor to a data type specified by the ‘to’ argument and returns an output tensor of the same size in the converted type. The ‘to’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message. NOTE: Casting to and from strings is not supported yet.

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto

Inputs

  • input (heterogeneous) - T1: Input tensor to be cast.

Outputs

  • output (heterogeneous) - T2: Output tensor with the same shape as input with type specified by the ‘to’ argument

Type Constraints

  • T1 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types. Casting from strings and complex are not supported.

  • T2 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types. Casting to strings and complex are not supported.

OnnxCast_9#

class mlprodict.npy.xop_auto_import_.OnnxCast_9(*args, **kwargs)#

Version

  • name: Cast (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

The operator casts the elements of a given input tensor to a data type specified by the ‘to’ argument and returns an output tensor of the same size in the converted type. The ‘to’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message.

Casting from string tensor in plain (e.g., “3.14” and “1000”) and scientific numeric representations (e.g., “1e-5” and “1E8”) to float types is supported. For example, converting string “100.5” to an integer may result 100. There are some string literals reserved for special floating-point values; “+INF” (and “INF”), “-INF”, and “NaN” are positive infinity, negative infinity, and not-a-number, respectively. Any string which can exactly match “+INF” in a case-insensitive way would be mapped to positive infinite. Similarly, this case-insensitive rule is applied to “INF” and “NaN”. When casting from numeric tensors to string tensors, plain floating-point representation (such as “314.15926”) would be used. Converting non-numerical-literal string such as “Hello World!” is an undefined behavior. Cases of converting string representing floating-point arithmetic value, such as “2.718”, to INT is an undefined behavior.

Conversion from a numerical type to any numerical type is always allowed. User must be aware of precision loss and value change caused by range difference between two types. For example, a 64-bit float 3.1415926459 may be round to a 32-bit float 3.141592. Similarly, converting an integer 36 to Boolean may produce 1 because we truncate bits which can’t be stored in the targeted type.

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto

Inputs

  • input (heterogeneous) - T1: Input tensor to be cast.

Outputs

  • output (heterogeneous) - T2: Output tensor with the same shape as input with type specified by the ‘to’ argument

Type Constraints

  • T1 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types. Casting from complex is not supported.

  • T2 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types. Casting to complex is not supported.

OnnxCeil#

class mlprodict.npy.xop_auto_import_.OnnxCeil(*args, **kwargs)#

Version

  • name: Ceil (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Ceil takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the ceil is, y = ceil(x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxCeil_1#

class mlprodict.npy.xop_auto_import_.OnnxCeil_1(*args, **kwargs)#

Version

  • name: Ceil (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Ceil takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the ceil is, y = ceil(x), is applied to the tensor elementwise.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxCeil_13#

class mlprodict.npy.xop_auto_import_.OnnxCeil_13(*args, **kwargs)#

Version

  • name: Ceil (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Ceil takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the ceil is, y = ceil(x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxCeil_6#

class mlprodict.npy.xop_auto_import_.OnnxCeil_6(*args, **kwargs)#

Version

  • name: Ceil (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Ceil takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the ceil is, y = ceil(x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxCelu#

class mlprodict.npy.xop_auto_import_.OnnxCelu(*args, **kwargs)#

Version

  • name: Celu (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Continuously Differentiable Exponential Linear Units: Perform the linear unit element-wise on the input tensor X using formula:

max(0,x) + min(0,alpha*(exp(x/alpha)-1))

Attributes

  • alpha: The Alpha value in Celu formula which control the shape of the unit. The default value is 1.0. Default value is 1.0.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(float) ): Constrain input and output types to float32 tensors.

OnnxCelu_12#

class mlprodict.npy.xop_auto_import_.OnnxCelu_12(*args, **kwargs)#

Version

  • name: Celu (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Continuously Differentiable Exponential Linear Units: Perform the linear unit element-wise on the input tensor X using formula:

max(0,x) + min(0,alpha*(exp(x/alpha)-1))

Attributes

  • alpha: The Alpha value in Celu formula which control the shape of the unit. The default value is 1.0. Default value is 1.0.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(float) ): Constrain input and output types to float32 tensors.

OnnxCenterCropPad#

class mlprodict.npy.xop_auto_import_.OnnxCenterCropPad(*args, **kwargs)#

Version

  • name: CenterCropPad (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Center crop or pad an input to given dimensions.

The crop/pad dimensions can be specified for a subset of the axes. Non-specified dimensions will not be cropped or padded.

If the input dimensions are bigger than the crop shape, a centered cropping window is extracted from the input. If the input dimensions are smaller than the crop shape, the input is padded on each side equally, so that the input is centered in the output.

Attributes

  • axes: If provided, it specifies a subset of axes that ‘shape’ refer to. If not provided, all axes are assumed [0, 1, …, r-1], where r = rank(data). Negative value means counting dimensions from the back. Accepted range is [-r, r-1], where r = rank(data). Behavior is undefined if an axis is repeated.

Inputs

  • input_data (heterogeneous) - T: Input to extract the centered crop from.

  • shape (heterogeneous) - Tind: 1-D tensor representing the cropping window dimensions.

Outputs

  • output_data (heterogeneous) - T: Output data.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxCenterCropPad_18#

class mlprodict.npy.xop_auto_import_.OnnxCenterCropPad_18(*args, **kwargs)#

Version

  • name: CenterCropPad (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Center crop or pad an input to given dimensions.

The crop/pad dimensions can be specified for a subset of the axes. Non-specified dimensions will not be cropped or padded.

If the input dimensions are bigger than the crop shape, a centered cropping window is extracted from the input. If the input dimensions are smaller than the crop shape, the input is padded on each side equally, so that the input is centered in the output.

Attributes

  • axes: If provided, it specifies a subset of axes that ‘shape’ refer to. If not provided, all axes are assumed [0, 1, …, r-1], where r = rank(data). Negative value means counting dimensions from the back. Accepted range is [-r, r-1], where r = rank(data). Behavior is undefined if an axis is repeated.

Inputs

  • input_data (heterogeneous) - T: Input to extract the centered crop from.

  • shape (heterogeneous) - Tind: 1-D tensor representing the cropping window dimensions.

Outputs

  • output_data (heterogeneous) - T: Output data.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxClip#

class mlprodict.npy.xop_auto_import_.OnnxClip(*args, **kwargs)#

Version

  • name: Clip (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Clip operator limits the given input within an interval. The interval is specified by the inputs ‘min’ and ‘max’. They default to numeric_limits::lowest() and numeric_limits::max(), respectively.

Inputs

Between 1 and 3 inputs.

  • input (heterogeneous) - T: Input tensor whose elements to be clipped

  • min (optional, heterogeneous) - T: Minimum value, under which element is replaced by min. It must be a scalar(tensor of empty shape).

  • max (optional, heterogeneous) - T: Maximum value, above which element is replaced by max. It must be a scalar(tensor of empty shape).

Outputs

  • output (heterogeneous) - T: Output tensor with clipped input elements

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxClip_1#

class mlprodict.npy.xop_auto_import_.OnnxClip_1(*args, **kwargs)#

Version

  • name: Clip (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Clip operator limits the given input within an interval. The interval is specified with arguments ‘min’ and ‘max’. They default to numeric_limits::lowest() and numeric_limits::max() respectively.

Attributes

  • consumed_inputs: legacy optimization attribute.

  • max: Maximum value, above which element is replaced by max

  • min: Minimum value, under which element is replaced by min

Inputs

  • input (heterogeneous) - T: Input tensor whose elements to be clipped

Outputs

  • output (heterogeneous) - T: Output tensor with clipped input elements

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxClip_11#

class mlprodict.npy.xop_auto_import_.OnnxClip_11(*args, **kwargs)#

Version

  • name: Clip (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Clip operator limits the given input within an interval. The interval is specified by the inputs ‘min’ and ‘max’. They default to numeric_limits::lowest() and numeric_limits::max(), respectively.

Inputs

Between 1 and 3 inputs.

  • input (heterogeneous) - T: Input tensor whose elements to be clipped

  • min (optional, heterogeneous) - T: Minimum value, under which element is replaced by min. It must be a scalar(tensor of empty shape).

  • max (optional, heterogeneous) - T: Maximum value, above which element is replaced by max. It must be a scalar(tensor of empty shape).

Outputs

  • output (heterogeneous) - T: Output tensor with clipped input elements

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxClip_12#

class mlprodict.npy.xop_auto_import_.OnnxClip_12(*args, **kwargs)#

Version

  • name: Clip (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Clip operator limits the given input within an interval. The interval is specified by the inputs ‘min’ and ‘max’. They default to numeric_limits::lowest() and numeric_limits::max(), respectively.

Inputs

Between 1 and 3 inputs.

  • input (heterogeneous) - T: Input tensor whose elements to be clipped

  • min (optional, heterogeneous) - T: Minimum value, under which element is replaced by min. It must be a scalar(tensor of empty shape).

  • max (optional, heterogeneous) - T: Maximum value, above which element is replaced by max. It must be a scalar(tensor of empty shape).

Outputs

  • output (heterogeneous) - T: Output tensor with clipped input elements

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxClip_13#

class mlprodict.npy.xop_auto_import_.OnnxClip_13(*args, **kwargs)#

Version

  • name: Clip (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Clip operator limits the given input within an interval. The interval is specified by the inputs ‘min’ and ‘max’. They default to numeric_limits::lowest() and numeric_limits::max(), respectively.

Inputs

Between 1 and 3 inputs.

  • input (heterogeneous) - T: Input tensor whose elements to be clipped

  • min (optional, heterogeneous) - T: Minimum value, under which element is replaced by min. It must be a scalar(tensor of empty shape).

  • max (optional, heterogeneous) - T: Maximum value, above which element is replaced by max. It must be a scalar(tensor of empty shape).

Outputs

  • output (heterogeneous) - T: Output tensor with clipped input elements

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxClip_6#

class mlprodict.npy.xop_auto_import_.OnnxClip_6(*args, **kwargs)#

Version

  • name: Clip (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Clip operator limits the given input within an interval. The interval is specified with arguments ‘min’ and ‘max’. They default to numeric_limits::lowest() and numeric_limits::max() respectively.

Attributes

  • max: Maximum value, above which element is replaced by max Default value is 3.4028234663852886e+38.

  • min: Minimum value, under which element is replaced by min Default value is -3.4028234663852886e+38.

Inputs

  • input (heterogeneous) - T: Input tensor whose elements to be clipped

Outputs

  • output (heterogeneous) - T: Output tensor with clipped input elements

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxCol2Im#

class mlprodict.npy.xop_auto_import_.OnnxCol2Im(*args, **kwargs)#

Version

  • name: Col2Im (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

The operator rearranges column blocks back into a multidimensional image

Col2Im behaves similarly to PyTorch’s fold https://pytorch.org/docs/stable/generated/torch.nn.Fold.html, but it only supports batched multi-dimensional image tensors. Another implementation in Python with N-dimension support can be found at f-dangel/unfoldNd.

NOTE: Although specifying image_shape looks redundant because it could be calculated from

convolution formulas, it is required as input for more advanced scenarios as explained at PyTorch’s implementation (pytorch/pytorch)

Attributes

  • dilations: 1-dimensional tensor with dilation value along each spatial axis of the image. If not present, the dilation defaults to 1 along each spatial axis of the image.

  • pads: 1-dimensional tensor with padding value for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin is the number of pixels added at the beginning of axis i and xi_end is the number of pixels added at the end of axis i. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: 1-dimensional tensor with stride value along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

  • input (heterogeneous) - T: Input data tensor to be rearranged from column blocks back into an image. This is a 3-dimensional tensor containing [N, C * n-ary- product(block_shape), L], where N is batch dimension, C is image channel dimension and L is number of blocks.The blocks are enumerated in increasing lexicographic-order of their indices.For example, with an image-size 10*20 and block-size 9*18, there would be 2*3 blocks, enumerated in the order block(0, 0), block(0, 1), block(0, 2), block(1, 0), block(1, 1), block(1, 2).

  • image_shape (heterogeneous) - tensor(int64): The shape of the spatial dimensions of the image after rearranging the column blocks.This is a 1-dimensional tensor with size of at least 2, containing the value [H_img, W_img] for a 2-D image or [dim_i1, dim_i2, …, dim_iN] for a N-D image.

  • block_shape (heterogeneous) - tensor(int64): The shape of the block to apply on the input.This is a 1-dimensional tensor of size of at least 2, containing the value [H_block, W_block] for a 2-D image or [dim_b1, dim_b2, …, dim_bN] for a N-D block.This is the block-shape before dilation is applied to it.

Outputs

  • output (heterogeneous) - T: Output tensor produced by rearranging blocks into an image.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensor types.

OnnxCol2Im_18#

class mlprodict.npy.xop_auto_import_.OnnxCol2Im_18(*args, **kwargs)#

Version

  • name: Col2Im (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

The operator rearranges column blocks back into a multidimensional image

Col2Im behaves similarly to PyTorch’s fold https://pytorch.org/docs/stable/generated/torch.nn.Fold.html, but it only supports batched multi-dimensional image tensors. Another implementation in Python with N-dimension support can be found at f-dangel/unfoldNd.

NOTE: Although specifying image_shape looks redundant because it could be calculated from

convolution formulas, it is required as input for more advanced scenarios as explained at PyTorch’s implementation (pytorch/pytorch)

Attributes

  • dilations: 1-dimensional tensor with dilation value along each spatial axis of the image. If not present, the dilation defaults to 1 along each spatial axis of the image.

  • pads: 1-dimensional tensor with padding value for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin is the number of pixels added at the beginning of axis i and xi_end is the number of pixels added at the end of axis i. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: 1-dimensional tensor with stride value along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

  • input (heterogeneous) - T: Input data tensor to be rearranged from column blocks back into an image. This is a 3-dimensional tensor containing [N, C * n-ary- product(block_shape), L], where N is batch dimension, C is image channel dimension and L is number of blocks.The blocks are enumerated in increasing lexicographic-order of their indices.For example, with an image-size 10*20 and block-size 9*18, there would be 2*3 blocks, enumerated in the order block(0, 0), block(0, 1), block(0, 2), block(1, 0), block(1, 1), block(1, 2).

  • image_shape (heterogeneous) - tensor(int64): The shape of the spatial dimensions of the image after rearranging the column blocks.This is a 1-dimensional tensor with size of at least 2, containing the value [H_img, W_img] for a 2-D image or [dim_i1, dim_i2, …, dim_iN] for a N-D image.

  • block_shape (heterogeneous) - tensor(int64): The shape of the block to apply on the input.This is a 1-dimensional tensor of size of at least 2, containing the value [H_block, W_block] for a 2-D image or [dim_b1, dim_b2, …, dim_bN] for a N-D block.This is the block-shape before dilation is applied to it.

Outputs

  • output (heterogeneous) - T: Output tensor produced by rearranging blocks into an image.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensor types.

OnnxComMicrosoftAdamOptimizer#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAdamOptimizer(*args, **kwargs)#

Version

  • name: AdamOptimizer (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • alpha: Coefficient of previous gradient in running average. Default value is ?.

  • beta: Coefficient of previous squared gradient in running average.The effective learning rate is computed by r = R / (1 + T * decay_factor). Default to 0 so that increasing update counts doesn’t reduce the learning rate. Default value is ?.

  • do_bias_correction: Compute unbiased 1st and 2nd momentums. Default value is ?.

  • epsilon: Small scalar to avoid dividing by zero. Default value is ?.

  • lambda: Regularization coefficient of 0.5 * lambda * ||X||_2^2. Default to 0, which means no regularization. Default value is ?.

  • max_norm_clip: clip threshold of gradients. Default value is ?.

  • weight_decay_mode: Modes for applying weight decay, 0 means applying decay before weight update, 1 means applying decay after weight update. Default value is ?.

Inputs

Between 6 and 10 inputs.

  • R (heterogeneous) - T1: The initial learning rate.

  • T (heterogeneous) - T2: The update count of “X”. It should be a scalar.

  • weights (heterogeneous) - T3: weights to optimize.

  • gradients (heterogeneous) - T_GRAD: gradients computed in this iteration.

  • moment_1 (heterogeneous) - T4: exponentially averaged historical gradients.

  • moment_2 (heterogeneous) - T4: exponentially averaged historical squared gradients.

  • mixed_precision_weights (optional, heterogeneous) - T_MIXED_PRECISION_FP: FP16 or BFloat16 weights to optimize.

  • loss_scale (optional, heterogeneous) - T3: loss scale for mixed precision training

  • global_gradient_norm (optional, heterogeneous) - T_GRAD_NORM: Global gradient norm.

  • update_signal (optional, heterogeneous) - T_BOOL: This signal indicates if weight tensors should be updated.

Outputs

Between 3 and 6 outputs.

  • new_T (heterogeneous) - T2: New update count.

  • new_moment_1 (heterogeneous) - T4: New averaged gradients.

  • new_moment_2 (heterogeneous) - T4: New averaged squared gradients.

  • new_weights (optional, heterogeneous) - T3: New weights.

  • new_gradients (optional, heterogeneous) - T_GRAD: New gradients.

  • new_mixed_precision_weights (optional, heterogeneous) - T_MIXED_PRECISION_FP: New FP16 or BFloat16 weights

OnnxComMicrosoftAdamOptimizer_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAdamOptimizer_1(*args, **kwargs)#

Version

  • name: AdamOptimizer (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • alpha: Coefficient of previous gradient in running average. Default value is ?.

  • beta: Coefficient of previous squared gradient in running average.The effective learning rate is computed by r = R / (1 + T * decay_factor). Default to 0 so that increasing update counts doesn’t reduce the learning rate. Default value is ?.

  • do_bias_correction: Compute unbiased 1st and 2nd momentums. Default value is ?.

  • epsilon: Small scalar to avoid dividing by zero. Default value is ?.

  • lambda: Regularization coefficient of 0.5 * lambda * ||X||_2^2. Default to 0, which means no regularization. Default value is ?.

  • max_norm_clip: clip threshold of gradients. Default value is ?.

  • weight_decay_mode: Modes for applying weight decay, 0 means applying decay before weight update, 1 means applying decay after weight update. Default value is ?.

Inputs

Between 6 and 10 inputs.

  • R (heterogeneous) - T1: The initial learning rate.

  • T (heterogeneous) - T2: The update count of “X”. It should be a scalar.

  • weights (heterogeneous) - T3: weights to optimize.

  • gradients (heterogeneous) - T_GRAD: gradients computed in this iteration.

  • moment_1 (heterogeneous) - T4: exponentially averaged historical gradients.

  • moment_2 (heterogeneous) - T4: exponentially averaged historical squared gradients.

  • mixed_precision_weights (optional, heterogeneous) - T_MIXED_PRECISION_FP: FP16 or BFloat16 weights to optimize.

  • loss_scale (optional, heterogeneous) - T3: loss scale for mixed precision training

  • global_gradient_norm (optional, heterogeneous) - T_GRAD_NORM: Global gradient norm.

  • update_signal (optional, heterogeneous) - T_BOOL: This signal indicates if weight tensors should be updated.

Outputs

Between 3 and 6 outputs.

  • new_T (heterogeneous) - T2: New update count.

  • new_moment_1 (heterogeneous) - T4: New averaged gradients.

  • new_moment_2 (heterogeneous) - T4: New averaged squared gradients.

  • new_weights (optional, heterogeneous) - T3: New weights.

  • new_gradients (optional, heterogeneous) - T_GRAD: New gradients.

  • new_mixed_precision_weights (optional, heterogeneous) - T_MIXED_PRECISION_FP: New FP16 or BFloat16 weights

OnnxComMicrosoftAdamWOptimizer#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAdamWOptimizer(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • adam_mode: Modes for applying bias correction and weight decay (default 0) 0 : Weight decay is applied before weight is updated. Computation aligned with Torch AdamW. In this mode, correct_bias should be 1 to keep aligned with PyTorch.1 : Weight decay is applied after weight is updated. Computation is aligned with Huggingface AdamW. Default value is ?.

  • alpha: Coefficient of previously accumulated gradient in running average. Default value is ?.

  • beta: Coefficient of previously accumulated squared-gradient in running average. Default value is ?.

  • correct_bias: Whether or not to correct bias, enabled by default. Default value is ?.

  • epsilon: Small scalar to avoid dividing by zero. Default value is ?.

  • weight_decay: weight decay coefficient. Default value is ?.

Inputs

Between 6 and 7 inputs.

  • lr (heterogeneous) - T1: The learning rate.

  • step (heterogeneous) - T2: The update count of weights. It should be a scalar.

  • weights (heterogeneous) - S_WEIGHT: Sequence of weights to optimize.

  • gradients (heterogeneous) - S_GRAD: Sequence of gradients computed in this iteration.

  • momentums_1 (heterogeneous) - S_MOMENT: Sequence of exponentially averaged historical gradients.

  • momentums_2 (heterogeneous) - S_MOMENT: Sequence of exponentially averaged historical squared gradients.

  • update_signal (optional, heterogeneous) - T_BOOL: This signal indicates if weight updates are skipped, applicable to gradient infinity check in mixed precision training.

Outputs

Between 1 and 4 outputs.

  • updated_flag (heterogeneous) - T2: Whether gradient is applied or not.

  • updated_weights (optional, heterogeneous) - S_WEIGHT: Sequence of weights after optimize.

  • updated_momentums_1 (optional, heterogeneous) - S_MOMENT: Sequence of momentum_1 after optimize.

  • updated_momentums_2 (optional, heterogeneous) - S_MOMENT: Sequence of momentum_2 after optimize.

OnnxComMicrosoftAdamWOptimizer_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAdamWOptimizer_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • adam_mode: Modes for applying bias correction and weight decay (default 0) 0 : Weight decay is applied before weight is updated. Computation aligned with Torch AdamW. In this mode, correct_bias should be 1 to keep aligned with PyTorch.1 : Weight decay is applied after weight is updated. Computation is aligned with Huggingface AdamW. Default value is ?.

  • alpha: Coefficient of previously accumulated gradient in running average. Default value is ?.

  • beta: Coefficient of previously accumulated squared-gradient in running average. Default value is ?.

  • correct_bias: Whether or not to correct bias, enabled by default. Default value is ?.

  • epsilon: Small scalar to avoid dividing by zero. Default value is ?.

  • weight_decay: weight decay coefficient. Default value is ?.

Inputs

Between 6 and 7 inputs.

  • lr (heterogeneous) - T1: The learning rate.

  • step (heterogeneous) - T2: The update count of weights. It should be a scalar.

  • weights (heterogeneous) - S_WEIGHT: Sequence of weights to optimize.

  • gradients (heterogeneous) - S_GRAD: Sequence of gradients computed in this iteration.

  • momentums_1 (heterogeneous) - S_MOMENT: Sequence of exponentially averaged historical gradients.

  • momentums_2 (heterogeneous) - S_MOMENT: Sequence of exponentially averaged historical squared gradients.

  • update_signal (optional, heterogeneous) - T_BOOL: This signal indicates if weight updates are skipped, applicable to gradient infinity check in mixed precision training.

Outputs

Between 1 and 4 outputs.

  • updated_flag (heterogeneous) - T2: Whether gradient is applied or not.

  • updated_weights (optional, heterogeneous) - S_WEIGHT: Sequence of weights after optimize.

  • updated_momentums_1 (optional, heterogeneous) - S_MOMENT: Sequence of momentum_1 after optimize.

  • updated_momentums_2 (optional, heterogeneous) - S_MOMENT: Sequence of momentum_2 after optimize.

OnnxComMicrosoftAdasumAllReduce#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAdasumAllReduce(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • reduce_algo: Algorithms for Adasum. Valid values are: CpuReduction(1) or GpuHierarchicalReduction(2) Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - T: tensors to be reduced

Outputs

Between 1 and 2147483647 outputs.

  • output (variadic, heterogeneous) - T: reduced tensors

OnnxComMicrosoftAdasumAllReduce_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAdasumAllReduce_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • reduce_algo: Algorithms for Adasum. Valid values are: CpuReduction(1) or GpuHierarchicalReduction(2) Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - T: tensors to be reduced

Outputs

Between 1 and 2147483647 outputs.

  • output (variadic, heterogeneous) - T: reduced tensors

OnnxComMicrosoftAll#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAll(*args, **kwargs)#

Version

  • name: All (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Return true if all elements are true and false otherwise.

Inputs

  • X (heterogeneous) - T: input

Outputs

  • Y (heterogeneous) - T: output.

OnnxComMicrosoftAll_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAll_1(*args, **kwargs)#

Version

  • name: All (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Return true if all elements are true and false otherwise.

Inputs

  • X (heterogeneous) - T: input

Outputs

  • Y (heterogeneous) - T: output.

OnnxComMicrosoftAttention#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAttention(*args, **kwargs)#

Version

  • name: Attention (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Multi-Head Self Attention that can be either unidirectional (like GPT-2) or bidirectional (like BERT). The mask_index input is optional. Besides raw attention mask with shape (batch_size, past_sequence_length + sequence_length) or (batch_size, sequence_length, past_sequence_length + sequence_length) with value 0 for masked and 1 otherwise, we also support other two formats: When input has right-side padding, mask_index is one dimension with shape (batch_size), where value of each element is the end position, or valid length of actual sequence excluding padding. When input has left-side padding, mask_index has shape (2 * batch_size), where the values are the exclusive end positions followed by the inclusive start positions. When unidirectional is 1, and each token only attend to previous tokens. For GPT-2, both past and present state are optional. Present state could appear in output even when past state is not in input.

Attributes

  • num_heads (required): Number of attention heads Default value is ?.

  • qkv_hidden_sizes: Hidden layer sizes of Q, K, V paths in Attention Default value is ?.

  • unidirectional: Whether every token can only attend to previous tokens. Default value is 0. Default value is ?.

Inputs

Between 3 and 6 inputs.

  • input (heterogeneous) - T: 3D input tensor with shape (batch_size, sequence_length, input_hidden_size)

  • weight (heterogeneous) - T: 2D input tensor with shape (input_hidden_size, 3 * hidden_size), where hidden_size = num_heads * head_size

  • bias (heterogeneous) - T: 1D input tensor with shape (3 * hidden_size)

  • mask_index (optional, heterogeneous) - M: Attention mask with shape (batch_size, 1, max_sequence_length, max_sequence_length), (batch_size, past_sequence_length + sequence_length)or (batch_size, sequence_length, past_sequence_length + sequence_length), or index with shape (batch_size) or (2 * batch_size).

  • past (optional, heterogeneous) - T: past state for key and value with shape (2, batch_size, num_heads, past_sequence_length, head_size).

  • extra_add (optional, heterogeneous) - T: additional add to QxK’ with shape (batch_size, num_heads, sequence_length, sequence_length).

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: 3D output tensor with shape (batch_size, sequence_length, hidden_size)

  • present (optional, heterogeneous) - T: present state for key and value with shape (2, batch_size, num_heads, past_sequence_length + sequence_length, head_size)

OnnxComMicrosoftAttention_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAttention_1(*args, **kwargs)#

Version

  • name: Attention (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Multi-Head Self Attention that can be either unidirectional (like GPT-2) or bidirectional (like BERT). The mask_index input is optional. Besides raw attention mask with shape (batch_size, past_sequence_length + sequence_length) or (batch_size, sequence_length, past_sequence_length + sequence_length) with value 0 for masked and 1 otherwise, we also support other two formats: When input has right-side padding, mask_index is one dimension with shape (batch_size), where value of each element is the end position, or valid length of actual sequence excluding padding. When input has left-side padding, mask_index has shape (2 * batch_size), where the values are the exclusive end positions followed by the inclusive start positions. When unidirectional is 1, and each token only attend to previous tokens. For GPT-2, both past and present state are optional. Present state could appear in output even when past state is not in input.

Attributes

  • num_heads (required): Number of attention heads Default value is ?.

  • qkv_hidden_sizes: Hidden layer sizes of Q, K, V paths in Attention Default value is ?.

  • unidirectional: Whether every token can only attend to previous tokens. Default value is 0. Default value is ?.

Inputs

Between 3 and 6 inputs.

  • input (heterogeneous) - T: 3D input tensor with shape (batch_size, sequence_length, input_hidden_size)

  • weight (heterogeneous) - T: 2D input tensor with shape (input_hidden_size, 3 * hidden_size), where hidden_size = num_heads * head_size

  • bias (heterogeneous) - T: 1D input tensor with shape (3 * hidden_size)

  • mask_index (optional, heterogeneous) - M: Attention mask with shape (batch_size, 1, max_sequence_length, max_sequence_length), (batch_size, past_sequence_length + sequence_length)or (batch_size, sequence_length, past_sequence_length + sequence_length), or index with shape (batch_size) or (2 * batch_size).

  • past (optional, heterogeneous) - T: past state for key and value with shape (2, batch_size, num_heads, past_sequence_length, head_size).

  • extra_add (optional, heterogeneous) - T: additional add to QxK’ with shape (batch_size, num_heads, sequence_length, sequence_length).

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: 3D output tensor with shape (batch_size, sequence_length, hidden_size)

  • present (optional, heterogeneous) - T: present state for key and value with shape (2, batch_size, num_heads, past_sequence_length + sequence_length, head_size)

OnnxComMicrosoftAttnLSTM#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAttnLSTM(*args, **kwargs)#

Version

  • name: AttnLSTM (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Computes an one-layer RNN where its RNN Cell is an AttentionWrapper wrapped a LSTM Cell. The RNN layer contains following basic component: LSTM Cell, Bahdanau Attention Mechanism, AttentionWrapp.

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Softmax(x) - exp(x) / sum(exp(x))

Bahdanau Attention Mechanism:

M - Memory tensor.

VALUES - masked Memory by its real sequence length.

MW - Memory layer weight.

KEYS - Processed memory tensor by the memory layer.

KEYS = M * MW

Query - Query tensor, normally at specific time step in sequence.

QW - Query layer weight in the attention mechanism

PQ - processed query, = Query * QW

`V’ - attention vector

ALIGN - calculated alignment based on Query and KEYS

ALIGN = softmax(reduce_sum(V * Tanh(KEYS + PQ)))

CONTEXT - context based on ALIGN and VALUES

CONTEXT = ALIGN * VALUES

LSTM Cell:

X - input tensor concat with attention state in the attention wrapper

i - input gate

o - output gate

f - forget gate

c - cell gate

t - time step (t-1 means previous time step)

W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

Wb[iofc] - W bias vectors for input, output, forget, and cell gates

Rb[iofc] - R bias vectors for input, output, forget, and cell gates

P[iof] - P peephole weight vector for input, output, and forget gates

WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

PB[iof] - P peephole weight vector for backward input, output, and forget gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

  • it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)

  • ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)

  • ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)

  • Ct = ft (.) Ct-1 + it (.) ct

  • ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)

  • Ht = ot (.) h(Ct)

AttentionWrapp Notations:
`lstm()’ - wrapped inner cell.

Ht, Ct = lstm(concat(Xt, ATTNt-1), Ct-1)

am() - attention mechanism the wrapper used.

CONTEXTt, ALIGNt = am(Ht, ALIGNt-1)

AW - attention layer weights, optional.

ATTN - attention state, initial is zero. If AW provided, it is the output of the attention layer,

ATTNt = concat(Ht, CONTEXTt) * AW

otherwise,

ATTNt = CONTEXTt

RNN layer output:

Y - if needed is the sequence of Ht from lstm cell.

Y_h - is the last valid H from lstm cell.

Y_c - is the last valid C from lstm cell.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01. Default value is ?.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators. Default value is ?.

  • activations: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified. Default value is ?.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified. Default value is ?.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is ?.

  • hidden_size: Number of neurons in the hidden layer. Default value is ?.

  • input_forget: Couple the input and forget gates if 1, default 0. Default value is ?.

Inputs

Between 3 and 14 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size]

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size]

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (optional, heterogeneous) - T: Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (optional, heterogeneous) - T: The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

  • QW (optional, heterogeneous) - T: The weight tensor of the query layer in the attention mechanism. Should be of shape [num_directions, am_query_depth(hidden_size of lstm), am_attn_size]

  • MW (optional, heterogeneous) - T: The weight tensor of the memory layer in the attention mechanism. Should be of shape [num_directions, memory_depth, am_attn_size]

  • V (optional, heterogeneous) - T: The attention_v tensor in the attention mechanism. Should be of shape [num_directions, am_attn_size]

  • M (optional, heterogeneous) - T: The sequence of the memory (input) for attention mechanism. Should be of [batch_size, max_memory_step, memory_depth]

  • memory_seq_lens (optional, heterogeneous) - T1: The sequence length of the input memory for the attention mechanism. Should be of [batch_size]

  • AW (optional, heterogeneous) - T: The weights of attention layer in the attention wrapper. If exists, should be of shape [num_directions, memory_depth+hidden_size, aw_attn_size]. Please note that attention mechanism context depth is also memory_depth in the attention mechanism.

Outputs

Between 0 and 3 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size]

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (optional, heterogeneous) - T: The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

OnnxComMicrosoftAttnLSTM_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftAttnLSTM_1(*args, **kwargs)#

Version

  • name: AttnLSTM (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Computes an one-layer RNN where its RNN Cell is an AttentionWrapper wrapped a LSTM Cell. The RNN layer contains following basic component: LSTM Cell, Bahdanau Attention Mechanism, AttentionWrapp.

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Softmax(x) - exp(x) / sum(exp(x))

Bahdanau Attention Mechanism:

M - Memory tensor.

VALUES - masked Memory by its real sequence length.

MW - Memory layer weight.

KEYS - Processed memory tensor by the memory layer.

KEYS = M * MW

Query - Query tensor, normally at specific time step in sequence.

QW - Query layer weight in the attention mechanism

PQ - processed query, = Query * QW

`V’ - attention vector

ALIGN - calculated alignment based on Query and KEYS

ALIGN = softmax(reduce_sum(V * Tanh(KEYS + PQ)))

CONTEXT - context based on ALIGN and VALUES

CONTEXT = ALIGN * VALUES

LSTM Cell:

X - input tensor concat with attention state in the attention wrapper

i - input gate

o - output gate

f - forget gate

c - cell gate

t - time step (t-1 means previous time step)

W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

Wb[iofc] - W bias vectors for input, output, forget, and cell gates

Rb[iofc] - R bias vectors for input, output, forget, and cell gates

P[iof] - P peephole weight vector for input, output, and forget gates

WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

PB[iof] - P peephole weight vector for backward input, output, and forget gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

  • it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)

  • ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)

  • ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)

  • Ct = ft (.) Ct-1 + it (.) ct

  • ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)

  • Ht = ot (.) h(Ct)

AttentionWrapp Notations:
`lstm()’ - wrapped inner cell.

Ht, Ct = lstm(concat(Xt, ATTNt-1), Ct-1)

am() - attention mechanism the wrapper used.

CONTEXTt, ALIGNt = am(Ht, ALIGNt-1)

AW - attention layer weights, optional.

ATTN - attention state, initial is zero. If AW provided, it is the output of the attention layer,

ATTNt = concat(Ht, CONTEXTt) * AW

otherwise,

ATTNt = CONTEXTt

RNN layer output:

Y - if needed is the sequence of Ht from lstm cell.

Y_h - is the last valid H from lstm cell.

Y_c - is the last valid C from lstm cell.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01. Default value is ?.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators. Default value is ?.

  • activations: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified. Default value is ?.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified. Default value is ?.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is ?.

  • hidden_size: Number of neurons in the hidden layer. Default value is ?.

  • input_forget: Couple the input and forget gates if 1, default 0. Default value is ?.

Inputs

Between 3 and 14 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size]

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size]

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (optional, heterogeneous) - T: Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (optional, heterogeneous) - T: The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

  • QW (optional, heterogeneous) - T: The weight tensor of the query layer in the attention mechanism. Should be of shape [num_directions, am_query_depth(hidden_size of lstm), am_attn_size]

  • MW (optional, heterogeneous) - T: The weight tensor of the memory layer in the attention mechanism. Should be of shape [num_directions, memory_depth, am_attn_size]

  • V (optional, heterogeneous) - T: The attention_v tensor in the attention mechanism. Should be of shape [num_directions, am_attn_size]

  • M (optional, heterogeneous) - T: The sequence of the memory (input) for attention mechanism. Should be of [batch_size, max_memory_step, memory_depth]

  • memory_seq_lens (optional, heterogeneous) - T1: The sequence length of the input memory for the attention mechanism. Should be of [batch_size]

  • AW (optional, heterogeneous) - T: The weights of attention layer in the attention wrapper. If exists, should be of shape [num_directions, memory_depth+hidden_size, aw_attn_size]. Please note that attention mechanism context depth is also memory_depth in the attention mechanism.

Outputs

Between 0 and 3 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size]

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (optional, heterogeneous) - T: The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

OnnxComMicrosoftBatchNormInternal#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBatchNormInternal(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Variant of BatchNormalization with additional output for saved_mean/inv_std_dev.

Attributes

  • epsilon: epsilon value Default value is ?.

  • momentum: momentum value Default value is ?.

  • training_mode: true if training Default value is ?.

Inputs

  • X (heterogeneous) - T: Input tensor.

  • scale (heterogeneous) - T1: Scale tensor of shape (C).

  • B (heterogeneous) - T1: Bias tensor of shape (C).

  • input_mean (heterogeneous) - T2: running mean tensor of shape (C).

  • input_var (heterogeneous) - T2: running variance tensor of shape (C).

Outputs

Between 1 and 5 outputs.

  • Y (heterogeneous) - T: The output tensor of the same shape as X

  • running_mean (optional, heterogeneous) - T2: The running mean after BN.

  • running_var (optional, heterogeneous) - T2: Running var after BN

  • saved_mean (optional, heterogeneous) - T2: Mean of the batch

  • saved_inv_std (optional, heterogeneous) - T2: Inverse standard deviation for the batch

OnnxComMicrosoftBatchNormInternal_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBatchNormInternal_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Variant of BatchNormalization with additional output for saved_mean/inv_std_dev.

Attributes

  • epsilon: epsilon value Default value is ?.

  • momentum: momentum value Default value is ?.

  • training_mode: true if training Default value is ?.

Inputs

  • X (heterogeneous) - T: Input tensor.

  • scale (heterogeneous) - T1: Scale tensor of shape (C).

  • B (heterogeneous) - T1: Bias tensor of shape (C).

  • input_mean (heterogeneous) - T2: running mean tensor of shape (C).

  • input_var (heterogeneous) - T2: running variance tensor of shape (C).

Outputs

Between 1 and 5 outputs.

  • Y (heterogeneous) - T: The output tensor of the same shape as X

  • running_mean (optional, heterogeneous) - T2: The running mean after BN.

  • running_var (optional, heterogeneous) - T2: Running var after BN

  • saved_mean (optional, heterogeneous) - T2: Mean of the batch

  • saved_inv_std (optional, heterogeneous) - T2: Inverse standard deviation for the batch

OnnxComMicrosoftBatchNormalizationGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBatchNormalizationGrad(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

BatchNormalizationGrad

Attributes

  • epsilon (required): epsilon value Default value is ?.

Inputs

  • dY (heterogeneous) - T: Gradient output from previous node

  • X (heterogeneous) - T: Input

  • scale (heterogeneous) - T1: Scale tensor

  • mean (heterogeneous) - T2: Mean of X

  • variance (heterogeneous) - T2: Variance of X

Outputs

  • X_grad (heterogeneous) - T: Gradient of the input

  • scale_grad (heterogeneous) - T1: Gradient of the scale

  • bias_grad (heterogeneous) - T1: Gradient of the bias

OnnxComMicrosoftBatchNormalizationGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBatchNormalizationGrad_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

BatchNormalizationGrad

Attributes

  • epsilon (required): epsilon value Default value is ?.

Inputs

  • dY (heterogeneous) - T: Gradient output from previous node

  • X (heterogeneous) - T: Input

  • scale (heterogeneous) - T1: Scale tensor

  • mean (heterogeneous) - T2: Mean of X

  • variance (heterogeneous) - T2: Variance of X

Outputs

  • X_grad (heterogeneous) - T: Gradient of the input

  • scale_grad (heterogeneous) - T1: Gradient of the scale

  • bias_grad (heterogeneous) - T1: Gradient of the bias

OnnxComMicrosoftBeamSearch#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBeamSearch(*args, **kwargs)#

Version

  • name: BeamSearch (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Beam Search for text generation. Supports GPT-2 decoder.

Attributes

  • decoder (required): Decoder subgraph to execute in a loop. Default value is ?.

  • early_stopping: early stop or not Default value is ?.

  • encoder_decoder_init: subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph. Default value is ?.

  • eos_token_id (required): The id of the end-of-sequence token Default value is ?.

  • model_type: model type: 0 for GPT-2; 1 for encoder decoder like T5 Default value is ?.

  • no_repeat_ngram_size: no repeat ngrams size Default value is ?.

  • pad_token_id (required): The id of the padding token Default value is ?.

Inputs

Between 6 and 10 inputs.

  • input_ids (heterogeneous) - I: The sequence used as a prompt for the generation. Shape is (batch_size, sequence_length)

  • max_length (heterogeneous) - I: The maximum length of the sequence to be generated. Shape is (1)

  • min_length (optional, heterogeneous) - I: The minimum length below which the score of eos_token_id is set to -Inf. Shape is (1)

  • num_beams (heterogeneous) - I: Number of beams for beam search. 1 means no beam search. Shape is (1)

  • num_return_sequences (heterogeneous) - I: The number of returned sequences in the batch. Shape is (1)

  • temperature (heterogeneous) - T: The value used to module the next token probabilities. Accepts value > 0.0. Shape is (1)

  • length_penalty (optional, heterogeneous) - T: Exponential penalty to the length. Default value 1.0 means no penalty.Value > 1.0 encourages longer sequences, while values < 1.0 produces shorter sequences.Shape is (1,)

  • repetition_penalty (optional, heterogeneous) - T: The parameter for repetition penalty. Default value 1.0 means no penalty. Accepts value > 0.0. Shape is (1)

  • vocab_mask (optional, heterogeneous) - M: Mask of vocabulary. Words that masked with 0 are not allowed to be generated, and 1 is allowed. Shape is (vacab_size)

  • prefix_vocab_mask (optional, heterogeneous) - M: Mask of vocabulary for first step. Words that masked with 0 are not allowed to be generated, and 1 is allowed. Shape is (batch_size, vocab_size)

Outputs

Between 1 and 3 outputs.

  • sequences (heterogeneous) - I: Word IDs of generated sequences. Shape is (batch_size, num_return_sequences, max_sequence_length)

  • sequences_scores (optional, heterogeneous) - T: Final beam score of the generated sequences. Shape is (batch_size, num_return_sequences)

  • scores (optional, heterogeneous) - T: Processed beam scores for each vocabulary token at each generation step.Beam scores consisting of log softmax scores for each vocabulary token and sum of log softmax of previously generated tokens in this beam.Shape is (max_length - sequence_length, batch_size, num_beams, vocab_size)

OnnxComMicrosoftBeamSearch_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBeamSearch_1(*args, **kwargs)#

Version

  • name: BeamSearch (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Beam Search for text generation. Supports GPT-2 decoder.

Attributes

  • decoder (required): Decoder subgraph to execute in a loop. Default value is ?.

  • early_stopping: early stop or not Default value is ?.

  • encoder_decoder_init: subgraph for initialization of encoder and decoder. It will be called once before decoder subgraph. Default value is ?.

  • eos_token_id (required): The id of the end-of-sequence token Default value is ?.

  • model_type: model type: 0 for GPT-2; 1 for encoder decoder like T5 Default value is ?.

  • no_repeat_ngram_size: no repeat ngrams size Default value is ?.

  • pad_token_id (required): The id of the padding token Default value is ?.

Inputs

Between 6 and 10 inputs.

  • input_ids (heterogeneous) - I: The sequence used as a prompt for the generation. Shape is (batch_size, sequence_length)

  • max_length (heterogeneous) - I: The maximum length of the sequence to be generated. Shape is (1)

  • min_length (optional, heterogeneous) - I: The minimum length below which the score of eos_token_id is set to -Inf. Shape is (1)

  • num_beams (heterogeneous) - I: Number of beams for beam search. 1 means no beam search. Shape is (1)

  • num_return_sequences (heterogeneous) - I: The number of returned sequences in the batch. Shape is (1)

  • temperature (heterogeneous) - T: The value used to module the next token probabilities. Accepts value > 0.0. Shape is (1)

  • length_penalty (optional, heterogeneous) - T: Exponential penalty to the length. Default value 1.0 means no penalty.Value > 1.0 encourages longer sequences, while values < 1.0 produces shorter sequences.Shape is (1,)

  • repetition_penalty (optional, heterogeneous) - T: The parameter for repetition penalty. Default value 1.0 means no penalty. Accepts value > 0.0. Shape is (1)

  • vocab_mask (optional, heterogeneous) - M: Mask of vocabulary. Words that masked with 0 are not allowed to be generated, and 1 is allowed. Shape is (vacab_size)

  • prefix_vocab_mask (optional, heterogeneous) - M: Mask of vocabulary for first step. Words that masked with 0 are not allowed to be generated, and 1 is allowed. Shape is (batch_size, vocab_size)

Outputs

Between 1 and 3 outputs.

  • sequences (heterogeneous) - I: Word IDs of generated sequences. Shape is (batch_size, num_return_sequences, max_sequence_length)

  • sequences_scores (optional, heterogeneous) - T: Final beam score of the generated sequences. Shape is (batch_size, num_return_sequences)

  • scores (optional, heterogeneous) - T: Processed beam scores for each vocabulary token at each generation step.Beam scores consisting of log softmax scores for each vocabulary token and sum of log softmax of previously generated tokens in this beam.Shape is (max_length - sequence_length, batch_size, num_beams, vocab_size)

OnnxComMicrosoftBiasDropout#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBiasDropout(*args, **kwargs)#

Version

  • name: BiasDropout (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

output, dropout_mask = Dropout(data + bias, ratio) + residual, Intended to specialize the dropout pattern commonly found in transformer models.

Attributes

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one. Default value is ?.

Inputs

Between 2 and 5 inputs.

  • data (heterogeneous) - T: The input data as Tensor.

  • bias (heterogeneous) - T: The bias input, a vector with the same shape as last dim of data OR same shape with data

  • residual (optional, heterogeneous) - T: The residual input, must have the same shape as data

  • ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.

  • training_mode (optional, heterogeneous) - T2: If set to true then it indicates dropout is being used for training. It is an optional value hence unless specified explicitly, it is false. If it is false, ratio is ignored and the operation mimics inference mode where nothing will be dropped from the input data and if mask is requested as output it will contain all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T2: The output mask of dropout.

OnnxComMicrosoftBiasDropout_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBiasDropout_1(*args, **kwargs)#

Version

  • name: BiasDropout (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

output, dropout_mask = Dropout(data + bias, ratio) + residual, Intended to specialize the dropout pattern commonly found in transformer models.

Attributes

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one. Default value is ?.

Inputs

Between 2 and 5 inputs.

  • data (heterogeneous) - T: The input data as Tensor.

  • bias (heterogeneous) - T: The bias input, a vector with the same shape as last dim of data OR same shape with data

  • residual (optional, heterogeneous) - T: The residual input, must have the same shape as data

  • ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.

  • training_mode (optional, heterogeneous) - T2: If set to true then it indicates dropout is being used for training. It is an optional value hence unless specified explicitly, it is false. If it is false, ratio is ignored and the operation mimics inference mode where nothing will be dropped from the input data and if mask is requested as output it will contain all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T2: The output mask of dropout.

OnnxComMicrosoftBiasFastGeluGrad_dX#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBiasFastGeluGrad_dX(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Computes dX for FastGeluGrad with bias

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • X (heterogeneous) - T: The input tensor.

  • B (heterogeneous) - T: The bias tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftBiasFastGeluGrad_dX_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBiasFastGeluGrad_dX_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Computes dX for FastGeluGrad with bias

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • X (heterogeneous) - T: The input tensor.

  • B (heterogeneous) - T: The bias tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftBiasGelu#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBiasGelu(*args, **kwargs)#

Version

  • name: BiasGelu (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Bias Gelu. It’s an extension of Gelu. It takes the sum of input A and bias input B as the input of Gelu activation.

Inputs

  • A (heterogeneous) - T: The normal input data.

  • B (heterogeneous) - T: The bias input data that is a 1D tensor.

Outputs

  • C (heterogeneous) - T: The output.

OnnxComMicrosoftBiasGeluGrad_dX#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBiasGeluGrad_dX(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Computes dX for BiasGeluGrad

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • X (heterogeneous) - T: The input tensor.

  • B (heterogeneous) - T: The bias tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftBiasGeluGrad_dX_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBiasGeluGrad_dX_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Computes dX for BiasGeluGrad

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • X (heterogeneous) - T: The input tensor.

  • B (heterogeneous) - T: The bias tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftBiasGelu_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBiasGelu_1(*args, **kwargs)#

Version

  • name: BiasGelu (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Bias Gelu. It’s an extension of Gelu. It takes the sum of input A and bias input B as the input of Gelu activation.

Inputs

  • A (heterogeneous) - T: The normal input data.

  • B (heterogeneous) - T: The bias input data that is a 1D tensor.

Outputs

  • C (heterogeneous) - T: The output.

OnnxComMicrosoftBiasSoftmax#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBiasSoftmax(*args, **kwargs)#

Version

  • name: BiasSoftmax (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Y = softmax(scores + bias)) with simple broadcast on bias. Intended to specialize softmax(scores + additive_mask) commonly found in transformer models.

Attributes

  • broadcast_axis: broadcast bias across input for dimensions broadcast_axis to softmax_axis-1 Default value is ?.

  • softmax_axis: apply softmax to elements for dimensions softmax_axis or higher Default value is ?.

Inputs

  • data (heterogeneous) - T: The input data as Tensor.

  • bias (heterogeneous) - T: The bias (or mask) as Tensor.

Outputs

  • output (heterogeneous) - T: The output.

OnnxComMicrosoftBiasSoftmax_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBiasSoftmax_1(*args, **kwargs)#

Version

  • name: BiasSoftmax (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Y = softmax(scores + bias)) with simple broadcast on bias. Intended to specialize softmax(scores + additive_mask) commonly found in transformer models.

Attributes

  • broadcast_axis: broadcast bias across input for dimensions broadcast_axis to softmax_axis-1 Default value is ?.

  • softmax_axis: apply softmax to elements for dimensions softmax_axis or higher Default value is ?.

Inputs

  • data (heterogeneous) - T: The input data as Tensor.

  • bias (heterogeneous) - T: The bias (or mask) as Tensor.

Outputs

  • output (heterogeneous) - T: The output.

OnnxComMicrosoftBifurcationDetector#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBifurcationDetector(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Component for aggressive decoding. Find the bifurcation index of predicted tokens, between source tokens, starting from previous suffix match index, and predicted tokens. Concat predicted tokens, starting from bifurcation index, to the back of current tokens. This forms the output tokens. Detect suffix match index in source tokens, between source tokens and output tokens. Detection is based on finding the appearances of last n-gram in output tokens in source tokens. A match is considered found if source tokens contain a single matching n-gram. Return the index of the start of the n-gram in source tokens. No matching if found if src tokens contain multiple or zero matching n-grams. Return -1.

Attributes

  • max_ngram_size: The maximum NGram size for suffix matching. Default value is ?.

  • min_ngram_size: The minimum NGram size for suffix matching. Default value is ?.

Inputs

Between 3 and 4 inputs.

  • src_tokens (heterogeneous) - T: Encoder input ids.

  • cur_tokens (heterogeneous) - T: Decoder input ids.

  • prev_suffix_match_idx (heterogeneous) - T: Previous suffix match index

  • pred_tokens (optional, heterogeneous) - T: Predicted token ids from aggressive decoding

Outputs

  • tokens (heterogeneous) - T: Decoder input ids after merging predicted tokens

  • suffix_match_idx (heterogeneous) - T: new suffix match index

OnnxComMicrosoftBifurcationDetector_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBifurcationDetector_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Component for aggressive decoding. Find the bifurcation index of predicted tokens, between source tokens, starting from previous suffix match index, and predicted tokens. Concat predicted tokens, starting from bifurcation index, to the back of current tokens. This forms the output tokens. Detect suffix match index in source tokens, between source tokens and output tokens. Detection is based on finding the appearances of last n-gram in output tokens in source tokens. A match is considered found if source tokens contain a single matching n-gram. Return the index of the start of the n-gram in source tokens. No matching if found if src tokens contain multiple or zero matching n-grams. Return -1.

Attributes

  • max_ngram_size: The maximum NGram size for suffix matching. Default value is ?.

  • min_ngram_size: The minimum NGram size for suffix matching. Default value is ?.

Inputs

Between 3 and 4 inputs.

  • src_tokens (heterogeneous) - T: Encoder input ids.

  • cur_tokens (heterogeneous) - T: Decoder input ids.

  • prev_suffix_match_idx (heterogeneous) - T: Previous suffix match index

  • pred_tokens (optional, heterogeneous) - T: Predicted token ids from aggressive decoding

Outputs

  • tokens (heterogeneous) - T: Decoder input ids after merging predicted tokens

  • suffix_match_idx (heterogeneous) - T: new suffix match index

OnnxComMicrosoftBitmaskBiasDropout#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBitmaskBiasDropout(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

output, dropout_bitmask = Dropout(data + bias, ratio) + residual, Intended to specialize the dropout pattern commonly found in transformer models.

Attributes

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one. Default value is ?.

Inputs

Between 2 and 5 inputs.

  • data (heterogeneous) - T: The input data as Tensor.

  • bias (heterogeneous) - T: The bias input, a vector with the same shape as last dim of data OR same shape with data

  • residual (optional, heterogeneous) - T: The residual input, must have the same shape as data

  • ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.

  • training_mode (optional, heterogeneous) - T2: If set to true then it indicates dropout is being used for training. It is an optional value hence unless specified explicitly, it is false. If it is false, ratio is ignored and the operation mimics inference mode where nothing will be dropped from the input data and if mask is requested as output it will contain all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T3: The output mask of dropout.

OnnxComMicrosoftBitmaskBiasDropout_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBitmaskBiasDropout_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

output, dropout_bitmask = Dropout(data + bias, ratio) + residual, Intended to specialize the dropout pattern commonly found in transformer models.

Attributes

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one. Default value is ?.

Inputs

Between 2 and 5 inputs.

  • data (heterogeneous) - T: The input data as Tensor.

  • bias (heterogeneous) - T: The bias input, a vector with the same shape as last dim of data OR same shape with data

  • residual (optional, heterogeneous) - T: The residual input, must have the same shape as data

  • ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.

  • training_mode (optional, heterogeneous) - T2: If set to true then it indicates dropout is being used for training. It is an optional value hence unless specified explicitly, it is false. If it is false, ratio is ignored and the operation mimics inference mode where nothing will be dropped from the input data and if mask is requested as output it will contain all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T3: The output mask of dropout.

OnnxComMicrosoftBitmaskDropout#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBitmaskDropout(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

BitmaskDropout takes an input floating-point tensor, an optional input ratio (floating-point scalar) and an optional input training_mode (boolean scalar). It produces two tensor outputs: output (floating-point tensor) and mask (optional Tensor<uint32>). If training_mode is true then the output Y will be a random dropout. Note that this Dropout scales the masked input data by the following equation, so to convert the trained model into inference mode, the user can simply not pass training_mode input or set it to false.

output = scale * data * mask,

where

scale = 1. / (1. - ratio).

This op functions in much the same was as Dropout-11 and Dropout-13 do, execpt that the mask is output as a bit-packed uint32 tensor, instead of a boolean tensor.

Attributes

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one. Default value is ?.

Inputs

Between 1 and 3 inputs.

  • data (heterogeneous) - T: The input data as Tensor.

  • ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.

  • training_mode (optional, heterogeneous) - T2: If set to true then it indicates dropout is being used for training. It is an optional value hence unless specified explicitly, it is false. If it is false, ratio is ignored and the operation mimics inference mode where nothing will be dropped from the input data and if mask is requested as output it will contain all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T3: The bit-packed output mask.

OnnxComMicrosoftBitmaskDropoutGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBitmaskDropoutGrad(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

BitmaskDropoutGrad

Inputs

Between 2 and 4 inputs.

  • dy (heterogeneous) - T: The gradient tensor from output.

  • mask (heterogeneous) - T3: The mask output of the dropout.

  • ratio (optional, heterogeneous) - T1: Same value as the ratio input supplied to the dropout op with value in [0, 1). If this input is not specified, a default value of 0.5 is used.

  • training_mode (optional, heterogeneous) - T2: Same value as the training_mode input supplied to the dropout op. If this input is not specified, a default value of false is used.

Outputs

  • dx (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftBitmaskDropoutGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBitmaskDropoutGrad_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

BitmaskDropoutGrad

Inputs

Between 2 and 4 inputs.

  • dy (heterogeneous) - T: The gradient tensor from output.

  • mask (heterogeneous) - T3: The mask output of the dropout.

  • ratio (optional, heterogeneous) - T1: Same value as the ratio input supplied to the dropout op with value in [0, 1). If this input is not specified, a default value of 0.5 is used.

  • training_mode (optional, heterogeneous) - T2: Same value as the training_mode input supplied to the dropout op. If this input is not specified, a default value of false is used.

Outputs

  • dx (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftBitmaskDropout_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBitmaskDropout_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

BitmaskDropout takes an input floating-point tensor, an optional input ratio (floating-point scalar) and an optional input training_mode (boolean scalar). It produces two tensor outputs: output (floating-point tensor) and mask (optional Tensor<uint32>). If training_mode is true then the output Y will be a random dropout. Note that this Dropout scales the masked input data by the following equation, so to convert the trained model into inference mode, the user can simply not pass training_mode input or set it to false.

output = scale * data * mask,

where

scale = 1. / (1. - ratio).

This op functions in much the same was as Dropout-11 and Dropout-13 do, execpt that the mask is output as a bit-packed uint32 tensor, instead of a boolean tensor.

Attributes

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one. Default value is ?.

Inputs

Between 1 and 3 inputs.

  • data (heterogeneous) - T: The input data as Tensor.

  • ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.

  • training_mode (optional, heterogeneous) - T2: If set to true then it indicates dropout is being used for training. It is an optional value hence unless specified explicitly, it is false. If it is false, ratio is ignored and the operation mimics inference mode where nothing will be dropped from the input data and if mask is requested as output it will contain all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T3: The bit-packed output mask.

OnnxComMicrosoftBroadcastGradientArgs#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBroadcastGradientArgs(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Returns the reduction axes for computing gradients of s0 op s1 with broadcast.The ouput axes are deterministic from last to first. Output is an empty vector when no reduction is necessary for the corresponding input.

Inputs

  • a_shape (heterogeneous) - T: The 1st input shape as Tensor.

  • b_shape (heterogeneous) - T: The 2nd input shape as Tensor.

Outputs

Between 0 and 2 outputs.

  • a_axes (optional, heterogeneous) - T: The reduction axes for 1st input, last to first.

  • b_axes (optional, heterogeneous) - T: The reduction axes for 2nd input, last to first.

OnnxComMicrosoftBroadcastGradientArgs_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftBroadcastGradientArgs_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Returns the reduction axes for computing gradients of s0 op s1 with broadcast.The ouput axes are deterministic from last to first. Output is an empty vector when no reduction is necessary for the corresponding input.

Inputs

  • a_shape (heterogeneous) - T: The 1st input shape as Tensor.

  • b_shape (heterogeneous) - T: The 2nd input shape as Tensor.

Outputs

Between 0 and 2 outputs.

  • a_axes (optional, heterogeneous) - T: The reduction axes for 1st input, last to first.

  • b_axes (optional, heterogeneous) - T: The reduction axes for 2nd input, last to first.

OnnxComMicrosoftCDist#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftCDist(*args, **kwargs)#

Version

  • name: CDist (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • metric: The distance metric to use. If a string, the distance function can be “braycurtis”, “canberra”, “chebyshev”, “cityblock”, “correlation”, “cosine”, “dice”, “euclidean”, “hamming”, “jaccard”, “jensenshannon”, “kulsinski”, “mahalanobis”, “matching”, “minkowski”, “rogerstanimoto”, “russellrao”, “seuclidean”, “sokalmichener”, “sokalsneath”, “sqeuclidean”, “wminkowski”, “yule”. Default value is ?.

Inputs

  • A (heterogeneous) - T: 2D matrix with shape (M,N)

  • B (heterogeneous) - T: 2D matrix with shape (K,N)

Outputs

  • C (heterogeneous) - T: A 2D Matrix that represents the distance between each pair of the two collections of inputs.

OnnxComMicrosoftCDist_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftCDist_1(*args, **kwargs)#

Version

  • name: CDist (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • metric: The distance metric to use. If a string, the distance function can be “braycurtis”, “canberra”, “chebyshev”, “cityblock”, “correlation”, “cosine”, “dice”, “euclidean”, “hamming”, “jaccard”, “jensenshannon”, “kulsinski”, “mahalanobis”, “matching”, “minkowski”, “rogerstanimoto”, “russellrao”, “seuclidean”, “sokalmichener”, “sokalsneath”, “sqeuclidean”, “wminkowski”, “yule”. Default value is ?.

Inputs

  • A (heterogeneous) - T: 2D matrix with shape (M,N)

  • B (heterogeneous) - T: 2D matrix with shape (K,N)

Outputs

  • C (heterogeneous) - T: A 2D Matrix that represents the distance between each pair of the two collections of inputs.

OnnxComMicrosoftComplexMul#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftComplexMul(*args, **kwargs)#

Version

  • name: ComplexMul (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • A (heterogeneous) - T: input_0

  • B (heterogeneous) - T: input_1

Outputs

  • C (heterogeneous) - T: output tensor

OnnxComMicrosoftComplexMulConj#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftComplexMulConj(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • A (heterogeneous) - T: input_0

  • B (heterogeneous) - T: input_1

Outputs

  • C (heterogeneous) - T: output tensor

OnnxComMicrosoftComplexMulConj_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftComplexMulConj_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • A (heterogeneous) - T: input_0

  • B (heterogeneous) - T: input_1

Outputs

  • C (heterogeneous) - T: output tensor

OnnxComMicrosoftComplexMul_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftComplexMul_1(*args, **kwargs)#

Version

  • name: ComplexMul (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • A (heterogeneous) - T: input_0

  • B (heterogeneous) - T: input_1

Outputs

  • C (heterogeneous) - T: output tensor

OnnxComMicrosoftConcatTraining#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftConcatTraining(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Concatenate a list of tensors into a single tensor

Attributes

  • axis (required): Which axis to concat on Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic, heterogeneous) - T: List of tensors for concatenation

Outputs

Between 1 and 2 outputs.

  • concat_result (heterogeneous) - T: Concatenated tensor

  • per_input_length (optional, heterogeneous) - Tint: Vector of length of each concatenated input along the ‘axis’ dimension

OnnxComMicrosoftConcatTraining_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftConcatTraining_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Concatenate a list of tensors into a single tensor

Attributes

  • axis (required): Which axis to concat on Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic, heterogeneous) - T: List of tensors for concatenation

Outputs

Between 1 and 2 outputs.

  • concat_result (heterogeneous) - T: Concatenated tensor

  • per_input_length (optional, heterogeneous) - Tint: Vector of length of each concatenated input along the ‘axis’ dimension

OnnxComMicrosoftConvGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftConvGrad(*args, **kwargs)#

Version

  • name: ConvGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • X (heterogeneous) - T: Input tensor

  • W (heterogeneous) - T: Weight tensor

Outputs

Between 0 and 3 outputs.

  • dX (optional, heterogeneous) - T: Gradient of X

  • dW (optional, heterogeneous) - T: Gradient of W

  • dB (optional, heterogeneous) - T: Gradient of B

OnnxComMicrosoftConvGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftConvGrad_1(*args, **kwargs)#

Version

  • name: ConvGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • X (heterogeneous) - T: Input tensor

  • W (heterogeneous) - T: Weight tensor

Outputs

Between 0 and 3 outputs.

  • dX (optional, heterogeneous) - T: Gradient of X

  • dW (optional, heterogeneous) - T: Gradient of W

  • dB (optional, heterogeneous) - T: Gradient of B

OnnxComMicrosoftConvTransposeWithDynamicPads#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftConvTransposeWithDynamicPads(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • auto_pad:

Default value is ?.

  • dilations:

Default value is ?.

  • group:

Default value is ?.

  • kernel_shape:

Default value is ?.

  • output_padding:

Default value is ?.

  • strides:

Default value is ?.

Inputs

Between 2 and 4 inputs.

  • X (heterogeneous) - T:

  • W (heterogeneous) - T:

  • Pads (optional, heterogeneous) - tensor(int64):

  • B (optional, heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftConvTransposeWithDynamicPads_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftConvTransposeWithDynamicPads_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • auto_pad:

Default value is ?.

  • dilations:

Default value is ?.

  • group:

Default value is ?.

  • kernel_shape:

Default value is ?.

  • output_padding:

Default value is ?.

  • strides:

Default value is ?.

Inputs

Between 2 and 4 inputs.

  • X (heterogeneous) - T:

  • W (heterogeneous) - T:

  • Pads (optional, heterogeneous) - tensor(int64):

  • B (optional, heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftCropAndResize#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftCropAndResize(*args, **kwargs)#

Version

  • name: CropAndResize (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Extracts crops from the input image tensor and resizes them using bilinear sampling or nearest neighbor sampling (possibly with aspect ratio change) to a common output size specified by crop_height and crop_width. Returns a tensor with crops from the input image at positions defined at the bounding box locations in boxes. The cropped boxes are all resized (with bilinear or nearest neighbor interpolation) to a fixed size = [crop_height, crop_width]. The result is a 4-D tensor [num_boxes, crop_height, crop_width, depth]. The resizing is corner aligned.

Attributes

  • extrapolation_value: Value used for extrapolation, when applicable. Default is 0.0f. Default value is ?.

  • mode: The pooling method. Two modes are supported: ‘bilinear’ and ‘nearest’. Default is ‘bilinear’. Default value is ?.

Inputs

  • X (heterogeneous) - T1: Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (heterogeneous) - T1: RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[y1, x1, y2, x2], …]. The RoIs’ coordinates are normalized in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.

  • batch_indices (heterogeneous) - T2: 1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

  • crop_size (heterogeneous) - T2: 1-D tensor of 2 elements: [crop_height, crop_width]. All cropped image patches are resized to this size. Both crop_height and crop_width need to be positive.

Outputs

  • Y (heterogeneous) - T1: RoI pooled output, 4-D tensor of shape (num_rois, C, crop_height, crop_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

OnnxComMicrosoftCropAndResize_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftCropAndResize_1(*args, **kwargs)#

Version

  • name: CropAndResize (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Extracts crops from the input image tensor and resizes them using bilinear sampling or nearest neighbor sampling (possibly with aspect ratio change) to a common output size specified by crop_height and crop_width. Returns a tensor with crops from the input image at positions defined at the bounding box locations in boxes. The cropped boxes are all resized (with bilinear or nearest neighbor interpolation) to a fixed size = [crop_height, crop_width]. The result is a 4-D tensor [num_boxes, crop_height, crop_width, depth]. The resizing is corner aligned.

Attributes

  • extrapolation_value: Value used for extrapolation, when applicable. Default is 0.0f. Default value is ?.

  • mode: The pooling method. Two modes are supported: ‘bilinear’ and ‘nearest’. Default is ‘bilinear’. Default value is ?.

Inputs

  • X (heterogeneous) - T1: Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (heterogeneous) - T1: RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[y1, x1, y2, x2], …]. The RoIs’ coordinates are normalized in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.

  • batch_indices (heterogeneous) - T2: 1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

  • crop_size (heterogeneous) - T2: 1-D tensor of 2 elements: [crop_height, crop_width]. All cropped image patches are resized to this size. Both crop_height and crop_width need to be positive.

Outputs

  • Y (heterogeneous) - T1: RoI pooled output, 4-D tensor of shape (num_rois, C, crop_height, crop_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

OnnxComMicrosoftDecoderAttention#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDecoderAttention(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

This DecoderAttention supports self attention and cross attention, key and value cache, and key_padding_mask. The attention mask is not support at the moment. Some boolean parameters are passed by runtime input for generic purpose

Attributes

  • num_heads (required): Number of attention heads Default value is ?.

Inputs

  • query (heterogeneous) - T: 3D input tensor with shape (sequence_length, batch_size, hidden_size), hidden_size = num_heads * head_size

  • key (heterogeneous) - T: 3D input tensor with shape (total_sequence_length, batch_size, hidden_size)

  • q_weight (heterogeneous) - T: 2D input tensor with shape (hidden_size, hidden_size)

  • kv_weight (heterogeneous) - T: 2D input tensor with shape (hidden_size, 2 * hidden_size)

  • bias (heterogeneous) - T: 1D input tensor with shape (3 * hidden_size)

  • key_padding_mask (optional, heterogeneous) - B: 2D input tensor with shape (batch_size, total_sequence_length)

  • key_cache (optional, heterogeneous) - T: input tensor with shape (batch_size, num_heads, sequence_length or total_sequence_length, head_size)

  • value_cache (optional, heterogeneous) - T: input tensor with shape (batch_size, num_heads, sequence_length or total_sequence_length, head_size)

  • static_kv (heterogeneous) - B: If static_kv = true, cross-attention; else self-attention

  • use_past (heterogeneous) - B: If use_past = true, use cache; else no cache

  • has_layer_state (heterogeneous) - B: If has_layer_state = true, layer_state = {} or [a,b]; else layer_state = None

  • has_key_padding_mask (heterogeneous) - B: has_key_padding_mask or not

Outputs

Between 1 and 3 outputs.

  • output (heterogeneous) - T: 3D output tensor with shape (sequence_length, batch_size, hidden_size)

  • new_key_cache (optional, heterogeneous) - T: output tensor with shape (batch_size, num_heads, new sequence_length, head_size)

  • new_value_cache (optional, heterogeneous) - T: output tensor with shape (batch_size, num_heads, new sequence_length, head_size)

OnnxComMicrosoftDecoderAttention_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDecoderAttention_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

This DecoderAttention supports self attention and cross attention, key and value cache, and key_padding_mask. The attention mask is not support at the moment. Some boolean parameters are passed by runtime input for generic purpose

Attributes

  • num_heads (required): Number of attention heads Default value is ?.

Inputs

  • query (heterogeneous) - T: 3D input tensor with shape (sequence_length, batch_size, hidden_size), hidden_size = num_heads * head_size

  • key (heterogeneous) - T: 3D input tensor with shape (total_sequence_length, batch_size, hidden_size)

  • q_weight (heterogeneous) - T: 2D input tensor with shape (hidden_size, hidden_size)

  • kv_weight (heterogeneous) - T: 2D input tensor with shape (hidden_size, 2 * hidden_size)

  • bias (heterogeneous) - T: 1D input tensor with shape (3 * hidden_size)

  • key_padding_mask (optional, heterogeneous) - B: 2D input tensor with shape (batch_size, total_sequence_length)

  • key_cache (optional, heterogeneous) - T: input tensor with shape (batch_size, num_heads, sequence_length or total_sequence_length, head_size)

  • value_cache (optional, heterogeneous) - T: input tensor with shape (batch_size, num_heads, sequence_length or total_sequence_length, head_size)

  • static_kv (heterogeneous) - B: If static_kv = true, cross-attention; else self-attention

  • use_past (heterogeneous) - B: If use_past = true, use cache; else no cache

  • has_layer_state (heterogeneous) - B: If has_layer_state = true, layer_state = {} or [a,b]; else layer_state = None

  • has_key_padding_mask (heterogeneous) - B: has_key_padding_mask or not

Outputs

Between 1 and 3 outputs.

  • output (heterogeneous) - T: 3D output tensor with shape (sequence_length, batch_size, hidden_size)

  • new_key_cache (optional, heterogeneous) - T: output tensor with shape (batch_size, num_heads, new sequence_length, head_size)

  • new_value_cache (optional, heterogeneous) - T: output tensor with shape (batch_size, num_heads, new sequence_length, head_size)

OnnxComMicrosoftDequantizeLinear#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDequantizeLinear(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The linear dequantization operator. It consumes a quantized data, a scale, a zero point and computes the full precision data. The dequantization formula is y = (x - x_zero_point) * x_scale. Scale and zero point must have same shape. They must be either scalar (per tensor) or 1-D tensor (per ‘axis’).

Attributes

  • axis: The axis along which same quantization parameters are applied. It’s optional.If it’s not specified, it means per-tensor quantization and input ‘x_scale’ and ‘x_zero_point’ must be scalars.If it’s specified, it means per ‘axis’ quantization and input ‘x_scale’ and ‘x_zero_point’ must be 1-D tensors. Default value is ?.

Inputs

  • x (heterogeneous) - T1: N-D quantized Input tensor to be de-quantized.

  • x_scale (heterogeneous) - T2: Scale for input ‘x’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-axis quantization.If it’s a 1-D tensor, its number of elements should be equal to the dimension value of ‘axis’ dimension of input ‘x’.

  • x_zero_point (heterogeneous) - T1: Zero point for input ‘x’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-axis quantization.If it’s a 1-D tensor, its number of elements should be equal to the dimension value of ‘axis’ dimension of input ‘x’.

Outputs

  • y (heterogeneous) - T2: N-D full precision output tensor. It has same shape as input ‘x’.

OnnxComMicrosoftDequantizeLinear_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDequantizeLinear_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The linear dequantization operator. It consumes a quantized data, a scale, a zero point and computes the full precision data. The dequantization formula is y = (x - x_zero_point) * x_scale. Scale and zero point must have same shape. They must be either scalar (per tensor) or 1-D tensor (per ‘axis’).

Attributes

  • axis: The axis along which same quantization parameters are applied. It’s optional.If it’s not specified, it means per-tensor quantization and input ‘x_scale’ and ‘x_zero_point’ must be scalars.If it’s specified, it means per ‘axis’ quantization and input ‘x_scale’ and ‘x_zero_point’ must be 1-D tensors. Default value is ?.

Inputs

  • x (heterogeneous) - T1: N-D quantized Input tensor to be de-quantized.

  • x_scale (heterogeneous) - T2: Scale for input ‘x’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-axis quantization.If it’s a 1-D tensor, its number of elements should be equal to the dimension value of ‘axis’ dimension of input ‘x’.

  • x_zero_point (heterogeneous) - T1: Zero point for input ‘x’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-axis quantization.If it’s a 1-D tensor, its number of elements should be equal to the dimension value of ‘axis’ dimension of input ‘x’.

Outputs

  • y (heterogeneous) - T2: N-D full precision output tensor. It has same shape as input ‘x’.

OnnxComMicrosoftDivGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDivGrad(*args, **kwargs)#

Version

  • name: DivGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • dY (heterogeneous) - T: Gradient of output

  • A (heterogeneous) - T: dividend

  • B (heterogeneous) - T: divisor

Outputs

Between 0 and 2 outputs.

  • dA (optional, heterogeneous) - T: Gradient of dividend

  • dB (optional, heterogeneous) - T: Gradient of divisor

OnnxComMicrosoftDivGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDivGrad_1(*args, **kwargs)#

Version

  • name: DivGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • dY (heterogeneous) - T: Gradient of output

  • A (heterogeneous) - T: dividend

  • B (heterogeneous) - T: divisor

Outputs

Between 0 and 2 outputs.

  • dA (optional, heterogeneous) - T: Gradient of dividend

  • dB (optional, heterogeneous) - T: Gradient of divisor

OnnxComMicrosoftDropoutGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDropoutGrad(*args, **kwargs)#

Version

  • name: DropoutGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

DropoutGrad

Inputs

Between 2 and 4 inputs.

  • dy (heterogeneous) - T: The gradient tensor from output.

  • mask (heterogeneous) - T2: The mask output of the dropout.

  • ratio (optional, heterogeneous) - T1: Same value as the ratio input supplied to the dropout op with value in [0, 1). If this input is not specified, a default value of 0.5 is used.

  • training_mode (optional, heterogeneous) - T2: Same value as the training_mode input supplied to the dropout op. If this input is not specified, a default value of false is used.

Outputs

  • dx (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftDropoutGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDropoutGrad_1(*args, **kwargs)#

Version

  • name: DropoutGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

DropoutGrad

Inputs

Between 2 and 4 inputs.

  • dy (heterogeneous) - T: The gradient tensor from output.

  • mask (heterogeneous) - T2: The mask output of the dropout.

  • ratio (optional, heterogeneous) - T1: Same value as the ratio input supplied to the dropout op with value in [0, 1). If this input is not specified, a default value of 0.5 is used.

  • training_mode (optional, heterogeneous) - T2: Same value as the training_mode input supplied to the dropout op. If this input is not specified, a default value of false is used.

Outputs

  • dx (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftDynamicQuantizeLSTM#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDynamicQuantizeLSTM(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01. Default value is ?.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators. Default value is ?.

  • activations: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified. Default value is ?.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified. Default value is ?.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is ?.

  • hidden_size: Number of neurons in the hidden layer Default value is ?.

  • input_forget: Couple the input and forget gates if 1. Default value is ?.

Inputs

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T2: The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, input_size, 4*hidden_size].

  • R (heterogeneous) - T2: The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, hidden_size, 4*hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (optional, heterogeneous) - T: Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (optional, heterogeneous) - T: The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

  • W_scale (heterogeneous) - T: W’s scale. Its size is [num_directions] for per-tensor/layer quantization, or [num_directions, 4*hidden_size] for per-channel quantization on the axis input_size.

  • W_zero_point (heterogeneous) - T2: W’s zero point. Its size is [num_directions] for per-tensor/layer quantization, or [num_directions, 4*hidden_size] for per-channel quantization on the axis input_size.

  • R_scale (heterogeneous) - T: R’s scale. Its size is [num_directions] for per-tensor/layer quantization, or [num_directions, 4*hidden_size] for per-channel quantization on the axis input_size.

  • R_zero_point (heterogeneous) - T2: R’s zero point. Its size is [num_directions] for per-tensor/layer quantization, or [num_directions, 4*hidden_size] for per-channel quantization on the axis input_size.

Outputs

Between 0 and 3 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (optional, heterogeneous) - T: The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

OnnxComMicrosoftDynamicQuantizeLSTM_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDynamicQuantizeLSTM_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01. Default value is ?.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators. Default value is ?.

  • activations: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified. Default value is ?.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified. Default value is ?.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is ?.

  • hidden_size: Number of neurons in the hidden layer Default value is ?.

  • input_forget: Couple the input and forget gates if 1. Default value is ?.

Inputs

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T2: The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, input_size, 4*hidden_size].

  • R (heterogeneous) - T2: The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, hidden_size, 4*hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (optional, heterogeneous) - T: Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (optional, heterogeneous) - T: The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

  • W_scale (heterogeneous) - T: W’s scale. Its size is [num_directions] for per-tensor/layer quantization, or [num_directions, 4*hidden_size] for per-channel quantization on the axis input_size.

  • W_zero_point (heterogeneous) - T2: W’s zero point. Its size is [num_directions] for per-tensor/layer quantization, or [num_directions, 4*hidden_size] for per-channel quantization on the axis input_size.

  • R_scale (heterogeneous) - T: R’s scale. Its size is [num_directions] for per-tensor/layer quantization, or [num_directions, 4*hidden_size] for per-channel quantization on the axis input_size.

  • R_zero_point (heterogeneous) - T2: R’s zero point. Its size is [num_directions] for per-tensor/layer quantization, or [num_directions, 4*hidden_size] for per-channel quantization on the axis input_size.

Outputs

Between 0 and 3 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (optional, heterogeneous) - T: The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

OnnxComMicrosoftDynamicQuantizeMatMul#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDynamicQuantizeMatMul(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

Between 3 and 5 inputs.

  • A (heterogeneous) - T1: N-dimensional matrix A

  • B (heterogeneous) - T2: N-dimensional matrix B

  • b_scale (heterogeneous) - T1: Scale of quantized input ‘B’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • b_zero_point (optional, heterogeneous) - T2: Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • bias (optional, heterogeneous) - T1: 1D input tensor, whose dimension is same as B’s last dimension

Outputs

  • Y (heterogeneous) - T1: Matrix multiply results from A * B

OnnxComMicrosoftDynamicQuantizeMatMul_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftDynamicQuantizeMatMul_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

Between 3 and 5 inputs.

  • A (heterogeneous) - T1: N-dimensional matrix A

  • B (heterogeneous) - T2: N-dimensional matrix B

  • b_scale (heterogeneous) - T1: Scale of quantized input ‘B’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • b_zero_point (optional, heterogeneous) - T2: Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • bias (optional, heterogeneous) - T1: 1D input tensor, whose dimension is same as B’s last dimension

Outputs

  • Y (heterogeneous) - T1: Matrix multiply results from A * B

OnnxComMicrosoftEmbedLayerNormalization#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftEmbedLayerNormalization(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

EmbedLayerNormalization is the fusion of embedding layer in BERT model, with optional mask processing. The embedding layer takes input_ids (word IDs) and segment_ids (sentence IDs) to look up word_embedding, position_embedding, and segment_emedding; the embeddings are added then applied layer normalization using gamma and beta tensors. The last input mask is optional. If mask is provided, mask index (that is position of first 0 in mask, or number of words) will be calculated.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is ?.

Inputs

Between 7 and 9 inputs.

  • input_ids (heterogeneous) - T1: 2D words IDs with shape (batch_size, sequence_length)

  • segment_ids (optional, heterogeneous) - T1: 2D segment IDs with shape (batch_size, sequence_length)

  • word_embedding (heterogeneous) - T: 2D with shape (,hidden_size)

  • position_embedding (heterogeneous) - T: 2D with shape (, hidden_size)

  • segment_embedding (optional, heterogeneous) - T: 2D with shape (, hidden_size)

  • gamma (heterogeneous) - T: 1D gamma tensor for layer normalization with shape (hidden_size)

  • beta (heterogeneous) - T: 1D beta tensor for layer normalization with shape (hidden_size)

  • mask (optional, heterogeneous) - T1: 2D attention mask with shape (batch_size, sequence_length)

  • position_ids (optional, heterogeneous) - T1: 2D position ids with shape (batch_size, sequence_length)

Outputs

Between 2 and 3 outputs.

  • output (heterogeneous) - T: 3D output tensor with shape (batch_size, sequence_length, hidden_size)

  • mask_index (heterogeneous) - T1: 1D mask_index tensor with shape (batch_size)

  • embedding_sum (optional, heterogeneous) - T: sum of word_embedding and position_embedding without layer normalization

OnnxComMicrosoftEmbedLayerNormalization_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftEmbedLayerNormalization_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

EmbedLayerNormalization is the fusion of embedding layer in BERT model, with optional mask processing. The embedding layer takes input_ids (word IDs) and segment_ids (sentence IDs) to look up word_embedding, position_embedding, and segment_emedding; the embeddings are added then applied layer normalization using gamma and beta tensors. The last input mask is optional. If mask is provided, mask index (that is position of first 0 in mask, or number of words) will be calculated.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is ?.

Inputs

Between 7 and 9 inputs.

  • input_ids (heterogeneous) - T1: 2D words IDs with shape (batch_size, sequence_length)

  • segment_ids (optional, heterogeneous) - T1: 2D segment IDs with shape (batch_size, sequence_length)

  • word_embedding (heterogeneous) - T: 2D with shape (,hidden_size)

  • position_embedding (heterogeneous) - T: 2D with shape (, hidden_size)

  • segment_embedding (optional, heterogeneous) - T: 2D with shape (, hidden_size)

  • gamma (heterogeneous) - T: 1D gamma tensor for layer normalization with shape (hidden_size)

  • beta (heterogeneous) - T: 1D beta tensor for layer normalization with shape (hidden_size)

  • mask (optional, heterogeneous) - T1: 2D attention mask with shape (batch_size, sequence_length)

  • position_ids (optional, heterogeneous) - T1: 2D position ids with shape (batch_size, sequence_length)

Outputs

Between 2 and 3 outputs.

  • output (heterogeneous) - T: 3D output tensor with shape (batch_size, sequence_length, hidden_size)

  • mask_index (heterogeneous) - T1: 1D mask_index tensor with shape (batch_size)

  • embedding_sum (optional, heterogeneous) - T: sum of word_embedding and position_embedding without layer normalization

OnnxComMicrosoftExpandDims#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftExpandDims(*args, **kwargs)#

Version

  • name: ExpandDims (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

ExpandDims echo operator.

Inputs

  • X (heterogeneous) - T: input

  • axis (heterogeneous) - tensor(int32): Specified axis to insert a dimension

Outputs

  • Y (heterogeneous) - T: output

OnnxComMicrosoftExpandDims_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftExpandDims_1(*args, **kwargs)#

Version

  • name: ExpandDims (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

ExpandDims echo operator.

Inputs

  • X (heterogeneous) - T: input

  • axis (heterogeneous) - tensor(int32): Specified axis to insert a dimension

Outputs

  • Y (heterogeneous) - T: output

OnnxComMicrosoftFastGelu#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftFastGelu(*args, **kwargs)#

Version

  • name: FastGelu (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

GELU (Gaussian Error Linear Unit) approximation: Y=0.5*X*(1+tanh(0.797885*X+0.035677*X*X*X)) with an optional input of bias that will be added to X before GELU.

Inputs

Between 1 and 2 inputs.

  • X (heterogeneous) - T: input tensor

  • bias (optional, heterogeneous) - T: bias tensor

Outputs

  • Y (heterogeneous) - T: output tensor

OnnxComMicrosoftFastGeluGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftFastGeluGrad(*args, **kwargs)#

Version

  • name: FastGeluGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

FastGeluGrad

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • X (heterogeneous) - T: The input tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftFastGeluGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftFastGeluGrad_1(*args, **kwargs)#

Version

  • name: FastGeluGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

FastGeluGrad

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • X (heterogeneous) - T: The input tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftFastGelu_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftFastGelu_1(*args, **kwargs)#

Version

  • name: FastGelu (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

GELU (Gaussian Error Linear Unit) approximation: Y=0.5*X*(1+tanh(0.797885*X+0.035677*X*X*X)) with an optional input of bias that will be added to X before GELU.

Inputs

Between 1 and 2 inputs.

  • X (heterogeneous) - T: input tensor

  • bias (optional, heterogeneous) - T: bias tensor

Outputs

  • Y (heterogeneous) - T: output tensor

OnnxComMicrosoftFusedConv#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftFusedConv(*args, **kwargs)#

Version

  • name: FusedConv (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The fused convolution operator schema is the same as Conv besides it includes an attribute activation.

Attributes

  • activation:

Default value is ?.

  • activation_params:

Default value is ?.

  • auto_pad:

Default value is ?.

  • dilations:

Default value is ?.

  • group:

Default value is ?.

  • kernel_shape:

Default value is ?.

  • pads:

Default value is ?.

  • strides:

Default value is ?.

Inputs

Between 2 and 4 inputs.

  • X (heterogeneous) - T:

  • W (heterogeneous) - T:

  • B (optional, heterogeneous) - T:

  • Z (optional, heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftFusedConv_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftFusedConv_1(*args, **kwargs)#

Version

  • name: FusedConv (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The fused convolution operator schema is the same as Conv besides it includes an attribute activation.

Attributes

  • activation:

Default value is ?.

  • activation_params:

Default value is ?.

  • auto_pad:

Default value is ?.

  • dilations:

Default value is ?.

  • group:

Default value is ?.

  • kernel_shape:

Default value is ?.

  • pads:

Default value is ?.

  • strides:

Default value is ?.

Inputs

Between 2 and 4 inputs.

  • X (heterogeneous) - T:

  • W (heterogeneous) - T:

  • B (optional, heterogeneous) - T:

  • Z (optional, heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftFusedGemm#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftFusedGemm(*args, **kwargs)#

Version

  • name: FusedGemm (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The FusedGemm operator schema is the same as Gemm besides it includes attributes activation and leaky_relu_alpha.

Attributes

  • activation:

Default value is ?.

  • activation_alpha:

Default value is ?.

  • activation_beta:

Default value is ?.

  • activation_gamma:

Default value is ?.

  • alpha: Scalar multiplier for the product of input tensors A * B. Default value is ?.

  • beta: Scalar multiplier for input tensor C. Default value is ?.

  • transA: Whether A should be transposed Default value is ?.

  • transB: Whether B should be transposed Default value is ?.

Inputs

  • A (heterogeneous) - T: Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.

  • B (heterogeneous) - T: Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.

  • C (heterogeneous) - T: Input tensor C. The shape of C should be unidirectional broadcastable to (M, N).

Outputs

  • Y (heterogeneous) - T: Output tensor of shape (M, N).

OnnxComMicrosoftFusedGemm_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftFusedGemm_1(*args, **kwargs)#

Version

  • name: FusedGemm (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The FusedGemm operator schema is the same as Gemm besides it includes attributes activation and leaky_relu_alpha.

Attributes

  • activation:

Default value is ?.

  • activation_alpha:

Default value is ?.

  • activation_beta:

Default value is ?.

  • activation_gamma:

Default value is ?.

  • alpha: Scalar multiplier for the product of input tensors A * B. Default value is ?.

  • beta: Scalar multiplier for input tensor C. Default value is ?.

  • transA: Whether A should be transposed Default value is ?.

  • transB: Whether B should be transposed Default value is ?.

Inputs

  • A (heterogeneous) - T: Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.

  • B (heterogeneous) - T: Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.

  • C (heterogeneous) - T: Input tensor C. The shape of C should be unidirectional broadcastable to (M, N).

Outputs

  • Y (heterogeneous) - T: Output tensor of shape (M, N).

OnnxComMicrosoftFusedMatMul#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftFusedMatMul(*args, **kwargs)#

Version

  • name: FusedMatMul (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

Attributes

  • alpha: Scalar multiplier for the product of the input tensors. Default value is ?.

  • transA: Whether A should be transposed on the last two dimensions before doing multiplication Default value is ?.

  • transB: Whether B should be transposed on the last two dimensions before doing multiplication Default value is ?.

  • transBatchA: Whether A should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication Default value is ?.

  • transBatchB: Whether B should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication Default value is ?.

Inputs

  • A (heterogeneous) - T: N-dimensional matrix A

  • B (heterogeneous) - T: N-dimensional matrix B

Outputs

  • Y (heterogeneous) - T: Matrix multiply results

OnnxComMicrosoftFusedMatMul_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftFusedMatMul_1(*args, **kwargs)#

Version

  • name: FusedMatMul (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

Attributes

  • alpha: Scalar multiplier for the product of the input tensors. Default value is ?.

  • transA: Whether A should be transposed on the last two dimensions before doing multiplication Default value is ?.

  • transB: Whether B should be transposed on the last two dimensions before doing multiplication Default value is ?.

  • transBatchA: Whether A should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication Default value is ?.

  • transBatchB: Whether B should be transposed on the 1st dimension and batch dimensions (dim-1 to dim-rank-2) before doing multiplication Default value is ?.

Inputs

  • A (heterogeneous) - T: N-dimensional matrix A

  • B (heterogeneous) - T: N-dimensional matrix B

Outputs

  • Y (heterogeneous) - T: Matrix multiply results

OnnxComMicrosoftGatherElementsGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGatherElementsGrad(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

GatherElementsGrad

Attributes

  • axis: Which axis to scatter on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is ?.

Inputs

  • dY (heterogeneous) - T: Tensor of rank r >=1 (same rank and shape as indices)

  • shape (heterogeneous) - I: Shape of the GatherElements input data.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of r >= 1 (same rank as input). All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • dX (heterogeneous) - T: Tensor of rank r >= 1 (same rank as input).

OnnxComMicrosoftGatherElementsGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGatherElementsGrad_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

GatherElementsGrad

Attributes

  • axis: Which axis to scatter on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is ?.

Inputs

  • dY (heterogeneous) - T: Tensor of rank r >=1 (same rank and shape as indices)

  • shape (heterogeneous) - I: Shape of the GatherElements input data.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of r >= 1 (same rank as input). All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • dX (heterogeneous) - T: Tensor of rank r >= 1 (same rank as input).

OnnxComMicrosoftGatherGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGatherGrad(*args, **kwargs)#

Version

  • name: GatherGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • axis: Which axis to gather on. Negative value means counting dimensions from the back. Accepted range in [-r, r-1] Default value is ?.

Inputs

  • shape (heterogeneous) - I: Shape of the Gather input X.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of any rank q.

  • dY (heterogeneous) - T: Gradient of output

Outputs

  • dX (heterogeneous) - T: Gradient of input

OnnxComMicrosoftGatherGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGatherGrad_1(*args, **kwargs)#

Version

  • name: GatherGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • axis: Which axis to gather on. Negative value means counting dimensions from the back. Accepted range in [-r, r-1] Default value is ?.

Inputs

  • shape (heterogeneous) - I: Shape of the Gather input X.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of any rank q.

  • dY (heterogeneous) - T: Gradient of output

Outputs

  • dX (heterogeneous) - T: Gradient of input

OnnxComMicrosoftGatherND#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGatherND(*args, **kwargs)#

Version

  • name: GatherND (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Given data tensor of rank r >= 1, and indices tensor of rank q >= 1, gather slices of data into an output tensor of rank q - 1 + r - indices[-1]. Example 1:

data = [[0,1],[2,3]] indices = [[0,0],[1,1]] output = [0,3]

Example 2:

data = [[0,1],[2,3]] indices = [[1],[0]] output = [[2,3],[0,1]]

Example 3:

data = [[[0,1],[2,3]],[[4,5],[6,7]]] indices = [[0,1],[1,0]] output = [[2,3],[4,5]]

Example 4:

data = [[[0,1],[2,3]],[[4,5],[6,7]]] indices = [[[0,1]],[[1,0]]] output = [[[2,3]],[[4,5]]]

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of rank q >= 1.

Outputs

  • output (heterogeneous) - T: Tensor of rank q-1+r-indices[-1].

OnnxComMicrosoftGatherNDGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGatherNDGrad(*args, **kwargs)#

Version

  • name: GatherNDGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • batch_dims: The number of batch dims. The gather of indexing starts from dimension of data[batch_dims+1:] Default value is ?.

Inputs

  • shape (heterogeneous) - T1: The shape of source data input of GatherND.

  • indices (heterogeneous) - Tind: Tensor of rank q >= 1.

  • update (heterogeneous) - T: The gradient of the output.

Outputs

  • output (heterogeneous) - T: Tensor graident of the input.

OnnxComMicrosoftGatherNDGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGatherNDGrad_1(*args, **kwargs)#

Version

  • name: GatherNDGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • batch_dims: The number of batch dims. The gather of indexing starts from dimension of data[batch_dims+1:] Default value is ?.

Inputs

  • shape (heterogeneous) - T1: The shape of source data input of GatherND.

  • indices (heterogeneous) - Tind: Tensor of rank q >= 1.

  • update (heterogeneous) - T: The gradient of the output.

Outputs

  • output (heterogeneous) - T: Tensor graident of the input.

OnnxComMicrosoftGatherND_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGatherND_1(*args, **kwargs)#

Version

  • name: GatherND (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Given data tensor of rank r >= 1, and indices tensor of rank q >= 1, gather slices of data into an output tensor of rank q - 1 + r - indices[-1]. Example 1:

data = [[0,1],[2,3]] indices = [[0,0],[1,1]] output = [0,3]

Example 2:

data = [[0,1],[2,3]] indices = [[1],[0]] output = [[2,3],[0,1]]

Example 3:

data = [[[0,1],[2,3]],[[4,5],[6,7]]] indices = [[0,1],[1,0]] output = [[2,3],[4,5]]

Example 4:

data = [[[0,1],[2,3]],[[4,5],[6,7]]] indices = [[[0,1]],[[1,0]]] output = [[[2,3]],[[4,5]]]

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of rank q >= 1.

Outputs

  • output (heterogeneous) - T: Tensor of rank q-1+r-indices[-1].

OnnxComMicrosoftGelu#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGelu(*args, **kwargs)#

Version

  • name: Gelu (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Gaussian Error Linear Unit. A high-performing neural network activation function.The GELU nonlinearity is the expected transformation of a stochastic regularizer which randomly applies the identity or zero map to a neuron’s input. The GELU nonlinearity weights inputs by their magnitude, rather than gates inputs by their sign as in ReLUs.

Inputs

  • X (heterogeneous) - T: The input data as Tensor.

Outputs

  • Y (heterogeneous) - T: The output.

OnnxComMicrosoftGeluGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGeluGrad(*args, **kwargs)#

Version

  • name: GeluGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

GeluGrad

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • X (heterogeneous) - T: The input tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftGeluGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGeluGrad_1(*args, **kwargs)#

Version

  • name: GeluGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

GeluGrad

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • X (heterogeneous) - T: The input tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftGelu_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGelu_1(*args, **kwargs)#

Version

  • name: Gelu (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Gaussian Error Linear Unit. A high-performing neural network activation function.The GELU nonlinearity is the expected transformation of a stochastic regularizer which randomly applies the identity or zero map to a neuron’s input. The GELU nonlinearity weights inputs by their magnitude, rather than gates inputs by their sign as in ReLUs.

Inputs

  • X (heterogeneous) - T: The input data as Tensor.

Outputs

  • Y (heterogeneous) - T: The output.

OnnxComMicrosoftGistBinarizeDecoder#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistBinarizeDecoder(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

  • X (heterogeneous) - T1: compresssed input

Outputs

  • Y (heterogeneous) - T: uncompressed output

OnnxComMicrosoftGistBinarizeDecoder_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistBinarizeDecoder_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

  • X (heterogeneous) - T1: compresssed input

Outputs

  • Y (heterogeneous) - T: uncompressed output

OnnxComMicrosoftGistBinarizeEncoder#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistBinarizeEncoder(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: uncompressed input

Outputs

  • Y (heterogeneous) - T1: compressed output

OnnxComMicrosoftGistBinarizeEncoder_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistBinarizeEncoder_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: uncompressed input

Outputs

  • Y (heterogeneous) - T1: compressed output

OnnxComMicrosoftGistPack16Decoder#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack16Decoder(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

  • X (heterogeneous) - T1: compressed input

Outputs

  • Y (heterogeneous) - T: uncompressed output

OnnxComMicrosoftGistPack16Decoder_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack16Decoder_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

  • X (heterogeneous) - T1: compressed input

Outputs

  • Y (heterogeneous) - T: uncompressed output

OnnxComMicrosoftGistPack16Encoder#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack16Encoder(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: uncompressed input

Outputs

  • Y (heterogeneous) - T1: compressed output

OnnxComMicrosoftGistPack16Encoder_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack16Encoder_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: uncompressed input

Outputs

  • Y (heterogeneous) - T1: compressed output

OnnxComMicrosoftGistPack1Decoder#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack1Decoder(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

  • X (heterogeneous) - T1: 1 bit compresssed input

Outputs

  • Y (heterogeneous) - T: uncompressed output

OnnxComMicrosoftGistPack1Decoder_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack1Decoder_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

  • X (heterogeneous) - T1: 1 bit compresssed input

Outputs

  • Y (heterogeneous) - T: uncompressed output

OnnxComMicrosoftGistPack1Encoder#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack1Encoder(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: uncompressed input

Outputs

  • Y (heterogeneous) - T1: 1 bit compressed output

OnnxComMicrosoftGistPack1Encoder_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack1Encoder_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: uncompressed input

Outputs

  • Y (heterogeneous) - T1: 1 bit compressed output

OnnxComMicrosoftGistPack8Decoder#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack8Decoder(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

  • X (heterogeneous) - T1: compresssed input

Outputs

  • Y (heterogeneous) - T: uncompressed output

OnnxComMicrosoftGistPack8Decoder_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack8Decoder_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

  • X (heterogeneous) - T1: compresssed input

Outputs

  • Y (heterogeneous) - T: uncompressed output

OnnxComMicrosoftGistPack8Encoder#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack8Encoder(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: uncompressed input

Outputs

  • Y (heterogeneous) - T1: compressed output

OnnxComMicrosoftGistPack8Encoder_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPack8Encoder_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: uncompressed input

Outputs

  • Y (heterogeneous) - T1: compressed output

OnnxComMicrosoftGistPackMsfp15Decoder#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPackMsfp15Decoder(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

  • X (heterogeneous) - T1: compresssed input

Outputs

  • Y (heterogeneous) - T: uncompressed output

OnnxComMicrosoftGistPackMsfp15Decoder_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPackMsfp15Decoder_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

  • X (heterogeneous) - T1: compresssed input

Outputs

  • Y (heterogeneous) - T: uncompressed output

OnnxComMicrosoftGistPackMsfp15Encoder#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPackMsfp15Encoder(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: uncompressed input

Outputs

  • Y (heterogeneous) - T1: compressed output

OnnxComMicrosoftGistPackMsfp15Encoder_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGistPackMsfp15Encoder_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: uncompressed input

Outputs

  • Y (heterogeneous) - T1: compressed output

OnnxComMicrosoftGridSample#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGridSample(*args, **kwargs)#

Version

  • name: GridSample (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Given an input and a flow-field grid, computes the output using input values and pixel locations from grid. Currently, only spatial (4-D) inputs are supported. For input with shape (N, C, H, W) and grid with shape (N, H_out, W_out, 2), the output will have shape (N, C, H_out, W_out). For each output location output[n, :, h, w], the size-2 vector grid[n, h, w] specifies input pixel locations x and y, which are used to interpolate the output value output[n, :, h, w]. The GridSample operator is often used in doing grid generator and sampler in the [Spatial Transformer Networks](https://arxiv.org/abs/1506.02025). See also in [torch.nn.functional.grid_sample](https://pytorch.org/docs/master/generated/torch.nn.functional.grid_sample.html#torch-nn-functional-grid-sample).

Attributes

  • align_corners: If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input’s corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input’s corner pixels, making the sampling more resolution agnostic. Default value is ?.

  • mode: Three interpolation modes: bilinear (default), nearest and bicubic. Default value is ?.

  • padding_mode: Support padding modes for outside grid values: zeros`(default), `border, reflection. zeros: use 0 for out-of-bound grid locations, border: use border values for out-of-bound grid locations, reflection: use values at locations reflected by the border for out-of-bound grid locations. Default value is ?.

Inputs

  • X (heterogeneous) - T1: 4-D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the input data.

  • Grid (heterogeneous) - T1: Input offset, 4-D tensor of shape (N, H_out, W_out, 2), where H_out and W_out are the height and width of grid and output, Grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-1, 1]. If grid has values outside the range of [-1, 1], the corresponding outputs will be handled as defined by padding_mode.

Outputs

  • Y (heterogeneous) - T2: 4-D tensor of shape (N, C, H_out, W_out).

OnnxComMicrosoftGridSample_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGridSample_1(*args, **kwargs)#

Version

  • name: GridSample (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Given an input and a flow-field grid, computes the output using input values and pixel locations from grid. Currently, only spatial (4-D) inputs are supported. For input with shape (N, C, H, W) and grid with shape (N, H_out, W_out, 2), the output will have shape (N, C, H_out, W_out). For each output location output[n, :, h, w], the size-2 vector grid[n, h, w] specifies input pixel locations x and y, which are used to interpolate the output value output[n, :, h, w]. The GridSample operator is often used in doing grid generator and sampler in the [Spatial Transformer Networks](https://arxiv.org/abs/1506.02025). See also in [torch.nn.functional.grid_sample](https://pytorch.org/docs/master/generated/torch.nn.functional.grid_sample.html#torch-nn-functional-grid-sample).

Attributes

  • align_corners: If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input’s corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input’s corner pixels, making the sampling more resolution agnostic. Default value is ?.

  • mode: Three interpolation modes: bilinear (default), nearest and bicubic. Default value is ?.

  • padding_mode: Support padding modes for outside grid values: zeros`(default), `border, reflection. zeros: use 0 for out-of-bound grid locations, border: use border values for out-of-bound grid locations, reflection: use values at locations reflected by the border for out-of-bound grid locations. Default value is ?.

Inputs

  • X (heterogeneous) - T1: 4-D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the input data.

  • Grid (heterogeneous) - T1: Input offset, 4-D tensor of shape (N, H_out, W_out, 2), where H_out and W_out are the height and width of grid and output, Grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-1, 1]. If grid has values outside the range of [-1, 1], the corresponding outputs will be handled as defined by padding_mode.

Outputs

  • Y (heterogeneous) - T2: 4-D tensor of shape (N, C, H_out, W_out).

OnnxComMicrosoftGroup#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGroup(*args, **kwargs)#

Version

  • name: Group (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

if all the inputs are available, the output will be true

Inputs

Between 1 and 2147483647 inputs.

  • input_tensors (variadic) - T: list of dependency tensors

Outputs

  • done (heterogeneous) - B: all the dependency tensors are ready

OnnxComMicrosoftGroup_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftGroup_1(*args, **kwargs)#

Version

  • name: Group (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

if all the inputs are available, the output will be true

Inputs

Between 1 and 2147483647 inputs.

  • input_tensors (variadic) - T: list of dependency tensors

Outputs

  • done (heterogeneous) - B: all the dependency tensors are ready

OnnxComMicrosoftInPlaceAccumulator#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftInPlaceAccumulator(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

in-place accumulator for tensors

Inputs

Between 2 and 3 inputs.

  • old_sum (heterogeneous) - T: historical result of accumulator

  • value (heterogeneous) - T_GRAD: the value that will be added to the accumulator

  • update_signal (optional, heterogeneous) - T_BOOL: This signal indicates if tensor should be updated

Outputs

  • new_sum (heterogeneous) - T: updated result of accumulator

OnnxComMicrosoftInPlaceAccumulator_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftInPlaceAccumulator_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

in-place accumulator for tensors

Inputs

Between 2 and 3 inputs.

  • old_sum (heterogeneous) - T: historical result of accumulator

  • value (heterogeneous) - T_GRAD: the value that will be added to the accumulator

  • update_signal (optional, heterogeneous) - T_BOOL: This signal indicates if tensor should be updated

Outputs

  • new_sum (heterogeneous) - T: updated result of accumulator

OnnxComMicrosoftInverse#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftInverse(*args, **kwargs)#

Version

  • name: Inverse (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: Input tensor. Every matrix in the batch must be invertible.

Outputs

  • Y (heterogeneous) - T: Output tensor of the same type and shape as the input tensor.

OnnxComMicrosoftInverse_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftInverse_1(*args, **kwargs)#

Version

  • name: Inverse (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • X (heterogeneous) - T: Input tensor. Every matrix in the batch must be invertible.

Outputs

  • Y (heterogeneous) - T: Output tensor of the same type and shape as the input tensor.

OnnxComMicrosoftInvertibleLayerNormalizationGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftInvertibleLayerNormalizationGrad(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

LayerNormalizationGrad

Attributes

  • axis: The first normalization dimension: normalization will be performed along dimensions axis : rank(inputs). Default value is ?.

Inputs

  • Y_grad (heterogeneous) - V: The gradient tensor from output.

  • Y (heterogeneous) - V: Output data tensor from the forward path

  • scale (heterogeneous) - V: Scale tensor.

  • bias (heterogeneous) - V: Bias tensor.

  • inv_std_var (heterogeneous) - U: inverse std variance of X.

Outputs

  • X_grad (heterogeneous) - T: Gradient of the input.

  • scale_grad (heterogeneous) - V: Gradient of the scale.

  • bias_grad (heterogeneous) - V: Gradient of the bias.

OnnxComMicrosoftInvertibleLayerNormalizationGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftInvertibleLayerNormalizationGrad_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

LayerNormalizationGrad

Attributes

  • axis: The first normalization dimension: normalization will be performed along dimensions axis : rank(inputs). Default value is ?.

Inputs

  • Y_grad (heterogeneous) - V: The gradient tensor from output.

  • Y (heterogeneous) - V: Output data tensor from the forward path

  • scale (heterogeneous) - V: Scale tensor.

  • bias (heterogeneous) - V: Bias tensor.

  • inv_std_var (heterogeneous) - U: inverse std variance of X.

Outputs

  • X_grad (heterogeneous) - T: Gradient of the input.

  • scale_grad (heterogeneous) - V: Gradient of the scale.

  • bias_grad (heterogeneous) - V: Gradient of the bias.

OnnxComMicrosoftIrfft#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftIrfft(*args, **kwargs)#

Version

  • name: Irfft (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • normalized:

Default value is ?.

  • onesided:

Default value is ?.

  • signal_ndim (required):

Default value is ?.

Inputs

  • X (heterogeneous) - T: input tensor

Outputs

  • Y (heterogeneous) - T: output tensor

OnnxComMicrosoftIrfft_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftIrfft_1(*args, **kwargs)#

Version

  • name: Irfft (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • normalized:

Default value is ?.

  • onesided:

Default value is ?.

  • signal_ndim (required):

Default value is ?.

Inputs

  • X (heterogeneous) - T: input tensor

Outputs

  • Y (heterogeneous) - T: output tensor

OnnxComMicrosoftIsAllFinite#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftIsAllFinite(*args, **kwargs)#

Version

  • name: IsAllFinite (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

IsAllFinite

Attributes

  • isinf_only: If true, check only for Inf, -Inf. Default value is ?.

  • isnan_only: If true, check only for NaN. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - V: Input tensors to check.

Outputs

  • output (heterogeneous) - T: The output scalar. Its value is true if all input tensors are finite. Otherwise, the output value would be false.

OnnxComMicrosoftIsAllFinite_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftIsAllFinite_1(*args, **kwargs)#

Version

  • name: IsAllFinite (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

IsAllFinite

Attributes

  • isinf_only: If true, check only for Inf, -Inf. Default value is ?.

  • isnan_only: If true, check only for NaN. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - V: Input tensors to check.

Outputs

  • output (heterogeneous) - T: The output scalar. Its value is true if all input tensors are finite. Otherwise, the output value would be false.

OnnxComMicrosoftIsFinite#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftIsFinite(*args, **kwargs)#

Version

  • name: IsFinite (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

IsFinite

Inputs

  • X (heterogeneous) - T: The input tensor.

Outputs

  • Y (heterogeneous) - T1: The output tensor. Its shape is the same as the input.

OnnxComMicrosoftIsFinite_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftIsFinite_1(*args, **kwargs)#

Version

  • name: IsFinite (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

IsFinite

Inputs

  • X (heterogeneous) - T: The input tensor.

Outputs

  • Y (heterogeneous) - T1: The output tensor. Its shape is the same as the input.

OnnxComMicrosoftLambOptimizer#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftLambOptimizer(*args, **kwargs)#

Version

  • name: LambOptimizer (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • alpha: Coefficient of previous gradient in running average. Default value is ?.

  • beta: Coefficient of previous squared gradient in running average.The effective learning rate is computed by r = R / (1 + T * decay_factor). Default to 0 so that increasing update counts doesn’t reduce the learning rate. Default value is ?.

  • do_bias_correction: Compute unbiased 1st and 2nd momentums. Default value is ?.

  • epsilon: Small scalar to avoid dividing by zero. Default value is ?.

  • lambda: Regularization coefficient of 0.5 * lambda * ||X||_2^2. Default to 0, which means no regularization. Default value is ?.

  • max_norm_clip: clip threshold of gradients. Default value is ?.

  • ratio_max: Upper bound on confidence ratio. Default value is ?.

  • ratio_min: Lower bound on confidence ratio. Default value is ?.

Inputs

Between 0 and 5125 inputs.

  • update_signal (optional, heterogeneous) - T_BOOL: This signal indicates if weight tensors should be updated.

  • loss_scale (optional, heterogeneous) - T2: Loss scale for mixed precision training.

  • gradient_norm (optional, heterogeneous) - T_GRAD_NORM: Norm of global gradient.

  • R (optional, heterogeneous) - T1: The initial learning rate.

  • step (optional, heterogeneous) - TInt64: One-based index of the current training iteration.

  • __group_0__weights (optional) - T2: weights to optimize.

  • __group_0__gradients (optional) - T3: gradients computed in this iteration.

  • __group_0__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_0__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_0__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1__weights (optional) - T2: weights to optimize.

  • __group_1__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_2__weights (optional) - T2: weights to optimize.

  • __group_2__gradients (optional) - T3: gradients computed in this iteration.

  • __group_2__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_2__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_2__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_3__weights (optional) - T2: weights to optimize.

  • __group_3__gradients (optional) - T3: gradients computed in this iteration.

  • __group_3__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_3__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_3__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_4__weights (optional) - T2: weights to optimize.

  • __group_4__gradients (optional) - T3: gradients computed in this iteration.

  • __group_4__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_4__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_4__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_5__weights (optional) - T2: weights to optimize.

  • __group_5__gradients (optional) - T3: gradients computed in this iteration.

  • __group_5__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_5__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_5__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_6__weights (optional) - T2: weights to optimize.

  • __group_6__gradients (optional) - T3: gradients computed in this iteration.

  • __group_6__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_6__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_6__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_7__weights (optional) - T2: weights to optimize.

  • __group_7__gradients (optional) - T3: gradients computed in this iteration.

  • __group_7__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_7__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_7__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_8__weights (optional) - T2: weights to optimize.

  • __group_8__gradients (optional) - T3: gradients computed in this iteration.

  • __group_8__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_8__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_8__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_9__weights (optional) - T2: weights to optimize.

  • __group_9__gradients (optional) - T3: gradients computed in this iteration.

  • __group_9__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_9__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_9__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_10__weights (optional) - T2: weights to optimize.

  • __group_10__gradients (optional) - T3: gradients computed in this iteration.

  • __group_10__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_10__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_10__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_11__weights (optional) - T2: weights to optimize.

  • __group_11__gradients (optional) - T3: gradients computed in this iteration.

  • __group_11__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_11__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_11__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_12__weights (optional) - T2: weights to optimize.

  • __group_12__gradients (optional) - T3: gradients computed in this iteration.

  • __group_12__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_12__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_12__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_13__weights (optional) - T2: weights to optimize.

  • __group_13__gradients (optional) - T3: gradients computed in this iteration.

  • __group_13__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_13__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_13__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_14__weights (optional) - T2: weights to optimize.

  • __group_14__gradients (optional) - T3: gradients computed in this iteration.

  • __group_14__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_14__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_14__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_15__weights (optional) - T2: weights to optimize.

  • __group_15__gradients (optional) - T3: gradients computed in this iteration.

  • __group_15__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_15__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_15__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_16__weights (optional) - T2: weights to optimize.

  • __group_16__gradients (optional) - T3: gradients computed in this iteration.

  • __group_16__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_16__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_16__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_17__weights (optional) - T2: weights to optimize.

  • __group_17__gradients (optional) - T3: gradients computed in this iteration.

  • __group_17__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_17__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_17__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_18__weights (optional) - T2: weights to optimize.

  • __group_18__gradients (optional) - T3: gradients computed in this iteration.

  • __group_18__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_18__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_18__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_19__weights (optional) - T2: weights to optimize.

  • __group_19__gradients (optional) - T3: gradients computed in this iteration.

  • __group_19__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_19__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_19__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_20__weights (optional) - T2: weights to optimize.

  • __group_20__gradients (optional) - T3: gradients computed in this iteration.

  • __group_20__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_20__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_20__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_21__weights (optional) - T2: weights to optimize.

  • __group_21__gradients (optional) - T3: gradients computed in this iteration.

  • __group_21__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_21__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_21__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_22__weights (optional) - T2: weights to optimize.

  • __group_22__gradients (optional) - T3: gradients computed in this iteration.

  • __group_22__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_22__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_22__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_23__weights (optional) - T2: weights to optimize.

  • __group_23__gradients (optional) - T3: gradients computed in this iteration.

  • __group_23__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_23__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_23__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_24__weights (optional) - T2: weights to optimize.

  • __group_24__gradients (optional) - T3: gradients computed in this iteration.

  • __group_24__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_24__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_24__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_25__weights (optional) - T2: weights to optimize.

  • __group_25__gradients (optional) - T3: gradients computed in this iteration.

  • __group_25__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_25__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_25__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_26__weights (optional) - T2: weights to optimize.

  • __group_26__gradients (optional) - T3: gradients computed in this iteration.

  • __group_26__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_26__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_26__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_27__weights (optional) - T2: weights to optimize.

  • __group_27__gradients (optional) - T3: gradients computed in this iteration.

  • __group_27__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_27__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_27__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_28__weights (optional) - T2: weights to optimize.

  • __group_28__gradients (optional) - T3: gradients computed in this iteration.

  • __group_28__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_28__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_28__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_29__weights (optional) - T2: weights to optimize.

  • __group_29__gradients (optional) - T3: gradients computed in this iteration.

  • __group_29__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_29__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_29__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_30__weights (optional) - T2: weights to optimize.

  • __group_30__gradients (optional) - T3: gradients computed in this iteration.

  • __group_30__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_30__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_30__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_31__weights (optional) - T2: weights to optimize.

  • __group_31__gradients (optional) - T3: gradients computed in this iteration.

  • __group_31__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_31__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_31__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_32__weights (optional) - T2: weights to optimize.

  • __group_32__gradients (optional) - T3: gradients computed in this iteration.

  • __group_32__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_32__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_32__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_33__weights (optional) - T2: weights to optimize.

  • __group_33__gradients (optional) - T3: gradients computed in this iteration.

  • __group_33__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_33__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_33__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_34__weights (optional) - T2: weights to optimize.

  • __group_34__gradients (optional) - T3: gradients computed in this iteration.

  • __group_34__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_34__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_34__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_35__weights (optional) - T2: weights to optimize.

  • __group_35__gradients (optional) - T3: gradients computed in this iteration.

  • __group_35__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_35__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_35__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_36__weights (optional) - T2: weights to optimize.

  • __group_36__gradients (optional) - T3: gradients computed in this iteration.

  • __group_36__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_36__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_36__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_37__weights (optional) - T2: weights to optimize.

  • __group_37__gradients (optional) - T3: gradients computed in this iteration.

  • __group_37__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_37__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_37__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_38__weights (optional) - T2: weights to optimize.

  • __group_38__gradients (optional) - T3: gradients computed in this iteration.

  • __group_38__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_38__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_38__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_39__weights (optional) - T2: weights to optimize.

  • __group_39__gradients (optional) - T3: gradients computed in this iteration.

  • __group_39__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_39__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_39__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_40__weights (optional) - T2: weights to optimize.

  • __group_40__gradients (optional) - T3: gradients computed in this iteration.

  • __group_40__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_40__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_40__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_41__weights (optional) - T2: weights to optimize.

  • __group_41__gradients (optional) - T3: gradients computed in this iteration.

  • __group_41__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_41__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_41__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_42__weights (optional) - T2: weights to optimize.

  • __group_42__gradients (optional) - T3: gradients computed in this iteration.

  • __group_42__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_42__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_42__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_43__weights (optional) - T2: weights to optimize.

  • __group_43__gradients (optional) - T3: gradients computed in this iteration.

  • __group_43__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_43__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_43__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_44__weights (optional) - T2: weights to optimize.

  • __group_44__gradients (optional) - T3: gradients computed in this iteration.

  • __group_44__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_44__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_44__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_45__weights (optional) - T2: weights to optimize.

  • __group_45__gradients (optional) - T3: gradients computed in this iteration.

  • __group_45__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_45__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_45__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_46__weights (optional) - T2: weights to optimize.

  • __group_46__gradients (optional) - T3: gradients computed in this iteration.

  • __group_46__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_46__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_46__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_47__weights (optional) - T2: weights to optimize.

  • __group_47__gradients (optional) - T3: gradients computed in this iteration.

  • __group_47__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_47__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_47__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_48__weights (optional) - T2: weights to optimize.

  • __group_48__gradients (optional) - T3: gradients computed in this iteration.

  • __group_48__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_48__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_48__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_49__weights (optional) - T2: weights to optimize.

  • __group_49__gradients (optional) - T3: gradients computed in this iteration.

  • __group_49__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_49__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_49__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_50__weights (optional) - T2: weights to optimize.

  • __group_50__gradients (optional) - T3: gradients computed in this iteration.

  • __group_50__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_50__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_50__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_51__weights (optional) - T2: weights to optimize.

  • __group_51__gradients (optional) - T3: gradients computed in this iteration.

  • __group_51__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_51__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_51__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_52__weights (optional) - T2: weights to optimize.

  • __group_52__gradients (optional) - T3: gradients computed in this iteration.

  • __group_52__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_52__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_52__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_53__weights (optional) - T2: weights to optimize.

  • __group_53__gradients (optional) - T3: gradients computed in this iteration.

  • __group_53__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_53__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_53__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_54__weights (optional) - T2: weights to optimize.

  • __group_54__gradients (optional) - T3: gradients computed in this iteration.

  • __group_54__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_54__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_54__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_55__weights (optional) - T2: weights to optimize.

  • __group_55__gradients (optional) - T3: gradients computed in this iteration.

  • __group_55__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_55__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_55__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_56__weights (optional) - T2: weights to optimize.

  • __group_56__gradients (optional) - T3: gradients computed in this iteration.

  • __group_56__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_56__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_56__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_57__weights (optional) - T2: weights to optimize.

  • __group_57__gradients (optional) - T3: gradients computed in this iteration.

  • __group_57__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_57__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_57__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_58__weights (optional) - T2: weights to optimize.

  • __group_58__gradients (optional) - T3: gradients computed in this iteration.

  • __group_58__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_58__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_58__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_59__weights (optional) - T2: weights to optimize.

  • __group_59__gradients (optional) - T3: gradients computed in this iteration.

  • __group_59__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_59__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_59__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_60__weights (optional) - T2: weights to optimize.

  • __group_60__gradients (optional) - T3: gradients computed in this iteration.

  • __group_60__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_60__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_60__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_61__weights (optional) - T2: weights to optimize.

  • __group_61__gradients (optional) - T3: gradients computed in this iteration.

  • __group_61__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_61__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_61__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_62__weights (optional) - T2: weights to optimize.

  • __group_62__gradients (optional) - T3: gradients computed in this iteration.

  • __group_62__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_62__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_62__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_63__weights (optional) - T2: weights to optimize.

  • __group_63__gradients (optional) - T3: gradients computed in this iteration.

  • __group_63__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_63__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_63__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_64__weights (optional) - T2: weights to optimize.

  • __group_64__gradients (optional) - T3: gradients computed in this iteration.

  • __group_64__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_64__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_64__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_65__weights (optional) - T2: weights to optimize.

  • __group_65__gradients (optional) - T3: gradients computed in this iteration.

  • __group_65__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_65__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_65__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_66__weights (optional) - T2: weights to optimize.

  • __group_66__gradients (optional) - T3: gradients computed in this iteration.

  • __group_66__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_66__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_66__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_67__weights (optional) - T2: weights to optimize.

  • __group_67__gradients (optional) - T3: gradients computed in this iteration.

  • __group_67__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_67__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_67__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_68__weights (optional) - T2: weights to optimize.

  • __group_68__gradients (optional) - T3: gradients computed in this iteration.

  • __group_68__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_68__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_68__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_69__weights (optional) - T2: weights to optimize.

  • __group_69__gradients (optional) - T3: gradients computed in this iteration.

  • __group_69__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_69__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_69__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_70__weights (optional) - T2: weights to optimize.

  • __group_70__gradients (optional) - T3: gradients computed in this iteration.

  • __group_70__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_70__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_70__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_71__weights (optional) - T2: weights to optimize.

  • __group_71__gradients (optional) - T3: gradients computed in this iteration.

  • __group_71__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_71__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_71__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_72__weights (optional) - T2: weights to optimize.

  • __group_72__gradients (optional) - T3: gradients computed in this iteration.

  • __group_72__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_72__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_72__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_73__weights (optional) - T2: weights to optimize.

  • __group_73__gradients (optional) - T3: gradients computed in this iteration.

  • __group_73__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_73__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_73__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_74__weights (optional) - T2: weights to optimize.

  • __group_74__gradients (optional) - T3: gradients computed in this iteration.

  • __group_74__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_74__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_74__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_75__weights (optional) - T2: weights to optimize.

  • __group_75__gradients (optional) - T3: gradients computed in this iteration.

  • __group_75__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_75__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_75__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_76__weights (optional) - T2: weights to optimize.

  • __group_76__gradients (optional) - T3: gradients computed in this iteration.

  • __group_76__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_76__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_76__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_77__weights (optional) - T2: weights to optimize.

  • __group_77__gradients (optional) - T3: gradients computed in this iteration.

  • __group_77__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_77__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_77__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_78__weights (optional) - T2: weights to optimize.

  • __group_78__gradients (optional) - T3: gradients computed in this iteration.

  • __group_78__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_78__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_78__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_79__weights (optional) - T2: weights to optimize.

  • __group_79__gradients (optional) - T3: gradients computed in this iteration.

  • __group_79__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_79__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_79__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_80__weights (optional) - T2: weights to optimize.

  • __group_80__gradients (optional) - T3: gradients computed in this iteration.

  • __group_80__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_80__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_80__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_81__weights (optional) - T2: weights to optimize.

  • __group_81__gradients (optional) - T3: gradients computed in this iteration.

  • __group_81__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_81__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_81__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_82__weights (optional) - T2: weights to optimize.

  • __group_82__gradients (optional) - T3: gradients computed in this iteration.

  • __group_82__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_82__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_82__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_83__weights (optional) - T2: weights to optimize.

  • __group_83__gradients (optional) - T3: gradients computed in this iteration.

  • __group_83__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_83__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_83__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_84__weights (optional) - T2: weights to optimize.

  • __group_84__gradients (optional) - T3: gradients computed in this iteration.

  • __group_84__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_84__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_84__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_85__weights (optional) - T2: weights to optimize.

  • __group_85__gradients (optional) - T3: gradients computed in this iteration.

  • __group_85__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_85__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_85__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_86__weights (optional) - T2: weights to optimize.

  • __group_86__gradients (optional) - T3: gradients computed in this iteration.

  • __group_86__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_86__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_86__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_87__weights (optional) - T2: weights to optimize.

  • __group_87__gradients (optional) - T3: gradients computed in this iteration.

  • __group_87__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_87__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_87__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_88__weights (optional) - T2: weights to optimize.

  • __group_88__gradients (optional) - T3: gradients computed in this iteration.

  • __group_88__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_88__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_88__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_89__weights (optional) - T2: weights to optimize.

  • __group_89__gradients (optional) - T3: gradients computed in this iteration.

  • __group_89__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_89__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_89__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_90__weights (optional) - T2: weights to optimize.

  • __group_90__gradients (optional) - T3: gradients computed in this iteration.

  • __group_90__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_90__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_90__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_91__weights (optional) - T2: weights to optimize.

  • __group_91__gradients (optional) - T3: gradients computed in this iteration.

  • __group_91__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_91__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_91__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_92__weights (optional) - T2: weights to optimize.

  • __group_92__gradients (optional) - T3: gradients computed in this iteration.

  • __group_92__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_92__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_92__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_93__weights (optional) - T2: weights to optimize.

  • __group_93__gradients (optional) - T3: gradients computed in this iteration.

  • __group_93__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_93__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_93__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_94__weights (optional) - T2: weights to optimize.

  • __group_94__gradients (optional) - T3: gradients computed in this iteration.

  • __group_94__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_94__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_94__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_95__weights (optional) - T2: weights to optimize.

  • __group_95__gradients (optional) - T3: gradients computed in this iteration.

  • __group_95__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_95__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_95__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_96__weights (optional) - T2: weights to optimize.

  • __group_96__gradients (optional) - T3: gradients computed in this iteration.

  • __group_96__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_96__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_96__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_97__weights (optional) - T2: weights to optimize.

  • __group_97__gradients (optional) - T3: gradients computed in this iteration.

  • __group_97__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_97__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_97__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_98__weights (optional) - T2: weights to optimize.

  • __group_98__gradients (optional) - T3: gradients computed in this iteration.

  • __group_98__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_98__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_98__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_99__weights (optional) - T2: weights to optimize.

  • __group_99__gradients (optional) - T3: gradients computed in this iteration.

  • __group_99__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_99__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_99__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_100__weights (optional) - T2: weights to optimize.

  • __group_100__gradients (optional) - T3: gradients computed in this iteration.

  • __group_100__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_100__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_100__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_101__weights (optional) - T2: weights to optimize.

  • __group_101__gradients (optional) - T3: gradients computed in this iteration.

  • __group_101__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_101__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_101__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_102__weights (optional) - T2: weights to optimize.

  • __group_102__gradients (optional) - T3: gradients computed in this iteration.

  • __group_102__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_102__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_102__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_103__weights (optional) - T2: weights to optimize.

  • __group_103__gradients (optional) - T3: gradients computed in this iteration.

  • __group_103__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_103__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_103__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_104__weights (optional) - T2: weights to optimize.

  • __group_104__gradients (optional) - T3: gradients computed in this iteration.

  • __group_104__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_104__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_104__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_105__weights (optional) - T2: weights to optimize.

  • __group_105__gradients (optional) - T3: gradients computed in this iteration.

  • __group_105__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_105__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_105__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_106__weights (optional) - T2: weights to optimize.

  • __group_106__gradients (optional) - T3: gradients computed in this iteration.

  • __group_106__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_106__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_106__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_107__weights (optional) - T2: weights to optimize.

  • __group_107__gradients (optional) - T3: gradients computed in this iteration.

  • __group_107__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_107__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_107__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_108__weights (optional) - T2: weights to optimize.

  • __group_108__gradients (optional) - T3: gradients computed in this iteration.

  • __group_108__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_108__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_108__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_109__weights (optional) - T2: weights to optimize.

  • __group_109__gradients (optional) - T3: gradients computed in this iteration.

  • __group_109__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_109__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_109__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_110__weights (optional) - T2: weights to optimize.

  • __group_110__gradients (optional) - T3: gradients computed in this iteration.

  • __group_110__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_110__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_110__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_111__weights (optional) - T2: weights to optimize.

  • __group_111__gradients (optional) - T3: gradients computed in this iteration.

  • __group_111__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_111__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_111__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_112__weights (optional) - T2: weights to optimize.

  • __group_112__gradients (optional) - T3: gradients computed in this iteration.

  • __group_112__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_112__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_112__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_113__weights (optional) - T2: weights to optimize.

  • __group_113__gradients (optional) - T3: gradients computed in this iteration.

  • __group_113__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_113__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_113__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_114__weights (optional) - T2: weights to optimize.

  • __group_114__gradients (optional) - T3: gradients computed in this iteration.

  • __group_114__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_114__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_114__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_115__weights (optional) - T2: weights to optimize.

  • __group_115__gradients (optional) - T3: gradients computed in this iteration.

  • __group_115__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_115__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_115__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_116__weights (optional) - T2: weights to optimize.

  • __group_116__gradients (optional) - T3: gradients computed in this iteration.

  • __group_116__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_116__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_116__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_117__weights (optional) - T2: weights to optimize.

  • __group_117__gradients (optional) - T3: gradients computed in this iteration.

  • __group_117__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_117__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_117__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_118__weights (optional) - T2: weights to optimize.

  • __group_118__gradients (optional) - T3: gradients computed in this iteration.

  • __group_118__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_118__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_118__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_119__weights (optional) - T2: weights to optimize.

  • __group_119__gradients (optional) - T3: gradients computed in this iteration.

  • __group_119__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_119__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_119__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_120__weights (optional) - T2: weights to optimize.

  • __group_120__gradients (optional) - T3: gradients computed in this iteration.

  • __group_120__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_120__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_120__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_121__weights (optional) - T2: weights to optimize.

  • __group_121__gradients (optional) - T3: gradients computed in this iteration.

  • __group_121__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_121__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_121__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_122__weights (optional) - T2: weights to optimize.

  • __group_122__gradients (optional) - T3: gradients computed in this iteration.

  • __group_122__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_122__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_122__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_123__weights (optional) - T2: weights to optimize.

  • __group_123__gradients (optional) - T3: gradients computed in this iteration.

  • __group_123__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_123__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_123__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_124__weights (optional) - T2: weights to optimize.

  • __group_124__gradients (optional) - T3: gradients computed in this iteration.

  • __group_124__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_124__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_124__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_125__weights (optional) - T2: weights to optimize.

  • __group_125__gradients (optional) - T3: gradients computed in this iteration.

  • __group_125__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_125__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_125__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_126__weights (optional) - T2: weights to optimize.

  • __group_126__gradients (optional) - T3: gradients computed in this iteration.

  • __group_126__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_126__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_126__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_127__weights (optional) - T2: weights to optimize.

  • __group_127__gradients (optional) - T3: gradients computed in this iteration.

  • __group_127__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_127__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_127__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_128__weights (optional) - T2: weights to optimize.

  • __group_128__gradients (optional) - T3: gradients computed in this iteration.

  • __group_128__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_128__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_128__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_129__weights (optional) - T2: weights to optimize.

  • __group_129__gradients (optional) - T3: gradients computed in this iteration.

  • __group_129__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_129__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_129__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_130__weights (optional) - T2: weights to optimize.

  • __group_130__gradients (optional) - T3: gradients computed in this iteration.

  • __group_130__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_130__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_130__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_131__weights (optional) - T2: weights to optimize.

  • __group_131__gradients (optional) - T3: gradients computed in this iteration.

  • __group_131__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_131__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_131__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_132__weights (optional) - T2: weights to optimize.

  • __group_132__gradients (optional) - T3: gradients computed in this iteration.

  • __group_132__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_132__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_132__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_133__weights (optional) - T2: weights to optimize.

  • __group_133__gradients (optional) - T3: gradients computed in this iteration.

  • __group_133__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_133__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_133__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_134__weights (optional) - T2: weights to optimize.

  • __group_134__gradients (optional) - T3: gradients computed in this iteration.

  • __group_134__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_134__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_134__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_135__weights (optional) - T2: weights to optimize.

  • __group_135__gradients (optional) - T3: gradients computed in this iteration.

  • __group_135__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_135__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_135__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_136__weights (optional) - T2: weights to optimize.

  • __group_136__gradients (optional) - T3: gradients computed in this iteration.

  • __group_136__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_136__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_136__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_137__weights (optional) - T2: weights to optimize.

  • __group_137__gradients (optional) - T3: gradients computed in this iteration.

  • __group_137__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_137__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_137__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_138__weights (optional) - T2: weights to optimize.

  • __group_138__gradients (optional) - T3: gradients computed in this iteration.

  • __group_138__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_138__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_138__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_139__weights (optional) - T2: weights to optimize.

  • __group_139__gradients (optional) - T3: gradients computed in this iteration.

  • __group_139__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_139__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_139__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_140__weights (optional) - T2: weights to optimize.

  • __group_140__gradients (optional) - T3: gradients computed in this iteration.

  • __group_140__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_140__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_140__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_141__weights (optional) - T2: weights to optimize.

  • __group_141__gradients (optional) - T3: gradients computed in this iteration.

  • __group_141__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_141__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_141__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_142__weights (optional) - T2: weights to optimize.

  • __group_142__gradients (optional) - T3: gradients computed in this iteration.

  • __group_142__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_142__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_142__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_143__weights (optional) - T2: weights to optimize.

  • __group_143__gradients (optional) - T3: gradients computed in this iteration.

  • __group_143__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_143__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_143__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_144__weights (optional) - T2: weights to optimize.

  • __group_144__gradients (optional) - T3: gradients computed in this iteration.

  • __group_144__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_144__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_144__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_145__weights (optional) - T2: weights to optimize.

  • __group_145__gradients (optional) - T3: gradients computed in this iteration.

  • __group_145__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_145__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_145__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_146__weights (optional) - T2: weights to optimize.

  • __group_146__gradients (optional) - T3: gradients computed in this iteration.

  • __group_146__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_146__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_146__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_147__weights (optional) - T2: weights to optimize.

  • __group_147__gradients (optional) - T3: gradients computed in this iteration.

  • __group_147__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_147__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_147__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_148__weights (optional) - T2: weights to optimize.

  • __group_148__gradients (optional) - T3: gradients computed in this iteration.

  • __group_148__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_148__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_148__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_149__weights (optional) - T2: weights to optimize.

  • __group_149__gradients (optional) - T3: gradients computed in this iteration.

  • __group_149__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_149__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_149__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_150__weights (optional) - T2: weights to optimize.

  • __group_150__gradients (optional) - T3: gradients computed in this iteration.

  • __group_150__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_150__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_150__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_151__weights (optional) - T2: weights to optimize.

  • __group_151__gradients (optional) - T3: gradients computed in this iteration.

  • __group_151__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_151__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_151__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_152__weights (optional) - T2: weights to optimize.

  • __group_152__gradients (optional) - T3: gradients computed in this iteration.

  • __group_152__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_152__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_152__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_153__weights (optional) - T2: weights to optimize.

  • __group_153__gradients (optional) - T3: gradients computed in this iteration.

  • __group_153__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_153__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_153__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_154__weights (optional) - T2: weights to optimize.

  • __group_154__gradients (optional) - T3: gradients computed in this iteration.

  • __group_154__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_154__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_154__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_155__weights (optional) - T2: weights to optimize.

  • __group_155__gradients (optional) - T3: gradients computed in this iteration.

  • __group_155__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_155__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_155__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_156__weights (optional) - T2: weights to optimize.

  • __group_156__gradients (optional) - T3: gradients computed in this iteration.

  • __group_156__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_156__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_156__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_157__weights (optional) - T2: weights to optimize.

  • __group_157__gradients (optional) - T3: gradients computed in this iteration.

  • __group_157__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_157__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_157__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_158__weights (optional) - T2: weights to optimize.

  • __group_158__gradients (optional) - T3: gradients computed in this iteration.

  • __group_158__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_158__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_158__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_159__weights (optional) - T2: weights to optimize.

  • __group_159__gradients (optional) - T3: gradients computed in this iteration.

  • __group_159__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_159__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_159__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_160__weights (optional) - T2: weights to optimize.

  • __group_160__gradients (optional) - T3: gradients computed in this iteration.

  • __group_160__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_160__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_160__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_161__weights (optional) - T2: weights to optimize.

  • __group_161__gradients (optional) - T3: gradients computed in this iteration.

  • __group_161__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_161__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_161__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_162__weights (optional) - T2: weights to optimize.

  • __group_162__gradients (optional) - T3: gradients computed in this iteration.

  • __group_162__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_162__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_162__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_163__weights (optional) - T2: weights to optimize.

  • __group_163__gradients (optional) - T3: gradients computed in this iteration.

  • __group_163__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_163__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_163__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_164__weights (optional) - T2: weights to optimize.

  • __group_164__gradients (optional) - T3: gradients computed in this iteration.

  • __group_164__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_164__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_164__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_165__weights (optional) - T2: weights to optimize.

  • __group_165__gradients (optional) - T3: gradients computed in this iteration.

  • __group_165__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_165__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_165__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_166__weights (optional) - T2: weights to optimize.

  • __group_166__gradients (optional) - T3: gradients computed in this iteration.

  • __group_166__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_166__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_166__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_167__weights (optional) - T2: weights to optimize.

  • __group_167__gradients (optional) - T3: gradients computed in this iteration.

  • __group_167__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_167__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_167__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_168__weights (optional) - T2: weights to optimize.

  • __group_168__gradients (optional) - T3: gradients computed in this iteration.

  • __group_168__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_168__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_168__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_169__weights (optional) - T2: weights to optimize.

  • __group_169__gradients (optional) - T3: gradients computed in this iteration.

  • __group_169__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_169__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_169__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_170__weights (optional) - T2: weights to optimize.

  • __group_170__gradients (optional) - T3: gradients computed in this iteration.

  • __group_170__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_170__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_170__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_171__weights (optional) - T2: weights to optimize.

  • __group_171__gradients (optional) - T3: gradients computed in this iteration.

  • __group_171__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_171__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_171__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_172__weights (optional) - T2: weights to optimize.

  • __group_172__gradients (optional) - T3: gradients computed in this iteration.

  • __group_172__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_172__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_172__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_173__weights (optional) - T2: weights to optimize.

  • __group_173__gradients (optional) - T3: gradients computed in this iteration.

  • __group_173__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_173__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_173__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_174__weights (optional) - T2: weights to optimize.

  • __group_174__gradients (optional) - T3: gradients computed in this iteration.

  • __group_174__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_174__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_174__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_175__weights (optional) - T2: weights to optimize.

  • __group_175__gradients (optional) - T3: gradients computed in this iteration.

  • __group_175__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_175__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_175__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_176__weights (optional) - T2: weights to optimize.

  • __group_176__gradients (optional) - T3: gradients computed in this iteration.

  • __group_176__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_176__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_176__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_177__weights (optional) - T2: weights to optimize.

  • __group_177__gradients (optional) - T3: gradients computed in this iteration.

  • __group_177__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_177__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_177__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_178__weights (optional) - T2: weights to optimize.

  • __group_178__gradients (optional) - T3: gradients computed in this iteration.

  • __group_178__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_178__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_178__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_179__weights (optional) - T2: weights to optimize.

  • __group_179__gradients (optional) - T3: gradients computed in this iteration.

  • __group_179__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_179__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_179__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_180__weights (optional) - T2: weights to optimize.

  • __group_180__gradients (optional) - T3: gradients computed in this iteration.

  • __group_180__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_180__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_180__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_181__weights (optional) - T2: weights to optimize.

  • __group_181__gradients (optional) - T3: gradients computed in this iteration.

  • __group_181__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_181__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_181__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_182__weights (optional) - T2: weights to optimize.

  • __group_182__gradients (optional) - T3: gradients computed in this iteration.

  • __group_182__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_182__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_182__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_183__weights (optional) - T2: weights to optimize.

  • __group_183__gradients (optional) - T3: gradients computed in this iteration.

  • __group_183__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_183__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_183__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_184__weights (optional) - T2: weights to optimize.

  • __group_184__gradients (optional) - T3: gradients computed in this iteration.

  • __group_184__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_184__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_184__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_185__weights (optional) - T2: weights to optimize.

  • __group_185__gradients (optional) - T3: gradients computed in this iteration.

  • __group_185__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_185__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_185__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_186__weights (optional) - T2: weights to optimize.

  • __group_186__gradients (optional) - T3: gradients computed in this iteration.

  • __group_186__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_186__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_186__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_187__weights (optional) - T2: weights to optimize.

  • __group_187__gradients (optional) - T3: gradients computed in this iteration.

  • __group_187__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_187__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_187__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_188__weights (optional) - T2: weights to optimize.

  • __group_188__gradients (optional) - T3: gradients computed in this iteration.

  • __group_188__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_188__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_188__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_189__weights (optional) - T2: weights to optimize.

  • __group_189__gradients (optional) - T3: gradients computed in this iteration.

  • __group_189__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_189__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_189__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_190__weights (optional) - T2: weights to optimize.

  • __group_190__gradients (optional) - T3: gradients computed in this iteration.

  • __group_190__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_190__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_190__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_191__weights (optional) - T2: weights to optimize.

  • __group_191__gradients (optional) - T3: gradients computed in this iteration.

  • __group_191__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_191__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_191__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_192__weights (optional) - T2: weights to optimize.

  • __group_192__gradients (optional) - T3: gradients computed in this iteration.

  • __group_192__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_192__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_192__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_193__weights (optional) - T2: weights to optimize.

  • __group_193__gradients (optional) - T3: gradients computed in this iteration.

  • __group_193__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_193__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_193__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_194__weights (optional) - T2: weights to optimize.

  • __group_194__gradients (optional) - T3: gradients computed in this iteration.

  • __group_194__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_194__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_194__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_195__weights (optional) - T2: weights to optimize.

  • __group_195__gradients (optional) - T3: gradients computed in this iteration.

  • __group_195__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_195__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_195__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_196__weights (optional) - T2: weights to optimize.

  • __group_196__gradients (optional) - T3: gradients computed in this iteration.

  • __group_196__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_196__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_196__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_197__weights (optional) - T2: weights to optimize.

  • __group_197__gradients (optional) - T3: gradients computed in this iteration.

  • __group_197__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_197__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_197__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_198__weights (optional) - T2: weights to optimize.

  • __group_198__gradients (optional) - T3: gradients computed in this iteration.

  • __group_198__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_198__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_198__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_199__weights (optional) - T2: weights to optimize.

  • __group_199__gradients (optional) - T3: gradients computed in this iteration.

  • __group_199__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_199__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_199__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_200__weights (optional) - T2: weights to optimize.

  • __group_200__gradients (optional) - T3: gradients computed in this iteration.

  • __group_200__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_200__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_200__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_201__weights (optional) - T2: weights to optimize.

  • __group_201__gradients (optional) - T3: gradients computed in this iteration.

  • __group_201__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_201__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_201__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_202__weights (optional) - T2: weights to optimize.

  • __group_202__gradients (optional) - T3: gradients computed in this iteration.

  • __group_202__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_202__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_202__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_203__weights (optional) - T2: weights to optimize.

  • __group_203__gradients (optional) - T3: gradients computed in this iteration.

  • __group_203__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_203__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_203__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_204__weights (optional) - T2: weights to optimize.

  • __group_204__gradients (optional) - T3: gradients computed in this iteration.

  • __group_204__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_204__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_204__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_205__weights (optional) - T2: weights to optimize.

  • __group_205__gradients (optional) - T3: gradients computed in this iteration.

  • __group_205__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_205__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_205__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_206__weights (optional) - T2: weights to optimize.

  • __group_206__gradients (optional) - T3: gradients computed in this iteration.

  • __group_206__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_206__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_206__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_207__weights (optional) - T2: weights to optimize.

  • __group_207__gradients (optional) - T3: gradients computed in this iteration.

  • __group_207__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_207__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_207__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_208__weights (optional) - T2: weights to optimize.

  • __group_208__gradients (optional) - T3: gradients computed in this iteration.

  • __group_208__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_208__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_208__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_209__weights (optional) - T2: weights to optimize.

  • __group_209__gradients (optional) - T3: gradients computed in this iteration.

  • __group_209__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_209__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_209__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_210__weights (optional) - T2: weights to optimize.

  • __group_210__gradients (optional) - T3: gradients computed in this iteration.

  • __group_210__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_210__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_210__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_211__weights (optional) - T2: weights to optimize.

  • __group_211__gradients (optional) - T3: gradients computed in this iteration.

  • __group_211__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_211__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_211__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_212__weights (optional) - T2: weights to optimize.

  • __group_212__gradients (optional) - T3: gradients computed in this iteration.

  • __group_212__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_212__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_212__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_213__weights (optional) - T2: weights to optimize.

  • __group_213__gradients (optional) - T3: gradients computed in this iteration.

  • __group_213__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_213__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_213__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_214__weights (optional) - T2: weights to optimize.

  • __group_214__gradients (optional) - T3: gradients computed in this iteration.

  • __group_214__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_214__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_214__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_215__weights (optional) - T2: weights to optimize.

  • __group_215__gradients (optional) - T3: gradients computed in this iteration.

  • __group_215__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_215__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_215__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_216__weights (optional) - T2: weights to optimize.

  • __group_216__gradients (optional) - T3: gradients computed in this iteration.

  • __group_216__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_216__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_216__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_217__weights (optional) - T2: weights to optimize.

  • __group_217__gradients (optional) - T3: gradients computed in this iteration.

  • __group_217__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_217__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_217__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_218__weights (optional) - T2: weights to optimize.

  • __group_218__gradients (optional) - T3: gradients computed in this iteration.

  • __group_218__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_218__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_218__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_219__weights (optional) - T2: weights to optimize.

  • __group_219__gradients (optional) - T3: gradients computed in this iteration.

  • __group_219__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_219__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_219__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_220__weights (optional) - T2: weights to optimize.

  • __group_220__gradients (optional) - T3: gradients computed in this iteration.

  • __group_220__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_220__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_220__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_221__weights (optional) - T2: weights to optimize.

  • __group_221__gradients (optional) - T3: gradients computed in this iteration.

  • __group_221__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_221__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_221__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_222__weights (optional) - T2: weights to optimize.

  • __group_222__gradients (optional) - T3: gradients computed in this iteration.

  • __group_222__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_222__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_222__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_223__weights (optional) - T2: weights to optimize.

  • __group_223__gradients (optional) - T3: gradients computed in this iteration.

  • __group_223__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_223__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_223__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_224__weights (optional) - T2: weights to optimize.

  • __group_224__gradients (optional) - T3: gradients computed in this iteration.

  • __group_224__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_224__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_224__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_225__weights (optional) - T2: weights to optimize.

  • __group_225__gradients (optional) - T3: gradients computed in this iteration.

  • __group_225__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_225__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_225__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_226__weights (optional) - T2: weights to optimize.

  • __group_226__gradients (optional) - T3: gradients computed in this iteration.

  • __group_226__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_226__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_226__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_227__weights (optional) - T2: weights to optimize.

  • __group_227__gradients (optional) - T3: gradients computed in this iteration.

  • __group_227__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_227__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_227__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_228__weights (optional) - T2: weights to optimize.

  • __group_228__gradients (optional) - T3: gradients computed in this iteration.

  • __group_228__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_228__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_228__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_229__weights (optional) - T2: weights to optimize.

  • __group_229__gradients (optional) - T3: gradients computed in this iteration.

  • __group_229__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_229__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_229__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_230__weights (optional) - T2: weights to optimize.

  • __group_230__gradients (optional) - T3: gradients computed in this iteration.

  • __group_230__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_230__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_230__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_231__weights (optional) - T2: weights to optimize.

  • __group_231__gradients (optional) - T3: gradients computed in this iteration.

  • __group_231__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_231__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_231__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_232__weights (optional) - T2: weights to optimize.

  • __group_232__gradients (optional) - T3: gradients computed in this iteration.

  • __group_232__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_232__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_232__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_233__weights (optional) - T2: weights to optimize.

  • __group_233__gradients (optional) - T3: gradients computed in this iteration.

  • __group_233__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_233__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_233__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_234__weights (optional) - T2: weights to optimize.

  • __group_234__gradients (optional) - T3: gradients computed in this iteration.

  • __group_234__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_234__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_234__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_235__weights (optional) - T2: weights to optimize.

  • __group_235__gradients (optional) - T3: gradients computed in this iteration.

  • __group_235__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_235__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_235__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_236__weights (optional) - T2: weights to optimize.

  • __group_236__gradients (optional) - T3: gradients computed in this iteration.

  • __group_236__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_236__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_236__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_237__weights (optional) - T2: weights to optimize.

  • __group_237__gradients (optional) - T3: gradients computed in this iteration.

  • __group_237__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_237__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_237__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_238__weights (optional) - T2: weights to optimize.

  • __group_238__gradients (optional) - T3: gradients computed in this iteration.

  • __group_238__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_238__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_238__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_239__weights (optional) - T2: weights to optimize.

  • __group_239__gradients (optional) - T3: gradients computed in this iteration.

  • __group_239__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_239__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_239__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_240__weights (optional) - T2: weights to optimize.

  • __group_240__gradients (optional) - T3: gradients computed in this iteration.

  • __group_240__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_240__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_240__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_241__weights (optional) - T2: weights to optimize.

  • __group_241__gradients (optional) - T3: gradients computed in this iteration.

  • __group_241__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_241__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_241__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_242__weights (optional) - T2: weights to optimize.

  • __group_242__gradients (optional) - T3: gradients computed in this iteration.

  • __group_242__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_242__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_242__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_243__weights (optional) - T2: weights to optimize.

  • __group_243__gradients (optional) - T3: gradients computed in this iteration.

  • __group_243__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_243__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_243__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_244__weights (optional) - T2: weights to optimize.

  • __group_244__gradients (optional) - T3: gradients computed in this iteration.

  • __group_244__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_244__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_244__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_245__weights (optional) - T2: weights to optimize.

  • __group_245__gradients (optional) - T3: gradients computed in this iteration.

  • __group_245__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_245__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_245__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_246__weights (optional) - T2: weights to optimize.

  • __group_246__gradients (optional) - T3: gradients computed in this iteration.

  • __group_246__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_246__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_246__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_247__weights (optional) - T2: weights to optimize.

  • __group_247__gradients (optional) - T3: gradients computed in this iteration.

  • __group_247__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_247__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_247__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_248__weights (optional) - T2: weights to optimize.

  • __group_248__gradients (optional) - T3: gradients computed in this iteration.

  • __group_248__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_248__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_248__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_249__weights (optional) - T2: weights to optimize.

  • __group_249__gradients (optional) - T3: gradients computed in this iteration.

  • __group_249__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_249__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_249__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_250__weights (optional) - T2: weights to optimize.

  • __group_250__gradients (optional) - T3: gradients computed in this iteration.

  • __group_250__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_250__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_250__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_251__weights (optional) - T2: weights to optimize.

  • __group_251__gradients (optional) - T3: gradients computed in this iteration.

  • __group_251__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_251__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_251__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_252__weights (optional) - T2: weights to optimize.

  • __group_252__gradients (optional) - T3: gradients computed in this iteration.

  • __group_252__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_252__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_252__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_253__weights (optional) - T2: weights to optimize.

  • __group_253__gradients (optional) - T3: gradients computed in this iteration.

  • __group_253__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_253__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_253__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_254__weights (optional) - T2: weights to optimize.

  • __group_254__gradients (optional) - T3: gradients computed in this iteration.

  • __group_254__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_254__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_254__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_255__weights (optional) - T2: weights to optimize.

  • __group_255__gradients (optional) - T3: gradients computed in this iteration.

  • __group_255__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_255__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_255__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_256__weights (optional) - T2: weights to optimize.

  • __group_256__gradients (optional) - T3: gradients computed in this iteration.

  • __group_256__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_256__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_256__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_257__weights (optional) - T2: weights to optimize.

  • __group_257__gradients (optional) - T3: gradients computed in this iteration.

  • __group_257__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_257__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_257__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_258__weights (optional) - T2: weights to optimize.

  • __group_258__gradients (optional) - T3: gradients computed in this iteration.

  • __group_258__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_258__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_258__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_259__weights (optional) - T2: weights to optimize.

  • __group_259__gradients (optional) - T3: gradients computed in this iteration.

  • __group_259__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_259__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_259__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_260__weights (optional) - T2: weights to optimize.

  • __group_260__gradients (optional) - T3: gradients computed in this iteration.

  • __group_260__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_260__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_260__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_261__weights (optional) - T2: weights to optimize.

  • __group_261__gradients (optional) - T3: gradients computed in this iteration.

  • __group_261__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_261__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_261__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_262__weights (optional) - T2: weights to optimize.

  • __group_262__gradients (optional) - T3: gradients computed in this iteration.

  • __group_262__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_262__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_262__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_263__weights (optional) - T2: weights to optimize.

  • __group_263__gradients (optional) - T3: gradients computed in this iteration.

  • __group_263__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_263__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_263__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_264__weights (optional) - T2: weights to optimize.

  • __group_264__gradients (optional) - T3: gradients computed in this iteration.

  • __group_264__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_264__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_264__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_265__weights (optional) - T2: weights to optimize.

  • __group_265__gradients (optional) - T3: gradients computed in this iteration.

  • __group_265__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_265__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_265__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_266__weights (optional) - T2: weights to optimize.

  • __group_266__gradients (optional) - T3: gradients computed in this iteration.

  • __group_266__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_266__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_266__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_267__weights (optional) - T2: weights to optimize.

  • __group_267__gradients (optional) - T3: gradients computed in this iteration.

  • __group_267__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_267__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_267__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_268__weights (optional) - T2: weights to optimize.

  • __group_268__gradients (optional) - T3: gradients computed in this iteration.

  • __group_268__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_268__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_268__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_269__weights (optional) - T2: weights to optimize.

  • __group_269__gradients (optional) - T3: gradients computed in this iteration.

  • __group_269__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_269__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_269__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_270__weights (optional) - T2: weights to optimize.

  • __group_270__gradients (optional) - T3: gradients computed in this iteration.

  • __group_270__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_270__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_270__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_271__weights (optional) - T2: weights to optimize.

  • __group_271__gradients (optional) - T3: gradients computed in this iteration.

  • __group_271__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_271__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_271__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_272__weights (optional) - T2: weights to optimize.

  • __group_272__gradients (optional) - T3: gradients computed in this iteration.

  • __group_272__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_272__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_272__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_273__weights (optional) - T2: weights to optimize.

  • __group_273__gradients (optional) - T3: gradients computed in this iteration.

  • __group_273__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_273__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_273__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_274__weights (optional) - T2: weights to optimize.

  • __group_274__gradients (optional) - T3: gradients computed in this iteration.

  • __group_274__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_274__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_274__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_275__weights (optional) - T2: weights to optimize.

  • __group_275__gradients (optional) - T3: gradients computed in this iteration.

  • __group_275__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_275__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_275__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_276__weights (optional) - T2: weights to optimize.

  • __group_276__gradients (optional) - T3: gradients computed in this iteration.

  • __group_276__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_276__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_276__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_277__weights (optional) - T2: weights to optimize.

  • __group_277__gradients (optional) - T3: gradients computed in this iteration.

  • __group_277__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_277__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_277__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_278__weights (optional) - T2: weights to optimize.

  • __group_278__gradients (optional) - T3: gradients computed in this iteration.

  • __group_278__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_278__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_278__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_279__weights (optional) - T2: weights to optimize.

  • __group_279__gradients (optional) - T3: gradients computed in this iteration.

  • __group_279__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_279__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_279__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_280__weights (optional) - T2: weights to optimize.

  • __group_280__gradients (optional) - T3: gradients computed in this iteration.

  • __group_280__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_280__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_280__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_281__weights (optional) - T2: weights to optimize.

  • __group_281__gradients (optional) - T3: gradients computed in this iteration.

  • __group_281__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_281__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_281__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_282__weights (optional) - T2: weights to optimize.

  • __group_282__gradients (optional) - T3: gradients computed in this iteration.

  • __group_282__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_282__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_282__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_283__weights (optional) - T2: weights to optimize.

  • __group_283__gradients (optional) - T3: gradients computed in this iteration.

  • __group_283__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_283__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_283__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_284__weights (optional) - T2: weights to optimize.

  • __group_284__gradients (optional) - T3: gradients computed in this iteration.

  • __group_284__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_284__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_284__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_285__weights (optional) - T2: weights to optimize.

  • __group_285__gradients (optional) - T3: gradients computed in this iteration.

  • __group_285__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_285__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_285__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_286__weights (optional) - T2: weights to optimize.

  • __group_286__gradients (optional) - T3: gradients computed in this iteration.

  • __group_286__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_286__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_286__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_287__weights (optional) - T2: weights to optimize.

  • __group_287__gradients (optional) - T3: gradients computed in this iteration.

  • __group_287__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_287__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_287__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_288__weights (optional) - T2: weights to optimize.

  • __group_288__gradients (optional) - T3: gradients computed in this iteration.

  • __group_288__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_288__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_288__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_289__weights (optional) - T2: weights to optimize.

  • __group_289__gradients (optional) - T3: gradients computed in this iteration.

  • __group_289__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_289__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_289__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_290__weights (optional) - T2: weights to optimize.

  • __group_290__gradients (optional) - T3: gradients computed in this iteration.

  • __group_290__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_290__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_290__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_291__weights (optional) - T2: weights to optimize.

  • __group_291__gradients (optional) - T3: gradients computed in this iteration.

  • __group_291__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_291__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_291__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_292__weights (optional) - T2: weights to optimize.

  • __group_292__gradients (optional) - T3: gradients computed in this iteration.

  • __group_292__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_292__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_292__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_293__weights (optional) - T2: weights to optimize.

  • __group_293__gradients (optional) - T3: gradients computed in this iteration.

  • __group_293__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_293__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_293__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_294__weights (optional) - T2: weights to optimize.

  • __group_294__gradients (optional) - T3: gradients computed in this iteration.

  • __group_294__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_294__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_294__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_295__weights (optional) - T2: weights to optimize.

  • __group_295__gradients (optional) - T3: gradients computed in this iteration.

  • __group_295__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_295__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_295__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_296__weights (optional) - T2: weights to optimize.

  • __group_296__gradients (optional) - T3: gradients computed in this iteration.

  • __group_296__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_296__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_296__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_297__weights (optional) - T2: weights to optimize.

  • __group_297__gradients (optional) - T3: gradients computed in this iteration.

  • __group_297__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_297__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_297__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_298__weights (optional) - T2: weights to optimize.

  • __group_298__gradients (optional) - T3: gradients computed in this iteration.

  • __group_298__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_298__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_298__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_299__weights (optional) - T2: weights to optimize.

  • __group_299__gradients (optional) - T3: gradients computed in this iteration.

  • __group_299__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_299__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_299__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_300__weights (optional) - T2: weights to optimize.

  • __group_300__gradients (optional) - T3: gradients computed in this iteration.

  • __group_300__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_300__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_300__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_301__weights (optional) - T2: weights to optimize.

  • __group_301__gradients (optional) - T3: gradients computed in this iteration.

  • __group_301__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_301__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_301__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_302__weights (optional) - T2: weights to optimize.

  • __group_302__gradients (optional) - T3: gradients computed in this iteration.

  • __group_302__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_302__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_302__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_303__weights (optional) - T2: weights to optimize.

  • __group_303__gradients (optional) - T3: gradients computed in this iteration.

  • __group_303__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_303__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_303__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_304__weights (optional) - T2: weights to optimize.

  • __group_304__gradients (optional) - T3: gradients computed in this iteration.

  • __group_304__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_304__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_304__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_305__weights (optional) - T2: weights to optimize.

  • __group_305__gradients (optional) - T3: gradients computed in this iteration.

  • __group_305__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_305__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_305__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_306__weights (optional) - T2: weights to optimize.

  • __group_306__gradients (optional) - T3: gradients computed in this iteration.

  • __group_306__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_306__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_306__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_307__weights (optional) - T2: weights to optimize.

  • __group_307__gradients (optional) - T3: gradients computed in this iteration.

  • __group_307__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_307__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_307__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_308__weights (optional) - T2: weights to optimize.

  • __group_308__gradients (optional) - T3: gradients computed in this iteration.

  • __group_308__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_308__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_308__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_309__weights (optional) - T2: weights to optimize.

  • __group_309__gradients (optional) - T3: gradients computed in this iteration.

  • __group_309__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_309__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_309__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_310__weights (optional) - T2: weights to optimize.

  • __group_310__gradients (optional) - T3: gradients computed in this iteration.

  • __group_310__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_310__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_310__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_311__weights (optional) - T2: weights to optimize.

  • __group_311__gradients (optional) - T3: gradients computed in this iteration.

  • __group_311__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_311__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_311__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_312__weights (optional) - T2: weights to optimize.

  • __group_312__gradients (optional) - T3: gradients computed in this iteration.

  • __group_312__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_312__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_312__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_313__weights (optional) - T2: weights to optimize.

  • __group_313__gradients (optional) - T3: gradients computed in this iteration.

  • __group_313__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_313__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_313__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_314__weights (optional) - T2: weights to optimize.

  • __group_314__gradients (optional) - T3: gradients computed in this iteration.

  • __group_314__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_314__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_314__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_315__weights (optional) - T2: weights to optimize.

  • __group_315__gradients (optional) - T3: gradients computed in this iteration.

  • __group_315__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_315__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_315__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_316__weights (optional) - T2: weights to optimize.

  • __group_316__gradients (optional) - T3: gradients computed in this iteration.

  • __group_316__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_316__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_316__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_317__weights (optional) - T2: weights to optimize.

  • __group_317__gradients (optional) - T3: gradients computed in this iteration.

  • __group_317__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_317__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_317__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_318__weights (optional) - T2: weights to optimize.

  • __group_318__gradients (optional) - T3: gradients computed in this iteration.

  • __group_318__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_318__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_318__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_319__weights (optional) - T2: weights to optimize.

  • __group_319__gradients (optional) - T3: gradients computed in this iteration.

  • __group_319__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_319__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_319__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_320__weights (optional) - T2: weights to optimize.

  • __group_320__gradients (optional) - T3: gradients computed in this iteration.

  • __group_320__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_320__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_320__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_321__weights (optional) - T2: weights to optimize.

  • __group_321__gradients (optional) - T3: gradients computed in this iteration.

  • __group_321__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_321__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_321__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_322__weights (optional) - T2: weights to optimize.

  • __group_322__gradients (optional) - T3: gradients computed in this iteration.

  • __group_322__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_322__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_322__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_323__weights (optional) - T2: weights to optimize.

  • __group_323__gradients (optional) - T3: gradients computed in this iteration.

  • __group_323__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_323__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_323__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_324__weights (optional) - T2: weights to optimize.

  • __group_324__gradients (optional) - T3: gradients computed in this iteration.

  • __group_324__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_324__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_324__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_325__weights (optional) - T2: weights to optimize.

  • __group_325__gradients (optional) - T3: gradients computed in this iteration.

  • __group_325__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_325__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_325__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_326__weights (optional) - T2: weights to optimize.

  • __group_326__gradients (optional) - T3: gradients computed in this iteration.

  • __group_326__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_326__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_326__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_327__weights (optional) - T2: weights to optimize.

  • __group_327__gradients (optional) - T3: gradients computed in this iteration.

  • __group_327__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_327__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_327__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_328__weights (optional) - T2: weights to optimize.

  • __group_328__gradients (optional) - T3: gradients computed in this iteration.

  • __group_328__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_328__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_328__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_329__weights (optional) - T2: weights to optimize.

  • __group_329__gradients (optional) - T3: gradients computed in this iteration.

  • __group_329__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_329__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_329__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_330__weights (optional) - T2: weights to optimize.

  • __group_330__gradients (optional) - T3: gradients computed in this iteration.

  • __group_330__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_330__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_330__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_331__weights (optional) - T2: weights to optimize.

  • __group_331__gradients (optional) - T3: gradients computed in this iteration.

  • __group_331__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_331__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_331__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_332__weights (optional) - T2: weights to optimize.

  • __group_332__gradients (optional) - T3: gradients computed in this iteration.

  • __group_332__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_332__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_332__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_333__weights (optional) - T2: weights to optimize.

  • __group_333__gradients (optional) - T3: gradients computed in this iteration.

  • __group_333__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_333__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_333__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_334__weights (optional) - T2: weights to optimize.

  • __group_334__gradients (optional) - T3: gradients computed in this iteration.

  • __group_334__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_334__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_334__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_335__weights (optional) - T2: weights to optimize.

  • __group_335__gradients (optional) - T3: gradients computed in this iteration.

  • __group_335__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_335__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_335__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_336__weights (optional) - T2: weights to optimize.

  • __group_336__gradients (optional) - T3: gradients computed in this iteration.

  • __group_336__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_336__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_336__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_337__weights (optional) - T2: weights to optimize.

  • __group_337__gradients (optional) - T3: gradients computed in this iteration.

  • __group_337__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_337__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_337__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_338__weights (optional) - T2: weights to optimize.

  • __group_338__gradients (optional) - T3: gradients computed in this iteration.

  • __group_338__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_338__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_338__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_339__weights (optional) - T2: weights to optimize.

  • __group_339__gradients (optional) - T3: gradients computed in this iteration.

  • __group_339__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_339__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_339__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_340__weights (optional) - T2: weights to optimize.

  • __group_340__gradients (optional) - T3: gradients computed in this iteration.

  • __group_340__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_340__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_340__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_341__weights (optional) - T2: weights to optimize.

  • __group_341__gradients (optional) - T3: gradients computed in this iteration.

  • __group_341__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_341__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_341__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_342__weights (optional) - T2: weights to optimize.

  • __group_342__gradients (optional) - T3: gradients computed in this iteration.

  • __group_342__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_342__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_342__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_343__weights (optional) - T2: weights to optimize.

  • __group_343__gradients (optional) - T3: gradients computed in this iteration.

  • __group_343__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_343__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_343__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_344__weights (optional) - T2: weights to optimize.

  • __group_344__gradients (optional) - T3: gradients computed in this iteration.

  • __group_344__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_344__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_344__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_345__weights (optional) - T2: weights to optimize.

  • __group_345__gradients (optional) - T3: gradients computed in this iteration.

  • __group_345__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_345__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_345__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_346__weights (optional) - T2: weights to optimize.

  • __group_346__gradients (optional) - T3: gradients computed in this iteration.

  • __group_346__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_346__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_346__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_347__weights (optional) - T2: weights to optimize.

  • __group_347__gradients (optional) - T3: gradients computed in this iteration.

  • __group_347__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_347__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_347__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_348__weights (optional) - T2: weights to optimize.

  • __group_348__gradients (optional) - T3: gradients computed in this iteration.

  • __group_348__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_348__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_348__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_349__weights (optional) - T2: weights to optimize.

  • __group_349__gradients (optional) - T3: gradients computed in this iteration.

  • __group_349__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_349__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_349__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_350__weights (optional) - T2: weights to optimize.

  • __group_350__gradients (optional) - T3: gradients computed in this iteration.

  • __group_350__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_350__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_350__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_351__weights (optional) - T2: weights to optimize.

  • __group_351__gradients (optional) - T3: gradients computed in this iteration.

  • __group_351__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_351__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_351__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_352__weights (optional) - T2: weights to optimize.

  • __group_352__gradients (optional) - T3: gradients computed in this iteration.

  • __group_352__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_352__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_352__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_353__weights (optional) - T2: weights to optimize.

  • __group_353__gradients (optional) - T3: gradients computed in this iteration.

  • __group_353__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_353__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_353__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_354__weights (optional) - T2: weights to optimize.

  • __group_354__gradients (optional) - T3: gradients computed in this iteration.

  • __group_354__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_354__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_354__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_355__weights (optional) - T2: weights to optimize.

  • __group_355__gradients (optional) - T3: gradients computed in this iteration.

  • __group_355__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_355__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_355__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_356__weights (optional) - T2: weights to optimize.

  • __group_356__gradients (optional) - T3: gradients computed in this iteration.

  • __group_356__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_356__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_356__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_357__weights (optional) - T2: weights to optimize.

  • __group_357__gradients (optional) - T3: gradients computed in this iteration.

  • __group_357__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_357__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_357__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_358__weights (optional) - T2: weights to optimize.

  • __group_358__gradients (optional) - T3: gradients computed in this iteration.

  • __group_358__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_358__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_358__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_359__weights (optional) - T2: weights to optimize.

  • __group_359__gradients (optional) - T3: gradients computed in this iteration.

  • __group_359__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_359__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_359__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_360__weights (optional) - T2: weights to optimize.

  • __group_360__gradients (optional) - T3: gradients computed in this iteration.

  • __group_360__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_360__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_360__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_361__weights (optional) - T2: weights to optimize.

  • __group_361__gradients (optional) - T3: gradients computed in this iteration.

  • __group_361__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_361__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_361__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_362__weights (optional) - T2: weights to optimize.

  • __group_362__gradients (optional) - T3: gradients computed in this iteration.

  • __group_362__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_362__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_362__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_363__weights (optional) - T2: weights to optimize.

  • __group_363__gradients (optional) - T3: gradients computed in this iteration.

  • __group_363__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_363__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_363__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_364__weights (optional) - T2: weights to optimize.

  • __group_364__gradients (optional) - T3: gradients computed in this iteration.

  • __group_364__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_364__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_364__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_365__weights (optional) - T2: weights to optimize.

  • __group_365__gradients (optional) - T3: gradients computed in this iteration.

  • __group_365__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_365__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_365__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_366__weights (optional) - T2: weights to optimize.

  • __group_366__gradients (optional) - T3: gradients computed in this iteration.

  • __group_366__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_366__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_366__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_367__weights (optional) - T2: weights to optimize.

  • __group_367__gradients (optional) - T3: gradients computed in this iteration.

  • __group_367__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_367__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_367__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_368__weights (optional) - T2: weights to optimize.

  • __group_368__gradients (optional) - T3: gradients computed in this iteration.

  • __group_368__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_368__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_368__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_369__weights (optional) - T2: weights to optimize.

  • __group_369__gradients (optional) - T3: gradients computed in this iteration.

  • __group_369__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_369__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_369__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_370__weights (optional) - T2: weights to optimize.

  • __group_370__gradients (optional) - T3: gradients computed in this iteration.

  • __group_370__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_370__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_370__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_371__weights (optional) - T2: weights to optimize.

  • __group_371__gradients (optional) - T3: gradients computed in this iteration.

  • __group_371__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_371__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_371__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_372__weights (optional) - T2: weights to optimize.

  • __group_372__gradients (optional) - T3: gradients computed in this iteration.

  • __group_372__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_372__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_372__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_373__weights (optional) - T2: weights to optimize.

  • __group_373__gradients (optional) - T3: gradients computed in this iteration.

  • __group_373__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_373__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_373__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_374__weights (optional) - T2: weights to optimize.

  • __group_374__gradients (optional) - T3: gradients computed in this iteration.

  • __group_374__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_374__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_374__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_375__weights (optional) - T2: weights to optimize.

  • __group_375__gradients (optional) - T3: gradients computed in this iteration.

  • __group_375__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_375__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_375__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_376__weights (optional) - T2: weights to optimize.

  • __group_376__gradients (optional) - T3: gradients computed in this iteration.

  • __group_376__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_376__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_376__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_377__weights (optional) - T2: weights to optimize.

  • __group_377__gradients (optional) - T3: gradients computed in this iteration.

  • __group_377__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_377__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_377__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_378__weights (optional) - T2: weights to optimize.

  • __group_378__gradients (optional) - T3: gradients computed in this iteration.

  • __group_378__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_378__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_378__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_379__weights (optional) - T2: weights to optimize.

  • __group_379__gradients (optional) - T3: gradients computed in this iteration.

  • __group_379__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_379__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_379__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_380__weights (optional) - T2: weights to optimize.

  • __group_380__gradients (optional) - T3: gradients computed in this iteration.

  • __group_380__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_380__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_380__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_381__weights (optional) - T2: weights to optimize.

  • __group_381__gradients (optional) - T3: gradients computed in this iteration.

  • __group_381__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_381__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_381__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_382__weights (optional) - T2: weights to optimize.

  • __group_382__gradients (optional) - T3: gradients computed in this iteration.

  • __group_382__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_382__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_382__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_383__weights (optional) - T2: weights to optimize.

  • __group_383__gradients (optional) - T3: gradients computed in this iteration.

  • __group_383__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_383__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_383__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_384__weights (optional) - T2: weights to optimize.

  • __group_384__gradients (optional) - T3: gradients computed in this iteration.

  • __group_384__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_384__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_384__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_385__weights (optional) - T2: weights to optimize.

  • __group_385__gradients (optional) - T3: gradients computed in this iteration.

  • __group_385__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_385__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_385__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_386__weights (optional) - T2: weights to optimize.

  • __group_386__gradients (optional) - T3: gradients computed in this iteration.

  • __group_386__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_386__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_386__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_387__weights (optional) - T2: weights to optimize.

  • __group_387__gradients (optional) - T3: gradients computed in this iteration.

  • __group_387__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_387__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_387__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_388__weights (optional) - T2: weights to optimize.

  • __group_388__gradients (optional) - T3: gradients computed in this iteration.

  • __group_388__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_388__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_388__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_389__weights (optional) - T2: weights to optimize.

  • __group_389__gradients (optional) - T3: gradients computed in this iteration.

  • __group_389__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_389__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_389__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_390__weights (optional) - T2: weights to optimize.

  • __group_390__gradients (optional) - T3: gradients computed in this iteration.

  • __group_390__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_390__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_390__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_391__weights (optional) - T2: weights to optimize.

  • __group_391__gradients (optional) - T3: gradients computed in this iteration.

  • __group_391__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_391__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_391__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_392__weights (optional) - T2: weights to optimize.

  • __group_392__gradients (optional) - T3: gradients computed in this iteration.

  • __group_392__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_392__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_392__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_393__weights (optional) - T2: weights to optimize.

  • __group_393__gradients (optional) - T3: gradients computed in this iteration.

  • __group_393__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_393__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_393__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_394__weights (optional) - T2: weights to optimize.

  • __group_394__gradients (optional) - T3: gradients computed in this iteration.

  • __group_394__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_394__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_394__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_395__weights (optional) - T2: weights to optimize.

  • __group_395__gradients (optional) - T3: gradients computed in this iteration.

  • __group_395__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_395__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_395__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_396__weights (optional) - T2: weights to optimize.

  • __group_396__gradients (optional) - T3: gradients computed in this iteration.

  • __group_396__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_396__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_396__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_397__weights (optional) - T2: weights to optimize.

  • __group_397__gradients (optional) - T3: gradients computed in this iteration.

  • __group_397__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_397__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_397__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_398__weights (optional) - T2: weights to optimize.

  • __group_398__gradients (optional) - T3: gradients computed in this iteration.

  • __group_398__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_398__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_398__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_399__weights (optional) - T2: weights to optimize.

  • __group_399__gradients (optional) - T3: gradients computed in this iteration.

  • __group_399__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_399__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_399__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_400__weights (optional) - T2: weights to optimize.

  • __group_400__gradients (optional) - T3: gradients computed in this iteration.

  • __group_400__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_400__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_400__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_401__weights (optional) - T2: weights to optimize.

  • __group_401__gradients (optional) - T3: gradients computed in this iteration.

  • __group_401__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_401__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_401__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_402__weights (optional) - T2: weights to optimize.

  • __group_402__gradients (optional) - T3: gradients computed in this iteration.

  • __group_402__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_402__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_402__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_403__weights (optional) - T2: weights to optimize.

  • __group_403__gradients (optional) - T3: gradients computed in this iteration.

  • __group_403__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_403__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_403__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_404__weights (optional) - T2: weights to optimize.

  • __group_404__gradients (optional) - T3: gradients computed in this iteration.

  • __group_404__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_404__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_404__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_405__weights (optional) - T2: weights to optimize.

  • __group_405__gradients (optional) - T3: gradients computed in this iteration.

  • __group_405__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_405__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_405__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_406__weights (optional) - T2: weights to optimize.

  • __group_406__gradients (optional) - T3: gradients computed in this iteration.

  • __group_406__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_406__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_406__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_407__weights (optional) - T2: weights to optimize.

  • __group_407__gradients (optional) - T3: gradients computed in this iteration.

  • __group_407__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_407__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_407__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_408__weights (optional) - T2: weights to optimize.

  • __group_408__gradients (optional) - T3: gradients computed in this iteration.

  • __group_408__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_408__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_408__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_409__weights (optional) - T2: weights to optimize.

  • __group_409__gradients (optional) - T3: gradients computed in this iteration.

  • __group_409__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_409__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_409__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_410__weights (optional) - T2: weights to optimize.

  • __group_410__gradients (optional) - T3: gradients computed in this iteration.

  • __group_410__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_410__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_410__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_411__weights (optional) - T2: weights to optimize.

  • __group_411__gradients (optional) - T3: gradients computed in this iteration.

  • __group_411__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_411__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_411__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_412__weights (optional) - T2: weights to optimize.

  • __group_412__gradients (optional) - T3: gradients computed in this iteration.

  • __group_412__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_412__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_412__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_413__weights (optional) - T2: weights to optimize.

  • __group_413__gradients (optional) - T3: gradients computed in this iteration.

  • __group_413__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_413__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_413__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_414__weights (optional) - T2: weights to optimize.

  • __group_414__gradients (optional) - T3: gradients computed in this iteration.

  • __group_414__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_414__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_414__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_415__weights (optional) - T2: weights to optimize.

  • __group_415__gradients (optional) - T3: gradients computed in this iteration.

  • __group_415__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_415__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_415__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_416__weights (optional) - T2: weights to optimize.

  • __group_416__gradients (optional) - T3: gradients computed in this iteration.

  • __group_416__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_416__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_416__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_417__weights (optional) - T2: weights to optimize.

  • __group_417__gradients (optional) - T3: gradients computed in this iteration.

  • __group_417__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_417__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_417__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_418__weights (optional) - T2: weights to optimize.

  • __group_418__gradients (optional) - T3: gradients computed in this iteration.

  • __group_418__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_418__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_418__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_419__weights (optional) - T2: weights to optimize.

  • __group_419__gradients (optional) - T3: gradients computed in this iteration.

  • __group_419__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_419__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_419__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_420__weights (optional) - T2: weights to optimize.

  • __group_420__gradients (optional) - T3: gradients computed in this iteration.

  • __group_420__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_420__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_420__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_421__weights (optional) - T2: weights to optimize.

  • __group_421__gradients (optional) - T3: gradients computed in this iteration.

  • __group_421__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_421__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_421__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_422__weights (optional) - T2: weights to optimize.

  • __group_422__gradients (optional) - T3: gradients computed in this iteration.

  • __group_422__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_422__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_422__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_423__weights (optional) - T2: weights to optimize.

  • __group_423__gradients (optional) - T3: gradients computed in this iteration.

  • __group_423__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_423__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_423__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_424__weights (optional) - T2: weights to optimize.

  • __group_424__gradients (optional) - T3: gradients computed in this iteration.

  • __group_424__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_424__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_424__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_425__weights (optional) - T2: weights to optimize.

  • __group_425__gradients (optional) - T3: gradients computed in this iteration.

  • __group_425__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_425__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_425__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_426__weights (optional) - T2: weights to optimize.

  • __group_426__gradients (optional) - T3: gradients computed in this iteration.

  • __group_426__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_426__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_426__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_427__weights (optional) - T2: weights to optimize.

  • __group_427__gradients (optional) - T3: gradients computed in this iteration.

  • __group_427__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_427__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_427__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_428__weights (optional) - T2: weights to optimize.

  • __group_428__gradients (optional) - T3: gradients computed in this iteration.

  • __group_428__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_428__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_428__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_429__weights (optional) - T2: weights to optimize.

  • __group_429__gradients (optional) - T3: gradients computed in this iteration.

  • __group_429__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_429__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_429__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_430__weights (optional) - T2: weights to optimize.

  • __group_430__gradients (optional) - T3: gradients computed in this iteration.

  • __group_430__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_430__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_430__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_431__weights (optional) - T2: weights to optimize.

  • __group_431__gradients (optional) - T3: gradients computed in this iteration.

  • __group_431__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_431__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_431__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_432__weights (optional) - T2: weights to optimize.

  • __group_432__gradients (optional) - T3: gradients computed in this iteration.

  • __group_432__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_432__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_432__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_433__weights (optional) - T2: weights to optimize.

  • __group_433__gradients (optional) - T3: gradients computed in this iteration.

  • __group_433__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_433__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_433__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_434__weights (optional) - T2: weights to optimize.

  • __group_434__gradients (optional) - T3: gradients computed in this iteration.

  • __group_434__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_434__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_434__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_435__weights (optional) - T2: weights to optimize.

  • __group_435__gradients (optional) - T3: gradients computed in this iteration.

  • __group_435__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_435__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_435__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_436__weights (optional) - T2: weights to optimize.

  • __group_436__gradients (optional) - T3: gradients computed in this iteration.

  • __group_436__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_436__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_436__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_437__weights (optional) - T2: weights to optimize.

  • __group_437__gradients (optional) - T3: gradients computed in this iteration.

  • __group_437__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_437__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_437__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_438__weights (optional) - T2: weights to optimize.

  • __group_438__gradients (optional) - T3: gradients computed in this iteration.

  • __group_438__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_438__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_438__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_439__weights (optional) - T2: weights to optimize.

  • __group_439__gradients (optional) - T3: gradients computed in this iteration.

  • __group_439__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_439__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_439__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_440__weights (optional) - T2: weights to optimize.

  • __group_440__gradients (optional) - T3: gradients computed in this iteration.

  • __group_440__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_440__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_440__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_441__weights (optional) - T2: weights to optimize.

  • __group_441__gradients (optional) - T3: gradients computed in this iteration.

  • __group_441__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_441__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_441__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_442__weights (optional) - T2: weights to optimize.

  • __group_442__gradients (optional) - T3: gradients computed in this iteration.

  • __group_442__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_442__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_442__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_443__weights (optional) - T2: weights to optimize.

  • __group_443__gradients (optional) - T3: gradients computed in this iteration.

  • __group_443__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_443__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_443__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_444__weights (optional) - T2: weights to optimize.

  • __group_444__gradients (optional) - T3: gradients computed in this iteration.

  • __group_444__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_444__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_444__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_445__weights (optional) - T2: weights to optimize.

  • __group_445__gradients (optional) - T3: gradients computed in this iteration.

  • __group_445__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_445__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_445__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_446__weights (optional) - T2: weights to optimize.

  • __group_446__gradients (optional) - T3: gradients computed in this iteration.

  • __group_446__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_446__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_446__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_447__weights (optional) - T2: weights to optimize.

  • __group_447__gradients (optional) - T3: gradients computed in this iteration.

  • __group_447__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_447__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_447__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_448__weights (optional) - T2: weights to optimize.

  • __group_448__gradients (optional) - T3: gradients computed in this iteration.

  • __group_448__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_448__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_448__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_449__weights (optional) - T2: weights to optimize.

  • __group_449__gradients (optional) - T3: gradients computed in this iteration.

  • __group_449__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_449__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_449__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_450__weights (optional) - T2: weights to optimize.

  • __group_450__gradients (optional) - T3: gradients computed in this iteration.

  • __group_450__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_450__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_450__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_451__weights (optional) - T2: weights to optimize.

  • __group_451__gradients (optional) - T3: gradients computed in this iteration.

  • __group_451__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_451__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_451__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_452__weights (optional) - T2: weights to optimize.

  • __group_452__gradients (optional) - T3: gradients computed in this iteration.

  • __group_452__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_452__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_452__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_453__weights (optional) - T2: weights to optimize.

  • __group_453__gradients (optional) - T3: gradients computed in this iteration.

  • __group_453__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_453__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_453__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_454__weights (optional) - T2: weights to optimize.

  • __group_454__gradients (optional) - T3: gradients computed in this iteration.

  • __group_454__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_454__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_454__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_455__weights (optional) - T2: weights to optimize.

  • __group_455__gradients (optional) - T3: gradients computed in this iteration.

  • __group_455__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_455__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_455__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_456__weights (optional) - T2: weights to optimize.

  • __group_456__gradients (optional) - T3: gradients computed in this iteration.

  • __group_456__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_456__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_456__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_457__weights (optional) - T2: weights to optimize.

  • __group_457__gradients (optional) - T3: gradients computed in this iteration.

  • __group_457__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_457__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_457__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_458__weights (optional) - T2: weights to optimize.

  • __group_458__gradients (optional) - T3: gradients computed in this iteration.

  • __group_458__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_458__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_458__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_459__weights (optional) - T2: weights to optimize.

  • __group_459__gradients (optional) - T3: gradients computed in this iteration.

  • __group_459__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_459__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_459__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_460__weights (optional) - T2: weights to optimize.

  • __group_460__gradients (optional) - T3: gradients computed in this iteration.

  • __group_460__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_460__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_460__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_461__weights (optional) - T2: weights to optimize.

  • __group_461__gradients (optional) - T3: gradients computed in this iteration.

  • __group_461__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_461__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_461__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_462__weights (optional) - T2: weights to optimize.

  • __group_462__gradients (optional) - T3: gradients computed in this iteration.

  • __group_462__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_462__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_462__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_463__weights (optional) - T2: weights to optimize.

  • __group_463__gradients (optional) - T3: gradients computed in this iteration.

  • __group_463__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_463__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_463__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_464__weights (optional) - T2: weights to optimize.

  • __group_464__gradients (optional) - T3: gradients computed in this iteration.

  • __group_464__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_464__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_464__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_465__weights (optional) - T2: weights to optimize.

  • __group_465__gradients (optional) - T3: gradients computed in this iteration.

  • __group_465__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_465__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_465__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_466__weights (optional) - T2: weights to optimize.

  • __group_466__gradients (optional) - T3: gradients computed in this iteration.

  • __group_466__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_466__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_466__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_467__weights (optional) - T2: weights to optimize.

  • __group_467__gradients (optional) - T3: gradients computed in this iteration.

  • __group_467__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_467__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_467__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_468__weights (optional) - T2: weights to optimize.

  • __group_468__gradients (optional) - T3: gradients computed in this iteration.

  • __group_468__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_468__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_468__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_469__weights (optional) - T2: weights to optimize.

  • __group_469__gradients (optional) - T3: gradients computed in this iteration.

  • __group_469__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_469__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_469__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_470__weights (optional) - T2: weights to optimize.

  • __group_470__gradients (optional) - T3: gradients computed in this iteration.

  • __group_470__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_470__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_470__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_471__weights (optional) - T2: weights to optimize.

  • __group_471__gradients (optional) - T3: gradients computed in this iteration.

  • __group_471__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_471__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_471__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_472__weights (optional) - T2: weights to optimize.

  • __group_472__gradients (optional) - T3: gradients computed in this iteration.

  • __group_472__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_472__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_472__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_473__weights (optional) - T2: weights to optimize.

  • __group_473__gradients (optional) - T3: gradients computed in this iteration.

  • __group_473__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_473__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_473__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_474__weights (optional) - T2: weights to optimize.

  • __group_474__gradients (optional) - T3: gradients computed in this iteration.

  • __group_474__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_474__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_474__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_475__weights (optional) - T2: weights to optimize.

  • __group_475__gradients (optional) - T3: gradients computed in this iteration.

  • __group_475__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_475__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_475__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_476__weights (optional) - T2: weights to optimize.

  • __group_476__gradients (optional) - T3: gradients computed in this iteration.

  • __group_476__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_476__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_476__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_477__weights (optional) - T2: weights to optimize.

  • __group_477__gradients (optional) - T3: gradients computed in this iteration.

  • __group_477__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_477__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_477__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_478__weights (optional) - T2: weights to optimize.

  • __group_478__gradients (optional) - T3: gradients computed in this iteration.

  • __group_478__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_478__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_478__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_479__weights (optional) - T2: weights to optimize.

  • __group_479__gradients (optional) - T3: gradients computed in this iteration.

  • __group_479__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_479__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_479__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_480__weights (optional) - T2: weights to optimize.

  • __group_480__gradients (optional) - T3: gradients computed in this iteration.

  • __group_480__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_480__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_480__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_481__weights (optional) - T2: weights to optimize.

  • __group_481__gradients (optional) - T3: gradients computed in this iteration.

  • __group_481__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_481__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_481__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_482__weights (optional) - T2: weights to optimize.

  • __group_482__gradients (optional) - T3: gradients computed in this iteration.

  • __group_482__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_482__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_482__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_483__weights (optional) - T2: weights to optimize.

  • __group_483__gradients (optional) - T3: gradients computed in this iteration.

  • __group_483__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_483__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_483__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_484__weights (optional) - T2: weights to optimize.

  • __group_484__gradients (optional) - T3: gradients computed in this iteration.

  • __group_484__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_484__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_484__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_485__weights (optional) - T2: weights to optimize.

  • __group_485__gradients (optional) - T3: gradients computed in this iteration.

  • __group_485__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_485__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_485__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_486__weights (optional) - T2: weights to optimize.

  • __group_486__gradients (optional) - T3: gradients computed in this iteration.

  • __group_486__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_486__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_486__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_487__weights (optional) - T2: weights to optimize.

  • __group_487__gradients (optional) - T3: gradients computed in this iteration.

  • __group_487__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_487__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_487__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_488__weights (optional) - T2: weights to optimize.

  • __group_488__gradients (optional) - T3: gradients computed in this iteration.

  • __group_488__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_488__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_488__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_489__weights (optional) - T2: weights to optimize.

  • __group_489__gradients (optional) - T3: gradients computed in this iteration.

  • __group_489__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_489__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_489__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_490__weights (optional) - T2: weights to optimize.

  • __group_490__gradients (optional) - T3: gradients computed in this iteration.

  • __group_490__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_490__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_490__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_491__weights (optional) - T2: weights to optimize.

  • __group_491__gradients (optional) - T3: gradients computed in this iteration.

  • __group_491__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_491__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_491__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_492__weights (optional) - T2: weights to optimize.

  • __group_492__gradients (optional) - T3: gradients computed in this iteration.

  • __group_492__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_492__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_492__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_493__weights (optional) - T2: weights to optimize.

  • __group_493__gradients (optional) - T3: gradients computed in this iteration.

  • __group_493__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_493__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_493__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_494__weights (optional) - T2: weights to optimize.

  • __group_494__gradients (optional) - T3: gradients computed in this iteration.

  • __group_494__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_494__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_494__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_495__weights (optional) - T2: weights to optimize.

  • __group_495__gradients (optional) - T3: gradients computed in this iteration.

  • __group_495__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_495__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_495__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_496__weights (optional) - T2: weights to optimize.

  • __group_496__gradients (optional) - T3: gradients computed in this iteration.

  • __group_496__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_496__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_496__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_497__weights (optional) - T2: weights to optimize.

  • __group_497__gradients (optional) - T3: gradients computed in this iteration.

  • __group_497__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_497__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_497__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_498__weights (optional) - T2: weights to optimize.

  • __group_498__gradients (optional) - T3: gradients computed in this iteration.

  • __group_498__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_498__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_498__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_499__weights (optional) - T2: weights to optimize.

  • __group_499__gradients (optional) - T3: gradients computed in this iteration.

  • __group_499__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_499__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_499__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_500__weights (optional) - T2: weights to optimize.

  • __group_500__gradients (optional) - T3: gradients computed in this iteration.

  • __group_500__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_500__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_500__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_501__weights (optional) - T2: weights to optimize.

  • __group_501__gradients (optional) - T3: gradients computed in this iteration.

  • __group_501__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_501__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_501__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_502__weights (optional) - T2: weights to optimize.

  • __group_502__gradients (optional) - T3: gradients computed in this iteration.

  • __group_502__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_502__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_502__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_503__weights (optional) - T2: weights to optimize.

  • __group_503__gradients (optional) - T3: gradients computed in this iteration.

  • __group_503__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_503__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_503__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_504__weights (optional) - T2: weights to optimize.

  • __group_504__gradients (optional) - T3: gradients computed in this iteration.

  • __group_504__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_504__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_504__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_505__weights (optional) - T2: weights to optimize.

  • __group_505__gradients (optional) - T3: gradients computed in this iteration.

  • __group_505__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_505__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_505__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_506__weights (optional) - T2: weights to optimize.

  • __group_506__gradients (optional) - T3: gradients computed in this iteration.

  • __group_506__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_506__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_506__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_507__weights (optional) - T2: weights to optimize.

  • __group_507__gradients (optional) - T3: gradients computed in this iteration.

  • __group_507__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_507__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_507__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_508__weights (optional) - T2: weights to optimize.

  • __group_508__gradients (optional) - T3: gradients computed in this iteration.

  • __group_508__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_508__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_508__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_509__weights (optional) - T2: weights to optimize.

  • __group_509__gradients (optional) - T3: gradients computed in this iteration.

  • __group_509__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_509__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_509__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_510__weights (optional) - T2: weights to optimize.

  • __group_510__gradients (optional) - T3: gradients computed in this iteration.

  • __group_510__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_510__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_510__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_511__weights (optional) - T2: weights to optimize.

  • __group_511__gradients (optional) - T3: gradients computed in this iteration.

  • __group_511__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_511__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_511__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_512__weights (optional) - T2: weights to optimize.

  • __group_512__gradients (optional) - T3: gradients computed in this iteration.

  • __group_512__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_512__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_512__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_513__weights (optional) - T2: weights to optimize.

  • __group_513__gradients (optional) - T3: gradients computed in this iteration.

  • __group_513__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_513__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_513__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_514__weights (optional) - T2: weights to optimize.

  • __group_514__gradients (optional) - T3: gradients computed in this iteration.

  • __group_514__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_514__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_514__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_515__weights (optional) - T2: weights to optimize.

  • __group_515__gradients (optional) - T3: gradients computed in this iteration.

  • __group_515__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_515__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_515__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_516__weights (optional) - T2: weights to optimize.

  • __group_516__gradients (optional) - T3: gradients computed in this iteration.

  • __group_516__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_516__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_516__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_517__weights (optional) - T2: weights to optimize.

  • __group_517__gradients (optional) - T3: gradients computed in this iteration.

  • __group_517__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_517__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_517__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_518__weights (optional) - T2: weights to optimize.

  • __group_518__gradients (optional) - T3: gradients computed in this iteration.

  • __group_518__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_518__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_518__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_519__weights (optional) - T2: weights to optimize.

  • __group_519__gradients (optional) - T3: gradients computed in this iteration.

  • __group_519__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_519__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_519__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_520__weights (optional) - T2: weights to optimize.

  • __group_520__gradients (optional) - T3: gradients computed in this iteration.

  • __group_520__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_520__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_520__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_521__weights (optional) - T2: weights to optimize.

  • __group_521__gradients (optional) - T3: gradients computed in this iteration.

  • __group_521__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_521__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_521__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_522__weights (optional) - T2: weights to optimize.

  • __group_522__gradients (optional) - T3: gradients computed in this iteration.

  • __group_522__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_522__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_522__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_523__weights (optional) - T2: weights to optimize.

  • __group_523__gradients (optional) - T3: gradients computed in this iteration.

  • __group_523__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_523__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_523__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_524__weights (optional) - T2: weights to optimize.

  • __group_524__gradients (optional) - T3: gradients computed in this iteration.

  • __group_524__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_524__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_524__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_525__weights (optional) - T2: weights to optimize.

  • __group_525__gradients (optional) - T3: gradients computed in this iteration.

  • __group_525__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_525__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_525__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_526__weights (optional) - T2: weights to optimize.

  • __group_526__gradients (optional) - T3: gradients computed in this iteration.

  • __group_526__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_526__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_526__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_527__weights (optional) - T2: weights to optimize.

  • __group_527__gradients (optional) - T3: gradients computed in this iteration.

  • __group_527__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_527__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_527__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_528__weights (optional) - T2: weights to optimize.

  • __group_528__gradients (optional) - T3: gradients computed in this iteration.

  • __group_528__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_528__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_528__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_529__weights (optional) - T2: weights to optimize.

  • __group_529__gradients (optional) - T3: gradients computed in this iteration.

  • __group_529__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_529__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_529__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_530__weights (optional) - T2: weights to optimize.

  • __group_530__gradients (optional) - T3: gradients computed in this iteration.

  • __group_530__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_530__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_530__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_531__weights (optional) - T2: weights to optimize.

  • __group_531__gradients (optional) - T3: gradients computed in this iteration.

  • __group_531__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_531__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_531__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_532__weights (optional) - T2: weights to optimize.

  • __group_532__gradients (optional) - T3: gradients computed in this iteration.

  • __group_532__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_532__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_532__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_533__weights (optional) - T2: weights to optimize.

  • __group_533__gradients (optional) - T3: gradients computed in this iteration.

  • __group_533__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_533__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_533__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_534__weights (optional) - T2: weights to optimize.

  • __group_534__gradients (optional) - T3: gradients computed in this iteration.

  • __group_534__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_534__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_534__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_535__weights (optional) - T2: weights to optimize.

  • __group_535__gradients (optional) - T3: gradients computed in this iteration.

  • __group_535__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_535__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_535__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_536__weights (optional) - T2: weights to optimize.

  • __group_536__gradients (optional) - T3: gradients computed in this iteration.

  • __group_536__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_536__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_536__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_537__weights (optional) - T2: weights to optimize.

  • __group_537__gradients (optional) - T3: gradients computed in this iteration.

  • __group_537__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_537__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_537__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_538__weights (optional) - T2: weights to optimize.

  • __group_538__gradients (optional) - T3: gradients computed in this iteration.

  • __group_538__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_538__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_538__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_539__weights (optional) - T2: weights to optimize.

  • __group_539__gradients (optional) - T3: gradients computed in this iteration.

  • __group_539__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_539__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_539__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_540__weights (optional) - T2: weights to optimize.

  • __group_540__gradients (optional) - T3: gradients computed in this iteration.

  • __group_540__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_540__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_540__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_541__weights (optional) - T2: weights to optimize.

  • __group_541__gradients (optional) - T3: gradients computed in this iteration.

  • __group_541__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_541__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_541__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_542__weights (optional) - T2: weights to optimize.

  • __group_542__gradients (optional) - T3: gradients computed in this iteration.

  • __group_542__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_542__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_542__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_543__weights (optional) - T2: weights to optimize.

  • __group_543__gradients (optional) - T3: gradients computed in this iteration.

  • __group_543__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_543__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_543__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_544__weights (optional) - T2: weights to optimize.

  • __group_544__gradients (optional) - T3: gradients computed in this iteration.

  • __group_544__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_544__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_544__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_545__weights (optional) - T2: weights to optimize.

  • __group_545__gradients (optional) - T3: gradients computed in this iteration.

  • __group_545__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_545__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_545__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_546__weights (optional) - T2: weights to optimize.

  • __group_546__gradients (optional) - T3: gradients computed in this iteration.

  • __group_546__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_546__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_546__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_547__weights (optional) - T2: weights to optimize.

  • __group_547__gradients (optional) - T3: gradients computed in this iteration.

  • __group_547__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_547__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_547__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_548__weights (optional) - T2: weights to optimize.

  • __group_548__gradients (optional) - T3: gradients computed in this iteration.

  • __group_548__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_548__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_548__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_549__weights (optional) - T2: weights to optimize.

  • __group_549__gradients (optional) - T3: gradients computed in this iteration.

  • __group_549__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_549__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_549__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_550__weights (optional) - T2: weights to optimize.

  • __group_550__gradients (optional) - T3: gradients computed in this iteration.

  • __group_550__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_550__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_550__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_551__weights (optional) - T2: weights to optimize.

  • __group_551__gradients (optional) - T3: gradients computed in this iteration.

  • __group_551__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_551__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_551__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_552__weights (optional) - T2: weights to optimize.

  • __group_552__gradients (optional) - T3: gradients computed in this iteration.

  • __group_552__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_552__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_552__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_553__weights (optional) - T2: weights to optimize.

  • __group_553__gradients (optional) - T3: gradients computed in this iteration.

  • __group_553__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_553__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_553__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_554__weights (optional) - T2: weights to optimize.

  • __group_554__gradients (optional) - T3: gradients computed in this iteration.

  • __group_554__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_554__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_554__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_555__weights (optional) - T2: weights to optimize.

  • __group_555__gradients (optional) - T3: gradients computed in this iteration.

  • __group_555__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_555__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_555__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_556__weights (optional) - T2: weights to optimize.

  • __group_556__gradients (optional) - T3: gradients computed in this iteration.

  • __group_556__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_556__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_556__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_557__weights (optional) - T2: weights to optimize.

  • __group_557__gradients (optional) - T3: gradients computed in this iteration.

  • __group_557__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_557__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_557__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_558__weights (optional) - T2: weights to optimize.

  • __group_558__gradients (optional) - T3: gradients computed in this iteration.

  • __group_558__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_558__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_558__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_559__weights (optional) - T2: weights to optimize.

  • __group_559__gradients (optional) - T3: gradients computed in this iteration.

  • __group_559__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_559__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_559__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_560__weights (optional) - T2: weights to optimize.

  • __group_560__gradients (optional) - T3: gradients computed in this iteration.

  • __group_560__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_560__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_560__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_561__weights (optional) - T2: weights to optimize.

  • __group_561__gradients (optional) - T3: gradients computed in this iteration.

  • __group_561__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_561__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_561__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_562__weights (optional) - T2: weights to optimize.

  • __group_562__gradients (optional) - T3: gradients computed in this iteration.

  • __group_562__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_562__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_562__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_563__weights (optional) - T2: weights to optimize.

  • __group_563__gradients (optional) - T3: gradients computed in this iteration.

  • __group_563__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_563__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_563__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_564__weights (optional) - T2: weights to optimize.

  • __group_564__gradients (optional) - T3: gradients computed in this iteration.

  • __group_564__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_564__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_564__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_565__weights (optional) - T2: weights to optimize.

  • __group_565__gradients (optional) - T3: gradients computed in this iteration.

  • __group_565__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_565__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_565__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_566__weights (optional) - T2: weights to optimize.

  • __group_566__gradients (optional) - T3: gradients computed in this iteration.

  • __group_566__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_566__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_566__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_567__weights (optional) - T2: weights to optimize.

  • __group_567__gradients (optional) - T3: gradients computed in this iteration.

  • __group_567__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_567__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_567__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_568__weights (optional) - T2: weights to optimize.

  • __group_568__gradients (optional) - T3: gradients computed in this iteration.

  • __group_568__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_568__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_568__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_569__weights (optional) - T2: weights to optimize.

  • __group_569__gradients (optional) - T3: gradients computed in this iteration.

  • __group_569__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_569__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_569__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_570__weights (optional) - T2: weights to optimize.

  • __group_570__gradients (optional) - T3: gradients computed in this iteration.

  • __group_570__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_570__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_570__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_571__weights (optional) - T2: weights to optimize.

  • __group_571__gradients (optional) - T3: gradients computed in this iteration.

  • __group_571__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_571__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_571__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_572__weights (optional) - T2: weights to optimize.

  • __group_572__gradients (optional) - T3: gradients computed in this iteration.

  • __group_572__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_572__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_572__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_573__weights (optional) - T2: weights to optimize.

  • __group_573__gradients (optional) - T3: gradients computed in this iteration.

  • __group_573__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_573__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_573__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_574__weights (optional) - T2: weights to optimize.

  • __group_574__gradients (optional) - T3: gradients computed in this iteration.

  • __group_574__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_574__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_574__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_575__weights (optional) - T2: weights to optimize.

  • __group_575__gradients (optional) - T3: gradients computed in this iteration.

  • __group_575__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_575__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_575__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_576__weights (optional) - T2: weights to optimize.

  • __group_576__gradients (optional) - T3: gradients computed in this iteration.

  • __group_576__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_576__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_576__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_577__weights (optional) - T2: weights to optimize.

  • __group_577__gradients (optional) - T3: gradients computed in this iteration.

  • __group_577__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_577__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_577__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_578__weights (optional) - T2: weights to optimize.

  • __group_578__gradients (optional) - T3: gradients computed in this iteration.

  • __group_578__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_578__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_578__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_579__weights (optional) - T2: weights to optimize.

  • __group_579__gradients (optional) - T3: gradients computed in this iteration.

  • __group_579__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_579__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_579__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_580__weights (optional) - T2: weights to optimize.

  • __group_580__gradients (optional) - T3: gradients computed in this iteration.

  • __group_580__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_580__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_580__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_581__weights (optional) - T2: weights to optimize.

  • __group_581__gradients (optional) - T3: gradients computed in this iteration.

  • __group_581__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_581__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_581__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_582__weights (optional) - T2: weights to optimize.

  • __group_582__gradients (optional) - T3: gradients computed in this iteration.

  • __group_582__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_582__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_582__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_583__weights (optional) - T2: weights to optimize.

  • __group_583__gradients (optional) - T3: gradients computed in this iteration.

  • __group_583__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_583__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_583__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_584__weights (optional) - T2: weights to optimize.

  • __group_584__gradients (optional) - T3: gradients computed in this iteration.

  • __group_584__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_584__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_584__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_585__weights (optional) - T2: weights to optimize.

  • __group_585__gradients (optional) - T3: gradients computed in this iteration.

  • __group_585__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_585__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_585__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_586__weights (optional) - T2: weights to optimize.

  • __group_586__gradients (optional) - T3: gradients computed in this iteration.

  • __group_586__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_586__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_586__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_587__weights (optional) - T2: weights to optimize.

  • __group_587__gradients (optional) - T3: gradients computed in this iteration.

  • __group_587__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_587__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_587__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_588__weights (optional) - T2: weights to optimize.

  • __group_588__gradients (optional) - T3: gradients computed in this iteration.

  • __group_588__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_588__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_588__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_589__weights (optional) - T2: weights to optimize.

  • __group_589__gradients (optional) - T3: gradients computed in this iteration.

  • __group_589__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_589__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_589__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_590__weights (optional) - T2: weights to optimize.

  • __group_590__gradients (optional) - T3: gradients computed in this iteration.

  • __group_590__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_590__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_590__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_591__weights (optional) - T2: weights to optimize.

  • __group_591__gradients (optional) - T3: gradients computed in this iteration.

  • __group_591__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_591__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_591__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_592__weights (optional) - T2: weights to optimize.

  • __group_592__gradients (optional) - T3: gradients computed in this iteration.

  • __group_592__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_592__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_592__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_593__weights (optional) - T2: weights to optimize.

  • __group_593__gradients (optional) - T3: gradients computed in this iteration.

  • __group_593__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_593__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_593__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_594__weights (optional) - T2: weights to optimize.

  • __group_594__gradients (optional) - T3: gradients computed in this iteration.

  • __group_594__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_594__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_594__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_595__weights (optional) - T2: weights to optimize.

  • __group_595__gradients (optional) - T3: gradients computed in this iteration.

  • __group_595__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_595__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_595__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_596__weights (optional) - T2: weights to optimize.

  • __group_596__gradients (optional) - T3: gradients computed in this iteration.

  • __group_596__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_596__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_596__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_597__weights (optional) - T2: weights to optimize.

  • __group_597__gradients (optional) - T3: gradients computed in this iteration.

  • __group_597__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_597__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_597__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_598__weights (optional) - T2: weights to optimize.

  • __group_598__gradients (optional) - T3: gradients computed in this iteration.

  • __group_598__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_598__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_598__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_599__weights (optional) - T2: weights to optimize.

  • __group_599__gradients (optional) - T3: gradients computed in this iteration.

  • __group_599__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_599__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_599__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_600__weights (optional) - T2: weights to optimize.

  • __group_600__gradients (optional) - T3: gradients computed in this iteration.

  • __group_600__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_600__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_600__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_601__weights (optional) - T2: weights to optimize.

  • __group_601__gradients (optional) - T3: gradients computed in this iteration.

  • __group_601__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_601__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_601__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_602__weights (optional) - T2: weights to optimize.

  • __group_602__gradients (optional) - T3: gradients computed in this iteration.

  • __group_602__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_602__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_602__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_603__weights (optional) - T2: weights to optimize.

  • __group_603__gradients (optional) - T3: gradients computed in this iteration.

  • __group_603__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_603__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_603__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_604__weights (optional) - T2: weights to optimize.

  • __group_604__gradients (optional) - T3: gradients computed in this iteration.

  • __group_604__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_604__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_604__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_605__weights (optional) - T2: weights to optimize.

  • __group_605__gradients (optional) - T3: gradients computed in this iteration.

  • __group_605__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_605__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_605__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_606__weights (optional) - T2: weights to optimize.

  • __group_606__gradients (optional) - T3: gradients computed in this iteration.

  • __group_606__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_606__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_606__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_607__weights (optional) - T2: weights to optimize.

  • __group_607__gradients (optional) - T3: gradients computed in this iteration.

  • __group_607__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_607__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_607__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_608__weights (optional) - T2: weights to optimize.

  • __group_608__gradients (optional) - T3: gradients computed in this iteration.

  • __group_608__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_608__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_608__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_609__weights (optional) - T2: weights to optimize.

  • __group_609__gradients (optional) - T3: gradients computed in this iteration.

  • __group_609__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_609__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_609__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_610__weights (optional) - T2: weights to optimize.

  • __group_610__gradients (optional) - T3: gradients computed in this iteration.

  • __group_610__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_610__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_610__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_611__weights (optional) - T2: weights to optimize.

  • __group_611__gradients (optional) - T3: gradients computed in this iteration.

  • __group_611__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_611__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_611__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_612__weights (optional) - T2: weights to optimize.

  • __group_612__gradients (optional) - T3: gradients computed in this iteration.

  • __group_612__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_612__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_612__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_613__weights (optional) - T2: weights to optimize.

  • __group_613__gradients (optional) - T3: gradients computed in this iteration.

  • __group_613__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_613__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_613__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_614__weights (optional) - T2: weights to optimize.

  • __group_614__gradients (optional) - T3: gradients computed in this iteration.

  • __group_614__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_614__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_614__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_615__weights (optional) - T2: weights to optimize.

  • __group_615__gradients (optional) - T3: gradients computed in this iteration.

  • __group_615__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_615__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_615__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_616__weights (optional) - T2: weights to optimize.

  • __group_616__gradients (optional) - T3: gradients computed in this iteration.

  • __group_616__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_616__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_616__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_617__weights (optional) - T2: weights to optimize.

  • __group_617__gradients (optional) - T3: gradients computed in this iteration.

  • __group_617__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_617__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_617__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_618__weights (optional) - T2: weights to optimize.

  • __group_618__gradients (optional) - T3: gradients computed in this iteration.

  • __group_618__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_618__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_618__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_619__weights (optional) - T2: weights to optimize.

  • __group_619__gradients (optional) - T3: gradients computed in this iteration.

  • __group_619__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_619__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_619__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_620__weights (optional) - T2: weights to optimize.

  • __group_620__gradients (optional) - T3: gradients computed in this iteration.

  • __group_620__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_620__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_620__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_621__weights (optional) - T2: weights to optimize.

  • __group_621__gradients (optional) - T3: gradients computed in this iteration.

  • __group_621__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_621__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_621__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_622__weights (optional) - T2: weights to optimize.

  • __group_622__gradients (optional) - T3: gradients computed in this iteration.

  • __group_622__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_622__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_622__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_623__weights (optional) - T2: weights to optimize.

  • __group_623__gradients (optional) - T3: gradients computed in this iteration.

  • __group_623__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_623__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_623__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_624__weights (optional) - T2: weights to optimize.

  • __group_624__gradients (optional) - T3: gradients computed in this iteration.

  • __group_624__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_624__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_624__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_625__weights (optional) - T2: weights to optimize.

  • __group_625__gradients (optional) - T3: gradients computed in this iteration.

  • __group_625__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_625__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_625__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_626__weights (optional) - T2: weights to optimize.

  • __group_626__gradients (optional) - T3: gradients computed in this iteration.

  • __group_626__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_626__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_626__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_627__weights (optional) - T2: weights to optimize.

  • __group_627__gradients (optional) - T3: gradients computed in this iteration.

  • __group_627__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_627__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_627__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_628__weights (optional) - T2: weights to optimize.

  • __group_628__gradients (optional) - T3: gradients computed in this iteration.

  • __group_628__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_628__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_628__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_629__weights (optional) - T2: weights to optimize.

  • __group_629__gradients (optional) - T3: gradients computed in this iteration.

  • __group_629__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_629__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_629__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_630__weights (optional) - T2: weights to optimize.

  • __group_630__gradients (optional) - T3: gradients computed in this iteration.

  • __group_630__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_630__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_630__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_631__weights (optional) - T2: weights to optimize.

  • __group_631__gradients (optional) - T3: gradients computed in this iteration.

  • __group_631__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_631__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_631__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_632__weights (optional) - T2: weights to optimize.

  • __group_632__gradients (optional) - T3: gradients computed in this iteration.

  • __group_632__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_632__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_632__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_633__weights (optional) - T2: weights to optimize.

  • __group_633__gradients (optional) - T3: gradients computed in this iteration.

  • __group_633__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_633__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_633__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_634__weights (optional) - T2: weights to optimize.

  • __group_634__gradients (optional) - T3: gradients computed in this iteration.

  • __group_634__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_634__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_634__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_635__weights (optional) - T2: weights to optimize.

  • __group_635__gradients (optional) - T3: gradients computed in this iteration.

  • __group_635__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_635__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_635__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_636__weights (optional) - T2: weights to optimize.

  • __group_636__gradients (optional) - T3: gradients computed in this iteration.

  • __group_636__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_636__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_636__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_637__weights (optional) - T2: weights to optimize.

  • __group_637__gradients (optional) - T3: gradients computed in this iteration.

  • __group_637__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_637__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_637__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_638__weights (optional) - T2: weights to optimize.

  • __group_638__gradients (optional) - T3: gradients computed in this iteration.

  • __group_638__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_638__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_638__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_639__weights (optional) - T2: weights to optimize.

  • __group_639__gradients (optional) - T3: gradients computed in this iteration.

  • __group_639__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_639__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_639__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_640__weights (optional) - T2: weights to optimize.

  • __group_640__gradients (optional) - T3: gradients computed in this iteration.

  • __group_640__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_640__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_640__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_641__weights (optional) - T2: weights to optimize.

  • __group_641__gradients (optional) - T3: gradients computed in this iteration.

  • __group_641__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_641__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_641__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_642__weights (optional) - T2: weights to optimize.

  • __group_642__gradients (optional) - T3: gradients computed in this iteration.

  • __group_642__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_642__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_642__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_643__weights (optional) - T2: weights to optimize.

  • __group_643__gradients (optional) - T3: gradients computed in this iteration.

  • __group_643__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_643__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_643__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_644__weights (optional) - T2: weights to optimize.

  • __group_644__gradients (optional) - T3: gradients computed in this iteration.

  • __group_644__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_644__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_644__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_645__weights (optional) - T2: weights to optimize.

  • __group_645__gradients (optional) - T3: gradients computed in this iteration.

  • __group_645__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_645__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_645__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_646__weights (optional) - T2: weights to optimize.

  • __group_646__gradients (optional) - T3: gradients computed in this iteration.

  • __group_646__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_646__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_646__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_647__weights (optional) - T2: weights to optimize.

  • __group_647__gradients (optional) - T3: gradients computed in this iteration.

  • __group_647__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_647__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_647__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_648__weights (optional) - T2: weights to optimize.

  • __group_648__gradients (optional) - T3: gradients computed in this iteration.

  • __group_648__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_648__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_648__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_649__weights (optional) - T2: weights to optimize.

  • __group_649__gradients (optional) - T3: gradients computed in this iteration.

  • __group_649__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_649__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_649__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_650__weights (optional) - T2: weights to optimize.

  • __group_650__gradients (optional) - T3: gradients computed in this iteration.

  • __group_650__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_650__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_650__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_651__weights (optional) - T2: weights to optimize.

  • __group_651__gradients (optional) - T3: gradients computed in this iteration.

  • __group_651__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_651__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_651__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_652__weights (optional) - T2: weights to optimize.

  • __group_652__gradients (optional) - T3: gradients computed in this iteration.

  • __group_652__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_652__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_652__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_653__weights (optional) - T2: weights to optimize.

  • __group_653__gradients (optional) - T3: gradients computed in this iteration.

  • __group_653__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_653__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_653__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_654__weights (optional) - T2: weights to optimize.

  • __group_654__gradients (optional) - T3: gradients computed in this iteration.

  • __group_654__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_654__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_654__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_655__weights (optional) - T2: weights to optimize.

  • __group_655__gradients (optional) - T3: gradients computed in this iteration.

  • __group_655__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_655__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_655__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_656__weights (optional) - T2: weights to optimize.

  • __group_656__gradients (optional) - T3: gradients computed in this iteration.

  • __group_656__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_656__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_656__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_657__weights (optional) - T2: weights to optimize.

  • __group_657__gradients (optional) - T3: gradients computed in this iteration.

  • __group_657__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_657__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_657__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_658__weights (optional) - T2: weights to optimize.

  • __group_658__gradients (optional) - T3: gradients computed in this iteration.

  • __group_658__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_658__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_658__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_659__weights (optional) - T2: weights to optimize.

  • __group_659__gradients (optional) - T3: gradients computed in this iteration.

  • __group_659__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_659__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_659__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_660__weights (optional) - T2: weights to optimize.

  • __group_660__gradients (optional) - T3: gradients computed in this iteration.

  • __group_660__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_660__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_660__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_661__weights (optional) - T2: weights to optimize.

  • __group_661__gradients (optional) - T3: gradients computed in this iteration.

  • __group_661__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_661__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_661__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_662__weights (optional) - T2: weights to optimize.

  • __group_662__gradients (optional) - T3: gradients computed in this iteration.

  • __group_662__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_662__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_662__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_663__weights (optional) - T2: weights to optimize.

  • __group_663__gradients (optional) - T3: gradients computed in this iteration.

  • __group_663__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_663__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_663__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_664__weights (optional) - T2: weights to optimize.

  • __group_664__gradients (optional) - T3: gradients computed in this iteration.

  • __group_664__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_664__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_664__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_665__weights (optional) - T2: weights to optimize.

  • __group_665__gradients (optional) - T3: gradients computed in this iteration.

  • __group_665__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_665__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_665__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_666__weights (optional) - T2: weights to optimize.

  • __group_666__gradients (optional) - T3: gradients computed in this iteration.

  • __group_666__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_666__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_666__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_667__weights (optional) - T2: weights to optimize.

  • __group_667__gradients (optional) - T3: gradients computed in this iteration.

  • __group_667__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_667__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_667__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_668__weights (optional) - T2: weights to optimize.

  • __group_668__gradients (optional) - T3: gradients computed in this iteration.

  • __group_668__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_668__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_668__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_669__weights (optional) - T2: weights to optimize.

  • __group_669__gradients (optional) - T3: gradients computed in this iteration.

  • __group_669__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_669__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_669__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_670__weights (optional) - T2: weights to optimize.

  • __group_670__gradients (optional) - T3: gradients computed in this iteration.

  • __group_670__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_670__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_670__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_671__weights (optional) - T2: weights to optimize.

  • __group_671__gradients (optional) - T3: gradients computed in this iteration.

  • __group_671__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_671__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_671__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_672__weights (optional) - T2: weights to optimize.

  • __group_672__gradients (optional) - T3: gradients computed in this iteration.

  • __group_672__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_672__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_672__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_673__weights (optional) - T2: weights to optimize.

  • __group_673__gradients (optional) - T3: gradients computed in this iteration.

  • __group_673__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_673__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_673__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_674__weights (optional) - T2: weights to optimize.

  • __group_674__gradients (optional) - T3: gradients computed in this iteration.

  • __group_674__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_674__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_674__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_675__weights (optional) - T2: weights to optimize.

  • __group_675__gradients (optional) - T3: gradients computed in this iteration.

  • __group_675__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_675__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_675__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_676__weights (optional) - T2: weights to optimize.

  • __group_676__gradients (optional) - T3: gradients computed in this iteration.

  • __group_676__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_676__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_676__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_677__weights (optional) - T2: weights to optimize.

  • __group_677__gradients (optional) - T3: gradients computed in this iteration.

  • __group_677__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_677__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_677__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_678__weights (optional) - T2: weights to optimize.

  • __group_678__gradients (optional) - T3: gradients computed in this iteration.

  • __group_678__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_678__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_678__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_679__weights (optional) - T2: weights to optimize.

  • __group_679__gradients (optional) - T3: gradients computed in this iteration.

  • __group_679__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_679__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_679__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_680__weights (optional) - T2: weights to optimize.

  • __group_680__gradients (optional) - T3: gradients computed in this iteration.

  • __group_680__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_680__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_680__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_681__weights (optional) - T2: weights to optimize.

  • __group_681__gradients (optional) - T3: gradients computed in this iteration.

  • __group_681__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_681__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_681__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_682__weights (optional) - T2: weights to optimize.

  • __group_682__gradients (optional) - T3: gradients computed in this iteration.

  • __group_682__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_682__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_682__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_683__weights (optional) - T2: weights to optimize.

  • __group_683__gradients (optional) - T3: gradients computed in this iteration.

  • __group_683__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_683__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_683__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_684__weights (optional) - T2: weights to optimize.

  • __group_684__gradients (optional) - T3: gradients computed in this iteration.

  • __group_684__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_684__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_684__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_685__weights (optional) - T2: weights to optimize.

  • __group_685__gradients (optional) - T3: gradients computed in this iteration.

  • __group_685__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_685__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_685__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_686__weights (optional) - T2: weights to optimize.

  • __group_686__gradients (optional) - T3: gradients computed in this iteration.

  • __group_686__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_686__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_686__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_687__weights (optional) - T2: weights to optimize.

  • __group_687__gradients (optional) - T3: gradients computed in this iteration.

  • __group_687__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_687__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_687__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_688__weights (optional) - T2: weights to optimize.

  • __group_688__gradients (optional) - T3: gradients computed in this iteration.

  • __group_688__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_688__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_688__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_689__weights (optional) - T2: weights to optimize.

  • __group_689__gradients (optional) - T3: gradients computed in this iteration.

  • __group_689__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_689__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_689__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_690__weights (optional) - T2: weights to optimize.

  • __group_690__gradients (optional) - T3: gradients computed in this iteration.

  • __group_690__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_690__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_690__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_691__weights (optional) - T2: weights to optimize.

  • __group_691__gradients (optional) - T3: gradients computed in this iteration.

  • __group_691__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_691__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_691__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_692__weights (optional) - T2: weights to optimize.

  • __group_692__gradients (optional) - T3: gradients computed in this iteration.

  • __group_692__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_692__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_692__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_693__weights (optional) - T2: weights to optimize.

  • __group_693__gradients (optional) - T3: gradients computed in this iteration.

  • __group_693__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_693__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_693__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_694__weights (optional) - T2: weights to optimize.

  • __group_694__gradients (optional) - T3: gradients computed in this iteration.

  • __group_694__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_694__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_694__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_695__weights (optional) - T2: weights to optimize.

  • __group_695__gradients (optional) - T3: gradients computed in this iteration.

  • __group_695__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_695__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_695__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_696__weights (optional) - T2: weights to optimize.

  • __group_696__gradients (optional) - T3: gradients computed in this iteration.

  • __group_696__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_696__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_696__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_697__weights (optional) - T2: weights to optimize.

  • __group_697__gradients (optional) - T3: gradients computed in this iteration.

  • __group_697__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_697__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_697__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_698__weights (optional) - T2: weights to optimize.

  • __group_698__gradients (optional) - T3: gradients computed in this iteration.

  • __group_698__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_698__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_698__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_699__weights (optional) - T2: weights to optimize.

  • __group_699__gradients (optional) - T3: gradients computed in this iteration.

  • __group_699__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_699__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_699__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_700__weights (optional) - T2: weights to optimize.

  • __group_700__gradients (optional) - T3: gradients computed in this iteration.

  • __group_700__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_700__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_700__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_701__weights (optional) - T2: weights to optimize.

  • __group_701__gradients (optional) - T3: gradients computed in this iteration.

  • __group_701__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_701__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_701__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_702__weights (optional) - T2: weights to optimize.

  • __group_702__gradients (optional) - T3: gradients computed in this iteration.

  • __group_702__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_702__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_702__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_703__weights (optional) - T2: weights to optimize.

  • __group_703__gradients (optional) - T3: gradients computed in this iteration.

  • __group_703__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_703__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_703__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_704__weights (optional) - T2: weights to optimize.

  • __group_704__gradients (optional) - T3: gradients computed in this iteration.

  • __group_704__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_704__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_704__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_705__weights (optional) - T2: weights to optimize.

  • __group_705__gradients (optional) - T3: gradients computed in this iteration.

  • __group_705__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_705__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_705__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_706__weights (optional) - T2: weights to optimize.

  • __group_706__gradients (optional) - T3: gradients computed in this iteration.

  • __group_706__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_706__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_706__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_707__weights (optional) - T2: weights to optimize.

  • __group_707__gradients (optional) - T3: gradients computed in this iteration.

  • __group_707__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_707__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_707__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_708__weights (optional) - T2: weights to optimize.

  • __group_708__gradients (optional) - T3: gradients computed in this iteration.

  • __group_708__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_708__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_708__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_709__weights (optional) - T2: weights to optimize.

  • __group_709__gradients (optional) - T3: gradients computed in this iteration.

  • __group_709__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_709__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_709__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_710__weights (optional) - T2: weights to optimize.

  • __group_710__gradients (optional) - T3: gradients computed in this iteration.

  • __group_710__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_710__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_710__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_711__weights (optional) - T2: weights to optimize.

  • __group_711__gradients (optional) - T3: gradients computed in this iteration.

  • __group_711__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_711__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_711__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_712__weights (optional) - T2: weights to optimize.

  • __group_712__gradients (optional) - T3: gradients computed in this iteration.

  • __group_712__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_712__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_712__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_713__weights (optional) - T2: weights to optimize.

  • __group_713__gradients (optional) - T3: gradients computed in this iteration.

  • __group_713__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_713__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_713__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_714__weights (optional) - T2: weights to optimize.

  • __group_714__gradients (optional) - T3: gradients computed in this iteration.

  • __group_714__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_714__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_714__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_715__weights (optional) - T2: weights to optimize.

  • __group_715__gradients (optional) - T3: gradients computed in this iteration.

  • __group_715__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_715__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_715__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_716__weights (optional) - T2: weights to optimize.

  • __group_716__gradients (optional) - T3: gradients computed in this iteration.

  • __group_716__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_716__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_716__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_717__weights (optional) - T2: weights to optimize.

  • __group_717__gradients (optional) - T3: gradients computed in this iteration.

  • __group_717__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_717__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_717__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_718__weights (optional) - T2: weights to optimize.

  • __group_718__gradients (optional) - T3: gradients computed in this iteration.

  • __group_718__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_718__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_718__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_719__weights (optional) - T2: weights to optimize.

  • __group_719__gradients (optional) - T3: gradients computed in this iteration.

  • __group_719__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_719__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_719__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_720__weights (optional) - T2: weights to optimize.

  • __group_720__gradients (optional) - T3: gradients computed in this iteration.

  • __group_720__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_720__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_720__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_721__weights (optional) - T2: weights to optimize.

  • __group_721__gradients (optional) - T3: gradients computed in this iteration.

  • __group_721__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_721__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_721__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_722__weights (optional) - T2: weights to optimize.

  • __group_722__gradients (optional) - T3: gradients computed in this iteration.

  • __group_722__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_722__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_722__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_723__weights (optional) - T2: weights to optimize.

  • __group_723__gradients (optional) - T3: gradients computed in this iteration.

  • __group_723__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_723__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_723__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_724__weights (optional) - T2: weights to optimize.

  • __group_724__gradients (optional) - T3: gradients computed in this iteration.

  • __group_724__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_724__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_724__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_725__weights (optional) - T2: weights to optimize.

  • __group_725__gradients (optional) - T3: gradients computed in this iteration.

  • __group_725__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_725__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_725__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_726__weights (optional) - T2: weights to optimize.

  • __group_726__gradients (optional) - T3: gradients computed in this iteration.

  • __group_726__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_726__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_726__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_727__weights (optional) - T2: weights to optimize.

  • __group_727__gradients (optional) - T3: gradients computed in this iteration.

  • __group_727__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_727__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_727__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_728__weights (optional) - T2: weights to optimize.

  • __group_728__gradients (optional) - T3: gradients computed in this iteration.

  • __group_728__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_728__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_728__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_729__weights (optional) - T2: weights to optimize.

  • __group_729__gradients (optional) - T3: gradients computed in this iteration.

  • __group_729__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_729__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_729__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_730__weights (optional) - T2: weights to optimize.

  • __group_730__gradients (optional) - T3: gradients computed in this iteration.

  • __group_730__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_730__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_730__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_731__weights (optional) - T2: weights to optimize.

  • __group_731__gradients (optional) - T3: gradients computed in this iteration.

  • __group_731__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_731__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_731__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_732__weights (optional) - T2: weights to optimize.

  • __group_732__gradients (optional) - T3: gradients computed in this iteration.

  • __group_732__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_732__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_732__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_733__weights (optional) - T2: weights to optimize.

  • __group_733__gradients (optional) - T3: gradients computed in this iteration.

  • __group_733__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_733__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_733__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_734__weights (optional) - T2: weights to optimize.

  • __group_734__gradients (optional) - T3: gradients computed in this iteration.

  • __group_734__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_734__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_734__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_735__weights (optional) - T2: weights to optimize.

  • __group_735__gradients (optional) - T3: gradients computed in this iteration.

  • __group_735__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_735__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_735__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_736__weights (optional) - T2: weights to optimize.

  • __group_736__gradients (optional) - T3: gradients computed in this iteration.

  • __group_736__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_736__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_736__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_737__weights (optional) - T2: weights to optimize.

  • __group_737__gradients (optional) - T3: gradients computed in this iteration.

  • __group_737__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_737__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_737__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_738__weights (optional) - T2: weights to optimize.

  • __group_738__gradients (optional) - T3: gradients computed in this iteration.

  • __group_738__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_738__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_738__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_739__weights (optional) - T2: weights to optimize.

  • __group_739__gradients (optional) - T3: gradients computed in this iteration.

  • __group_739__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_739__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_739__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_740__weights (optional) - T2: weights to optimize.

  • __group_740__gradients (optional) - T3: gradients computed in this iteration.

  • __group_740__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_740__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_740__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_741__weights (optional) - T2: weights to optimize.

  • __group_741__gradients (optional) - T3: gradients computed in this iteration.

  • __group_741__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_741__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_741__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_742__weights (optional) - T2: weights to optimize.

  • __group_742__gradients (optional) - T3: gradients computed in this iteration.

  • __group_742__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_742__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_742__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_743__weights (optional) - T2: weights to optimize.

  • __group_743__gradients (optional) - T3: gradients computed in this iteration.

  • __group_743__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_743__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_743__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_744__weights (optional) - T2: weights to optimize.

  • __group_744__gradients (optional) - T3: gradients computed in this iteration.

  • __group_744__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_744__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_744__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_745__weights (optional) - T2: weights to optimize.

  • __group_745__gradients (optional) - T3: gradients computed in this iteration.

  • __group_745__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_745__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_745__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_746__weights (optional) - T2: weights to optimize.

  • __group_746__gradients (optional) - T3: gradients computed in this iteration.

  • __group_746__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_746__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_746__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_747__weights (optional) - T2: weights to optimize.

  • __group_747__gradients (optional) - T3: gradients computed in this iteration.

  • __group_747__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_747__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_747__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_748__weights (optional) - T2: weights to optimize.

  • __group_748__gradients (optional) - T3: gradients computed in this iteration.

  • __group_748__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_748__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_748__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_749__weights (optional) - T2: weights to optimize.

  • __group_749__gradients (optional) - T3: gradients computed in this iteration.

  • __group_749__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_749__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_749__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_750__weights (optional) - T2: weights to optimize.

  • __group_750__gradients (optional) - T3: gradients computed in this iteration.

  • __group_750__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_750__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_750__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_751__weights (optional) - T2: weights to optimize.

  • __group_751__gradients (optional) - T3: gradients computed in this iteration.

  • __group_751__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_751__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_751__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_752__weights (optional) - T2: weights to optimize.

  • __group_752__gradients (optional) - T3: gradients computed in this iteration.

  • __group_752__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_752__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_752__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_753__weights (optional) - T2: weights to optimize.

  • __group_753__gradients (optional) - T3: gradients computed in this iteration.

  • __group_753__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_753__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_753__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_754__weights (optional) - T2: weights to optimize.

  • __group_754__gradients (optional) - T3: gradients computed in this iteration.

  • __group_754__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_754__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_754__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_755__weights (optional) - T2: weights to optimize.

  • __group_755__gradients (optional) - T3: gradients computed in this iteration.

  • __group_755__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_755__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_755__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_756__weights (optional) - T2: weights to optimize.

  • __group_756__gradients (optional) - T3: gradients computed in this iteration.

  • __group_756__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_756__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_756__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_757__weights (optional) - T2: weights to optimize.

  • __group_757__gradients (optional) - T3: gradients computed in this iteration.

  • __group_757__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_757__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_757__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_758__weights (optional) - T2: weights to optimize.

  • __group_758__gradients (optional) - T3: gradients computed in this iteration.

  • __group_758__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_758__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_758__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_759__weights (optional) - T2: weights to optimize.

  • __group_759__gradients (optional) - T3: gradients computed in this iteration.

  • __group_759__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_759__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_759__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_760__weights (optional) - T2: weights to optimize.

  • __group_760__gradients (optional) - T3: gradients computed in this iteration.

  • __group_760__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_760__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_760__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_761__weights (optional) - T2: weights to optimize.

  • __group_761__gradients (optional) - T3: gradients computed in this iteration.

  • __group_761__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_761__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_761__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_762__weights (optional) - T2: weights to optimize.

  • __group_762__gradients (optional) - T3: gradients computed in this iteration.

  • __group_762__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_762__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_762__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_763__weights (optional) - T2: weights to optimize.

  • __group_763__gradients (optional) - T3: gradients computed in this iteration.

  • __group_763__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_763__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_763__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_764__weights (optional) - T2: weights to optimize.

  • __group_764__gradients (optional) - T3: gradients computed in this iteration.

  • __group_764__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_764__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_764__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_765__weights (optional) - T2: weights to optimize.

  • __group_765__gradients (optional) - T3: gradients computed in this iteration.

  • __group_765__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_765__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_765__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_766__weights (optional) - T2: weights to optimize.

  • __group_766__gradients (optional) - T3: gradients computed in this iteration.

  • __group_766__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_766__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_766__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_767__weights (optional) - T2: weights to optimize.

  • __group_767__gradients (optional) - T3: gradients computed in this iteration.

  • __group_767__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_767__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_767__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_768__weights (optional) - T2: weights to optimize.

  • __group_768__gradients (optional) - T3: gradients computed in this iteration.

  • __group_768__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_768__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_768__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_769__weights (optional) - T2: weights to optimize.

  • __group_769__gradients (optional) - T3: gradients computed in this iteration.

  • __group_769__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_769__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_769__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_770__weights (optional) - T2: weights to optimize.

  • __group_770__gradients (optional) - T3: gradients computed in this iteration.

  • __group_770__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_770__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_770__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_771__weights (optional) - T2: weights to optimize.

  • __group_771__gradients (optional) - T3: gradients computed in this iteration.

  • __group_771__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_771__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_771__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_772__weights (optional) - T2: weights to optimize.

  • __group_772__gradients (optional) - T3: gradients computed in this iteration.

  • __group_772__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_772__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_772__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_773__weights (optional) - T2: weights to optimize.

  • __group_773__gradients (optional) - T3: gradients computed in this iteration.

  • __group_773__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_773__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_773__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_774__weights (optional) - T2: weights to optimize.

  • __group_774__gradients (optional) - T3: gradients computed in this iteration.

  • __group_774__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_774__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_774__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_775__weights (optional) - T2: weights to optimize.

  • __group_775__gradients (optional) - T3: gradients computed in this iteration.

  • __group_775__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_775__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_775__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_776__weights (optional) - T2: weights to optimize.

  • __group_776__gradients (optional) - T3: gradients computed in this iteration.

  • __group_776__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_776__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_776__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_777__weights (optional) - T2: weights to optimize.

  • __group_777__gradients (optional) - T3: gradients computed in this iteration.

  • __group_777__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_777__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_777__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_778__weights (optional) - T2: weights to optimize.

  • __group_778__gradients (optional) - T3: gradients computed in this iteration.

  • __group_778__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_778__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_778__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_779__weights (optional) - T2: weights to optimize.

  • __group_779__gradients (optional) - T3: gradients computed in this iteration.

  • __group_779__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_779__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_779__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_780__weights (optional) - T2: weights to optimize.

  • __group_780__gradients (optional) - T3: gradients computed in this iteration.

  • __group_780__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_780__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_780__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_781__weights (optional) - T2: weights to optimize.

  • __group_781__gradients (optional) - T3: gradients computed in this iteration.

  • __group_781__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_781__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_781__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_782__weights (optional) - T2: weights to optimize.

  • __group_782__gradients (optional) - T3: gradients computed in this iteration.

  • __group_782__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_782__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_782__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_783__weights (optional) - T2: weights to optimize.

  • __group_783__gradients (optional) - T3: gradients computed in this iteration.

  • __group_783__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_783__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_783__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_784__weights (optional) - T2: weights to optimize.

  • __group_784__gradients (optional) - T3: gradients computed in this iteration.

  • __group_784__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_784__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_784__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_785__weights (optional) - T2: weights to optimize.

  • __group_785__gradients (optional) - T3: gradients computed in this iteration.

  • __group_785__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_785__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_785__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_786__weights (optional) - T2: weights to optimize.

  • __group_786__gradients (optional) - T3: gradients computed in this iteration.

  • __group_786__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_786__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_786__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_787__weights (optional) - T2: weights to optimize.

  • __group_787__gradients (optional) - T3: gradients computed in this iteration.

  • __group_787__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_787__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_787__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_788__weights (optional) - T2: weights to optimize.

  • __group_788__gradients (optional) - T3: gradients computed in this iteration.

  • __group_788__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_788__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_788__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_789__weights (optional) - T2: weights to optimize.

  • __group_789__gradients (optional) - T3: gradients computed in this iteration.

  • __group_789__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_789__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_789__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_790__weights (optional) - T2: weights to optimize.

  • __group_790__gradients (optional) - T3: gradients computed in this iteration.

  • __group_790__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_790__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_790__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_791__weights (optional) - T2: weights to optimize.

  • __group_791__gradients (optional) - T3: gradients computed in this iteration.

  • __group_791__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_791__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_791__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_792__weights (optional) - T2: weights to optimize.

  • __group_792__gradients (optional) - T3: gradients computed in this iteration.

  • __group_792__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_792__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_792__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_793__weights (optional) - T2: weights to optimize.

  • __group_793__gradients (optional) - T3: gradients computed in this iteration.

  • __group_793__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_793__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_793__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_794__weights (optional) - T2: weights to optimize.

  • __group_794__gradients (optional) - T3: gradients computed in this iteration.

  • __group_794__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_794__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_794__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_795__weights (optional) - T2: weights to optimize.

  • __group_795__gradients (optional) - T3: gradients computed in this iteration.

  • __group_795__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_795__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_795__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_796__weights (optional) - T2: weights to optimize.

  • __group_796__gradients (optional) - T3: gradients computed in this iteration.

  • __group_796__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_796__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_796__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_797__weights (optional) - T2: weights to optimize.

  • __group_797__gradients (optional) - T3: gradients computed in this iteration.

  • __group_797__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_797__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_797__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_798__weights (optional) - T2: weights to optimize.

  • __group_798__gradients (optional) - T3: gradients computed in this iteration.

  • __group_798__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_798__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_798__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_799__weights (optional) - T2: weights to optimize.

  • __group_799__gradients (optional) - T3: gradients computed in this iteration.

  • __group_799__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_799__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_799__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_800__weights (optional) - T2: weights to optimize.

  • __group_800__gradients (optional) - T3: gradients computed in this iteration.

  • __group_800__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_800__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_800__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_801__weights (optional) - T2: weights to optimize.

  • __group_801__gradients (optional) - T3: gradients computed in this iteration.

  • __group_801__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_801__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_801__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_802__weights (optional) - T2: weights to optimize.

  • __group_802__gradients (optional) - T3: gradients computed in this iteration.

  • __group_802__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_802__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_802__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_803__weights (optional) - T2: weights to optimize.

  • __group_803__gradients (optional) - T3: gradients computed in this iteration.

  • __group_803__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_803__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_803__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_804__weights (optional) - T2: weights to optimize.

  • __group_804__gradients (optional) - T3: gradients computed in this iteration.

  • __group_804__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_804__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_804__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_805__weights (optional) - T2: weights to optimize.

  • __group_805__gradients (optional) - T3: gradients computed in this iteration.

  • __group_805__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_805__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_805__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_806__weights (optional) - T2: weights to optimize.

  • __group_806__gradients (optional) - T3: gradients computed in this iteration.

  • __group_806__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_806__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_806__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_807__weights (optional) - T2: weights to optimize.

  • __group_807__gradients (optional) - T3: gradients computed in this iteration.

  • __group_807__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_807__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_807__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_808__weights (optional) - T2: weights to optimize.

  • __group_808__gradients (optional) - T3: gradients computed in this iteration.

  • __group_808__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_808__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_808__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_809__weights (optional) - T2: weights to optimize.

  • __group_809__gradients (optional) - T3: gradients computed in this iteration.

  • __group_809__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_809__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_809__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_810__weights (optional) - T2: weights to optimize.

  • __group_810__gradients (optional) - T3: gradients computed in this iteration.

  • __group_810__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_810__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_810__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_811__weights (optional) - T2: weights to optimize.

  • __group_811__gradients (optional) - T3: gradients computed in this iteration.

  • __group_811__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_811__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_811__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_812__weights (optional) - T2: weights to optimize.

  • __group_812__gradients (optional) - T3: gradients computed in this iteration.

  • __group_812__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_812__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_812__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_813__weights (optional) - T2: weights to optimize.

  • __group_813__gradients (optional) - T3: gradients computed in this iteration.

  • __group_813__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_813__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_813__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_814__weights (optional) - T2: weights to optimize.

  • __group_814__gradients (optional) - T3: gradients computed in this iteration.

  • __group_814__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_814__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_814__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_815__weights (optional) - T2: weights to optimize.

  • __group_815__gradients (optional) - T3: gradients computed in this iteration.

  • __group_815__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_815__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_815__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_816__weights (optional) - T2: weights to optimize.

  • __group_816__gradients (optional) - T3: gradients computed in this iteration.

  • __group_816__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_816__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_816__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_817__weights (optional) - T2: weights to optimize.

  • __group_817__gradients (optional) - T3: gradients computed in this iteration.

  • __group_817__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_817__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_817__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_818__weights (optional) - T2: weights to optimize.

  • __group_818__gradients (optional) - T3: gradients computed in this iteration.

  • __group_818__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_818__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_818__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_819__weights (optional) - T2: weights to optimize.

  • __group_819__gradients (optional) - T3: gradients computed in this iteration.

  • __group_819__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_819__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_819__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_820__weights (optional) - T2: weights to optimize.

  • __group_820__gradients (optional) - T3: gradients computed in this iteration.

  • __group_820__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_820__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_820__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_821__weights (optional) - T2: weights to optimize.

  • __group_821__gradients (optional) - T3: gradients computed in this iteration.

  • __group_821__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_821__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_821__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_822__weights (optional) - T2: weights to optimize.

  • __group_822__gradients (optional) - T3: gradients computed in this iteration.

  • __group_822__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_822__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_822__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_823__weights (optional) - T2: weights to optimize.

  • __group_823__gradients (optional) - T3: gradients computed in this iteration.

  • __group_823__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_823__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_823__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_824__weights (optional) - T2: weights to optimize.

  • __group_824__gradients (optional) - T3: gradients computed in this iteration.

  • __group_824__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_824__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_824__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_825__weights (optional) - T2: weights to optimize.

  • __group_825__gradients (optional) - T3: gradients computed in this iteration.

  • __group_825__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_825__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_825__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_826__weights (optional) - T2: weights to optimize.

  • __group_826__gradients (optional) - T3: gradients computed in this iteration.

  • __group_826__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_826__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_826__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_827__weights (optional) - T2: weights to optimize.

  • __group_827__gradients (optional) - T3: gradients computed in this iteration.

  • __group_827__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_827__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_827__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_828__weights (optional) - T2: weights to optimize.

  • __group_828__gradients (optional) - T3: gradients computed in this iteration.

  • __group_828__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_828__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_828__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_829__weights (optional) - T2: weights to optimize.

  • __group_829__gradients (optional) - T3: gradients computed in this iteration.

  • __group_829__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_829__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_829__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_830__weights (optional) - T2: weights to optimize.

  • __group_830__gradients (optional) - T3: gradients computed in this iteration.

  • __group_830__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_830__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_830__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_831__weights (optional) - T2: weights to optimize.

  • __group_831__gradients (optional) - T3: gradients computed in this iteration.

  • __group_831__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_831__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_831__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_832__weights (optional) - T2: weights to optimize.

  • __group_832__gradients (optional) - T3: gradients computed in this iteration.

  • __group_832__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_832__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_832__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_833__weights (optional) - T2: weights to optimize.

  • __group_833__gradients (optional) - T3: gradients computed in this iteration.

  • __group_833__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_833__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_833__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_834__weights (optional) - T2: weights to optimize.

  • __group_834__gradients (optional) - T3: gradients computed in this iteration.

  • __group_834__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_834__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_834__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_835__weights (optional) - T2: weights to optimize.

  • __group_835__gradients (optional) - T3: gradients computed in this iteration.

  • __group_835__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_835__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_835__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_836__weights (optional) - T2: weights to optimize.

  • __group_836__gradients (optional) - T3: gradients computed in this iteration.

  • __group_836__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_836__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_836__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_837__weights (optional) - T2: weights to optimize.

  • __group_837__gradients (optional) - T3: gradients computed in this iteration.

  • __group_837__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_837__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_837__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_838__weights (optional) - T2: weights to optimize.

  • __group_838__gradients (optional) - T3: gradients computed in this iteration.

  • __group_838__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_838__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_838__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_839__weights (optional) - T2: weights to optimize.

  • __group_839__gradients (optional) - T3: gradients computed in this iteration.

  • __group_839__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_839__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_839__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_840__weights (optional) - T2: weights to optimize.

  • __group_840__gradients (optional) - T3: gradients computed in this iteration.

  • __group_840__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_840__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_840__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_841__weights (optional) - T2: weights to optimize.

  • __group_841__gradients (optional) - T3: gradients computed in this iteration.

  • __group_841__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_841__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_841__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_842__weights (optional) - T2: weights to optimize.

  • __group_842__gradients (optional) - T3: gradients computed in this iteration.

  • __group_842__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_842__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_842__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_843__weights (optional) - T2: weights to optimize.

  • __group_843__gradients (optional) - T3: gradients computed in this iteration.

  • __group_843__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_843__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_843__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_844__weights (optional) - T2: weights to optimize.

  • __group_844__gradients (optional) - T3: gradients computed in this iteration.

  • __group_844__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_844__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_844__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_845__weights (optional) - T2: weights to optimize.

  • __group_845__gradients (optional) - T3: gradients computed in this iteration.

  • __group_845__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_845__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_845__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_846__weights (optional) - T2: weights to optimize.

  • __group_846__gradients (optional) - T3: gradients computed in this iteration.

  • __group_846__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_846__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_846__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_847__weights (optional) - T2: weights to optimize.

  • __group_847__gradients (optional) - T3: gradients computed in this iteration.

  • __group_847__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_847__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_847__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_848__weights (optional) - T2: weights to optimize.

  • __group_848__gradients (optional) - T3: gradients computed in this iteration.

  • __group_848__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_848__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_848__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_849__weights (optional) - T2: weights to optimize.

  • __group_849__gradients (optional) - T3: gradients computed in this iteration.

  • __group_849__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_849__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_849__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_850__weights (optional) - T2: weights to optimize.

  • __group_850__gradients (optional) - T3: gradients computed in this iteration.

  • __group_850__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_850__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_850__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_851__weights (optional) - T2: weights to optimize.

  • __group_851__gradients (optional) - T3: gradients computed in this iteration.

  • __group_851__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_851__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_851__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_852__weights (optional) - T2: weights to optimize.

  • __group_852__gradients (optional) - T3: gradients computed in this iteration.

  • __group_852__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_852__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_852__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_853__weights (optional) - T2: weights to optimize.

  • __group_853__gradients (optional) - T3: gradients computed in this iteration.

  • __group_853__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_853__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_853__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_854__weights (optional) - T2: weights to optimize.

  • __group_854__gradients (optional) - T3: gradients computed in this iteration.

  • __group_854__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_854__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_854__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_855__weights (optional) - T2: weights to optimize.

  • __group_855__gradients (optional) - T3: gradients computed in this iteration.

  • __group_855__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_855__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_855__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_856__weights (optional) - T2: weights to optimize.

  • __group_856__gradients (optional) - T3: gradients computed in this iteration.

  • __group_856__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_856__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_856__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_857__weights (optional) - T2: weights to optimize.

  • __group_857__gradients (optional) - T3: gradients computed in this iteration.

  • __group_857__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_857__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_857__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_858__weights (optional) - T2: weights to optimize.

  • __group_858__gradients (optional) - T3: gradients computed in this iteration.

  • __group_858__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_858__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_858__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_859__weights (optional) - T2: weights to optimize.

  • __group_859__gradients (optional) - T3: gradients computed in this iteration.

  • __group_859__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_859__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_859__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_860__weights (optional) - T2: weights to optimize.

  • __group_860__gradients (optional) - T3: gradients computed in this iteration.

  • __group_860__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_860__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_860__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_861__weights (optional) - T2: weights to optimize.

  • __group_861__gradients (optional) - T3: gradients computed in this iteration.

  • __group_861__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_861__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_861__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_862__weights (optional) - T2: weights to optimize.

  • __group_862__gradients (optional) - T3: gradients computed in this iteration.

  • __group_862__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_862__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_862__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_863__weights (optional) - T2: weights to optimize.

  • __group_863__gradients (optional) - T3: gradients computed in this iteration.

  • __group_863__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_863__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_863__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_864__weights (optional) - T2: weights to optimize.

  • __group_864__gradients (optional) - T3: gradients computed in this iteration.

  • __group_864__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_864__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_864__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_865__weights (optional) - T2: weights to optimize.

  • __group_865__gradients (optional) - T3: gradients computed in this iteration.

  • __group_865__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_865__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_865__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_866__weights (optional) - T2: weights to optimize.

  • __group_866__gradients (optional) - T3: gradients computed in this iteration.

  • __group_866__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_866__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_866__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_867__weights (optional) - T2: weights to optimize.

  • __group_867__gradients (optional) - T3: gradients computed in this iteration.

  • __group_867__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_867__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_867__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_868__weights (optional) - T2: weights to optimize.

  • __group_868__gradients (optional) - T3: gradients computed in this iteration.

  • __group_868__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_868__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_868__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_869__weights (optional) - T2: weights to optimize.

  • __group_869__gradients (optional) - T3: gradients computed in this iteration.

  • __group_869__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_869__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_869__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_870__weights (optional) - T2: weights to optimize.

  • __group_870__gradients (optional) - T3: gradients computed in this iteration.

  • __group_870__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_870__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_870__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_871__weights (optional) - T2: weights to optimize.

  • __group_871__gradients (optional) - T3: gradients computed in this iteration.

  • __group_871__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_871__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_871__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_872__weights (optional) - T2: weights to optimize.

  • __group_872__gradients (optional) - T3: gradients computed in this iteration.

  • __group_872__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_872__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_872__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_873__weights (optional) - T2: weights to optimize.

  • __group_873__gradients (optional) - T3: gradients computed in this iteration.

  • __group_873__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_873__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_873__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_874__weights (optional) - T2: weights to optimize.

  • __group_874__gradients (optional) - T3: gradients computed in this iteration.

  • __group_874__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_874__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_874__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_875__weights (optional) - T2: weights to optimize.

  • __group_875__gradients (optional) - T3: gradients computed in this iteration.

  • __group_875__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_875__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_875__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_876__weights (optional) - T2: weights to optimize.

  • __group_876__gradients (optional) - T3: gradients computed in this iteration.

  • __group_876__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_876__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_876__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_877__weights (optional) - T2: weights to optimize.

  • __group_877__gradients (optional) - T3: gradients computed in this iteration.

  • __group_877__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_877__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_877__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_878__weights (optional) - T2: weights to optimize.

  • __group_878__gradients (optional) - T3: gradients computed in this iteration.

  • __group_878__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_878__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_878__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_879__weights (optional) - T2: weights to optimize.

  • __group_879__gradients (optional) - T3: gradients computed in this iteration.

  • __group_879__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_879__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_879__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_880__weights (optional) - T2: weights to optimize.

  • __group_880__gradients (optional) - T3: gradients computed in this iteration.

  • __group_880__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_880__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_880__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_881__weights (optional) - T2: weights to optimize.

  • __group_881__gradients (optional) - T3: gradients computed in this iteration.

  • __group_881__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_881__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_881__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_882__weights (optional) - T2: weights to optimize.

  • __group_882__gradients (optional) - T3: gradients computed in this iteration.

  • __group_882__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_882__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_882__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_883__weights (optional) - T2: weights to optimize.

  • __group_883__gradients (optional) - T3: gradients computed in this iteration.

  • __group_883__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_883__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_883__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_884__weights (optional) - T2: weights to optimize.

  • __group_884__gradients (optional) - T3: gradients computed in this iteration.

  • __group_884__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_884__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_884__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_885__weights (optional) - T2: weights to optimize.

  • __group_885__gradients (optional) - T3: gradients computed in this iteration.

  • __group_885__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_885__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_885__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_886__weights (optional) - T2: weights to optimize.

  • __group_886__gradients (optional) - T3: gradients computed in this iteration.

  • __group_886__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_886__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_886__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_887__weights (optional) - T2: weights to optimize.

  • __group_887__gradients (optional) - T3: gradients computed in this iteration.

  • __group_887__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_887__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_887__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_888__weights (optional) - T2: weights to optimize.

  • __group_888__gradients (optional) - T3: gradients computed in this iteration.

  • __group_888__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_888__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_888__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_889__weights (optional) - T2: weights to optimize.

  • __group_889__gradients (optional) - T3: gradients computed in this iteration.

  • __group_889__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_889__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_889__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_890__weights (optional) - T2: weights to optimize.

  • __group_890__gradients (optional) - T3: gradients computed in this iteration.

  • __group_890__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_890__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_890__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_891__weights (optional) - T2: weights to optimize.

  • __group_891__gradients (optional) - T3: gradients computed in this iteration.

  • __group_891__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_891__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_891__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_892__weights (optional) - T2: weights to optimize.

  • __group_892__gradients (optional) - T3: gradients computed in this iteration.

  • __group_892__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_892__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_892__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_893__weights (optional) - T2: weights to optimize.

  • __group_893__gradients (optional) - T3: gradients computed in this iteration.

  • __group_893__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_893__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_893__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_894__weights (optional) - T2: weights to optimize.

  • __group_894__gradients (optional) - T3: gradients computed in this iteration.

  • __group_894__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_894__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_894__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_895__weights (optional) - T2: weights to optimize.

  • __group_895__gradients (optional) - T3: gradients computed in this iteration.

  • __group_895__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_895__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_895__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_896__weights (optional) - T2: weights to optimize.

  • __group_896__gradients (optional) - T3: gradients computed in this iteration.

  • __group_896__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_896__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_896__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_897__weights (optional) - T2: weights to optimize.

  • __group_897__gradients (optional) - T3: gradients computed in this iteration.

  • __group_897__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_897__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_897__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_898__weights (optional) - T2: weights to optimize.

  • __group_898__gradients (optional) - T3: gradients computed in this iteration.

  • __group_898__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_898__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_898__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_899__weights (optional) - T2: weights to optimize.

  • __group_899__gradients (optional) - T3: gradients computed in this iteration.

  • __group_899__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_899__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_899__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_900__weights (optional) - T2: weights to optimize.

  • __group_900__gradients (optional) - T3: gradients computed in this iteration.

  • __group_900__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_900__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_900__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_901__weights (optional) - T2: weights to optimize.

  • __group_901__gradients (optional) - T3: gradients computed in this iteration.

  • __group_901__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_901__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_901__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_902__weights (optional) - T2: weights to optimize.

  • __group_902__gradients (optional) - T3: gradients computed in this iteration.

  • __group_902__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_902__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_902__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_903__weights (optional) - T2: weights to optimize.

  • __group_903__gradients (optional) - T3: gradients computed in this iteration.

  • __group_903__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_903__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_903__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_904__weights (optional) - T2: weights to optimize.

  • __group_904__gradients (optional) - T3: gradients computed in this iteration.

  • __group_904__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_904__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_904__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_905__weights (optional) - T2: weights to optimize.

  • __group_905__gradients (optional) - T3: gradients computed in this iteration.

  • __group_905__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_905__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_905__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_906__weights (optional) - T2: weights to optimize.

  • __group_906__gradients (optional) - T3: gradients computed in this iteration.

  • __group_906__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_906__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_906__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_907__weights (optional) - T2: weights to optimize.

  • __group_907__gradients (optional) - T3: gradients computed in this iteration.

  • __group_907__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_907__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_907__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_908__weights (optional) - T2: weights to optimize.

  • __group_908__gradients (optional) - T3: gradients computed in this iteration.

  • __group_908__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_908__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_908__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_909__weights (optional) - T2: weights to optimize.

  • __group_909__gradients (optional) - T3: gradients computed in this iteration.

  • __group_909__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_909__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_909__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_910__weights (optional) - T2: weights to optimize.

  • __group_910__gradients (optional) - T3: gradients computed in this iteration.

  • __group_910__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_910__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_910__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_911__weights (optional) - T2: weights to optimize.

  • __group_911__gradients (optional) - T3: gradients computed in this iteration.

  • __group_911__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_911__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_911__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_912__weights (optional) - T2: weights to optimize.

  • __group_912__gradients (optional) - T3: gradients computed in this iteration.

  • __group_912__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_912__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_912__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_913__weights (optional) - T2: weights to optimize.

  • __group_913__gradients (optional) - T3: gradients computed in this iteration.

  • __group_913__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_913__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_913__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_914__weights (optional) - T2: weights to optimize.

  • __group_914__gradients (optional) - T3: gradients computed in this iteration.

  • __group_914__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_914__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_914__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_915__weights (optional) - T2: weights to optimize.

  • __group_915__gradients (optional) - T3: gradients computed in this iteration.

  • __group_915__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_915__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_915__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_916__weights (optional) - T2: weights to optimize.

  • __group_916__gradients (optional) - T3: gradients computed in this iteration.

  • __group_916__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_916__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_916__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_917__weights (optional) - T2: weights to optimize.

  • __group_917__gradients (optional) - T3: gradients computed in this iteration.

  • __group_917__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_917__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_917__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_918__weights (optional) - T2: weights to optimize.

  • __group_918__gradients (optional) - T3: gradients computed in this iteration.

  • __group_918__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_918__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_918__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_919__weights (optional) - T2: weights to optimize.

  • __group_919__gradients (optional) - T3: gradients computed in this iteration.

  • __group_919__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_919__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_919__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_920__weights (optional) - T2: weights to optimize.

  • __group_920__gradients (optional) - T3: gradients computed in this iteration.

  • __group_920__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_920__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_920__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_921__weights (optional) - T2: weights to optimize.

  • __group_921__gradients (optional) - T3: gradients computed in this iteration.

  • __group_921__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_921__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_921__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_922__weights (optional) - T2: weights to optimize.

  • __group_922__gradients (optional) - T3: gradients computed in this iteration.

  • __group_922__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_922__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_922__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_923__weights (optional) - T2: weights to optimize.

  • __group_923__gradients (optional) - T3: gradients computed in this iteration.

  • __group_923__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_923__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_923__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_924__weights (optional) - T2: weights to optimize.

  • __group_924__gradients (optional) - T3: gradients computed in this iteration.

  • __group_924__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_924__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_924__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_925__weights (optional) - T2: weights to optimize.

  • __group_925__gradients (optional) - T3: gradients computed in this iteration.

  • __group_925__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_925__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_925__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_926__weights (optional) - T2: weights to optimize.

  • __group_926__gradients (optional) - T3: gradients computed in this iteration.

  • __group_926__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_926__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_926__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_927__weights (optional) - T2: weights to optimize.

  • __group_927__gradients (optional) - T3: gradients computed in this iteration.

  • __group_927__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_927__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_927__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_928__weights (optional) - T2: weights to optimize.

  • __group_928__gradients (optional) - T3: gradients computed in this iteration.

  • __group_928__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_928__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_928__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_929__weights (optional) - T2: weights to optimize.

  • __group_929__gradients (optional) - T3: gradients computed in this iteration.

  • __group_929__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_929__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_929__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_930__weights (optional) - T2: weights to optimize.

  • __group_930__gradients (optional) - T3: gradients computed in this iteration.

  • __group_930__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_930__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_930__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_931__weights (optional) - T2: weights to optimize.

  • __group_931__gradients (optional) - T3: gradients computed in this iteration.

  • __group_931__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_931__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_931__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_932__weights (optional) - T2: weights to optimize.

  • __group_932__gradients (optional) - T3: gradients computed in this iteration.

  • __group_932__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_932__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_932__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_933__weights (optional) - T2: weights to optimize.

  • __group_933__gradients (optional) - T3: gradients computed in this iteration.

  • __group_933__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_933__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_933__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_934__weights (optional) - T2: weights to optimize.

  • __group_934__gradients (optional) - T3: gradients computed in this iteration.

  • __group_934__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_934__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_934__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_935__weights (optional) - T2: weights to optimize.

  • __group_935__gradients (optional) - T3: gradients computed in this iteration.

  • __group_935__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_935__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_935__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_936__weights (optional) - T2: weights to optimize.

  • __group_936__gradients (optional) - T3: gradients computed in this iteration.

  • __group_936__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_936__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_936__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_937__weights (optional) - T2: weights to optimize.

  • __group_937__gradients (optional) - T3: gradients computed in this iteration.

  • __group_937__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_937__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_937__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_938__weights (optional) - T2: weights to optimize.

  • __group_938__gradients (optional) - T3: gradients computed in this iteration.

  • __group_938__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_938__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_938__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_939__weights (optional) - T2: weights to optimize.

  • __group_939__gradients (optional) - T3: gradients computed in this iteration.

  • __group_939__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_939__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_939__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_940__weights (optional) - T2: weights to optimize.

  • __group_940__gradients (optional) - T3: gradients computed in this iteration.

  • __group_940__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_940__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_940__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_941__weights (optional) - T2: weights to optimize.

  • __group_941__gradients (optional) - T3: gradients computed in this iteration.

  • __group_941__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_941__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_941__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_942__weights (optional) - T2: weights to optimize.

  • __group_942__gradients (optional) - T3: gradients computed in this iteration.

  • __group_942__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_942__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_942__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_943__weights (optional) - T2: weights to optimize.

  • __group_943__gradients (optional) - T3: gradients computed in this iteration.

  • __group_943__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_943__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_943__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_944__weights (optional) - T2: weights to optimize.

  • __group_944__gradients (optional) - T3: gradients computed in this iteration.

  • __group_944__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_944__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_944__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_945__weights (optional) - T2: weights to optimize.

  • __group_945__gradients (optional) - T3: gradients computed in this iteration.

  • __group_945__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_945__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_945__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_946__weights (optional) - T2: weights to optimize.

  • __group_946__gradients (optional) - T3: gradients computed in this iteration.

  • __group_946__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_946__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_946__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_947__weights (optional) - T2: weights to optimize.

  • __group_947__gradients (optional) - T3: gradients computed in this iteration.

  • __group_947__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_947__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_947__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_948__weights (optional) - T2: weights to optimize.

  • __group_948__gradients (optional) - T3: gradients computed in this iteration.

  • __group_948__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_948__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_948__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_949__weights (optional) - T2: weights to optimize.

  • __group_949__gradients (optional) - T3: gradients computed in this iteration.

  • __group_949__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_949__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_949__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_950__weights (optional) - T2: weights to optimize.

  • __group_950__gradients (optional) - T3: gradients computed in this iteration.

  • __group_950__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_950__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_950__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_951__weights (optional) - T2: weights to optimize.

  • __group_951__gradients (optional) - T3: gradients computed in this iteration.

  • __group_951__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_951__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_951__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_952__weights (optional) - T2: weights to optimize.

  • __group_952__gradients (optional) - T3: gradients computed in this iteration.

  • __group_952__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_952__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_952__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_953__weights (optional) - T2: weights to optimize.

  • __group_953__gradients (optional) - T3: gradients computed in this iteration.

  • __group_953__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_953__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_953__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_954__weights (optional) - T2: weights to optimize.

  • __group_954__gradients (optional) - T3: gradients computed in this iteration.

  • __group_954__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_954__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_954__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_955__weights (optional) - T2: weights to optimize.

  • __group_955__gradients (optional) - T3: gradients computed in this iteration.

  • __group_955__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_955__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_955__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_956__weights (optional) - T2: weights to optimize.

  • __group_956__gradients (optional) - T3: gradients computed in this iteration.

  • __group_956__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_956__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_956__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_957__weights (optional) - T2: weights to optimize.

  • __group_957__gradients (optional) - T3: gradients computed in this iteration.

  • __group_957__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_957__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_957__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_958__weights (optional) - T2: weights to optimize.

  • __group_958__gradients (optional) - T3: gradients computed in this iteration.

  • __group_958__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_958__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_958__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_959__weights (optional) - T2: weights to optimize.

  • __group_959__gradients (optional) - T3: gradients computed in this iteration.

  • __group_959__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_959__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_959__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_960__weights (optional) - T2: weights to optimize.

  • __group_960__gradients (optional) - T3: gradients computed in this iteration.

  • __group_960__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_960__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_960__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_961__weights (optional) - T2: weights to optimize.

  • __group_961__gradients (optional) - T3: gradients computed in this iteration.

  • __group_961__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_961__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_961__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_962__weights (optional) - T2: weights to optimize.

  • __group_962__gradients (optional) - T3: gradients computed in this iteration.

  • __group_962__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_962__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_962__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_963__weights (optional) - T2: weights to optimize.

  • __group_963__gradients (optional) - T3: gradients computed in this iteration.

  • __group_963__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_963__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_963__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_964__weights (optional) - T2: weights to optimize.

  • __group_964__gradients (optional) - T3: gradients computed in this iteration.

  • __group_964__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_964__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_964__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_965__weights (optional) - T2: weights to optimize.

  • __group_965__gradients (optional) - T3: gradients computed in this iteration.

  • __group_965__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_965__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_965__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_966__weights (optional) - T2: weights to optimize.

  • __group_966__gradients (optional) - T3: gradients computed in this iteration.

  • __group_966__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_966__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_966__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_967__weights (optional) - T2: weights to optimize.

  • __group_967__gradients (optional) - T3: gradients computed in this iteration.

  • __group_967__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_967__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_967__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_968__weights (optional) - T2: weights to optimize.

  • __group_968__gradients (optional) - T3: gradients computed in this iteration.

  • __group_968__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_968__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_968__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_969__weights (optional) - T2: weights to optimize.

  • __group_969__gradients (optional) - T3: gradients computed in this iteration.

  • __group_969__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_969__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_969__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_970__weights (optional) - T2: weights to optimize.

  • __group_970__gradients (optional) - T3: gradients computed in this iteration.

  • __group_970__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_970__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_970__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_971__weights (optional) - T2: weights to optimize.

  • __group_971__gradients (optional) - T3: gradients computed in this iteration.

  • __group_971__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_971__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_971__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_972__weights (optional) - T2: weights to optimize.

  • __group_972__gradients (optional) - T3: gradients computed in this iteration.

  • __group_972__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_972__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_972__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_973__weights (optional) - T2: weights to optimize.

  • __group_973__gradients (optional) - T3: gradients computed in this iteration.

  • __group_973__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_973__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_973__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_974__weights (optional) - T2: weights to optimize.

  • __group_974__gradients (optional) - T3: gradients computed in this iteration.

  • __group_974__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_974__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_974__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_975__weights (optional) - T2: weights to optimize.

  • __group_975__gradients (optional) - T3: gradients computed in this iteration.

  • __group_975__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_975__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_975__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_976__weights (optional) - T2: weights to optimize.

  • __group_976__gradients (optional) - T3: gradients computed in this iteration.

  • __group_976__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_976__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_976__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_977__weights (optional) - T2: weights to optimize.

  • __group_977__gradients (optional) - T3: gradients computed in this iteration.

  • __group_977__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_977__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_977__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_978__weights (optional) - T2: weights to optimize.

  • __group_978__gradients (optional) - T3: gradients computed in this iteration.

  • __group_978__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_978__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_978__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_979__weights (optional) - T2: weights to optimize.

  • __group_979__gradients (optional) - T3: gradients computed in this iteration.

  • __group_979__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_979__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_979__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_980__weights (optional) - T2: weights to optimize.

  • __group_980__gradients (optional) - T3: gradients computed in this iteration.

  • __group_980__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_980__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_980__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_981__weights (optional) - T2: weights to optimize.

  • __group_981__gradients (optional) - T3: gradients computed in this iteration.

  • __group_981__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_981__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_981__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_982__weights (optional) - T2: weights to optimize.

  • __group_982__gradients (optional) - T3: gradients computed in this iteration.

  • __group_982__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_982__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_982__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_983__weights (optional) - T2: weights to optimize.

  • __group_983__gradients (optional) - T3: gradients computed in this iteration.

  • __group_983__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_983__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_983__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_984__weights (optional) - T2: weights to optimize.

  • __group_984__gradients (optional) - T3: gradients computed in this iteration.

  • __group_984__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_984__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_984__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_985__weights (optional) - T2: weights to optimize.

  • __group_985__gradients (optional) - T3: gradients computed in this iteration.

  • __group_985__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_985__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_985__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_986__weights (optional) - T2: weights to optimize.

  • __group_986__gradients (optional) - T3: gradients computed in this iteration.

  • __group_986__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_986__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_986__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_987__weights (optional) - T2: weights to optimize.

  • __group_987__gradients (optional) - T3: gradients computed in this iteration.

  • __group_987__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_987__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_987__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_988__weights (optional) - T2: weights to optimize.

  • __group_988__gradients (optional) - T3: gradients computed in this iteration.

  • __group_988__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_988__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_988__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_989__weights (optional) - T2: weights to optimize.

  • __group_989__gradients (optional) - T3: gradients computed in this iteration.

  • __group_989__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_989__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_989__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_990__weights (optional) - T2: weights to optimize.

  • __group_990__gradients (optional) - T3: gradients computed in this iteration.

  • __group_990__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_990__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_990__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_991__weights (optional) - T2: weights to optimize.

  • __group_991__gradients (optional) - T3: gradients computed in this iteration.

  • __group_991__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_991__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_991__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_992__weights (optional) - T2: weights to optimize.

  • __group_992__gradients (optional) - T3: gradients computed in this iteration.

  • __group_992__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_992__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_992__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_993__weights (optional) - T2: weights to optimize.

  • __group_993__gradients (optional) - T3: gradients computed in this iteration.

  • __group_993__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_993__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_993__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_994__weights (optional) - T2: weights to optimize.

  • __group_994__gradients (optional) - T3: gradients computed in this iteration.

  • __group_994__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_994__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_994__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_995__weights (optional) - T2: weights to optimize.

  • __group_995__gradients (optional) - T3: gradients computed in this iteration.

  • __group_995__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_995__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_995__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_996__weights (optional) - T2: weights to optimize.

  • __group_996__gradients (optional) - T3: gradients computed in this iteration.

  • __group_996__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_996__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_996__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_997__weights (optional) - T2: weights to optimize.

  • __group_997__gradients (optional) - T3: gradients computed in this iteration.

  • __group_997__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_997__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_997__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_998__weights (optional) - T2: weights to optimize.

  • __group_998__gradients (optional) - T3: gradients computed in this iteration.

  • __group_998__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_998__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_998__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_999__weights (optional) - T2: weights to optimize.

  • __group_999__gradients (optional) - T3: gradients computed in this iteration.

  • __group_999__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_999__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_999__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1000__weights (optional) - T2: weights to optimize.

  • __group_1000__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1000__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1000__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1000__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1001__weights (optional) - T2: weights to optimize.

  • __group_1001__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1001__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1001__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1001__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1002__weights (optional) - T2: weights to optimize.

  • __group_1002__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1002__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1002__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1002__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1003__weights (optional) - T2: weights to optimize.

  • __group_1003__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1003__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1003__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1003__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1004__weights (optional) - T2: weights to optimize.

  • __group_1004__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1004__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1004__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1004__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1005__weights (optional) - T2: weights to optimize.

  • __group_1005__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1005__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1005__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1005__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1006__weights (optional) - T2: weights to optimize.

  • __group_1006__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1006__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1006__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1006__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1007__weights (optional) - T2: weights to optimize.

  • __group_1007__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1007__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1007__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1007__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1008__weights (optional) - T2: weights to optimize.

  • __group_1008__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1008__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1008__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1008__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1009__weights (optional) - T2: weights to optimize.

  • __group_1009__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1009__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1009__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1009__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1010__weights (optional) - T2: weights to optimize.

  • __group_1010__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1010__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1010__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1010__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1011__weights (optional) - T2: weights to optimize.

  • __group_1011__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1011__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1011__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1011__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1012__weights (optional) - T2: weights to optimize.

  • __group_1012__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1012__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1012__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1012__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1013__weights (optional) - T2: weights to optimize.

  • __group_1013__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1013__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1013__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1013__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1014__weights (optional) - T2: weights to optimize.

  • __group_1014__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1014__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1014__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1014__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1015__weights (optional) - T2: weights to optimize.

  • __group_1015__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1015__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1015__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1015__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1016__weights (optional) - T2: weights to optimize.

  • __group_1016__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1016__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1016__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1016__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1017__weights (optional) - T2: weights to optimize.

  • __group_1017__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1017__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1017__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1017__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1018__weights (optional) - T2: weights to optimize.

  • __group_1018__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1018__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1018__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1018__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1019__weights (optional) - T2: weights to optimize.

  • __group_1019__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1019__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1019__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1019__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1020__weights (optional) - T2: weights to optimize.

  • __group_1020__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1020__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1020__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1020__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1021__weights (optional) - T2: weights to optimize.

  • __group_1021__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1021__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1021__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1021__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1022__weights (optional) - T2: weights to optimize.

  • __group_1022__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1022__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1022__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1022__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1023__weights (optional) - T2: weights to optimize.

  • __group_1023__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1023__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1023__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1023__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

Outputs

Between 0 and 5121 outputs.

  • new_step (optional, heterogeneous) - TInt64: One-based index of the next training iteration.

  • __group_0__new_weights (optional) - T2: New weights

  • __group_0__new_gradients (optional) - T3: New gradients

  • __group_0__new_moment_1 (optional) - T4: New averaged gradients

  • __group_0__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_0__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1__new_weights (optional) - T2: New weights

  • __group_1__new_gradients (optional) - T3: New gradients

  • __group_1__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_2__new_weights (optional) - T2: New weights

  • __group_2__new_gradients (optional) - T3: New gradients

  • __group_2__new_moment_1 (optional) - T4: New averaged gradients

  • __group_2__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_2__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_3__new_weights (optional) - T2: New weights

  • __group_3__new_gradients (optional) - T3: New gradients

  • __group_3__new_moment_1 (optional) - T4: New averaged gradients

  • __group_3__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_3__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_4__new_weights (optional) - T2: New weights

  • __group_4__new_gradients (optional) - T3: New gradients

  • __group_4__new_moment_1 (optional) - T4: New averaged gradients

  • __group_4__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_4__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_5__new_weights (optional) - T2: New weights

  • __group_5__new_gradients (optional) - T3: New gradients

  • __group_5__new_moment_1 (optional) - T4: New averaged gradients

  • __group_5__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_5__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_6__new_weights (optional) - T2: New weights

  • __group_6__new_gradients (optional) - T3: New gradients

  • __group_6__new_moment_1 (optional) - T4: New averaged gradients

  • __group_6__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_6__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_7__new_weights (optional) - T2: New weights

  • __group_7__new_gradients (optional) - T3: New gradients

  • __group_7__new_moment_1 (optional) - T4: New averaged gradients

  • __group_7__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_7__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_8__new_weights (optional) - T2: New weights

  • __group_8__new_gradients (optional) - T3: New gradients

  • __group_8__new_moment_1 (optional) - T4: New averaged gradients

  • __group_8__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_8__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_9__new_weights (optional) - T2: New weights

  • __group_9__new_gradients (optional) - T3: New gradients

  • __group_9__new_moment_1 (optional) - T4: New averaged gradients

  • __group_9__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_9__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_10__new_weights (optional) - T2: New weights

  • __group_10__new_gradients (optional) - T3: New gradients

  • __group_10__new_moment_1 (optional) - T4: New averaged gradients

  • __group_10__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_10__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_11__new_weights (optional) - T2: New weights

  • __group_11__new_gradients (optional) - T3: New gradients

  • __group_11__new_moment_1 (optional) - T4: New averaged gradients

  • __group_11__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_11__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_12__new_weights (optional) - T2: New weights

  • __group_12__new_gradients (optional) - T3: New gradients

  • __group_12__new_moment_1 (optional) - T4: New averaged gradients

  • __group_12__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_12__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_13__new_weights (optional) - T2: New weights

  • __group_13__new_gradients (optional) - T3: New gradients

  • __group_13__new_moment_1 (optional) - T4: New averaged gradients

  • __group_13__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_13__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_14__new_weights (optional) - T2: New weights

  • __group_14__new_gradients (optional) - T3: New gradients

  • __group_14__new_moment_1 (optional) - T4: New averaged gradients

  • __group_14__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_14__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_15__new_weights (optional) - T2: New weights

  • __group_15__new_gradients (optional) - T3: New gradients

  • __group_15__new_moment_1 (optional) - T4: New averaged gradients

  • __group_15__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_15__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_16__new_weights (optional) - T2: New weights

  • __group_16__new_gradients (optional) - T3: New gradients

  • __group_16__new_moment_1 (optional) - T4: New averaged gradients

  • __group_16__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_16__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_17__new_weights (optional) - T2: New weights

  • __group_17__new_gradients (optional) - T3: New gradients

  • __group_17__new_moment_1 (optional) - T4: New averaged gradients

  • __group_17__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_17__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_18__new_weights (optional) - T2: New weights

  • __group_18__new_gradients (optional) - T3: New gradients

  • __group_18__new_moment_1 (optional) - T4: New averaged gradients

  • __group_18__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_18__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_19__new_weights (optional) - T2: New weights

  • __group_19__new_gradients (optional) - T3: New gradients

  • __group_19__new_moment_1 (optional) - T4: New averaged gradients

  • __group_19__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_19__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_20__new_weights (optional) - T2: New weights

  • __group_20__new_gradients (optional) - T3: New gradients

  • __group_20__new_moment_1 (optional) - T4: New averaged gradients

  • __group_20__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_20__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_21__new_weights (optional) - T2: New weights

  • __group_21__new_gradients (optional) - T3: New gradients

  • __group_21__new_moment_1 (optional) - T4: New averaged gradients

  • __group_21__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_21__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_22__new_weights (optional) - T2: New weights

  • __group_22__new_gradients (optional) - T3: New gradients

  • __group_22__new_moment_1 (optional) - T4: New averaged gradients

  • __group_22__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_22__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_23__new_weights (optional) - T2: New weights

  • __group_23__new_gradients (optional) - T3: New gradients

  • __group_23__new_moment_1 (optional) - T4: New averaged gradients

  • __group_23__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_23__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_24__new_weights (optional) - T2: New weights

  • __group_24__new_gradients (optional) - T3: New gradients

  • __group_24__new_moment_1 (optional) - T4: New averaged gradients

  • __group_24__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_24__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_25__new_weights (optional) - T2: New weights

  • __group_25__new_gradients (optional) - T3: New gradients

  • __group_25__new_moment_1 (optional) - T4: New averaged gradients

  • __group_25__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_25__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_26__new_weights (optional) - T2: New weights

  • __group_26__new_gradients (optional) - T3: New gradients

  • __group_26__new_moment_1 (optional) - T4: New averaged gradients

  • __group_26__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_26__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_27__new_weights (optional) - T2: New weights

  • __group_27__new_gradients (optional) - T3: New gradients

  • __group_27__new_moment_1 (optional) - T4: New averaged gradients

  • __group_27__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_27__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_28__new_weights (optional) - T2: New weights

  • __group_28__new_gradients (optional) - T3: New gradients

  • __group_28__new_moment_1 (optional) - T4: New averaged gradients

  • __group_28__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_28__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_29__new_weights (optional) - T2: New weights

  • __group_29__new_gradients (optional) - T3: New gradients

  • __group_29__new_moment_1 (optional) - T4: New averaged gradients

  • __group_29__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_29__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_30__new_weights (optional) - T2: New weights

  • __group_30__new_gradients (optional) - T3: New gradients

  • __group_30__new_moment_1 (optional) - T4: New averaged gradients

  • __group_30__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_30__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_31__new_weights (optional) - T2: New weights

  • __group_31__new_gradients (optional) - T3: New gradients

  • __group_31__new_moment_1 (optional) - T4: New averaged gradients

  • __group_31__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_31__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_32__new_weights (optional) - T2: New weights

  • __group_32__new_gradients (optional) - T3: New gradients

  • __group_32__new_moment_1 (optional) - T4: New averaged gradients

  • __group_32__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_32__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_33__new_weights (optional) - T2: New weights

  • __group_33__new_gradients (optional) - T3: New gradients

  • __group_33__new_moment_1 (optional) - T4: New averaged gradients

  • __group_33__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_33__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_34__new_weights (optional) - T2: New weights

  • __group_34__new_gradients (optional) - T3: New gradients

  • __group_34__new_moment_1 (optional) - T4: New averaged gradients

  • __group_34__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_34__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_35__new_weights (optional) - T2: New weights

  • __group_35__new_gradients (optional) - T3: New gradients

  • __group_35__new_moment_1 (optional) - T4: New averaged gradients

  • __group_35__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_35__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_36__new_weights (optional) - T2: New weights

  • __group_36__new_gradients (optional) - T3: New gradients

  • __group_36__new_moment_1 (optional) - T4: New averaged gradients

  • __group_36__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_36__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_37__new_weights (optional) - T2: New weights

  • __group_37__new_gradients (optional) - T3: New gradients

  • __group_37__new_moment_1 (optional) - T4: New averaged gradients

  • __group_37__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_37__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_38__new_weights (optional) - T2: New weights

  • __group_38__new_gradients (optional) - T3: New gradients

  • __group_38__new_moment_1 (optional) - T4: New averaged gradients

  • __group_38__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_38__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_39__new_weights (optional) - T2: New weights

  • __group_39__new_gradients (optional) - T3: New gradients

  • __group_39__new_moment_1 (optional) - T4: New averaged gradients

  • __group_39__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_39__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_40__new_weights (optional) - T2: New weights

  • __group_40__new_gradients (optional) - T3: New gradients

  • __group_40__new_moment_1 (optional) - T4: New averaged gradients

  • __group_40__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_40__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_41__new_weights (optional) - T2: New weights

  • __group_41__new_gradients (optional) - T3: New gradients

  • __group_41__new_moment_1 (optional) - T4: New averaged gradients

  • __group_41__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_41__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_42__new_weights (optional) - T2: New weights

  • __group_42__new_gradients (optional) - T3: New gradients

  • __group_42__new_moment_1 (optional) - T4: New averaged gradients

  • __group_42__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_42__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_43__new_weights (optional) - T2: New weights

  • __group_43__new_gradients (optional) - T3: New gradients

  • __group_43__new_moment_1 (optional) - T4: New averaged gradients

  • __group_43__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_43__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_44__new_weights (optional) - T2: New weights

  • __group_44__new_gradients (optional) - T3: New gradients

  • __group_44__new_moment_1 (optional) - T4: New averaged gradients

  • __group_44__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_44__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_45__new_weights (optional) - T2: New weights

  • __group_45__new_gradients (optional) - T3: New gradients

  • __group_45__new_moment_1 (optional) - T4: New averaged gradients

  • __group_45__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_45__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_46__new_weights (optional) - T2: New weights

  • __group_46__new_gradients (optional) - T3: New gradients

  • __group_46__new_moment_1 (optional) - T4: New averaged gradients

  • __group_46__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_46__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_47__new_weights (optional) - T2: New weights

  • __group_47__new_gradients (optional) - T3: New gradients

  • __group_47__new_moment_1 (optional) - T4: New averaged gradients

  • __group_47__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_47__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_48__new_weights (optional) - T2: New weights

  • __group_48__new_gradients (optional) - T3: New gradients

  • __group_48__new_moment_1 (optional) - T4: New averaged gradients

  • __group_48__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_48__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_49__new_weights (optional) - T2: New weights

  • __group_49__new_gradients (optional) - T3: New gradients

  • __group_49__new_moment_1 (optional) - T4: New averaged gradients

  • __group_49__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_49__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_50__new_weights (optional) - T2: New weights

  • __group_50__new_gradients (optional) - T3: New gradients

  • __group_50__new_moment_1 (optional) - T4: New averaged gradients

  • __group_50__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_50__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_51__new_weights (optional) - T2: New weights

  • __group_51__new_gradients (optional) - T3: New gradients

  • __group_51__new_moment_1 (optional) - T4: New averaged gradients

  • __group_51__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_51__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_52__new_weights (optional) - T2: New weights

  • __group_52__new_gradients (optional) - T3: New gradients

  • __group_52__new_moment_1 (optional) - T4: New averaged gradients

  • __group_52__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_52__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_53__new_weights (optional) - T2: New weights

  • __group_53__new_gradients (optional) - T3: New gradients

  • __group_53__new_moment_1 (optional) - T4: New averaged gradients

  • __group_53__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_53__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_54__new_weights (optional) - T2: New weights

  • __group_54__new_gradients (optional) - T3: New gradients

  • __group_54__new_moment_1 (optional) - T4: New averaged gradients

  • __group_54__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_54__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_55__new_weights (optional) - T2: New weights

  • __group_55__new_gradients (optional) - T3: New gradients

  • __group_55__new_moment_1 (optional) - T4: New averaged gradients

  • __group_55__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_55__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_56__new_weights (optional) - T2: New weights

  • __group_56__new_gradients (optional) - T3: New gradients

  • __group_56__new_moment_1 (optional) - T4: New averaged gradients

  • __group_56__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_56__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_57__new_weights (optional) - T2: New weights

  • __group_57__new_gradients (optional) - T3: New gradients

  • __group_57__new_moment_1 (optional) - T4: New averaged gradients

  • __group_57__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_57__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_58__new_weights (optional) - T2: New weights

  • __group_58__new_gradients (optional) - T3: New gradients

  • __group_58__new_moment_1 (optional) - T4: New averaged gradients

  • __group_58__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_58__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_59__new_weights (optional) - T2: New weights

  • __group_59__new_gradients (optional) - T3: New gradients

  • __group_59__new_moment_1 (optional) - T4: New averaged gradients

  • __group_59__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_59__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_60__new_weights (optional) - T2: New weights

  • __group_60__new_gradients (optional) - T3: New gradients

  • __group_60__new_moment_1 (optional) - T4: New averaged gradients

  • __group_60__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_60__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_61__new_weights (optional) - T2: New weights

  • __group_61__new_gradients (optional) - T3: New gradients

  • __group_61__new_moment_1 (optional) - T4: New averaged gradients

  • __group_61__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_61__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_62__new_weights (optional) - T2: New weights

  • __group_62__new_gradients (optional) - T3: New gradients

  • __group_62__new_moment_1 (optional) - T4: New averaged gradients

  • __group_62__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_62__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_63__new_weights (optional) - T2: New weights

  • __group_63__new_gradients (optional) - T3: New gradients

  • __group_63__new_moment_1 (optional) - T4: New averaged gradients

  • __group_63__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_63__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_64__new_weights (optional) - T2: New weights

  • __group_64__new_gradients (optional) - T3: New gradients

  • __group_64__new_moment_1 (optional) - T4: New averaged gradients

  • __group_64__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_64__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_65__new_weights (optional) - T2: New weights

  • __group_65__new_gradients (optional) - T3: New gradients

  • __group_65__new_moment_1 (optional) - T4: New averaged gradients

  • __group_65__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_65__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_66__new_weights (optional) - T2: New weights

  • __group_66__new_gradients (optional) - T3: New gradients

  • __group_66__new_moment_1 (optional) - T4: New averaged gradients

  • __group_66__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_66__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_67__new_weights (optional) - T2: New weights

  • __group_67__new_gradients (optional) - T3: New gradients

  • __group_67__new_moment_1 (optional) - T4: New averaged gradients

  • __group_67__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_67__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_68__new_weights (optional) - T2: New weights

  • __group_68__new_gradients (optional) - T3: New gradients

  • __group_68__new_moment_1 (optional) - T4: New averaged gradients

  • __group_68__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_68__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_69__new_weights (optional) - T2: New weights

  • __group_69__new_gradients (optional) - T3: New gradients

  • __group_69__new_moment_1 (optional) - T4: New averaged gradients

  • __group_69__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_69__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_70__new_weights (optional) - T2: New weights

  • __group_70__new_gradients (optional) - T3: New gradients

  • __group_70__new_moment_1 (optional) - T4: New averaged gradients

  • __group_70__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_70__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_71__new_weights (optional) - T2: New weights

  • __group_71__new_gradients (optional) - T3: New gradients

  • __group_71__new_moment_1 (optional) - T4: New averaged gradients

  • __group_71__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_71__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_72__new_weights (optional) - T2: New weights

  • __group_72__new_gradients (optional) - T3: New gradients

  • __group_72__new_moment_1 (optional) - T4: New averaged gradients

  • __group_72__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_72__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_73__new_weights (optional) - T2: New weights

  • __group_73__new_gradients (optional) - T3: New gradients

  • __group_73__new_moment_1 (optional) - T4: New averaged gradients

  • __group_73__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_73__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_74__new_weights (optional) - T2: New weights

  • __group_74__new_gradients (optional) - T3: New gradients

  • __group_74__new_moment_1 (optional) - T4: New averaged gradients

  • __group_74__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_74__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_75__new_weights (optional) - T2: New weights

  • __group_75__new_gradients (optional) - T3: New gradients

  • __group_75__new_moment_1 (optional) - T4: New averaged gradients

  • __group_75__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_75__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_76__new_weights (optional) - T2: New weights

  • __group_76__new_gradients (optional) - T3: New gradients

  • __group_76__new_moment_1 (optional) - T4: New averaged gradients

  • __group_76__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_76__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_77__new_weights (optional) - T2: New weights

  • __group_77__new_gradients (optional) - T3: New gradients

  • __group_77__new_moment_1 (optional) - T4: New averaged gradients

  • __group_77__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_77__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_78__new_weights (optional) - T2: New weights

  • __group_78__new_gradients (optional) - T3: New gradients

  • __group_78__new_moment_1 (optional) - T4: New averaged gradients

  • __group_78__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_78__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_79__new_weights (optional) - T2: New weights

  • __group_79__new_gradients (optional) - T3: New gradients

  • __group_79__new_moment_1 (optional) - T4: New averaged gradients

  • __group_79__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_79__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_80__new_weights (optional) - T2: New weights

  • __group_80__new_gradients (optional) - T3: New gradients

  • __group_80__new_moment_1 (optional) - T4: New averaged gradients

  • __group_80__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_80__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_81__new_weights (optional) - T2: New weights

  • __group_81__new_gradients (optional) - T3: New gradients

  • __group_81__new_moment_1 (optional) - T4: New averaged gradients

  • __group_81__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_81__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_82__new_weights (optional) - T2: New weights

  • __group_82__new_gradients (optional) - T3: New gradients

  • __group_82__new_moment_1 (optional) - T4: New averaged gradients

  • __group_82__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_82__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_83__new_weights (optional) - T2: New weights

  • __group_83__new_gradients (optional) - T3: New gradients

  • __group_83__new_moment_1 (optional) - T4: New averaged gradients

  • __group_83__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_83__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_84__new_weights (optional) - T2: New weights

  • __group_84__new_gradients (optional) - T3: New gradients

  • __group_84__new_moment_1 (optional) - T4: New averaged gradients

  • __group_84__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_84__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_85__new_weights (optional) - T2: New weights

  • __group_85__new_gradients (optional) - T3: New gradients

  • __group_85__new_moment_1 (optional) - T4: New averaged gradients

  • __group_85__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_85__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_86__new_weights (optional) - T2: New weights

  • __group_86__new_gradients (optional) - T3: New gradients

  • __group_86__new_moment_1 (optional) - T4: New averaged gradients

  • __group_86__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_86__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_87__new_weights (optional) - T2: New weights

  • __group_87__new_gradients (optional) - T3: New gradients

  • __group_87__new_moment_1 (optional) - T4: New averaged gradients

  • __group_87__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_87__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_88__new_weights (optional) - T2: New weights

  • __group_88__new_gradients (optional) - T3: New gradients

  • __group_88__new_moment_1 (optional) - T4: New averaged gradients

  • __group_88__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_88__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_89__new_weights (optional) - T2: New weights

  • __group_89__new_gradients (optional) - T3: New gradients

  • __group_89__new_moment_1 (optional) - T4: New averaged gradients

  • __group_89__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_89__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_90__new_weights (optional) - T2: New weights

  • __group_90__new_gradients (optional) - T3: New gradients

  • __group_90__new_moment_1 (optional) - T4: New averaged gradients

  • __group_90__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_90__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_91__new_weights (optional) - T2: New weights

  • __group_91__new_gradients (optional) - T3: New gradients

  • __group_91__new_moment_1 (optional) - T4: New averaged gradients

  • __group_91__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_91__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_92__new_weights (optional) - T2: New weights

  • __group_92__new_gradients (optional) - T3: New gradients

  • __group_92__new_moment_1 (optional) - T4: New averaged gradients

  • __group_92__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_92__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_93__new_weights (optional) - T2: New weights

  • __group_93__new_gradients (optional) - T3: New gradients

  • __group_93__new_moment_1 (optional) - T4: New averaged gradients

  • __group_93__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_93__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_94__new_weights (optional) - T2: New weights

  • __group_94__new_gradients (optional) - T3: New gradients

  • __group_94__new_moment_1 (optional) - T4: New averaged gradients

  • __group_94__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_94__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_95__new_weights (optional) - T2: New weights

  • __group_95__new_gradients (optional) - T3: New gradients

  • __group_95__new_moment_1 (optional) - T4: New averaged gradients

  • __group_95__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_95__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_96__new_weights (optional) - T2: New weights

  • __group_96__new_gradients (optional) - T3: New gradients

  • __group_96__new_moment_1 (optional) - T4: New averaged gradients

  • __group_96__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_96__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_97__new_weights (optional) - T2: New weights

  • __group_97__new_gradients (optional) - T3: New gradients

  • __group_97__new_moment_1 (optional) - T4: New averaged gradients

  • __group_97__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_97__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_98__new_weights (optional) - T2: New weights

  • __group_98__new_gradients (optional) - T3: New gradients

  • __group_98__new_moment_1 (optional) - T4: New averaged gradients

  • __group_98__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_98__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_99__new_weights (optional) - T2: New weights

  • __group_99__new_gradients (optional) - T3: New gradients

  • __group_99__new_moment_1 (optional) - T4: New averaged gradients

  • __group_99__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_99__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_100__new_weights (optional) - T2: New weights

  • __group_100__new_gradients (optional) - T3: New gradients

  • __group_100__new_moment_1 (optional) - T4: New averaged gradients

  • __group_100__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_100__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_101__new_weights (optional) - T2: New weights

  • __group_101__new_gradients (optional) - T3: New gradients

  • __group_101__new_moment_1 (optional) - T4: New averaged gradients

  • __group_101__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_101__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_102__new_weights (optional) - T2: New weights

  • __group_102__new_gradients (optional) - T3: New gradients

  • __group_102__new_moment_1 (optional) - T4: New averaged gradients

  • __group_102__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_102__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_103__new_weights (optional) - T2: New weights

  • __group_103__new_gradients (optional) - T3: New gradients

  • __group_103__new_moment_1 (optional) - T4: New averaged gradients

  • __group_103__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_103__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_104__new_weights (optional) - T2: New weights

  • __group_104__new_gradients (optional) - T3: New gradients

  • __group_104__new_moment_1 (optional) - T4: New averaged gradients

  • __group_104__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_104__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_105__new_weights (optional) - T2: New weights

  • __group_105__new_gradients (optional) - T3: New gradients

  • __group_105__new_moment_1 (optional) - T4: New averaged gradients

  • __group_105__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_105__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_106__new_weights (optional) - T2: New weights

  • __group_106__new_gradients (optional) - T3: New gradients

  • __group_106__new_moment_1 (optional) - T4: New averaged gradients

  • __group_106__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_106__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_107__new_weights (optional) - T2: New weights

  • __group_107__new_gradients (optional) - T3: New gradients

  • __group_107__new_moment_1 (optional) - T4: New averaged gradients

  • __group_107__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_107__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_108__new_weights (optional) - T2: New weights

  • __group_108__new_gradients (optional) - T3: New gradients

  • __group_108__new_moment_1 (optional) - T4: New averaged gradients

  • __group_108__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_108__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_109__new_weights (optional) - T2: New weights

  • __group_109__new_gradients (optional) - T3: New gradients

  • __group_109__new_moment_1 (optional) - T4: New averaged gradients

  • __group_109__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_109__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_110__new_weights (optional) - T2: New weights

  • __group_110__new_gradients (optional) - T3: New gradients

  • __group_110__new_moment_1 (optional) - T4: New averaged gradients

  • __group_110__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_110__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_111__new_weights (optional) - T2: New weights

  • __group_111__new_gradients (optional) - T3: New gradients

  • __group_111__new_moment_1 (optional) - T4: New averaged gradients

  • __group_111__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_111__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_112__new_weights (optional) - T2: New weights

  • __group_112__new_gradients (optional) - T3: New gradients

  • __group_112__new_moment_1 (optional) - T4: New averaged gradients

  • __group_112__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_112__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_113__new_weights (optional) - T2: New weights

  • __group_113__new_gradients (optional) - T3: New gradients

  • __group_113__new_moment_1 (optional) - T4: New averaged gradients

  • __group_113__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_113__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_114__new_weights (optional) - T2: New weights

  • __group_114__new_gradients (optional) - T3: New gradients

  • __group_114__new_moment_1 (optional) - T4: New averaged gradients

  • __group_114__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_114__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_115__new_weights (optional) - T2: New weights

  • __group_115__new_gradients (optional) - T3: New gradients

  • __group_115__new_moment_1 (optional) - T4: New averaged gradients

  • __group_115__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_115__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_116__new_weights (optional) - T2: New weights

  • __group_116__new_gradients (optional) - T3: New gradients

  • __group_116__new_moment_1 (optional) - T4: New averaged gradients

  • __group_116__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_116__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_117__new_weights (optional) - T2: New weights

  • __group_117__new_gradients (optional) - T3: New gradients

  • __group_117__new_moment_1 (optional) - T4: New averaged gradients

  • __group_117__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_117__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_118__new_weights (optional) - T2: New weights

  • __group_118__new_gradients (optional) - T3: New gradients

  • __group_118__new_moment_1 (optional) - T4: New averaged gradients

  • __group_118__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_118__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_119__new_weights (optional) - T2: New weights

  • __group_119__new_gradients (optional) - T3: New gradients

  • __group_119__new_moment_1 (optional) - T4: New averaged gradients

  • __group_119__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_119__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_120__new_weights (optional) - T2: New weights

  • __group_120__new_gradients (optional) - T3: New gradients

  • __group_120__new_moment_1 (optional) - T4: New averaged gradients

  • __group_120__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_120__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_121__new_weights (optional) - T2: New weights

  • __group_121__new_gradients (optional) - T3: New gradients

  • __group_121__new_moment_1 (optional) - T4: New averaged gradients

  • __group_121__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_121__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_122__new_weights (optional) - T2: New weights

  • __group_122__new_gradients (optional) - T3: New gradients

  • __group_122__new_moment_1 (optional) - T4: New averaged gradients

  • __group_122__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_122__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_123__new_weights (optional) - T2: New weights

  • __group_123__new_gradients (optional) - T3: New gradients

  • __group_123__new_moment_1 (optional) - T4: New averaged gradients

  • __group_123__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_123__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_124__new_weights (optional) - T2: New weights

  • __group_124__new_gradients (optional) - T3: New gradients

  • __group_124__new_moment_1 (optional) - T4: New averaged gradients

  • __group_124__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_124__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_125__new_weights (optional) - T2: New weights

  • __group_125__new_gradients (optional) - T3: New gradients

  • __group_125__new_moment_1 (optional) - T4: New averaged gradients

  • __group_125__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_125__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_126__new_weights (optional) - T2: New weights

  • __group_126__new_gradients (optional) - T3: New gradients

  • __group_126__new_moment_1 (optional) - T4: New averaged gradients

  • __group_126__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_126__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_127__new_weights (optional) - T2: New weights

  • __group_127__new_gradients (optional) - T3: New gradients

  • __group_127__new_moment_1 (optional) - T4: New averaged gradients

  • __group_127__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_127__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_128__new_weights (optional) - T2: New weights

  • __group_128__new_gradients (optional) - T3: New gradients

  • __group_128__new_moment_1 (optional) - T4: New averaged gradients

  • __group_128__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_128__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_129__new_weights (optional) - T2: New weights

  • __group_129__new_gradients (optional) - T3: New gradients

  • __group_129__new_moment_1 (optional) - T4: New averaged gradients

  • __group_129__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_129__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_130__new_weights (optional) - T2: New weights

  • __group_130__new_gradients (optional) - T3: New gradients

  • __group_130__new_moment_1 (optional) - T4: New averaged gradients

  • __group_130__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_130__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_131__new_weights (optional) - T2: New weights

  • __group_131__new_gradients (optional) - T3: New gradients

  • __group_131__new_moment_1 (optional) - T4: New averaged gradients

  • __group_131__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_131__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_132__new_weights (optional) - T2: New weights

  • __group_132__new_gradients (optional) - T3: New gradients

  • __group_132__new_moment_1 (optional) - T4: New averaged gradients

  • __group_132__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_132__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_133__new_weights (optional) - T2: New weights

  • __group_133__new_gradients (optional) - T3: New gradients

  • __group_133__new_moment_1 (optional) - T4: New averaged gradients

  • __group_133__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_133__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_134__new_weights (optional) - T2: New weights

  • __group_134__new_gradients (optional) - T3: New gradients

  • __group_134__new_moment_1 (optional) - T4: New averaged gradients

  • __group_134__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_134__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_135__new_weights (optional) - T2: New weights

  • __group_135__new_gradients (optional) - T3: New gradients

  • __group_135__new_moment_1 (optional) - T4: New averaged gradients

  • __group_135__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_135__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_136__new_weights (optional) - T2: New weights

  • __group_136__new_gradients (optional) - T3: New gradients

  • __group_136__new_moment_1 (optional) - T4: New averaged gradients

  • __group_136__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_136__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_137__new_weights (optional) - T2: New weights

  • __group_137__new_gradients (optional) - T3: New gradients

  • __group_137__new_moment_1 (optional) - T4: New averaged gradients

  • __group_137__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_137__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_138__new_weights (optional) - T2: New weights

  • __group_138__new_gradients (optional) - T3: New gradients

  • __group_138__new_moment_1 (optional) - T4: New averaged gradients

  • __group_138__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_138__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_139__new_weights (optional) - T2: New weights

  • __group_139__new_gradients (optional) - T3: New gradients

  • __group_139__new_moment_1 (optional) - T4: New averaged gradients

  • __group_139__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_139__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_140__new_weights (optional) - T2: New weights

  • __group_140__new_gradients (optional) - T3: New gradients

  • __group_140__new_moment_1 (optional) - T4: New averaged gradients

  • __group_140__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_140__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_141__new_weights (optional) - T2: New weights

  • __group_141__new_gradients (optional) - T3: New gradients

  • __group_141__new_moment_1 (optional) - T4: New averaged gradients

  • __group_141__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_141__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_142__new_weights (optional) - T2: New weights

  • __group_142__new_gradients (optional) - T3: New gradients

  • __group_142__new_moment_1 (optional) - T4: New averaged gradients

  • __group_142__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_142__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_143__new_weights (optional) - T2: New weights

  • __group_143__new_gradients (optional) - T3: New gradients

  • __group_143__new_moment_1 (optional) - T4: New averaged gradients

  • __group_143__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_143__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_144__new_weights (optional) - T2: New weights

  • __group_144__new_gradients (optional) - T3: New gradients

  • __group_144__new_moment_1 (optional) - T4: New averaged gradients

  • __group_144__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_144__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_145__new_weights (optional) - T2: New weights

  • __group_145__new_gradients (optional) - T3: New gradients

  • __group_145__new_moment_1 (optional) - T4: New averaged gradients

  • __group_145__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_145__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_146__new_weights (optional) - T2: New weights

  • __group_146__new_gradients (optional) - T3: New gradients

  • __group_146__new_moment_1 (optional) - T4: New averaged gradients

  • __group_146__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_146__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_147__new_weights (optional) - T2: New weights

  • __group_147__new_gradients (optional) - T3: New gradients

  • __group_147__new_moment_1 (optional) - T4: New averaged gradients

  • __group_147__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_147__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_148__new_weights (optional) - T2: New weights

  • __group_148__new_gradients (optional) - T3: New gradients

  • __group_148__new_moment_1 (optional) - T4: New averaged gradients

  • __group_148__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_148__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_149__new_weights (optional) - T2: New weights

  • __group_149__new_gradients (optional) - T3: New gradients

  • __group_149__new_moment_1 (optional) - T4: New averaged gradients

  • __group_149__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_149__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_150__new_weights (optional) - T2: New weights

  • __group_150__new_gradients (optional) - T3: New gradients

  • __group_150__new_moment_1 (optional) - T4: New averaged gradients

  • __group_150__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_150__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_151__new_weights (optional) - T2: New weights

  • __group_151__new_gradients (optional) - T3: New gradients

  • __group_151__new_moment_1 (optional) - T4: New averaged gradients

  • __group_151__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_151__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_152__new_weights (optional) - T2: New weights

  • __group_152__new_gradients (optional) - T3: New gradients

  • __group_152__new_moment_1 (optional) - T4: New averaged gradients

  • __group_152__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_152__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_153__new_weights (optional) - T2: New weights

  • __group_153__new_gradients (optional) - T3: New gradients

  • __group_153__new_moment_1 (optional) - T4: New averaged gradients

  • __group_153__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_153__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_154__new_weights (optional) - T2: New weights

  • __group_154__new_gradients (optional) - T3: New gradients

  • __group_154__new_moment_1 (optional) - T4: New averaged gradients

  • __group_154__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_154__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_155__new_weights (optional) - T2: New weights

  • __group_155__new_gradients (optional) - T3: New gradients

  • __group_155__new_moment_1 (optional) - T4: New averaged gradients

  • __group_155__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_155__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_156__new_weights (optional) - T2: New weights

  • __group_156__new_gradients (optional) - T3: New gradients

  • __group_156__new_moment_1 (optional) - T4: New averaged gradients

  • __group_156__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_156__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_157__new_weights (optional) - T2: New weights

  • __group_157__new_gradients (optional) - T3: New gradients

  • __group_157__new_moment_1 (optional) - T4: New averaged gradients

  • __group_157__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_157__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_158__new_weights (optional) - T2: New weights

  • __group_158__new_gradients (optional) - T3: New gradients

  • __group_158__new_moment_1 (optional) - T4: New averaged gradients

  • __group_158__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_158__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_159__new_weights (optional) - T2: New weights

  • __group_159__new_gradients (optional) - T3: New gradients

  • __group_159__new_moment_1 (optional) - T4: New averaged gradients

  • __group_159__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_159__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_160__new_weights (optional) - T2: New weights

  • __group_160__new_gradients (optional) - T3: New gradients

  • __group_160__new_moment_1 (optional) - T4: New averaged gradients

  • __group_160__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_160__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_161__new_weights (optional) - T2: New weights

  • __group_161__new_gradients (optional) - T3: New gradients

  • __group_161__new_moment_1 (optional) - T4: New averaged gradients

  • __group_161__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_161__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_162__new_weights (optional) - T2: New weights

  • __group_162__new_gradients (optional) - T3: New gradients

  • __group_162__new_moment_1 (optional) - T4: New averaged gradients

  • __group_162__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_162__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_163__new_weights (optional) - T2: New weights

  • __group_163__new_gradients (optional) - T3: New gradients

  • __group_163__new_moment_1 (optional) - T4: New averaged gradients

  • __group_163__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_163__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_164__new_weights (optional) - T2: New weights

  • __group_164__new_gradients (optional) - T3: New gradients

  • __group_164__new_moment_1 (optional) - T4: New averaged gradients

  • __group_164__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_164__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_165__new_weights (optional) - T2: New weights

  • __group_165__new_gradients (optional) - T3: New gradients

  • __group_165__new_moment_1 (optional) - T4: New averaged gradients

  • __group_165__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_165__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_166__new_weights (optional) - T2: New weights

  • __group_166__new_gradients (optional) - T3: New gradients

  • __group_166__new_moment_1 (optional) - T4: New averaged gradients

  • __group_166__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_166__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_167__new_weights (optional) - T2: New weights

  • __group_167__new_gradients (optional) - T3: New gradients

  • __group_167__new_moment_1 (optional) - T4: New averaged gradients

  • __group_167__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_167__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_168__new_weights (optional) - T2: New weights

  • __group_168__new_gradients (optional) - T3: New gradients

  • __group_168__new_moment_1 (optional) - T4: New averaged gradients

  • __group_168__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_168__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_169__new_weights (optional) - T2: New weights

  • __group_169__new_gradients (optional) - T3: New gradients

  • __group_169__new_moment_1 (optional) - T4: New averaged gradients

  • __group_169__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_169__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_170__new_weights (optional) - T2: New weights

  • __group_170__new_gradients (optional) - T3: New gradients

  • __group_170__new_moment_1 (optional) - T4: New averaged gradients

  • __group_170__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_170__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_171__new_weights (optional) - T2: New weights

  • __group_171__new_gradients (optional) - T3: New gradients

  • __group_171__new_moment_1 (optional) - T4: New averaged gradients

  • __group_171__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_171__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_172__new_weights (optional) - T2: New weights

  • __group_172__new_gradients (optional) - T3: New gradients

  • __group_172__new_moment_1 (optional) - T4: New averaged gradients

  • __group_172__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_172__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_173__new_weights (optional) - T2: New weights

  • __group_173__new_gradients (optional) - T3: New gradients

  • __group_173__new_moment_1 (optional) - T4: New averaged gradients

  • __group_173__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_173__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_174__new_weights (optional) - T2: New weights

  • __group_174__new_gradients (optional) - T3: New gradients

  • __group_174__new_moment_1 (optional) - T4: New averaged gradients

  • __group_174__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_174__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_175__new_weights (optional) - T2: New weights

  • __group_175__new_gradients (optional) - T3: New gradients

  • __group_175__new_moment_1 (optional) - T4: New averaged gradients

  • __group_175__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_175__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_176__new_weights (optional) - T2: New weights

  • __group_176__new_gradients (optional) - T3: New gradients

  • __group_176__new_moment_1 (optional) - T4: New averaged gradients

  • __group_176__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_176__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_177__new_weights (optional) - T2: New weights

  • __group_177__new_gradients (optional) - T3: New gradients

  • __group_177__new_moment_1 (optional) - T4: New averaged gradients

  • __group_177__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_177__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_178__new_weights (optional) - T2: New weights

  • __group_178__new_gradients (optional) - T3: New gradients

  • __group_178__new_moment_1 (optional) - T4: New averaged gradients

  • __group_178__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_178__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_179__new_weights (optional) - T2: New weights

  • __group_179__new_gradients (optional) - T3: New gradients

  • __group_179__new_moment_1 (optional) - T4: New averaged gradients

  • __group_179__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_179__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_180__new_weights (optional) - T2: New weights

  • __group_180__new_gradients (optional) - T3: New gradients

  • __group_180__new_moment_1 (optional) - T4: New averaged gradients

  • __group_180__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_180__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_181__new_weights (optional) - T2: New weights

  • __group_181__new_gradients (optional) - T3: New gradients

  • __group_181__new_moment_1 (optional) - T4: New averaged gradients

  • __group_181__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_181__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_182__new_weights (optional) - T2: New weights

  • __group_182__new_gradients (optional) - T3: New gradients

  • __group_182__new_moment_1 (optional) - T4: New averaged gradients

  • __group_182__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_182__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_183__new_weights (optional) - T2: New weights

  • __group_183__new_gradients (optional) - T3: New gradients

  • __group_183__new_moment_1 (optional) - T4: New averaged gradients

  • __group_183__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_183__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_184__new_weights (optional) - T2: New weights

  • __group_184__new_gradients (optional) - T3: New gradients

  • __group_184__new_moment_1 (optional) - T4: New averaged gradients

  • __group_184__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_184__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_185__new_weights (optional) - T2: New weights

  • __group_185__new_gradients (optional) - T3: New gradients

  • __group_185__new_moment_1 (optional) - T4: New averaged gradients

  • __group_185__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_185__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_186__new_weights (optional) - T2: New weights

  • __group_186__new_gradients (optional) - T3: New gradients

  • __group_186__new_moment_1 (optional) - T4: New averaged gradients

  • __group_186__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_186__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_187__new_weights (optional) - T2: New weights

  • __group_187__new_gradients (optional) - T3: New gradients

  • __group_187__new_moment_1 (optional) - T4: New averaged gradients

  • __group_187__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_187__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_188__new_weights (optional) - T2: New weights

  • __group_188__new_gradients (optional) - T3: New gradients

  • __group_188__new_moment_1 (optional) - T4: New averaged gradients

  • __group_188__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_188__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_189__new_weights (optional) - T2: New weights

  • __group_189__new_gradients (optional) - T3: New gradients

  • __group_189__new_moment_1 (optional) - T4: New averaged gradients

  • __group_189__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_189__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_190__new_weights (optional) - T2: New weights

  • __group_190__new_gradients (optional) - T3: New gradients

  • __group_190__new_moment_1 (optional) - T4: New averaged gradients

  • __group_190__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_190__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_191__new_weights (optional) - T2: New weights

  • __group_191__new_gradients (optional) - T3: New gradients

  • __group_191__new_moment_1 (optional) - T4: New averaged gradients

  • __group_191__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_191__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_192__new_weights (optional) - T2: New weights

  • __group_192__new_gradients (optional) - T3: New gradients

  • __group_192__new_moment_1 (optional) - T4: New averaged gradients

  • __group_192__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_192__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_193__new_weights (optional) - T2: New weights

  • __group_193__new_gradients (optional) - T3: New gradients

  • __group_193__new_moment_1 (optional) - T4: New averaged gradients

  • __group_193__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_193__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_194__new_weights (optional) - T2: New weights

  • __group_194__new_gradients (optional) - T3: New gradients

  • __group_194__new_moment_1 (optional) - T4: New averaged gradients

  • __group_194__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_194__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_195__new_weights (optional) - T2: New weights

  • __group_195__new_gradients (optional) - T3: New gradients

  • __group_195__new_moment_1 (optional) - T4: New averaged gradients

  • __group_195__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_195__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_196__new_weights (optional) - T2: New weights

  • __group_196__new_gradients (optional) - T3: New gradients

  • __group_196__new_moment_1 (optional) - T4: New averaged gradients

  • __group_196__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_196__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_197__new_weights (optional) - T2: New weights

  • __group_197__new_gradients (optional) - T3: New gradients

  • __group_197__new_moment_1 (optional) - T4: New averaged gradients

  • __group_197__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_197__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_198__new_weights (optional) - T2: New weights

  • __group_198__new_gradients (optional) - T3: New gradients

  • __group_198__new_moment_1 (optional) - T4: New averaged gradients

  • __group_198__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_198__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_199__new_weights (optional) - T2: New weights

  • __group_199__new_gradients (optional) - T3: New gradients

  • __group_199__new_moment_1 (optional) - T4: New averaged gradients

  • __group_199__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_199__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_200__new_weights (optional) - T2: New weights

  • __group_200__new_gradients (optional) - T3: New gradients

  • __group_200__new_moment_1 (optional) - T4: New averaged gradients

  • __group_200__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_200__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_201__new_weights (optional) - T2: New weights

  • __group_201__new_gradients (optional) - T3: New gradients

  • __group_201__new_moment_1 (optional) - T4: New averaged gradients

  • __group_201__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_201__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_202__new_weights (optional) - T2: New weights

  • __group_202__new_gradients (optional) - T3: New gradients

  • __group_202__new_moment_1 (optional) - T4: New averaged gradients

  • __group_202__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_202__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_203__new_weights (optional) - T2: New weights

  • __group_203__new_gradients (optional) - T3: New gradients

  • __group_203__new_moment_1 (optional) - T4: New averaged gradients

  • __group_203__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_203__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_204__new_weights (optional) - T2: New weights

  • __group_204__new_gradients (optional) - T3: New gradients

  • __group_204__new_moment_1 (optional) - T4: New averaged gradients

  • __group_204__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_204__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_205__new_weights (optional) - T2: New weights

  • __group_205__new_gradients (optional) - T3: New gradients

  • __group_205__new_moment_1 (optional) - T4: New averaged gradients

  • __group_205__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_205__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_206__new_weights (optional) - T2: New weights

  • __group_206__new_gradients (optional) - T3: New gradients

  • __group_206__new_moment_1 (optional) - T4: New averaged gradients

  • __group_206__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_206__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_207__new_weights (optional) - T2: New weights

  • __group_207__new_gradients (optional) - T3: New gradients

  • __group_207__new_moment_1 (optional) - T4: New averaged gradients

  • __group_207__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_207__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_208__new_weights (optional) - T2: New weights

  • __group_208__new_gradients (optional) - T3: New gradients

  • __group_208__new_moment_1 (optional) - T4: New averaged gradients

  • __group_208__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_208__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_209__new_weights (optional) - T2: New weights

  • __group_209__new_gradients (optional) - T3: New gradients

  • __group_209__new_moment_1 (optional) - T4: New averaged gradients

  • __group_209__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_209__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_210__new_weights (optional) - T2: New weights

  • __group_210__new_gradients (optional) - T3: New gradients

  • __group_210__new_moment_1 (optional) - T4: New averaged gradients

  • __group_210__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_210__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_211__new_weights (optional) - T2: New weights

  • __group_211__new_gradients (optional) - T3: New gradients

  • __group_211__new_moment_1 (optional) - T4: New averaged gradients

  • __group_211__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_211__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_212__new_weights (optional) - T2: New weights

  • __group_212__new_gradients (optional) - T3: New gradients

  • __group_212__new_moment_1 (optional) - T4: New averaged gradients

  • __group_212__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_212__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_213__new_weights (optional) - T2: New weights

  • __group_213__new_gradients (optional) - T3: New gradients

  • __group_213__new_moment_1 (optional) - T4: New averaged gradients

  • __group_213__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_213__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_214__new_weights (optional) - T2: New weights

  • __group_214__new_gradients (optional) - T3: New gradients

  • __group_214__new_moment_1 (optional) - T4: New averaged gradients

  • __group_214__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_214__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_215__new_weights (optional) - T2: New weights

  • __group_215__new_gradients (optional) - T3: New gradients

  • __group_215__new_moment_1 (optional) - T4: New averaged gradients

  • __group_215__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_215__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_216__new_weights (optional) - T2: New weights

  • __group_216__new_gradients (optional) - T3: New gradients

  • __group_216__new_moment_1 (optional) - T4: New averaged gradients

  • __group_216__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_216__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_217__new_weights (optional) - T2: New weights

  • __group_217__new_gradients (optional) - T3: New gradients

  • __group_217__new_moment_1 (optional) - T4: New averaged gradients

  • __group_217__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_217__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_218__new_weights (optional) - T2: New weights

  • __group_218__new_gradients (optional) - T3: New gradients

  • __group_218__new_moment_1 (optional) - T4: New averaged gradients

  • __group_218__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_218__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_219__new_weights (optional) - T2: New weights

  • __group_219__new_gradients (optional) - T3: New gradients

  • __group_219__new_moment_1 (optional) - T4: New averaged gradients

  • __group_219__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_219__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_220__new_weights (optional) - T2: New weights

  • __group_220__new_gradients (optional) - T3: New gradients

  • __group_220__new_moment_1 (optional) - T4: New averaged gradients

  • __group_220__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_220__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_221__new_weights (optional) - T2: New weights

  • __group_221__new_gradients (optional) - T3: New gradients

  • __group_221__new_moment_1 (optional) - T4: New averaged gradients

  • __group_221__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_221__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_222__new_weights (optional) - T2: New weights

  • __group_222__new_gradients (optional) - T3: New gradients

  • __group_222__new_moment_1 (optional) - T4: New averaged gradients

  • __group_222__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_222__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_223__new_weights (optional) - T2: New weights

  • __group_223__new_gradients (optional) - T3: New gradients

  • __group_223__new_moment_1 (optional) - T4: New averaged gradients

  • __group_223__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_223__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_224__new_weights (optional) - T2: New weights

  • __group_224__new_gradients (optional) - T3: New gradients

  • __group_224__new_moment_1 (optional) - T4: New averaged gradients

  • __group_224__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_224__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_225__new_weights (optional) - T2: New weights

  • __group_225__new_gradients (optional) - T3: New gradients

  • __group_225__new_moment_1 (optional) - T4: New averaged gradients

  • __group_225__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_225__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_226__new_weights (optional) - T2: New weights

  • __group_226__new_gradients (optional) - T3: New gradients

  • __group_226__new_moment_1 (optional) - T4: New averaged gradients

  • __group_226__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_226__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_227__new_weights (optional) - T2: New weights

  • __group_227__new_gradients (optional) - T3: New gradients

  • __group_227__new_moment_1 (optional) - T4: New averaged gradients

  • __group_227__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_227__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_228__new_weights (optional) - T2: New weights

  • __group_228__new_gradients (optional) - T3: New gradients

  • __group_228__new_moment_1 (optional) - T4: New averaged gradients

  • __group_228__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_228__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_229__new_weights (optional) - T2: New weights

  • __group_229__new_gradients (optional) - T3: New gradients

  • __group_229__new_moment_1 (optional) - T4: New averaged gradients

  • __group_229__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_229__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_230__new_weights (optional) - T2: New weights

  • __group_230__new_gradients (optional) - T3: New gradients

  • __group_230__new_moment_1 (optional) - T4: New averaged gradients

  • __group_230__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_230__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_231__new_weights (optional) - T2: New weights

  • __group_231__new_gradients (optional) - T3: New gradients

  • __group_231__new_moment_1 (optional) - T4: New averaged gradients

  • __group_231__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_231__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_232__new_weights (optional) - T2: New weights

  • __group_232__new_gradients (optional) - T3: New gradients

  • __group_232__new_moment_1 (optional) - T4: New averaged gradients

  • __group_232__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_232__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_233__new_weights (optional) - T2: New weights

  • __group_233__new_gradients (optional) - T3: New gradients

  • __group_233__new_moment_1 (optional) - T4: New averaged gradients

  • __group_233__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_233__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_234__new_weights (optional) - T2: New weights

  • __group_234__new_gradients (optional) - T3: New gradients

  • __group_234__new_moment_1 (optional) - T4: New averaged gradients

  • __group_234__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_234__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_235__new_weights (optional) - T2: New weights

  • __group_235__new_gradients (optional) - T3: New gradients

  • __group_235__new_moment_1 (optional) - T4: New averaged gradients

  • __group_235__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_235__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_236__new_weights (optional) - T2: New weights

  • __group_236__new_gradients (optional) - T3: New gradients

  • __group_236__new_moment_1 (optional) - T4: New averaged gradients

  • __group_236__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_236__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_237__new_weights (optional) - T2: New weights

  • __group_237__new_gradients (optional) - T3: New gradients

  • __group_237__new_moment_1 (optional) - T4: New averaged gradients

  • __group_237__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_237__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_238__new_weights (optional) - T2: New weights

  • __group_238__new_gradients (optional) - T3: New gradients

  • __group_238__new_moment_1 (optional) - T4: New averaged gradients

  • __group_238__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_238__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_239__new_weights (optional) - T2: New weights

  • __group_239__new_gradients (optional) - T3: New gradients

  • __group_239__new_moment_1 (optional) - T4: New averaged gradients

  • __group_239__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_239__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_240__new_weights (optional) - T2: New weights

  • __group_240__new_gradients (optional) - T3: New gradients

  • __group_240__new_moment_1 (optional) - T4: New averaged gradients

  • __group_240__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_240__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_241__new_weights (optional) - T2: New weights

  • __group_241__new_gradients (optional) - T3: New gradients

  • __group_241__new_moment_1 (optional) - T4: New averaged gradients

  • __group_241__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_241__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_242__new_weights (optional) - T2: New weights

  • __group_242__new_gradients (optional) - T3: New gradients

  • __group_242__new_moment_1 (optional) - T4: New averaged gradients

  • __group_242__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_242__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_243__new_weights (optional) - T2: New weights

  • __group_243__new_gradients (optional) - T3: New gradients

  • __group_243__new_moment_1 (optional) - T4: New averaged gradients

  • __group_243__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_243__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_244__new_weights (optional) - T2: New weights

  • __group_244__new_gradients (optional) - T3: New gradients

  • __group_244__new_moment_1 (optional) - T4: New averaged gradients

  • __group_244__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_244__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_245__new_weights (optional) - T2: New weights

  • __group_245__new_gradients (optional) - T3: New gradients

  • __group_245__new_moment_1 (optional) - T4: New averaged gradients

  • __group_245__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_245__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_246__new_weights (optional) - T2: New weights

  • __group_246__new_gradients (optional) - T3: New gradients

  • __group_246__new_moment_1 (optional) - T4: New averaged gradients

  • __group_246__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_246__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_247__new_weights (optional) - T2: New weights

  • __group_247__new_gradients (optional) - T3: New gradients

  • __group_247__new_moment_1 (optional) - T4: New averaged gradients

  • __group_247__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_247__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_248__new_weights (optional) - T2: New weights

  • __group_248__new_gradients (optional) - T3: New gradients

  • __group_248__new_moment_1 (optional) - T4: New averaged gradients

  • __group_248__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_248__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_249__new_weights (optional) - T2: New weights

  • __group_249__new_gradients (optional) - T3: New gradients

  • __group_249__new_moment_1 (optional) - T4: New averaged gradients

  • __group_249__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_249__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_250__new_weights (optional) - T2: New weights

  • __group_250__new_gradients (optional) - T3: New gradients

  • __group_250__new_moment_1 (optional) - T4: New averaged gradients

  • __group_250__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_250__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_251__new_weights (optional) - T2: New weights

  • __group_251__new_gradients (optional) - T3: New gradients

  • __group_251__new_moment_1 (optional) - T4: New averaged gradients

  • __group_251__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_251__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_252__new_weights (optional) - T2: New weights

  • __group_252__new_gradients (optional) - T3: New gradients

  • __group_252__new_moment_1 (optional) - T4: New averaged gradients

  • __group_252__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_252__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_253__new_weights (optional) - T2: New weights

  • __group_253__new_gradients (optional) - T3: New gradients

  • __group_253__new_moment_1 (optional) - T4: New averaged gradients

  • __group_253__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_253__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_254__new_weights (optional) - T2: New weights

  • __group_254__new_gradients (optional) - T3: New gradients

  • __group_254__new_moment_1 (optional) - T4: New averaged gradients

  • __group_254__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_254__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_255__new_weights (optional) - T2: New weights

  • __group_255__new_gradients (optional) - T3: New gradients

  • __group_255__new_moment_1 (optional) - T4: New averaged gradients

  • __group_255__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_255__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_256__new_weights (optional) - T2: New weights

  • __group_256__new_gradients (optional) - T3: New gradients

  • __group_256__new_moment_1 (optional) - T4: New averaged gradients

  • __group_256__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_256__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_257__new_weights (optional) - T2: New weights

  • __group_257__new_gradients (optional) - T3: New gradients

  • __group_257__new_moment_1 (optional) - T4: New averaged gradients

  • __group_257__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_257__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_258__new_weights (optional) - T2: New weights

  • __group_258__new_gradients (optional) - T3: New gradients

  • __group_258__new_moment_1 (optional) - T4: New averaged gradients

  • __group_258__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_258__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_259__new_weights (optional) - T2: New weights

  • __group_259__new_gradients (optional) - T3: New gradients

  • __group_259__new_moment_1 (optional) - T4: New averaged gradients

  • __group_259__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_259__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_260__new_weights (optional) - T2: New weights

  • __group_260__new_gradients (optional) - T3: New gradients

  • __group_260__new_moment_1 (optional) - T4: New averaged gradients

  • __group_260__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_260__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_261__new_weights (optional) - T2: New weights

  • __group_261__new_gradients (optional) - T3: New gradients

  • __group_261__new_moment_1 (optional) - T4: New averaged gradients

  • __group_261__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_261__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_262__new_weights (optional) - T2: New weights

  • __group_262__new_gradients (optional) - T3: New gradients

  • __group_262__new_moment_1 (optional) - T4: New averaged gradients

  • __group_262__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_262__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_263__new_weights (optional) - T2: New weights

  • __group_263__new_gradients (optional) - T3: New gradients

  • __group_263__new_moment_1 (optional) - T4: New averaged gradients

  • __group_263__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_263__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_264__new_weights (optional) - T2: New weights

  • __group_264__new_gradients (optional) - T3: New gradients

  • __group_264__new_moment_1 (optional) - T4: New averaged gradients

  • __group_264__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_264__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_265__new_weights (optional) - T2: New weights

  • __group_265__new_gradients (optional) - T3: New gradients

  • __group_265__new_moment_1 (optional) - T4: New averaged gradients

  • __group_265__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_265__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_266__new_weights (optional) - T2: New weights

  • __group_266__new_gradients (optional) - T3: New gradients

  • __group_266__new_moment_1 (optional) - T4: New averaged gradients

  • __group_266__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_266__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_267__new_weights (optional) - T2: New weights

  • __group_267__new_gradients (optional) - T3: New gradients

  • __group_267__new_moment_1 (optional) - T4: New averaged gradients

  • __group_267__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_267__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_268__new_weights (optional) - T2: New weights

  • __group_268__new_gradients (optional) - T3: New gradients

  • __group_268__new_moment_1 (optional) - T4: New averaged gradients

  • __group_268__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_268__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_269__new_weights (optional) - T2: New weights

  • __group_269__new_gradients (optional) - T3: New gradients

  • __group_269__new_moment_1 (optional) - T4: New averaged gradients

  • __group_269__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_269__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_270__new_weights (optional) - T2: New weights

  • __group_270__new_gradients (optional) - T3: New gradients

  • __group_270__new_moment_1 (optional) - T4: New averaged gradients

  • __group_270__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_270__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_271__new_weights (optional) - T2: New weights

  • __group_271__new_gradients (optional) - T3: New gradients

  • __group_271__new_moment_1 (optional) - T4: New averaged gradients

  • __group_271__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_271__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_272__new_weights (optional) - T2: New weights

  • __group_272__new_gradients (optional) - T3: New gradients

  • __group_272__new_moment_1 (optional) - T4: New averaged gradients

  • __group_272__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_272__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_273__new_weights (optional) - T2: New weights

  • __group_273__new_gradients (optional) - T3: New gradients

  • __group_273__new_moment_1 (optional) - T4: New averaged gradients

  • __group_273__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_273__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_274__new_weights (optional) - T2: New weights

  • __group_274__new_gradients (optional) - T3: New gradients

  • __group_274__new_moment_1 (optional) - T4: New averaged gradients

  • __group_274__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_274__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_275__new_weights (optional) - T2: New weights

  • __group_275__new_gradients (optional) - T3: New gradients

  • __group_275__new_moment_1 (optional) - T4: New averaged gradients

  • __group_275__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_275__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_276__new_weights (optional) - T2: New weights

  • __group_276__new_gradients (optional) - T3: New gradients

  • __group_276__new_moment_1 (optional) - T4: New averaged gradients

  • __group_276__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_276__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_277__new_weights (optional) - T2: New weights

  • __group_277__new_gradients (optional) - T3: New gradients

  • __group_277__new_moment_1 (optional) - T4: New averaged gradients

  • __group_277__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_277__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_278__new_weights (optional) - T2: New weights

  • __group_278__new_gradients (optional) - T3: New gradients

  • __group_278__new_moment_1 (optional) - T4: New averaged gradients

  • __group_278__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_278__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_279__new_weights (optional) - T2: New weights

  • __group_279__new_gradients (optional) - T3: New gradients

  • __group_279__new_moment_1 (optional) - T4: New averaged gradients

  • __group_279__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_279__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_280__new_weights (optional) - T2: New weights

  • __group_280__new_gradients (optional) - T3: New gradients

  • __group_280__new_moment_1 (optional) - T4: New averaged gradients

  • __group_280__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_280__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_281__new_weights (optional) - T2: New weights

  • __group_281__new_gradients (optional) - T3: New gradients

  • __group_281__new_moment_1 (optional) - T4: New averaged gradients

  • __group_281__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_281__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_282__new_weights (optional) - T2: New weights

  • __group_282__new_gradients (optional) - T3: New gradients

  • __group_282__new_moment_1 (optional) - T4: New averaged gradients

  • __group_282__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_282__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_283__new_weights (optional) - T2: New weights

  • __group_283__new_gradients (optional) - T3: New gradients

  • __group_283__new_moment_1 (optional) - T4: New averaged gradients

  • __group_283__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_283__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_284__new_weights (optional) - T2: New weights

  • __group_284__new_gradients (optional) - T3: New gradients

  • __group_284__new_moment_1 (optional) - T4: New averaged gradients

  • __group_284__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_284__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_285__new_weights (optional) - T2: New weights

  • __group_285__new_gradients (optional) - T3: New gradients

  • __group_285__new_moment_1 (optional) - T4: New averaged gradients

  • __group_285__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_285__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_286__new_weights (optional) - T2: New weights

  • __group_286__new_gradients (optional) - T3: New gradients

  • __group_286__new_moment_1 (optional) - T4: New averaged gradients

  • __group_286__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_286__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_287__new_weights (optional) - T2: New weights

  • __group_287__new_gradients (optional) - T3: New gradients

  • __group_287__new_moment_1 (optional) - T4: New averaged gradients

  • __group_287__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_287__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_288__new_weights (optional) - T2: New weights

  • __group_288__new_gradients (optional) - T3: New gradients

  • __group_288__new_moment_1 (optional) - T4: New averaged gradients

  • __group_288__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_288__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_289__new_weights (optional) - T2: New weights

  • __group_289__new_gradients (optional) - T3: New gradients

  • __group_289__new_moment_1 (optional) - T4: New averaged gradients

  • __group_289__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_289__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_290__new_weights (optional) - T2: New weights

  • __group_290__new_gradients (optional) - T3: New gradients

  • __group_290__new_moment_1 (optional) - T4: New averaged gradients

  • __group_290__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_290__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_291__new_weights (optional) - T2: New weights

  • __group_291__new_gradients (optional) - T3: New gradients

  • __group_291__new_moment_1 (optional) - T4: New averaged gradients

  • __group_291__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_291__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_292__new_weights (optional) - T2: New weights

  • __group_292__new_gradients (optional) - T3: New gradients

  • __group_292__new_moment_1 (optional) - T4: New averaged gradients

  • __group_292__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_292__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_293__new_weights (optional) - T2: New weights

  • __group_293__new_gradients (optional) - T3: New gradients

  • __group_293__new_moment_1 (optional) - T4: New averaged gradients

  • __group_293__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_293__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_294__new_weights (optional) - T2: New weights

  • __group_294__new_gradients (optional) - T3: New gradients

  • __group_294__new_moment_1 (optional) - T4: New averaged gradients

  • __group_294__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_294__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_295__new_weights (optional) - T2: New weights

  • __group_295__new_gradients (optional) - T3: New gradients

  • __group_295__new_moment_1 (optional) - T4: New averaged gradients

  • __group_295__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_295__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_296__new_weights (optional) - T2: New weights

  • __group_296__new_gradients (optional) - T3: New gradients

  • __group_296__new_moment_1 (optional) - T4: New averaged gradients

  • __group_296__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_296__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_297__new_weights (optional) - T2: New weights

  • __group_297__new_gradients (optional) - T3: New gradients

  • __group_297__new_moment_1 (optional) - T4: New averaged gradients

  • __group_297__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_297__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_298__new_weights (optional) - T2: New weights

  • __group_298__new_gradients (optional) - T3: New gradients

  • __group_298__new_moment_1 (optional) - T4: New averaged gradients

  • __group_298__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_298__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_299__new_weights (optional) - T2: New weights

  • __group_299__new_gradients (optional) - T3: New gradients

  • __group_299__new_moment_1 (optional) - T4: New averaged gradients

  • __group_299__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_299__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_300__new_weights (optional) - T2: New weights

  • __group_300__new_gradients (optional) - T3: New gradients

  • __group_300__new_moment_1 (optional) - T4: New averaged gradients

  • __group_300__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_300__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_301__new_weights (optional) - T2: New weights

  • __group_301__new_gradients (optional) - T3: New gradients

  • __group_301__new_moment_1 (optional) - T4: New averaged gradients

  • __group_301__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_301__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_302__new_weights (optional) - T2: New weights

  • __group_302__new_gradients (optional) - T3: New gradients

  • __group_302__new_moment_1 (optional) - T4: New averaged gradients

  • __group_302__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_302__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_303__new_weights (optional) - T2: New weights

  • __group_303__new_gradients (optional) - T3: New gradients

  • __group_303__new_moment_1 (optional) - T4: New averaged gradients

  • __group_303__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_303__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_304__new_weights (optional) - T2: New weights

  • __group_304__new_gradients (optional) - T3: New gradients

  • __group_304__new_moment_1 (optional) - T4: New averaged gradients

  • __group_304__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_304__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_305__new_weights (optional) - T2: New weights

  • __group_305__new_gradients (optional) - T3: New gradients

  • __group_305__new_moment_1 (optional) - T4: New averaged gradients

  • __group_305__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_305__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_306__new_weights (optional) - T2: New weights

  • __group_306__new_gradients (optional) - T3: New gradients

  • __group_306__new_moment_1 (optional) - T4: New averaged gradients

  • __group_306__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_306__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_307__new_weights (optional) - T2: New weights

  • __group_307__new_gradients (optional) - T3: New gradients

  • __group_307__new_moment_1 (optional) - T4: New averaged gradients

  • __group_307__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_307__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_308__new_weights (optional) - T2: New weights

  • __group_308__new_gradients (optional) - T3: New gradients

  • __group_308__new_moment_1 (optional) - T4: New averaged gradients

  • __group_308__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_308__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_309__new_weights (optional) - T2: New weights

  • __group_309__new_gradients (optional) - T3: New gradients

  • __group_309__new_moment_1 (optional) - T4: New averaged gradients

  • __group_309__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_309__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_310__new_weights (optional) - T2: New weights

  • __group_310__new_gradients (optional) - T3: New gradients

  • __group_310__new_moment_1 (optional) - T4: New averaged gradients

  • __group_310__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_310__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_311__new_weights (optional) - T2: New weights

  • __group_311__new_gradients (optional) - T3: New gradients

  • __group_311__new_moment_1 (optional) - T4: New averaged gradients

  • __group_311__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_311__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_312__new_weights (optional) - T2: New weights

  • __group_312__new_gradients (optional) - T3: New gradients

  • __group_312__new_moment_1 (optional) - T4: New averaged gradients

  • __group_312__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_312__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_313__new_weights (optional) - T2: New weights

  • __group_313__new_gradients (optional) - T3: New gradients

  • __group_313__new_moment_1 (optional) - T4: New averaged gradients

  • __group_313__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_313__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_314__new_weights (optional) - T2: New weights

  • __group_314__new_gradients (optional) - T3: New gradients

  • __group_314__new_moment_1 (optional) - T4: New averaged gradients

  • __group_314__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_314__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_315__new_weights (optional) - T2: New weights

  • __group_315__new_gradients (optional) - T3: New gradients

  • __group_315__new_moment_1 (optional) - T4: New averaged gradients

  • __group_315__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_315__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_316__new_weights (optional) - T2: New weights

  • __group_316__new_gradients (optional) - T3: New gradients

  • __group_316__new_moment_1 (optional) - T4: New averaged gradients

  • __group_316__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_316__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_317__new_weights (optional) - T2: New weights

  • __group_317__new_gradients (optional) - T3: New gradients

  • __group_317__new_moment_1 (optional) - T4: New averaged gradients

  • __group_317__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_317__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_318__new_weights (optional) - T2: New weights

  • __group_318__new_gradients (optional) - T3: New gradients

  • __group_318__new_moment_1 (optional) - T4: New averaged gradients

  • __group_318__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_318__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_319__new_weights (optional) - T2: New weights

  • __group_319__new_gradients (optional) - T3: New gradients

  • __group_319__new_moment_1 (optional) - T4: New averaged gradients

  • __group_319__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_319__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_320__new_weights (optional) - T2: New weights

  • __group_320__new_gradients (optional) - T3: New gradients

  • __group_320__new_moment_1 (optional) - T4: New averaged gradients

  • __group_320__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_320__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_321__new_weights (optional) - T2: New weights

  • __group_321__new_gradients (optional) - T3: New gradients

  • __group_321__new_moment_1 (optional) - T4: New averaged gradients

  • __group_321__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_321__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_322__new_weights (optional) - T2: New weights

  • __group_322__new_gradients (optional) - T3: New gradients

  • __group_322__new_moment_1 (optional) - T4: New averaged gradients

  • __group_322__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_322__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_323__new_weights (optional) - T2: New weights

  • __group_323__new_gradients (optional) - T3: New gradients

  • __group_323__new_moment_1 (optional) - T4: New averaged gradients

  • __group_323__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_323__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_324__new_weights (optional) - T2: New weights

  • __group_324__new_gradients (optional) - T3: New gradients

  • __group_324__new_moment_1 (optional) - T4: New averaged gradients

  • __group_324__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_324__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_325__new_weights (optional) - T2: New weights

  • __group_325__new_gradients (optional) - T3: New gradients

  • __group_325__new_moment_1 (optional) - T4: New averaged gradients

  • __group_325__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_325__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_326__new_weights (optional) - T2: New weights

  • __group_326__new_gradients (optional) - T3: New gradients

  • __group_326__new_moment_1 (optional) - T4: New averaged gradients

  • __group_326__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_326__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_327__new_weights (optional) - T2: New weights

  • __group_327__new_gradients (optional) - T3: New gradients

  • __group_327__new_moment_1 (optional) - T4: New averaged gradients

  • __group_327__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_327__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_328__new_weights (optional) - T2: New weights

  • __group_328__new_gradients (optional) - T3: New gradients

  • __group_328__new_moment_1 (optional) - T4: New averaged gradients

  • __group_328__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_328__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_329__new_weights (optional) - T2: New weights

  • __group_329__new_gradients (optional) - T3: New gradients

  • __group_329__new_moment_1 (optional) - T4: New averaged gradients

  • __group_329__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_329__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_330__new_weights (optional) - T2: New weights

  • __group_330__new_gradients (optional) - T3: New gradients

  • __group_330__new_moment_1 (optional) - T4: New averaged gradients

  • __group_330__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_330__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_331__new_weights (optional) - T2: New weights

  • __group_331__new_gradients (optional) - T3: New gradients

  • __group_331__new_moment_1 (optional) - T4: New averaged gradients

  • __group_331__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_331__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_332__new_weights (optional) - T2: New weights

  • __group_332__new_gradients (optional) - T3: New gradients

  • __group_332__new_moment_1 (optional) - T4: New averaged gradients

  • __group_332__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_332__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_333__new_weights (optional) - T2: New weights

  • __group_333__new_gradients (optional) - T3: New gradients

  • __group_333__new_moment_1 (optional) - T4: New averaged gradients

  • __group_333__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_333__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_334__new_weights (optional) - T2: New weights

  • __group_334__new_gradients (optional) - T3: New gradients

  • __group_334__new_moment_1 (optional) - T4: New averaged gradients

  • __group_334__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_334__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_335__new_weights (optional) - T2: New weights

  • __group_335__new_gradients (optional) - T3: New gradients

  • __group_335__new_moment_1 (optional) - T4: New averaged gradients

  • __group_335__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_335__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_336__new_weights (optional) - T2: New weights

  • __group_336__new_gradients (optional) - T3: New gradients

  • __group_336__new_moment_1 (optional) - T4: New averaged gradients

  • __group_336__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_336__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_337__new_weights (optional) - T2: New weights

  • __group_337__new_gradients (optional) - T3: New gradients

  • __group_337__new_moment_1 (optional) - T4: New averaged gradients

  • __group_337__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_337__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_338__new_weights (optional) - T2: New weights

  • __group_338__new_gradients (optional) - T3: New gradients

  • __group_338__new_moment_1 (optional) - T4: New averaged gradients

  • __group_338__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_338__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_339__new_weights (optional) - T2: New weights

  • __group_339__new_gradients (optional) - T3: New gradients

  • __group_339__new_moment_1 (optional) - T4: New averaged gradients

  • __group_339__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_339__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_340__new_weights (optional) - T2: New weights

  • __group_340__new_gradients (optional) - T3: New gradients

  • __group_340__new_moment_1 (optional) - T4: New averaged gradients

  • __group_340__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_340__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_341__new_weights (optional) - T2: New weights

  • __group_341__new_gradients (optional) - T3: New gradients

  • __group_341__new_moment_1 (optional) - T4: New averaged gradients

  • __group_341__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_341__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_342__new_weights (optional) - T2: New weights

  • __group_342__new_gradients (optional) - T3: New gradients

  • __group_342__new_moment_1 (optional) - T4: New averaged gradients

  • __group_342__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_342__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_343__new_weights (optional) - T2: New weights

  • __group_343__new_gradients (optional) - T3: New gradients

  • __group_343__new_moment_1 (optional) - T4: New averaged gradients

  • __group_343__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_343__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_344__new_weights (optional) - T2: New weights

  • __group_344__new_gradients (optional) - T3: New gradients

  • __group_344__new_moment_1 (optional) - T4: New averaged gradients

  • __group_344__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_344__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_345__new_weights (optional) - T2: New weights

  • __group_345__new_gradients (optional) - T3: New gradients

  • __group_345__new_moment_1 (optional) - T4: New averaged gradients

  • __group_345__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_345__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_346__new_weights (optional) - T2: New weights

  • __group_346__new_gradients (optional) - T3: New gradients

  • __group_346__new_moment_1 (optional) - T4: New averaged gradients

  • __group_346__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_346__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_347__new_weights (optional) - T2: New weights

  • __group_347__new_gradients (optional) - T3: New gradients

  • __group_347__new_moment_1 (optional) - T4: New averaged gradients

  • __group_347__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_347__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_348__new_weights (optional) - T2: New weights

  • __group_348__new_gradients (optional) - T3: New gradients

  • __group_348__new_moment_1 (optional) - T4: New averaged gradients

  • __group_348__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_348__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_349__new_weights (optional) - T2: New weights

  • __group_349__new_gradients (optional) - T3: New gradients

  • __group_349__new_moment_1 (optional) - T4: New averaged gradients

  • __group_349__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_349__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_350__new_weights (optional) - T2: New weights

  • __group_350__new_gradients (optional) - T3: New gradients

  • __group_350__new_moment_1 (optional) - T4: New averaged gradients

  • __group_350__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_350__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_351__new_weights (optional) - T2: New weights

  • __group_351__new_gradients (optional) - T3: New gradients

  • __group_351__new_moment_1 (optional) - T4: New averaged gradients

  • __group_351__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_351__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_352__new_weights (optional) - T2: New weights

  • __group_352__new_gradients (optional) - T3: New gradients

  • __group_352__new_moment_1 (optional) - T4: New averaged gradients

  • __group_352__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_352__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_353__new_weights (optional) - T2: New weights

  • __group_353__new_gradients (optional) - T3: New gradients

  • __group_353__new_moment_1 (optional) - T4: New averaged gradients

  • __group_353__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_353__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_354__new_weights (optional) - T2: New weights

  • __group_354__new_gradients (optional) - T3: New gradients

  • __group_354__new_moment_1 (optional) - T4: New averaged gradients

  • __group_354__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_354__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_355__new_weights (optional) - T2: New weights

  • __group_355__new_gradients (optional) - T3: New gradients

  • __group_355__new_moment_1 (optional) - T4: New averaged gradients

  • __group_355__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_355__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_356__new_weights (optional) - T2: New weights

  • __group_356__new_gradients (optional) - T3: New gradients

  • __group_356__new_moment_1 (optional) - T4: New averaged gradients

  • __group_356__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_356__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_357__new_weights (optional) - T2: New weights

  • __group_357__new_gradients (optional) - T3: New gradients

  • __group_357__new_moment_1 (optional) - T4: New averaged gradients

  • __group_357__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_357__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_358__new_weights (optional) - T2: New weights

  • __group_358__new_gradients (optional) - T3: New gradients

  • __group_358__new_moment_1 (optional) - T4: New averaged gradients

  • __group_358__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_358__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_359__new_weights (optional) - T2: New weights

  • __group_359__new_gradients (optional) - T3: New gradients

  • __group_359__new_moment_1 (optional) - T4: New averaged gradients

  • __group_359__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_359__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_360__new_weights (optional) - T2: New weights

  • __group_360__new_gradients (optional) - T3: New gradients

  • __group_360__new_moment_1 (optional) - T4: New averaged gradients

  • __group_360__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_360__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_361__new_weights (optional) - T2: New weights

  • __group_361__new_gradients (optional) - T3: New gradients

  • __group_361__new_moment_1 (optional) - T4: New averaged gradients

  • __group_361__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_361__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_362__new_weights (optional) - T2: New weights

  • __group_362__new_gradients (optional) - T3: New gradients

  • __group_362__new_moment_1 (optional) - T4: New averaged gradients

  • __group_362__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_362__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_363__new_weights (optional) - T2: New weights

  • __group_363__new_gradients (optional) - T3: New gradients

  • __group_363__new_moment_1 (optional) - T4: New averaged gradients

  • __group_363__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_363__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_364__new_weights (optional) - T2: New weights

  • __group_364__new_gradients (optional) - T3: New gradients

  • __group_364__new_moment_1 (optional) - T4: New averaged gradients

  • __group_364__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_364__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_365__new_weights (optional) - T2: New weights

  • __group_365__new_gradients (optional) - T3: New gradients

  • __group_365__new_moment_1 (optional) - T4: New averaged gradients

  • __group_365__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_365__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_366__new_weights (optional) - T2: New weights

  • __group_366__new_gradients (optional) - T3: New gradients

  • __group_366__new_moment_1 (optional) - T4: New averaged gradients

  • __group_366__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_366__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_367__new_weights (optional) - T2: New weights

  • __group_367__new_gradients (optional) - T3: New gradients

  • __group_367__new_moment_1 (optional) - T4: New averaged gradients

  • __group_367__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_367__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_368__new_weights (optional) - T2: New weights

  • __group_368__new_gradients (optional) - T3: New gradients

  • __group_368__new_moment_1 (optional) - T4: New averaged gradients

  • __group_368__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_368__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_369__new_weights (optional) - T2: New weights

  • __group_369__new_gradients (optional) - T3: New gradients

  • __group_369__new_moment_1 (optional) - T4: New averaged gradients

  • __group_369__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_369__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_370__new_weights (optional) - T2: New weights

  • __group_370__new_gradients (optional) - T3: New gradients

  • __group_370__new_moment_1 (optional) - T4: New averaged gradients

  • __group_370__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_370__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_371__new_weights (optional) - T2: New weights

  • __group_371__new_gradients (optional) - T3: New gradients

  • __group_371__new_moment_1 (optional) - T4: New averaged gradients

  • __group_371__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_371__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_372__new_weights (optional) - T2: New weights

  • __group_372__new_gradients (optional) - T3: New gradients

  • __group_372__new_moment_1 (optional) - T4: New averaged gradients

  • __group_372__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_372__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_373__new_weights (optional) - T2: New weights

  • __group_373__new_gradients (optional) - T3: New gradients

  • __group_373__new_moment_1 (optional) - T4: New averaged gradients

  • __group_373__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_373__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_374__new_weights (optional) - T2: New weights

  • __group_374__new_gradients (optional) - T3: New gradients

  • __group_374__new_moment_1 (optional) - T4: New averaged gradients

  • __group_374__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_374__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_375__new_weights (optional) - T2: New weights

  • __group_375__new_gradients (optional) - T3: New gradients

  • __group_375__new_moment_1 (optional) - T4: New averaged gradients

  • __group_375__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_375__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_376__new_weights (optional) - T2: New weights

  • __group_376__new_gradients (optional) - T3: New gradients

  • __group_376__new_moment_1 (optional) - T4: New averaged gradients

  • __group_376__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_376__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_377__new_weights (optional) - T2: New weights

  • __group_377__new_gradients (optional) - T3: New gradients

  • __group_377__new_moment_1 (optional) - T4: New averaged gradients

  • __group_377__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_377__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_378__new_weights (optional) - T2: New weights

  • __group_378__new_gradients (optional) - T3: New gradients

  • __group_378__new_moment_1 (optional) - T4: New averaged gradients

  • __group_378__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_378__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_379__new_weights (optional) - T2: New weights

  • __group_379__new_gradients (optional) - T3: New gradients

  • __group_379__new_moment_1 (optional) - T4: New averaged gradients

  • __group_379__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_379__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_380__new_weights (optional) - T2: New weights

  • __group_380__new_gradients (optional) - T3: New gradients

  • __group_380__new_moment_1 (optional) - T4: New averaged gradients

  • __group_380__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_380__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_381__new_weights (optional) - T2: New weights

  • __group_381__new_gradients (optional) - T3: New gradients

  • __group_381__new_moment_1 (optional) - T4: New averaged gradients

  • __group_381__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_381__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_382__new_weights (optional) - T2: New weights

  • __group_382__new_gradients (optional) - T3: New gradients

  • __group_382__new_moment_1 (optional) - T4: New averaged gradients

  • __group_382__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_382__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_383__new_weights (optional) - T2: New weights

  • __group_383__new_gradients (optional) - T3: New gradients

  • __group_383__new_moment_1 (optional) - T4: New averaged gradients

  • __group_383__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_383__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_384__new_weights (optional) - T2: New weights

  • __group_384__new_gradients (optional) - T3: New gradients

  • __group_384__new_moment_1 (optional) - T4: New averaged gradients

  • __group_384__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_384__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_385__new_weights (optional) - T2: New weights

  • __group_385__new_gradients (optional) - T3: New gradients

  • __group_385__new_moment_1 (optional) - T4: New averaged gradients

  • __group_385__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_385__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_386__new_weights (optional) - T2: New weights

  • __group_386__new_gradients (optional) - T3: New gradients

  • __group_386__new_moment_1 (optional) - T4: New averaged gradients

  • __group_386__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_386__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_387__new_weights (optional) - T2: New weights

  • __group_387__new_gradients (optional) - T3: New gradients

  • __group_387__new_moment_1 (optional) - T4: New averaged gradients

  • __group_387__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_387__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_388__new_weights (optional) - T2: New weights

  • __group_388__new_gradients (optional) - T3: New gradients

  • __group_388__new_moment_1 (optional) - T4: New averaged gradients

  • __group_388__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_388__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_389__new_weights (optional) - T2: New weights

  • __group_389__new_gradients (optional) - T3: New gradients

  • __group_389__new_moment_1 (optional) - T4: New averaged gradients

  • __group_389__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_389__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_390__new_weights (optional) - T2: New weights

  • __group_390__new_gradients (optional) - T3: New gradients

  • __group_390__new_moment_1 (optional) - T4: New averaged gradients

  • __group_390__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_390__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_391__new_weights (optional) - T2: New weights

  • __group_391__new_gradients (optional) - T3: New gradients

  • __group_391__new_moment_1 (optional) - T4: New averaged gradients

  • __group_391__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_391__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_392__new_weights (optional) - T2: New weights

  • __group_392__new_gradients (optional) - T3: New gradients

  • __group_392__new_moment_1 (optional) - T4: New averaged gradients

  • __group_392__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_392__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_393__new_weights (optional) - T2: New weights

  • __group_393__new_gradients (optional) - T3: New gradients

  • __group_393__new_moment_1 (optional) - T4: New averaged gradients

  • __group_393__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_393__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_394__new_weights (optional) - T2: New weights

  • __group_394__new_gradients (optional) - T3: New gradients

  • __group_394__new_moment_1 (optional) - T4: New averaged gradients

  • __group_394__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_394__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_395__new_weights (optional) - T2: New weights

  • __group_395__new_gradients (optional) - T3: New gradients

  • __group_395__new_moment_1 (optional) - T4: New averaged gradients

  • __group_395__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_395__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_396__new_weights (optional) - T2: New weights

  • __group_396__new_gradients (optional) - T3: New gradients

  • __group_396__new_moment_1 (optional) - T4: New averaged gradients

  • __group_396__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_396__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_397__new_weights (optional) - T2: New weights

  • __group_397__new_gradients (optional) - T3: New gradients

  • __group_397__new_moment_1 (optional) - T4: New averaged gradients

  • __group_397__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_397__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_398__new_weights (optional) - T2: New weights

  • __group_398__new_gradients (optional) - T3: New gradients

  • __group_398__new_moment_1 (optional) - T4: New averaged gradients

  • __group_398__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_398__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_399__new_weights (optional) - T2: New weights

  • __group_399__new_gradients (optional) - T3: New gradients

  • __group_399__new_moment_1 (optional) - T4: New averaged gradients

  • __group_399__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_399__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_400__new_weights (optional) - T2: New weights

  • __group_400__new_gradients (optional) - T3: New gradients

  • __group_400__new_moment_1 (optional) - T4: New averaged gradients

  • __group_400__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_400__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_401__new_weights (optional) - T2: New weights

  • __group_401__new_gradients (optional) - T3: New gradients

  • __group_401__new_moment_1 (optional) - T4: New averaged gradients

  • __group_401__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_401__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_402__new_weights (optional) - T2: New weights

  • __group_402__new_gradients (optional) - T3: New gradients

  • __group_402__new_moment_1 (optional) - T4: New averaged gradients

  • __group_402__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_402__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_403__new_weights (optional) - T2: New weights

  • __group_403__new_gradients (optional) - T3: New gradients

  • __group_403__new_moment_1 (optional) - T4: New averaged gradients

  • __group_403__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_403__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_404__new_weights (optional) - T2: New weights

  • __group_404__new_gradients (optional) - T3: New gradients

  • __group_404__new_moment_1 (optional) - T4: New averaged gradients

  • __group_404__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_404__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_405__new_weights (optional) - T2: New weights

  • __group_405__new_gradients (optional) - T3: New gradients

  • __group_405__new_moment_1 (optional) - T4: New averaged gradients

  • __group_405__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_405__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_406__new_weights (optional) - T2: New weights

  • __group_406__new_gradients (optional) - T3: New gradients

  • __group_406__new_moment_1 (optional) - T4: New averaged gradients

  • __group_406__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_406__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_407__new_weights (optional) - T2: New weights

  • __group_407__new_gradients (optional) - T3: New gradients

  • __group_407__new_moment_1 (optional) - T4: New averaged gradients

  • __group_407__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_407__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_408__new_weights (optional) - T2: New weights

  • __group_408__new_gradients (optional) - T3: New gradients

  • __group_408__new_moment_1 (optional) - T4: New averaged gradients

  • __group_408__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_408__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_409__new_weights (optional) - T2: New weights

  • __group_409__new_gradients (optional) - T3: New gradients

  • __group_409__new_moment_1 (optional) - T4: New averaged gradients

  • __group_409__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_409__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_410__new_weights (optional) - T2: New weights

  • __group_410__new_gradients (optional) - T3: New gradients

  • __group_410__new_moment_1 (optional) - T4: New averaged gradients

  • __group_410__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_410__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_411__new_weights (optional) - T2: New weights

  • __group_411__new_gradients (optional) - T3: New gradients

  • __group_411__new_moment_1 (optional) - T4: New averaged gradients

  • __group_411__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_411__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_412__new_weights (optional) - T2: New weights

  • __group_412__new_gradients (optional) - T3: New gradients

  • __group_412__new_moment_1 (optional) - T4: New averaged gradients

  • __group_412__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_412__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_413__new_weights (optional) - T2: New weights

  • __group_413__new_gradients (optional) - T3: New gradients

  • __group_413__new_moment_1 (optional) - T4: New averaged gradients

  • __group_413__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_413__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_414__new_weights (optional) - T2: New weights

  • __group_414__new_gradients (optional) - T3: New gradients

  • __group_414__new_moment_1 (optional) - T4: New averaged gradients

  • __group_414__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_414__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_415__new_weights (optional) - T2: New weights

  • __group_415__new_gradients (optional) - T3: New gradients

  • __group_415__new_moment_1 (optional) - T4: New averaged gradients

  • __group_415__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_415__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_416__new_weights (optional) - T2: New weights

  • __group_416__new_gradients (optional) - T3: New gradients

  • __group_416__new_moment_1 (optional) - T4: New averaged gradients

  • __group_416__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_416__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_417__new_weights (optional) - T2: New weights

  • __group_417__new_gradients (optional) - T3: New gradients

  • __group_417__new_moment_1 (optional) - T4: New averaged gradients

  • __group_417__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_417__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_418__new_weights (optional) - T2: New weights

  • __group_418__new_gradients (optional) - T3: New gradients

  • __group_418__new_moment_1 (optional) - T4: New averaged gradients

  • __group_418__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_418__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_419__new_weights (optional) - T2: New weights

  • __group_419__new_gradients (optional) - T3: New gradients

  • __group_419__new_moment_1 (optional) - T4: New averaged gradients

  • __group_419__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_419__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_420__new_weights (optional) - T2: New weights

  • __group_420__new_gradients (optional) - T3: New gradients

  • __group_420__new_moment_1 (optional) - T4: New averaged gradients

  • __group_420__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_420__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_421__new_weights (optional) - T2: New weights

  • __group_421__new_gradients (optional) - T3: New gradients

  • __group_421__new_moment_1 (optional) - T4: New averaged gradients

  • __group_421__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_421__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_422__new_weights (optional) - T2: New weights

  • __group_422__new_gradients (optional) - T3: New gradients

  • __group_422__new_moment_1 (optional) - T4: New averaged gradients

  • __group_422__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_422__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_423__new_weights (optional) - T2: New weights

  • __group_423__new_gradients (optional) - T3: New gradients

  • __group_423__new_moment_1 (optional) - T4: New averaged gradients

  • __group_423__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_423__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_424__new_weights (optional) - T2: New weights

  • __group_424__new_gradients (optional) - T3: New gradients

  • __group_424__new_moment_1 (optional) - T4: New averaged gradients

  • __group_424__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_424__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_425__new_weights (optional) - T2: New weights

  • __group_425__new_gradients (optional) - T3: New gradients

  • __group_425__new_moment_1 (optional) - T4: New averaged gradients

  • __group_425__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_425__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_426__new_weights (optional) - T2: New weights

  • __group_426__new_gradients (optional) - T3: New gradients

  • __group_426__new_moment_1 (optional) - T4: New averaged gradients

  • __group_426__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_426__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_427__new_weights (optional) - T2: New weights

  • __group_427__new_gradients (optional) - T3: New gradients

  • __group_427__new_moment_1 (optional) - T4: New averaged gradients

  • __group_427__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_427__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_428__new_weights (optional) - T2: New weights

  • __group_428__new_gradients (optional) - T3: New gradients

  • __group_428__new_moment_1 (optional) - T4: New averaged gradients

  • __group_428__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_428__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_429__new_weights (optional) - T2: New weights

  • __group_429__new_gradients (optional) - T3: New gradients

  • __group_429__new_moment_1 (optional) - T4: New averaged gradients

  • __group_429__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_429__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_430__new_weights (optional) - T2: New weights

  • __group_430__new_gradients (optional) - T3: New gradients

  • __group_430__new_moment_1 (optional) - T4: New averaged gradients

  • __group_430__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_430__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_431__new_weights (optional) - T2: New weights

  • __group_431__new_gradients (optional) - T3: New gradients

  • __group_431__new_moment_1 (optional) - T4: New averaged gradients

  • __group_431__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_431__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_432__new_weights (optional) - T2: New weights

  • __group_432__new_gradients (optional) - T3: New gradients

  • __group_432__new_moment_1 (optional) - T4: New averaged gradients

  • __group_432__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_432__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_433__new_weights (optional) - T2: New weights

  • __group_433__new_gradients (optional) - T3: New gradients

  • __group_433__new_moment_1 (optional) - T4: New averaged gradients

  • __group_433__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_433__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_434__new_weights (optional) - T2: New weights

  • __group_434__new_gradients (optional) - T3: New gradients

  • __group_434__new_moment_1 (optional) - T4: New averaged gradients

  • __group_434__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_434__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_435__new_weights (optional) - T2: New weights

  • __group_435__new_gradients (optional) - T3: New gradients

  • __group_435__new_moment_1 (optional) - T4: New averaged gradients

  • __group_435__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_435__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_436__new_weights (optional) - T2: New weights

  • __group_436__new_gradients (optional) - T3: New gradients

  • __group_436__new_moment_1 (optional) - T4: New averaged gradients

  • __group_436__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_436__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_437__new_weights (optional) - T2: New weights

  • __group_437__new_gradients (optional) - T3: New gradients

  • __group_437__new_moment_1 (optional) - T4: New averaged gradients

  • __group_437__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_437__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_438__new_weights (optional) - T2: New weights

  • __group_438__new_gradients (optional) - T3: New gradients

  • __group_438__new_moment_1 (optional) - T4: New averaged gradients

  • __group_438__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_438__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_439__new_weights (optional) - T2: New weights

  • __group_439__new_gradients (optional) - T3: New gradients

  • __group_439__new_moment_1 (optional) - T4: New averaged gradients

  • __group_439__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_439__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_440__new_weights (optional) - T2: New weights

  • __group_440__new_gradients (optional) - T3: New gradients

  • __group_440__new_moment_1 (optional) - T4: New averaged gradients

  • __group_440__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_440__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_441__new_weights (optional) - T2: New weights

  • __group_441__new_gradients (optional) - T3: New gradients

  • __group_441__new_moment_1 (optional) - T4: New averaged gradients

  • __group_441__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_441__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_442__new_weights (optional) - T2: New weights

  • __group_442__new_gradients (optional) - T3: New gradients

  • __group_442__new_moment_1 (optional) - T4: New averaged gradients

  • __group_442__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_442__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_443__new_weights (optional) - T2: New weights

  • __group_443__new_gradients (optional) - T3: New gradients

  • __group_443__new_moment_1 (optional) - T4: New averaged gradients

  • __group_443__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_443__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_444__new_weights (optional) - T2: New weights

  • __group_444__new_gradients (optional) - T3: New gradients

  • __group_444__new_moment_1 (optional) - T4: New averaged gradients

  • __group_444__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_444__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_445__new_weights (optional) - T2: New weights

  • __group_445__new_gradients (optional) - T3: New gradients

  • __group_445__new_moment_1 (optional) - T4: New averaged gradients

  • __group_445__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_445__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_446__new_weights (optional) - T2: New weights

  • __group_446__new_gradients (optional) - T3: New gradients

  • __group_446__new_moment_1 (optional) - T4: New averaged gradients

  • __group_446__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_446__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_447__new_weights (optional) - T2: New weights

  • __group_447__new_gradients (optional) - T3: New gradients

  • __group_447__new_moment_1 (optional) - T4: New averaged gradients

  • __group_447__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_447__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_448__new_weights (optional) - T2: New weights

  • __group_448__new_gradients (optional) - T3: New gradients

  • __group_448__new_moment_1 (optional) - T4: New averaged gradients

  • __group_448__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_448__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_449__new_weights (optional) - T2: New weights

  • __group_449__new_gradients (optional) - T3: New gradients

  • __group_449__new_moment_1 (optional) - T4: New averaged gradients

  • __group_449__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_449__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_450__new_weights (optional) - T2: New weights

  • __group_450__new_gradients (optional) - T3: New gradients

  • __group_450__new_moment_1 (optional) - T4: New averaged gradients

  • __group_450__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_450__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_451__new_weights (optional) - T2: New weights

  • __group_451__new_gradients (optional) - T3: New gradients

  • __group_451__new_moment_1 (optional) - T4: New averaged gradients

  • __group_451__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_451__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_452__new_weights (optional) - T2: New weights

  • __group_452__new_gradients (optional) - T3: New gradients

  • __group_452__new_moment_1 (optional) - T4: New averaged gradients

  • __group_452__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_452__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_453__new_weights (optional) - T2: New weights

  • __group_453__new_gradients (optional) - T3: New gradients

  • __group_453__new_moment_1 (optional) - T4: New averaged gradients

  • __group_453__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_453__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_454__new_weights (optional) - T2: New weights

  • __group_454__new_gradients (optional) - T3: New gradients

  • __group_454__new_moment_1 (optional) - T4: New averaged gradients

  • __group_454__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_454__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_455__new_weights (optional) - T2: New weights

  • __group_455__new_gradients (optional) - T3: New gradients

  • __group_455__new_moment_1 (optional) - T4: New averaged gradients

  • __group_455__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_455__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_456__new_weights (optional) - T2: New weights

  • __group_456__new_gradients (optional) - T3: New gradients

  • __group_456__new_moment_1 (optional) - T4: New averaged gradients

  • __group_456__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_456__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_457__new_weights (optional) - T2: New weights

  • __group_457__new_gradients (optional) - T3: New gradients

  • __group_457__new_moment_1 (optional) - T4: New averaged gradients

  • __group_457__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_457__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_458__new_weights (optional) - T2: New weights

  • __group_458__new_gradients (optional) - T3: New gradients

  • __group_458__new_moment_1 (optional) - T4: New averaged gradients

  • __group_458__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_458__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_459__new_weights (optional) - T2: New weights

  • __group_459__new_gradients (optional) - T3: New gradients

  • __group_459__new_moment_1 (optional) - T4: New averaged gradients

  • __group_459__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_459__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_460__new_weights (optional) - T2: New weights

  • __group_460__new_gradients (optional) - T3: New gradients

  • __group_460__new_moment_1 (optional) - T4: New averaged gradients

  • __group_460__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_460__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_461__new_weights (optional) - T2: New weights

  • __group_461__new_gradients (optional) - T3: New gradients

  • __group_461__new_moment_1 (optional) - T4: New averaged gradients

  • __group_461__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_461__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_462__new_weights (optional) - T2: New weights

  • __group_462__new_gradients (optional) - T3: New gradients

  • __group_462__new_moment_1 (optional) - T4: New averaged gradients

  • __group_462__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_462__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_463__new_weights (optional) - T2: New weights

  • __group_463__new_gradients (optional) - T3: New gradients

  • __group_463__new_moment_1 (optional) - T4: New averaged gradients

  • __group_463__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_463__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_464__new_weights (optional) - T2: New weights

  • __group_464__new_gradients (optional) - T3: New gradients

  • __group_464__new_moment_1 (optional) - T4: New averaged gradients

  • __group_464__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_464__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_465__new_weights (optional) - T2: New weights

  • __group_465__new_gradients (optional) - T3: New gradients

  • __group_465__new_moment_1 (optional) - T4: New averaged gradients

  • __group_465__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_465__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_466__new_weights (optional) - T2: New weights

  • __group_466__new_gradients (optional) - T3: New gradients

  • __group_466__new_moment_1 (optional) - T4: New averaged gradients

  • __group_466__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_466__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_467__new_weights (optional) - T2: New weights

  • __group_467__new_gradients (optional) - T3: New gradients

  • __group_467__new_moment_1 (optional) - T4: New averaged gradients

  • __group_467__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_467__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_468__new_weights (optional) - T2: New weights

  • __group_468__new_gradients (optional) - T3: New gradients

  • __group_468__new_moment_1 (optional) - T4: New averaged gradients

  • __group_468__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_468__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_469__new_weights (optional) - T2: New weights

  • __group_469__new_gradients (optional) - T3: New gradients

  • __group_469__new_moment_1 (optional) - T4: New averaged gradients

  • __group_469__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_469__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_470__new_weights (optional) - T2: New weights

  • __group_470__new_gradients (optional) - T3: New gradients

  • __group_470__new_moment_1 (optional) - T4: New averaged gradients

  • __group_470__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_470__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_471__new_weights (optional) - T2: New weights

  • __group_471__new_gradients (optional) - T3: New gradients

  • __group_471__new_moment_1 (optional) - T4: New averaged gradients

  • __group_471__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_471__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_472__new_weights (optional) - T2: New weights

  • __group_472__new_gradients (optional) - T3: New gradients

  • __group_472__new_moment_1 (optional) - T4: New averaged gradients

  • __group_472__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_472__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_473__new_weights (optional) - T2: New weights

  • __group_473__new_gradients (optional) - T3: New gradients

  • __group_473__new_moment_1 (optional) - T4: New averaged gradients

  • __group_473__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_473__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_474__new_weights (optional) - T2: New weights

  • __group_474__new_gradients (optional) - T3: New gradients

  • __group_474__new_moment_1 (optional) - T4: New averaged gradients

  • __group_474__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_474__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_475__new_weights (optional) - T2: New weights

  • __group_475__new_gradients (optional) - T3: New gradients

  • __group_475__new_moment_1 (optional) - T4: New averaged gradients

  • __group_475__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_475__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_476__new_weights (optional) - T2: New weights

  • __group_476__new_gradients (optional) - T3: New gradients

  • __group_476__new_moment_1 (optional) - T4: New averaged gradients

  • __group_476__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_476__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_477__new_weights (optional) - T2: New weights

  • __group_477__new_gradients (optional) - T3: New gradients

  • __group_477__new_moment_1 (optional) - T4: New averaged gradients

  • __group_477__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_477__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_478__new_weights (optional) - T2: New weights

  • __group_478__new_gradients (optional) - T3: New gradients

  • __group_478__new_moment_1 (optional) - T4: New averaged gradients

  • __group_478__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_478__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_479__new_weights (optional) - T2: New weights

  • __group_479__new_gradients (optional) - T3: New gradients

  • __group_479__new_moment_1 (optional) - T4: New averaged gradients

  • __group_479__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_479__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_480__new_weights (optional) - T2: New weights

  • __group_480__new_gradients (optional) - T3: New gradients

  • __group_480__new_moment_1 (optional) - T4: New averaged gradients

  • __group_480__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_480__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_481__new_weights (optional) - T2: New weights

  • __group_481__new_gradients (optional) - T3: New gradients

  • __group_481__new_moment_1 (optional) - T4: New averaged gradients

  • __group_481__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_481__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_482__new_weights (optional) - T2: New weights

  • __group_482__new_gradients (optional) - T3: New gradients

  • __group_482__new_moment_1 (optional) - T4: New averaged gradients

  • __group_482__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_482__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_483__new_weights (optional) - T2: New weights

  • __group_483__new_gradients (optional) - T3: New gradients

  • __group_483__new_moment_1 (optional) - T4: New averaged gradients

  • __group_483__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_483__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_484__new_weights (optional) - T2: New weights

  • __group_484__new_gradients (optional) - T3: New gradients

  • __group_484__new_moment_1 (optional) - T4: New averaged gradients

  • __group_484__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_484__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_485__new_weights (optional) - T2: New weights

  • __group_485__new_gradients (optional) - T3: New gradients

  • __group_485__new_moment_1 (optional) - T4: New averaged gradients

  • __group_485__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_485__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_486__new_weights (optional) - T2: New weights

  • __group_486__new_gradients (optional) - T3: New gradients

  • __group_486__new_moment_1 (optional) - T4: New averaged gradients

  • __group_486__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_486__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_487__new_weights (optional) - T2: New weights

  • __group_487__new_gradients (optional) - T3: New gradients

  • __group_487__new_moment_1 (optional) - T4: New averaged gradients

  • __group_487__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_487__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_488__new_weights (optional) - T2: New weights

  • __group_488__new_gradients (optional) - T3: New gradients

  • __group_488__new_moment_1 (optional) - T4: New averaged gradients

  • __group_488__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_488__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_489__new_weights (optional) - T2: New weights

  • __group_489__new_gradients (optional) - T3: New gradients

  • __group_489__new_moment_1 (optional) - T4: New averaged gradients

  • __group_489__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_489__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_490__new_weights (optional) - T2: New weights

  • __group_490__new_gradients (optional) - T3: New gradients

  • __group_490__new_moment_1 (optional) - T4: New averaged gradients

  • __group_490__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_490__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_491__new_weights (optional) - T2: New weights

  • __group_491__new_gradients (optional) - T3: New gradients

  • __group_491__new_moment_1 (optional) - T4: New averaged gradients

  • __group_491__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_491__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_492__new_weights (optional) - T2: New weights

  • __group_492__new_gradients (optional) - T3: New gradients

  • __group_492__new_moment_1 (optional) - T4: New averaged gradients

  • __group_492__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_492__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_493__new_weights (optional) - T2: New weights

  • __group_493__new_gradients (optional) - T3: New gradients

  • __group_493__new_moment_1 (optional) - T4: New averaged gradients

  • __group_493__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_493__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_494__new_weights (optional) - T2: New weights

  • __group_494__new_gradients (optional) - T3: New gradients

  • __group_494__new_moment_1 (optional) - T4: New averaged gradients

  • __group_494__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_494__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_495__new_weights (optional) - T2: New weights

  • __group_495__new_gradients (optional) - T3: New gradients

  • __group_495__new_moment_1 (optional) - T4: New averaged gradients

  • __group_495__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_495__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_496__new_weights (optional) - T2: New weights

  • __group_496__new_gradients (optional) - T3: New gradients

  • __group_496__new_moment_1 (optional) - T4: New averaged gradients

  • __group_496__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_496__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_497__new_weights (optional) - T2: New weights

  • __group_497__new_gradients (optional) - T3: New gradients

  • __group_497__new_moment_1 (optional) - T4: New averaged gradients

  • __group_497__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_497__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_498__new_weights (optional) - T2: New weights

  • __group_498__new_gradients (optional) - T3: New gradients

  • __group_498__new_moment_1 (optional) - T4: New averaged gradients

  • __group_498__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_498__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_499__new_weights (optional) - T2: New weights

  • __group_499__new_gradients (optional) - T3: New gradients

  • __group_499__new_moment_1 (optional) - T4: New averaged gradients

  • __group_499__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_499__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_500__new_weights (optional) - T2: New weights

  • __group_500__new_gradients (optional) - T3: New gradients

  • __group_500__new_moment_1 (optional) - T4: New averaged gradients

  • __group_500__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_500__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_501__new_weights (optional) - T2: New weights

  • __group_501__new_gradients (optional) - T3: New gradients

  • __group_501__new_moment_1 (optional) - T4: New averaged gradients

  • __group_501__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_501__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_502__new_weights (optional) - T2: New weights

  • __group_502__new_gradients (optional) - T3: New gradients

  • __group_502__new_moment_1 (optional) - T4: New averaged gradients

  • __group_502__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_502__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_503__new_weights (optional) - T2: New weights

  • __group_503__new_gradients (optional) - T3: New gradients

  • __group_503__new_moment_1 (optional) - T4: New averaged gradients

  • __group_503__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_503__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_504__new_weights (optional) - T2: New weights

  • __group_504__new_gradients (optional) - T3: New gradients

  • __group_504__new_moment_1 (optional) - T4: New averaged gradients

  • __group_504__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_504__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_505__new_weights (optional) - T2: New weights

  • __group_505__new_gradients (optional) - T3: New gradients

  • __group_505__new_moment_1 (optional) - T4: New averaged gradients

  • __group_505__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_505__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_506__new_weights (optional) - T2: New weights

  • __group_506__new_gradients (optional) - T3: New gradients

  • __group_506__new_moment_1 (optional) - T4: New averaged gradients

  • __group_506__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_506__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_507__new_weights (optional) - T2: New weights

  • __group_507__new_gradients (optional) - T3: New gradients

  • __group_507__new_moment_1 (optional) - T4: New averaged gradients

  • __group_507__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_507__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_508__new_weights (optional) - T2: New weights

  • __group_508__new_gradients (optional) - T3: New gradients

  • __group_508__new_moment_1 (optional) - T4: New averaged gradients

  • __group_508__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_508__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_509__new_weights (optional) - T2: New weights

  • __group_509__new_gradients (optional) - T3: New gradients

  • __group_509__new_moment_1 (optional) - T4: New averaged gradients

  • __group_509__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_509__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_510__new_weights (optional) - T2: New weights

  • __group_510__new_gradients (optional) - T3: New gradients

  • __group_510__new_moment_1 (optional) - T4: New averaged gradients

  • __group_510__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_510__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_511__new_weights (optional) - T2: New weights

  • __group_511__new_gradients (optional) - T3: New gradients

  • __group_511__new_moment_1 (optional) - T4: New averaged gradients

  • __group_511__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_511__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_512__new_weights (optional) - T2: New weights

  • __group_512__new_gradients (optional) - T3: New gradients

  • __group_512__new_moment_1 (optional) - T4: New averaged gradients

  • __group_512__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_512__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_513__new_weights (optional) - T2: New weights

  • __group_513__new_gradients (optional) - T3: New gradients

  • __group_513__new_moment_1 (optional) - T4: New averaged gradients

  • __group_513__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_513__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_514__new_weights (optional) - T2: New weights

  • __group_514__new_gradients (optional) - T3: New gradients

  • __group_514__new_moment_1 (optional) - T4: New averaged gradients

  • __group_514__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_514__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_515__new_weights (optional) - T2: New weights

  • __group_515__new_gradients (optional) - T3: New gradients

  • __group_515__new_moment_1 (optional) - T4: New averaged gradients

  • __group_515__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_515__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_516__new_weights (optional) - T2: New weights

  • __group_516__new_gradients (optional) - T3: New gradients

  • __group_516__new_moment_1 (optional) - T4: New averaged gradients

  • __group_516__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_516__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_517__new_weights (optional) - T2: New weights

  • __group_517__new_gradients (optional) - T3: New gradients

  • __group_517__new_moment_1 (optional) - T4: New averaged gradients

  • __group_517__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_517__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_518__new_weights (optional) - T2: New weights

  • __group_518__new_gradients (optional) - T3: New gradients

  • __group_518__new_moment_1 (optional) - T4: New averaged gradients

  • __group_518__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_518__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_519__new_weights (optional) - T2: New weights

  • __group_519__new_gradients (optional) - T3: New gradients

  • __group_519__new_moment_1 (optional) - T4: New averaged gradients

  • __group_519__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_519__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_520__new_weights (optional) - T2: New weights

  • __group_520__new_gradients (optional) - T3: New gradients

  • __group_520__new_moment_1 (optional) - T4: New averaged gradients

  • __group_520__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_520__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_521__new_weights (optional) - T2: New weights

  • __group_521__new_gradients (optional) - T3: New gradients

  • __group_521__new_moment_1 (optional) - T4: New averaged gradients

  • __group_521__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_521__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_522__new_weights (optional) - T2: New weights

  • __group_522__new_gradients (optional) - T3: New gradients

  • __group_522__new_moment_1 (optional) - T4: New averaged gradients

  • __group_522__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_522__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_523__new_weights (optional) - T2: New weights

  • __group_523__new_gradients (optional) - T3: New gradients

  • __group_523__new_moment_1 (optional) - T4: New averaged gradients

  • __group_523__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_523__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_524__new_weights (optional) - T2: New weights

  • __group_524__new_gradients (optional) - T3: New gradients

  • __group_524__new_moment_1 (optional) - T4: New averaged gradients

  • __group_524__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_524__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_525__new_weights (optional) - T2: New weights

  • __group_525__new_gradients (optional) - T3: New gradients

  • __group_525__new_moment_1 (optional) - T4: New averaged gradients

  • __group_525__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_525__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_526__new_weights (optional) - T2: New weights

  • __group_526__new_gradients (optional) - T3: New gradients

  • __group_526__new_moment_1 (optional) - T4: New averaged gradients

  • __group_526__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_526__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_527__new_weights (optional) - T2: New weights

  • __group_527__new_gradients (optional) - T3: New gradients

  • __group_527__new_moment_1 (optional) - T4: New averaged gradients

  • __group_527__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_527__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_528__new_weights (optional) - T2: New weights

  • __group_528__new_gradients (optional) - T3: New gradients

  • __group_528__new_moment_1 (optional) - T4: New averaged gradients

  • __group_528__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_528__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_529__new_weights (optional) - T2: New weights

  • __group_529__new_gradients (optional) - T3: New gradients

  • __group_529__new_moment_1 (optional) - T4: New averaged gradients

  • __group_529__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_529__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_530__new_weights (optional) - T2: New weights

  • __group_530__new_gradients (optional) - T3: New gradients

  • __group_530__new_moment_1 (optional) - T4: New averaged gradients

  • __group_530__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_530__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_531__new_weights (optional) - T2: New weights

  • __group_531__new_gradients (optional) - T3: New gradients

  • __group_531__new_moment_1 (optional) - T4: New averaged gradients

  • __group_531__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_531__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_532__new_weights (optional) - T2: New weights

  • __group_532__new_gradients (optional) - T3: New gradients

  • __group_532__new_moment_1 (optional) - T4: New averaged gradients

  • __group_532__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_532__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_533__new_weights (optional) - T2: New weights

  • __group_533__new_gradients (optional) - T3: New gradients

  • __group_533__new_moment_1 (optional) - T4: New averaged gradients

  • __group_533__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_533__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_534__new_weights (optional) - T2: New weights

  • __group_534__new_gradients (optional) - T3: New gradients

  • __group_534__new_moment_1 (optional) - T4: New averaged gradients

  • __group_534__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_534__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_535__new_weights (optional) - T2: New weights

  • __group_535__new_gradients (optional) - T3: New gradients

  • __group_535__new_moment_1 (optional) - T4: New averaged gradients

  • __group_535__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_535__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_536__new_weights (optional) - T2: New weights

  • __group_536__new_gradients (optional) - T3: New gradients

  • __group_536__new_moment_1 (optional) - T4: New averaged gradients

  • __group_536__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_536__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_537__new_weights (optional) - T2: New weights

  • __group_537__new_gradients (optional) - T3: New gradients

  • __group_537__new_moment_1 (optional) - T4: New averaged gradients

  • __group_537__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_537__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_538__new_weights (optional) - T2: New weights

  • __group_538__new_gradients (optional) - T3: New gradients

  • __group_538__new_moment_1 (optional) - T4: New averaged gradients

  • __group_538__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_538__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_539__new_weights (optional) - T2: New weights

  • __group_539__new_gradients (optional) - T3: New gradients

  • __group_539__new_moment_1 (optional) - T4: New averaged gradients

  • __group_539__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_539__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_540__new_weights (optional) - T2: New weights

  • __group_540__new_gradients (optional) - T3: New gradients

  • __group_540__new_moment_1 (optional) - T4: New averaged gradients

  • __group_540__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_540__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_541__new_weights (optional) - T2: New weights

  • __group_541__new_gradients (optional) - T3: New gradients

  • __group_541__new_moment_1 (optional) - T4: New averaged gradients

  • __group_541__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_541__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_542__new_weights (optional) - T2: New weights

  • __group_542__new_gradients (optional) - T3: New gradients

  • __group_542__new_moment_1 (optional) - T4: New averaged gradients

  • __group_542__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_542__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_543__new_weights (optional) - T2: New weights

  • __group_543__new_gradients (optional) - T3: New gradients

  • __group_543__new_moment_1 (optional) - T4: New averaged gradients

  • __group_543__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_543__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_544__new_weights (optional) - T2: New weights

  • __group_544__new_gradients (optional) - T3: New gradients

  • __group_544__new_moment_1 (optional) - T4: New averaged gradients

  • __group_544__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_544__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_545__new_weights (optional) - T2: New weights

  • __group_545__new_gradients (optional) - T3: New gradients

  • __group_545__new_moment_1 (optional) - T4: New averaged gradients

  • __group_545__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_545__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_546__new_weights (optional) - T2: New weights

  • __group_546__new_gradients (optional) - T3: New gradients

  • __group_546__new_moment_1 (optional) - T4: New averaged gradients

  • __group_546__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_546__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_547__new_weights (optional) - T2: New weights

  • __group_547__new_gradients (optional) - T3: New gradients

  • __group_547__new_moment_1 (optional) - T4: New averaged gradients

  • __group_547__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_547__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_548__new_weights (optional) - T2: New weights

  • __group_548__new_gradients (optional) - T3: New gradients

  • __group_548__new_moment_1 (optional) - T4: New averaged gradients

  • __group_548__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_548__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_549__new_weights (optional) - T2: New weights

  • __group_549__new_gradients (optional) - T3: New gradients

  • __group_549__new_moment_1 (optional) - T4: New averaged gradients

  • __group_549__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_549__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_550__new_weights (optional) - T2: New weights

  • __group_550__new_gradients (optional) - T3: New gradients

  • __group_550__new_moment_1 (optional) - T4: New averaged gradients

  • __group_550__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_550__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_551__new_weights (optional) - T2: New weights

  • __group_551__new_gradients (optional) - T3: New gradients

  • __group_551__new_moment_1 (optional) - T4: New averaged gradients

  • __group_551__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_551__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_552__new_weights (optional) - T2: New weights

  • __group_552__new_gradients (optional) - T3: New gradients

  • __group_552__new_moment_1 (optional) - T4: New averaged gradients

  • __group_552__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_552__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_553__new_weights (optional) - T2: New weights

  • __group_553__new_gradients (optional) - T3: New gradients

  • __group_553__new_moment_1 (optional) - T4: New averaged gradients

  • __group_553__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_553__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_554__new_weights (optional) - T2: New weights

  • __group_554__new_gradients (optional) - T3: New gradients

  • __group_554__new_moment_1 (optional) - T4: New averaged gradients

  • __group_554__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_554__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_555__new_weights (optional) - T2: New weights

  • __group_555__new_gradients (optional) - T3: New gradients

  • __group_555__new_moment_1 (optional) - T4: New averaged gradients

  • __group_555__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_555__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_556__new_weights (optional) - T2: New weights

  • __group_556__new_gradients (optional) - T3: New gradients

  • __group_556__new_moment_1 (optional) - T4: New averaged gradients

  • __group_556__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_556__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_557__new_weights (optional) - T2: New weights

  • __group_557__new_gradients (optional) - T3: New gradients

  • __group_557__new_moment_1 (optional) - T4: New averaged gradients

  • __group_557__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_557__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_558__new_weights (optional) - T2: New weights

  • __group_558__new_gradients (optional) - T3: New gradients

  • __group_558__new_moment_1 (optional) - T4: New averaged gradients

  • __group_558__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_558__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_559__new_weights (optional) - T2: New weights

  • __group_559__new_gradients (optional) - T3: New gradients

  • __group_559__new_moment_1 (optional) - T4: New averaged gradients

  • __group_559__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_559__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_560__new_weights (optional) - T2: New weights

  • __group_560__new_gradients (optional) - T3: New gradients

  • __group_560__new_moment_1 (optional) - T4: New averaged gradients

  • __group_560__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_560__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_561__new_weights (optional) - T2: New weights

  • __group_561__new_gradients (optional) - T3: New gradients

  • __group_561__new_moment_1 (optional) - T4: New averaged gradients

  • __group_561__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_561__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_562__new_weights (optional) - T2: New weights

  • __group_562__new_gradients (optional) - T3: New gradients

  • __group_562__new_moment_1 (optional) - T4: New averaged gradients

  • __group_562__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_562__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_563__new_weights (optional) - T2: New weights

  • __group_563__new_gradients (optional) - T3: New gradients

  • __group_563__new_moment_1 (optional) - T4: New averaged gradients

  • __group_563__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_563__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_564__new_weights (optional) - T2: New weights

  • __group_564__new_gradients (optional) - T3: New gradients

  • __group_564__new_moment_1 (optional) - T4: New averaged gradients

  • __group_564__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_564__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_565__new_weights (optional) - T2: New weights

  • __group_565__new_gradients (optional) - T3: New gradients

  • __group_565__new_moment_1 (optional) - T4: New averaged gradients

  • __group_565__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_565__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_566__new_weights (optional) - T2: New weights

  • __group_566__new_gradients (optional) - T3: New gradients

  • __group_566__new_moment_1 (optional) - T4: New averaged gradients

  • __group_566__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_566__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_567__new_weights (optional) - T2: New weights

  • __group_567__new_gradients (optional) - T3: New gradients

  • __group_567__new_moment_1 (optional) - T4: New averaged gradients

  • __group_567__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_567__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_568__new_weights (optional) - T2: New weights

  • __group_568__new_gradients (optional) - T3: New gradients

  • __group_568__new_moment_1 (optional) - T4: New averaged gradients

  • __group_568__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_568__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_569__new_weights (optional) - T2: New weights

  • __group_569__new_gradients (optional) - T3: New gradients

  • __group_569__new_moment_1 (optional) - T4: New averaged gradients

  • __group_569__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_569__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_570__new_weights (optional) - T2: New weights

  • __group_570__new_gradients (optional) - T3: New gradients

  • __group_570__new_moment_1 (optional) - T4: New averaged gradients

  • __group_570__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_570__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_571__new_weights (optional) - T2: New weights

  • __group_571__new_gradients (optional) - T3: New gradients

  • __group_571__new_moment_1 (optional) - T4: New averaged gradients

  • __group_571__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_571__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_572__new_weights (optional) - T2: New weights

  • __group_572__new_gradients (optional) - T3: New gradients

  • __group_572__new_moment_1 (optional) - T4: New averaged gradients

  • __group_572__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_572__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_573__new_weights (optional) - T2: New weights

  • __group_573__new_gradients (optional) - T3: New gradients

  • __group_573__new_moment_1 (optional) - T4: New averaged gradients

  • __group_573__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_573__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_574__new_weights (optional) - T2: New weights

  • __group_574__new_gradients (optional) - T3: New gradients

  • __group_574__new_moment_1 (optional) - T4: New averaged gradients

  • __group_574__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_574__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_575__new_weights (optional) - T2: New weights

  • __group_575__new_gradients (optional) - T3: New gradients

  • __group_575__new_moment_1 (optional) - T4: New averaged gradients

  • __group_575__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_575__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_576__new_weights (optional) - T2: New weights

  • __group_576__new_gradients (optional) - T3: New gradients

  • __group_576__new_moment_1 (optional) - T4: New averaged gradients

  • __group_576__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_576__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_577__new_weights (optional) - T2: New weights

  • __group_577__new_gradients (optional) - T3: New gradients

  • __group_577__new_moment_1 (optional) - T4: New averaged gradients

  • __group_577__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_577__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_578__new_weights (optional) - T2: New weights

  • __group_578__new_gradients (optional) - T3: New gradients

  • __group_578__new_moment_1 (optional) - T4: New averaged gradients

  • __group_578__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_578__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_579__new_weights (optional) - T2: New weights

  • __group_579__new_gradients (optional) - T3: New gradients

  • __group_579__new_moment_1 (optional) - T4: New averaged gradients

  • __group_579__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_579__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_580__new_weights (optional) - T2: New weights

  • __group_580__new_gradients (optional) - T3: New gradients

  • __group_580__new_moment_1 (optional) - T4: New averaged gradients

  • __group_580__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_580__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_581__new_weights (optional) - T2: New weights

  • __group_581__new_gradients (optional) - T3: New gradients

  • __group_581__new_moment_1 (optional) - T4: New averaged gradients

  • __group_581__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_581__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_582__new_weights (optional) - T2: New weights

  • __group_582__new_gradients (optional) - T3: New gradients

  • __group_582__new_moment_1 (optional) - T4: New averaged gradients

  • __group_582__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_582__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_583__new_weights (optional) - T2: New weights

  • __group_583__new_gradients (optional) - T3: New gradients

  • __group_583__new_moment_1 (optional) - T4: New averaged gradients

  • __group_583__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_583__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_584__new_weights (optional) - T2: New weights

  • __group_584__new_gradients (optional) - T3: New gradients

  • __group_584__new_moment_1 (optional) - T4: New averaged gradients

  • __group_584__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_584__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_585__new_weights (optional) - T2: New weights

  • __group_585__new_gradients (optional) - T3: New gradients

  • __group_585__new_moment_1 (optional) - T4: New averaged gradients

  • __group_585__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_585__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_586__new_weights (optional) - T2: New weights

  • __group_586__new_gradients (optional) - T3: New gradients

  • __group_586__new_moment_1 (optional) - T4: New averaged gradients

  • __group_586__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_586__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_587__new_weights (optional) - T2: New weights

  • __group_587__new_gradients (optional) - T3: New gradients

  • __group_587__new_moment_1 (optional) - T4: New averaged gradients

  • __group_587__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_587__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_588__new_weights (optional) - T2: New weights

  • __group_588__new_gradients (optional) - T3: New gradients

  • __group_588__new_moment_1 (optional) - T4: New averaged gradients

  • __group_588__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_588__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_589__new_weights (optional) - T2: New weights

  • __group_589__new_gradients (optional) - T3: New gradients

  • __group_589__new_moment_1 (optional) - T4: New averaged gradients

  • __group_589__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_589__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_590__new_weights (optional) - T2: New weights

  • __group_590__new_gradients (optional) - T3: New gradients

  • __group_590__new_moment_1 (optional) - T4: New averaged gradients

  • __group_590__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_590__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_591__new_weights (optional) - T2: New weights

  • __group_591__new_gradients (optional) - T3: New gradients

  • __group_591__new_moment_1 (optional) - T4: New averaged gradients

  • __group_591__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_591__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_592__new_weights (optional) - T2: New weights

  • __group_592__new_gradients (optional) - T3: New gradients

  • __group_592__new_moment_1 (optional) - T4: New averaged gradients

  • __group_592__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_592__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_593__new_weights (optional) - T2: New weights

  • __group_593__new_gradients (optional) - T3: New gradients

  • __group_593__new_moment_1 (optional) - T4: New averaged gradients

  • __group_593__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_593__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_594__new_weights (optional) - T2: New weights

  • __group_594__new_gradients (optional) - T3: New gradients

  • __group_594__new_moment_1 (optional) - T4: New averaged gradients

  • __group_594__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_594__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_595__new_weights (optional) - T2: New weights

  • __group_595__new_gradients (optional) - T3: New gradients

  • __group_595__new_moment_1 (optional) - T4: New averaged gradients

  • __group_595__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_595__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_596__new_weights (optional) - T2: New weights

  • __group_596__new_gradients (optional) - T3: New gradients

  • __group_596__new_moment_1 (optional) - T4: New averaged gradients

  • __group_596__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_596__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_597__new_weights (optional) - T2: New weights

  • __group_597__new_gradients (optional) - T3: New gradients

  • __group_597__new_moment_1 (optional) - T4: New averaged gradients

  • __group_597__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_597__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_598__new_weights (optional) - T2: New weights

  • __group_598__new_gradients (optional) - T3: New gradients

  • __group_598__new_moment_1 (optional) - T4: New averaged gradients

  • __group_598__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_598__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_599__new_weights (optional) - T2: New weights

  • __group_599__new_gradients (optional) - T3: New gradients

  • __group_599__new_moment_1 (optional) - T4: New averaged gradients

  • __group_599__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_599__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_600__new_weights (optional) - T2: New weights

  • __group_600__new_gradients (optional) - T3: New gradients

  • __group_600__new_moment_1 (optional) - T4: New averaged gradients

  • __group_600__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_600__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_601__new_weights (optional) - T2: New weights

  • __group_601__new_gradients (optional) - T3: New gradients

  • __group_601__new_moment_1 (optional) - T4: New averaged gradients

  • __group_601__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_601__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_602__new_weights (optional) - T2: New weights

  • __group_602__new_gradients (optional) - T3: New gradients

  • __group_602__new_moment_1 (optional) - T4: New averaged gradients

  • __group_602__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_602__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_603__new_weights (optional) - T2: New weights

  • __group_603__new_gradients (optional) - T3: New gradients

  • __group_603__new_moment_1 (optional) - T4: New averaged gradients

  • __group_603__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_603__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_604__new_weights (optional) - T2: New weights

  • __group_604__new_gradients (optional) - T3: New gradients

  • __group_604__new_moment_1 (optional) - T4: New averaged gradients

  • __group_604__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_604__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_605__new_weights (optional) - T2: New weights

  • __group_605__new_gradients (optional) - T3: New gradients

  • __group_605__new_moment_1 (optional) - T4: New averaged gradients

  • __group_605__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_605__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_606__new_weights (optional) - T2: New weights

  • __group_606__new_gradients (optional) - T3: New gradients

  • __group_606__new_moment_1 (optional) - T4: New averaged gradients

  • __group_606__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_606__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_607__new_weights (optional) - T2: New weights

  • __group_607__new_gradients (optional) - T3: New gradients

  • __group_607__new_moment_1 (optional) - T4: New averaged gradients

  • __group_607__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_607__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_608__new_weights (optional) - T2: New weights

  • __group_608__new_gradients (optional) - T3: New gradients

  • __group_608__new_moment_1 (optional) - T4: New averaged gradients

  • __group_608__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_608__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_609__new_weights (optional) - T2: New weights

  • __group_609__new_gradients (optional) - T3: New gradients

  • __group_609__new_moment_1 (optional) - T4: New averaged gradients

  • __group_609__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_609__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_610__new_weights (optional) - T2: New weights

  • __group_610__new_gradients (optional) - T3: New gradients

  • __group_610__new_moment_1 (optional) - T4: New averaged gradients

  • __group_610__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_610__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_611__new_weights (optional) - T2: New weights

  • __group_611__new_gradients (optional) - T3: New gradients

  • __group_611__new_moment_1 (optional) - T4: New averaged gradients

  • __group_611__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_611__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_612__new_weights (optional) - T2: New weights

  • __group_612__new_gradients (optional) - T3: New gradients

  • __group_612__new_moment_1 (optional) - T4: New averaged gradients

  • __group_612__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_612__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_613__new_weights (optional) - T2: New weights

  • __group_613__new_gradients (optional) - T3: New gradients

  • __group_613__new_moment_1 (optional) - T4: New averaged gradients

  • __group_613__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_613__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_614__new_weights (optional) - T2: New weights

  • __group_614__new_gradients (optional) - T3: New gradients

  • __group_614__new_moment_1 (optional) - T4: New averaged gradients

  • __group_614__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_614__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_615__new_weights (optional) - T2: New weights

  • __group_615__new_gradients (optional) - T3: New gradients

  • __group_615__new_moment_1 (optional) - T4: New averaged gradients

  • __group_615__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_615__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_616__new_weights (optional) - T2: New weights

  • __group_616__new_gradients (optional) - T3: New gradients

  • __group_616__new_moment_1 (optional) - T4: New averaged gradients

  • __group_616__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_616__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_617__new_weights (optional) - T2: New weights

  • __group_617__new_gradients (optional) - T3: New gradients

  • __group_617__new_moment_1 (optional) - T4: New averaged gradients

  • __group_617__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_617__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_618__new_weights (optional) - T2: New weights

  • __group_618__new_gradients (optional) - T3: New gradients

  • __group_618__new_moment_1 (optional) - T4: New averaged gradients

  • __group_618__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_618__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_619__new_weights (optional) - T2: New weights

  • __group_619__new_gradients (optional) - T3: New gradients

  • __group_619__new_moment_1 (optional) - T4: New averaged gradients

  • __group_619__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_619__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_620__new_weights (optional) - T2: New weights

  • __group_620__new_gradients (optional) - T3: New gradients

  • __group_620__new_moment_1 (optional) - T4: New averaged gradients

  • __group_620__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_620__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_621__new_weights (optional) - T2: New weights

  • __group_621__new_gradients (optional) - T3: New gradients

  • __group_621__new_moment_1 (optional) - T4: New averaged gradients

  • __group_621__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_621__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_622__new_weights (optional) - T2: New weights

  • __group_622__new_gradients (optional) - T3: New gradients

  • __group_622__new_moment_1 (optional) - T4: New averaged gradients

  • __group_622__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_622__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_623__new_weights (optional) - T2: New weights

  • __group_623__new_gradients (optional) - T3: New gradients

  • __group_623__new_moment_1 (optional) - T4: New averaged gradients

  • __group_623__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_623__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_624__new_weights (optional) - T2: New weights

  • __group_624__new_gradients (optional) - T3: New gradients

  • __group_624__new_moment_1 (optional) - T4: New averaged gradients

  • __group_624__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_624__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_625__new_weights (optional) - T2: New weights

  • __group_625__new_gradients (optional) - T3: New gradients

  • __group_625__new_moment_1 (optional) - T4: New averaged gradients

  • __group_625__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_625__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_626__new_weights (optional) - T2: New weights

  • __group_626__new_gradients (optional) - T3: New gradients

  • __group_626__new_moment_1 (optional) - T4: New averaged gradients

  • __group_626__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_626__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_627__new_weights (optional) - T2: New weights

  • __group_627__new_gradients (optional) - T3: New gradients

  • __group_627__new_moment_1 (optional) - T4: New averaged gradients

  • __group_627__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_627__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_628__new_weights (optional) - T2: New weights

  • __group_628__new_gradients (optional) - T3: New gradients

  • __group_628__new_moment_1 (optional) - T4: New averaged gradients

  • __group_628__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_628__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_629__new_weights (optional) - T2: New weights

  • __group_629__new_gradients (optional) - T3: New gradients

  • __group_629__new_moment_1 (optional) - T4: New averaged gradients

  • __group_629__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_629__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_630__new_weights (optional) - T2: New weights

  • __group_630__new_gradients (optional) - T3: New gradients

  • __group_630__new_moment_1 (optional) - T4: New averaged gradients

  • __group_630__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_630__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_631__new_weights (optional) - T2: New weights

  • __group_631__new_gradients (optional) - T3: New gradients

  • __group_631__new_moment_1 (optional) - T4: New averaged gradients

  • __group_631__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_631__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_632__new_weights (optional) - T2: New weights

  • __group_632__new_gradients (optional) - T3: New gradients

  • __group_632__new_moment_1 (optional) - T4: New averaged gradients

  • __group_632__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_632__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_633__new_weights (optional) - T2: New weights

  • __group_633__new_gradients (optional) - T3: New gradients

  • __group_633__new_moment_1 (optional) - T4: New averaged gradients

  • __group_633__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_633__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_634__new_weights (optional) - T2: New weights

  • __group_634__new_gradients (optional) - T3: New gradients

  • __group_634__new_moment_1 (optional) - T4: New averaged gradients

  • __group_634__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_634__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_635__new_weights (optional) - T2: New weights

  • __group_635__new_gradients (optional) - T3: New gradients

  • __group_635__new_moment_1 (optional) - T4: New averaged gradients

  • __group_635__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_635__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_636__new_weights (optional) - T2: New weights

  • __group_636__new_gradients (optional) - T3: New gradients

  • __group_636__new_moment_1 (optional) - T4: New averaged gradients

  • __group_636__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_636__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_637__new_weights (optional) - T2: New weights

  • __group_637__new_gradients (optional) - T3: New gradients

  • __group_637__new_moment_1 (optional) - T4: New averaged gradients

  • __group_637__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_637__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_638__new_weights (optional) - T2: New weights

  • __group_638__new_gradients (optional) - T3: New gradients

  • __group_638__new_moment_1 (optional) - T4: New averaged gradients

  • __group_638__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_638__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_639__new_weights (optional) - T2: New weights

  • __group_639__new_gradients (optional) - T3: New gradients

  • __group_639__new_moment_1 (optional) - T4: New averaged gradients

  • __group_639__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_639__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_640__new_weights (optional) - T2: New weights

  • __group_640__new_gradients (optional) - T3: New gradients

  • __group_640__new_moment_1 (optional) - T4: New averaged gradients

  • __group_640__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_640__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_641__new_weights (optional) - T2: New weights

  • __group_641__new_gradients (optional) - T3: New gradients

  • __group_641__new_moment_1 (optional) - T4: New averaged gradients

  • __group_641__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_641__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_642__new_weights (optional) - T2: New weights

  • __group_642__new_gradients (optional) - T3: New gradients

  • __group_642__new_moment_1 (optional) - T4: New averaged gradients

  • __group_642__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_642__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_643__new_weights (optional) - T2: New weights

  • __group_643__new_gradients (optional) - T3: New gradients

  • __group_643__new_moment_1 (optional) - T4: New averaged gradients

  • __group_643__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_643__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_644__new_weights (optional) - T2: New weights

  • __group_644__new_gradients (optional) - T3: New gradients

  • __group_644__new_moment_1 (optional) - T4: New averaged gradients

  • __group_644__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_644__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_645__new_weights (optional) - T2: New weights

  • __group_645__new_gradients (optional) - T3: New gradients

  • __group_645__new_moment_1 (optional) - T4: New averaged gradients

  • __group_645__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_645__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_646__new_weights (optional) - T2: New weights

  • __group_646__new_gradients (optional) - T3: New gradients

  • __group_646__new_moment_1 (optional) - T4: New averaged gradients

  • __group_646__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_646__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_647__new_weights (optional) - T2: New weights

  • __group_647__new_gradients (optional) - T3: New gradients

  • __group_647__new_moment_1 (optional) - T4: New averaged gradients

  • __group_647__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_647__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_648__new_weights (optional) - T2: New weights

  • __group_648__new_gradients (optional) - T3: New gradients

  • __group_648__new_moment_1 (optional) - T4: New averaged gradients

  • __group_648__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_648__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_649__new_weights (optional) - T2: New weights

  • __group_649__new_gradients (optional) - T3: New gradients

  • __group_649__new_moment_1 (optional) - T4: New averaged gradients

  • __group_649__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_649__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_650__new_weights (optional) - T2: New weights

  • __group_650__new_gradients (optional) - T3: New gradients

  • __group_650__new_moment_1 (optional) - T4: New averaged gradients

  • __group_650__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_650__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_651__new_weights (optional) - T2: New weights

  • __group_651__new_gradients (optional) - T3: New gradients

  • __group_651__new_moment_1 (optional) - T4: New averaged gradients

  • __group_651__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_651__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_652__new_weights (optional) - T2: New weights

  • __group_652__new_gradients (optional) - T3: New gradients

  • __group_652__new_moment_1 (optional) - T4: New averaged gradients

  • __group_652__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_652__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_653__new_weights (optional) - T2: New weights

  • __group_653__new_gradients (optional) - T3: New gradients

  • __group_653__new_moment_1 (optional) - T4: New averaged gradients

  • __group_653__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_653__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_654__new_weights (optional) - T2: New weights

  • __group_654__new_gradients (optional) - T3: New gradients

  • __group_654__new_moment_1 (optional) - T4: New averaged gradients

  • __group_654__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_654__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_655__new_weights (optional) - T2: New weights

  • __group_655__new_gradients (optional) - T3: New gradients

  • __group_655__new_moment_1 (optional) - T4: New averaged gradients

  • __group_655__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_655__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_656__new_weights (optional) - T2: New weights

  • __group_656__new_gradients (optional) - T3: New gradients

  • __group_656__new_moment_1 (optional) - T4: New averaged gradients

  • __group_656__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_656__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_657__new_weights (optional) - T2: New weights

  • __group_657__new_gradients (optional) - T3: New gradients

  • __group_657__new_moment_1 (optional) - T4: New averaged gradients

  • __group_657__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_657__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_658__new_weights (optional) - T2: New weights

  • __group_658__new_gradients (optional) - T3: New gradients

  • __group_658__new_moment_1 (optional) - T4: New averaged gradients

  • __group_658__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_658__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_659__new_weights (optional) - T2: New weights

  • __group_659__new_gradients (optional) - T3: New gradients

  • __group_659__new_moment_1 (optional) - T4: New averaged gradients

  • __group_659__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_659__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_660__new_weights (optional) - T2: New weights

  • __group_660__new_gradients (optional) - T3: New gradients

  • __group_660__new_moment_1 (optional) - T4: New averaged gradients

  • __group_660__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_660__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_661__new_weights (optional) - T2: New weights

  • __group_661__new_gradients (optional) - T3: New gradients

  • __group_661__new_moment_1 (optional) - T4: New averaged gradients

  • __group_661__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_661__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_662__new_weights (optional) - T2: New weights

  • __group_662__new_gradients (optional) - T3: New gradients

  • __group_662__new_moment_1 (optional) - T4: New averaged gradients

  • __group_662__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_662__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_663__new_weights (optional) - T2: New weights

  • __group_663__new_gradients (optional) - T3: New gradients

  • __group_663__new_moment_1 (optional) - T4: New averaged gradients

  • __group_663__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_663__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_664__new_weights (optional) - T2: New weights

  • __group_664__new_gradients (optional) - T3: New gradients

  • __group_664__new_moment_1 (optional) - T4: New averaged gradients

  • __group_664__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_664__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_665__new_weights (optional) - T2: New weights

  • __group_665__new_gradients (optional) - T3: New gradients

  • __group_665__new_moment_1 (optional) - T4: New averaged gradients

  • __group_665__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_665__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_666__new_weights (optional) - T2: New weights

  • __group_666__new_gradients (optional) - T3: New gradients

  • __group_666__new_moment_1 (optional) - T4: New averaged gradients

  • __group_666__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_666__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_667__new_weights (optional) - T2: New weights

  • __group_667__new_gradients (optional) - T3: New gradients

  • __group_667__new_moment_1 (optional) - T4: New averaged gradients

  • __group_667__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_667__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_668__new_weights (optional) - T2: New weights

  • __group_668__new_gradients (optional) - T3: New gradients

  • __group_668__new_moment_1 (optional) - T4: New averaged gradients

  • __group_668__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_668__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_669__new_weights (optional) - T2: New weights

  • __group_669__new_gradients (optional) - T3: New gradients

  • __group_669__new_moment_1 (optional) - T4: New averaged gradients

  • __group_669__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_669__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_670__new_weights (optional) - T2: New weights

  • __group_670__new_gradients (optional) - T3: New gradients

  • __group_670__new_moment_1 (optional) - T4: New averaged gradients

  • __group_670__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_670__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_671__new_weights (optional) - T2: New weights

  • __group_671__new_gradients (optional) - T3: New gradients

  • __group_671__new_moment_1 (optional) - T4: New averaged gradients

  • __group_671__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_671__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_672__new_weights (optional) - T2: New weights

  • __group_672__new_gradients (optional) - T3: New gradients

  • __group_672__new_moment_1 (optional) - T4: New averaged gradients

  • __group_672__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_672__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_673__new_weights (optional) - T2: New weights

  • __group_673__new_gradients (optional) - T3: New gradients

  • __group_673__new_moment_1 (optional) - T4: New averaged gradients

  • __group_673__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_673__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_674__new_weights (optional) - T2: New weights

  • __group_674__new_gradients (optional) - T3: New gradients

  • __group_674__new_moment_1 (optional) - T4: New averaged gradients

  • __group_674__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_674__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_675__new_weights (optional) - T2: New weights

  • __group_675__new_gradients (optional) - T3: New gradients

  • __group_675__new_moment_1 (optional) - T4: New averaged gradients

  • __group_675__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_675__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_676__new_weights (optional) - T2: New weights

  • __group_676__new_gradients (optional) - T3: New gradients

  • __group_676__new_moment_1 (optional) - T4: New averaged gradients

  • __group_676__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_676__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_677__new_weights (optional) - T2: New weights

  • __group_677__new_gradients (optional) - T3: New gradients

  • __group_677__new_moment_1 (optional) - T4: New averaged gradients

  • __group_677__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_677__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_678__new_weights (optional) - T2: New weights

  • __group_678__new_gradients (optional) - T3: New gradients

  • __group_678__new_moment_1 (optional) - T4: New averaged gradients

  • __group_678__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_678__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_679__new_weights (optional) - T2: New weights

  • __group_679__new_gradients (optional) - T3: New gradients

  • __group_679__new_moment_1 (optional) - T4: New averaged gradients

  • __group_679__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_679__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_680__new_weights (optional) - T2: New weights

  • __group_680__new_gradients (optional) - T3: New gradients

  • __group_680__new_moment_1 (optional) - T4: New averaged gradients

  • __group_680__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_680__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_681__new_weights (optional) - T2: New weights

  • __group_681__new_gradients (optional) - T3: New gradients

  • __group_681__new_moment_1 (optional) - T4: New averaged gradients

  • __group_681__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_681__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_682__new_weights (optional) - T2: New weights

  • __group_682__new_gradients (optional) - T3: New gradients

  • __group_682__new_moment_1 (optional) - T4: New averaged gradients

  • __group_682__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_682__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_683__new_weights (optional) - T2: New weights

  • __group_683__new_gradients (optional) - T3: New gradients

  • __group_683__new_moment_1 (optional) - T4: New averaged gradients

  • __group_683__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_683__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_684__new_weights (optional) - T2: New weights

  • __group_684__new_gradients (optional) - T3: New gradients

  • __group_684__new_moment_1 (optional) - T4: New averaged gradients

  • __group_684__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_684__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_685__new_weights (optional) - T2: New weights

  • __group_685__new_gradients (optional) - T3: New gradients

  • __group_685__new_moment_1 (optional) - T4: New averaged gradients

  • __group_685__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_685__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_686__new_weights (optional) - T2: New weights

  • __group_686__new_gradients (optional) - T3: New gradients

  • __group_686__new_moment_1 (optional) - T4: New averaged gradients

  • __group_686__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_686__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_687__new_weights (optional) - T2: New weights

  • __group_687__new_gradients (optional) - T3: New gradients

  • __group_687__new_moment_1 (optional) - T4: New averaged gradients

  • __group_687__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_687__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_688__new_weights (optional) - T2: New weights

  • __group_688__new_gradients (optional) - T3: New gradients

  • __group_688__new_moment_1 (optional) - T4: New averaged gradients

  • __group_688__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_688__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_689__new_weights (optional) - T2: New weights

  • __group_689__new_gradients (optional) - T3: New gradients

  • __group_689__new_moment_1 (optional) - T4: New averaged gradients

  • __group_689__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_689__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_690__new_weights (optional) - T2: New weights

  • __group_690__new_gradients (optional) - T3: New gradients

  • __group_690__new_moment_1 (optional) - T4: New averaged gradients

  • __group_690__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_690__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_691__new_weights (optional) - T2: New weights

  • __group_691__new_gradients (optional) - T3: New gradients

  • __group_691__new_moment_1 (optional) - T4: New averaged gradients

  • __group_691__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_691__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_692__new_weights (optional) - T2: New weights

  • __group_692__new_gradients (optional) - T3: New gradients

  • __group_692__new_moment_1 (optional) - T4: New averaged gradients

  • __group_692__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_692__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_693__new_weights (optional) - T2: New weights

  • __group_693__new_gradients (optional) - T3: New gradients

  • __group_693__new_moment_1 (optional) - T4: New averaged gradients

  • __group_693__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_693__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_694__new_weights (optional) - T2: New weights

  • __group_694__new_gradients (optional) - T3: New gradients

  • __group_694__new_moment_1 (optional) - T4: New averaged gradients

  • __group_694__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_694__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_695__new_weights (optional) - T2: New weights

  • __group_695__new_gradients (optional) - T3: New gradients

  • __group_695__new_moment_1 (optional) - T4: New averaged gradients

  • __group_695__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_695__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_696__new_weights (optional) - T2: New weights

  • __group_696__new_gradients (optional) - T3: New gradients

  • __group_696__new_moment_1 (optional) - T4: New averaged gradients

  • __group_696__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_696__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_697__new_weights (optional) - T2: New weights

  • __group_697__new_gradients (optional) - T3: New gradients

  • __group_697__new_moment_1 (optional) - T4: New averaged gradients

  • __group_697__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_697__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_698__new_weights (optional) - T2: New weights

  • __group_698__new_gradients (optional) - T3: New gradients

  • __group_698__new_moment_1 (optional) - T4: New averaged gradients

  • __group_698__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_698__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_699__new_weights (optional) - T2: New weights

  • __group_699__new_gradients (optional) - T3: New gradients

  • __group_699__new_moment_1 (optional) - T4: New averaged gradients

  • __group_699__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_699__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_700__new_weights (optional) - T2: New weights

  • __group_700__new_gradients (optional) - T3: New gradients

  • __group_700__new_moment_1 (optional) - T4: New averaged gradients

  • __group_700__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_700__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_701__new_weights (optional) - T2: New weights

  • __group_701__new_gradients (optional) - T3: New gradients

  • __group_701__new_moment_1 (optional) - T4: New averaged gradients

  • __group_701__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_701__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_702__new_weights (optional) - T2: New weights

  • __group_702__new_gradients (optional) - T3: New gradients

  • __group_702__new_moment_1 (optional) - T4: New averaged gradients

  • __group_702__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_702__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_703__new_weights (optional) - T2: New weights

  • __group_703__new_gradients (optional) - T3: New gradients

  • __group_703__new_moment_1 (optional) - T4: New averaged gradients

  • __group_703__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_703__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_704__new_weights (optional) - T2: New weights

  • __group_704__new_gradients (optional) - T3: New gradients

  • __group_704__new_moment_1 (optional) - T4: New averaged gradients

  • __group_704__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_704__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_705__new_weights (optional) - T2: New weights

  • __group_705__new_gradients (optional) - T3: New gradients

  • __group_705__new_moment_1 (optional) - T4: New averaged gradients

  • __group_705__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_705__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_706__new_weights (optional) - T2: New weights

  • __group_706__new_gradients (optional) - T3: New gradients

  • __group_706__new_moment_1 (optional) - T4: New averaged gradients

  • __group_706__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_706__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_707__new_weights (optional) - T2: New weights

  • __group_707__new_gradients (optional) - T3: New gradients

  • __group_707__new_moment_1 (optional) - T4: New averaged gradients

  • __group_707__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_707__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_708__new_weights (optional) - T2: New weights

  • __group_708__new_gradients (optional) - T3: New gradients

  • __group_708__new_moment_1 (optional) - T4: New averaged gradients

  • __group_708__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_708__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_709__new_weights (optional) - T2: New weights

  • __group_709__new_gradients (optional) - T3: New gradients

  • __group_709__new_moment_1 (optional) - T4: New averaged gradients

  • __group_709__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_709__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_710__new_weights (optional) - T2: New weights

  • __group_710__new_gradients (optional) - T3: New gradients

  • __group_710__new_moment_1 (optional) - T4: New averaged gradients

  • __group_710__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_710__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_711__new_weights (optional) - T2: New weights

  • __group_711__new_gradients (optional) - T3: New gradients

  • __group_711__new_moment_1 (optional) - T4: New averaged gradients

  • __group_711__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_711__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_712__new_weights (optional) - T2: New weights

  • __group_712__new_gradients (optional) - T3: New gradients

  • __group_712__new_moment_1 (optional) - T4: New averaged gradients

  • __group_712__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_712__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_713__new_weights (optional) - T2: New weights

  • __group_713__new_gradients (optional) - T3: New gradients

  • __group_713__new_moment_1 (optional) - T4: New averaged gradients

  • __group_713__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_713__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_714__new_weights (optional) - T2: New weights

  • __group_714__new_gradients (optional) - T3: New gradients

  • __group_714__new_moment_1 (optional) - T4: New averaged gradients

  • __group_714__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_714__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_715__new_weights (optional) - T2: New weights

  • __group_715__new_gradients (optional) - T3: New gradients

  • __group_715__new_moment_1 (optional) - T4: New averaged gradients

  • __group_715__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_715__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_716__new_weights (optional) - T2: New weights

  • __group_716__new_gradients (optional) - T3: New gradients

  • __group_716__new_moment_1 (optional) - T4: New averaged gradients

  • __group_716__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_716__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_717__new_weights (optional) - T2: New weights

  • __group_717__new_gradients (optional) - T3: New gradients

  • __group_717__new_moment_1 (optional) - T4: New averaged gradients

  • __group_717__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_717__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_718__new_weights (optional) - T2: New weights

  • __group_718__new_gradients (optional) - T3: New gradients

  • __group_718__new_moment_1 (optional) - T4: New averaged gradients

  • __group_718__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_718__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_719__new_weights (optional) - T2: New weights

  • __group_719__new_gradients (optional) - T3: New gradients

  • __group_719__new_moment_1 (optional) - T4: New averaged gradients

  • __group_719__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_719__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_720__new_weights (optional) - T2: New weights

  • __group_720__new_gradients (optional) - T3: New gradients

  • __group_720__new_moment_1 (optional) - T4: New averaged gradients

  • __group_720__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_720__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_721__new_weights (optional) - T2: New weights

  • __group_721__new_gradients (optional) - T3: New gradients

  • __group_721__new_moment_1 (optional) - T4: New averaged gradients

  • __group_721__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_721__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_722__new_weights (optional) - T2: New weights

  • __group_722__new_gradients (optional) - T3: New gradients

  • __group_722__new_moment_1 (optional) - T4: New averaged gradients

  • __group_722__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_722__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_723__new_weights (optional) - T2: New weights

  • __group_723__new_gradients (optional) - T3: New gradients

  • __group_723__new_moment_1 (optional) - T4: New averaged gradients

  • __group_723__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_723__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_724__new_weights (optional) - T2: New weights

  • __group_724__new_gradients (optional) - T3: New gradients

  • __group_724__new_moment_1 (optional) - T4: New averaged gradients

  • __group_724__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_724__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_725__new_weights (optional) - T2: New weights

  • __group_725__new_gradients (optional) - T3: New gradients

  • __group_725__new_moment_1 (optional) - T4: New averaged gradients

  • __group_725__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_725__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_726__new_weights (optional) - T2: New weights

  • __group_726__new_gradients (optional) - T3: New gradients

  • __group_726__new_moment_1 (optional) - T4: New averaged gradients

  • __group_726__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_726__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_727__new_weights (optional) - T2: New weights

  • __group_727__new_gradients (optional) - T3: New gradients

  • __group_727__new_moment_1 (optional) - T4: New averaged gradients

  • __group_727__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_727__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_728__new_weights (optional) - T2: New weights

  • __group_728__new_gradients (optional) - T3: New gradients

  • __group_728__new_moment_1 (optional) - T4: New averaged gradients

  • __group_728__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_728__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_729__new_weights (optional) - T2: New weights

  • __group_729__new_gradients (optional) - T3: New gradients

  • __group_729__new_moment_1 (optional) - T4: New averaged gradients

  • __group_729__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_729__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_730__new_weights (optional) - T2: New weights

  • __group_730__new_gradients (optional) - T3: New gradients

  • __group_730__new_moment_1 (optional) - T4: New averaged gradients

  • __group_730__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_730__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_731__new_weights (optional) - T2: New weights

  • __group_731__new_gradients (optional) - T3: New gradients

  • __group_731__new_moment_1 (optional) - T4: New averaged gradients

  • __group_731__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_731__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_732__new_weights (optional) - T2: New weights

  • __group_732__new_gradients (optional) - T3: New gradients

  • __group_732__new_moment_1 (optional) - T4: New averaged gradients

  • __group_732__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_732__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_733__new_weights (optional) - T2: New weights

  • __group_733__new_gradients (optional) - T3: New gradients

  • __group_733__new_moment_1 (optional) - T4: New averaged gradients

  • __group_733__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_733__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_734__new_weights (optional) - T2: New weights

  • __group_734__new_gradients (optional) - T3: New gradients

  • __group_734__new_moment_1 (optional) - T4: New averaged gradients

  • __group_734__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_734__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_735__new_weights (optional) - T2: New weights

  • __group_735__new_gradients (optional) - T3: New gradients

  • __group_735__new_moment_1 (optional) - T4: New averaged gradients

  • __group_735__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_735__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_736__new_weights (optional) - T2: New weights

  • __group_736__new_gradients (optional) - T3: New gradients

  • __group_736__new_moment_1 (optional) - T4: New averaged gradients

  • __group_736__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_736__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_737__new_weights (optional) - T2: New weights

  • __group_737__new_gradients (optional) - T3: New gradients

  • __group_737__new_moment_1 (optional) - T4: New averaged gradients

  • __group_737__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_737__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_738__new_weights (optional) - T2: New weights

  • __group_738__new_gradients (optional) - T3: New gradients

  • __group_738__new_moment_1 (optional) - T4: New averaged gradients

  • __group_738__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_738__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_739__new_weights (optional) - T2: New weights

  • __group_739__new_gradients (optional) - T3: New gradients

  • __group_739__new_moment_1 (optional) - T4: New averaged gradients

  • __group_739__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_739__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_740__new_weights (optional) - T2: New weights

  • __group_740__new_gradients (optional) - T3: New gradients

  • __group_740__new_moment_1 (optional) - T4: New averaged gradients

  • __group_740__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_740__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_741__new_weights (optional) - T2: New weights

  • __group_741__new_gradients (optional) - T3: New gradients

  • __group_741__new_moment_1 (optional) - T4: New averaged gradients

  • __group_741__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_741__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_742__new_weights (optional) - T2: New weights

  • __group_742__new_gradients (optional) - T3: New gradients

  • __group_742__new_moment_1 (optional) - T4: New averaged gradients

  • __group_742__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_742__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_743__new_weights (optional) - T2: New weights

  • __group_743__new_gradients (optional) - T3: New gradients

  • __group_743__new_moment_1 (optional) - T4: New averaged gradients

  • __group_743__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_743__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_744__new_weights (optional) - T2: New weights

  • __group_744__new_gradients (optional) - T3: New gradients

  • __group_744__new_moment_1 (optional) - T4: New averaged gradients

  • __group_744__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_744__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_745__new_weights (optional) - T2: New weights

  • __group_745__new_gradients (optional) - T3: New gradients

  • __group_745__new_moment_1 (optional) - T4: New averaged gradients

  • __group_745__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_745__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_746__new_weights (optional) - T2: New weights

  • __group_746__new_gradients (optional) - T3: New gradients

  • __group_746__new_moment_1 (optional) - T4: New averaged gradients

  • __group_746__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_746__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_747__new_weights (optional) - T2: New weights

  • __group_747__new_gradients (optional) - T3: New gradients

  • __group_747__new_moment_1 (optional) - T4: New averaged gradients

  • __group_747__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_747__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_748__new_weights (optional) - T2: New weights

  • __group_748__new_gradients (optional) - T3: New gradients

  • __group_748__new_moment_1 (optional) - T4: New averaged gradients

  • __group_748__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_748__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_749__new_weights (optional) - T2: New weights

  • __group_749__new_gradients (optional) - T3: New gradients

  • __group_749__new_moment_1 (optional) - T4: New averaged gradients

  • __group_749__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_749__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_750__new_weights (optional) - T2: New weights

  • __group_750__new_gradients (optional) - T3: New gradients

  • __group_750__new_moment_1 (optional) - T4: New averaged gradients

  • __group_750__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_750__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_751__new_weights (optional) - T2: New weights

  • __group_751__new_gradients (optional) - T3: New gradients

  • __group_751__new_moment_1 (optional) - T4: New averaged gradients

  • __group_751__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_751__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_752__new_weights (optional) - T2: New weights

  • __group_752__new_gradients (optional) - T3: New gradients

  • __group_752__new_moment_1 (optional) - T4: New averaged gradients

  • __group_752__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_752__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_753__new_weights (optional) - T2: New weights

  • __group_753__new_gradients (optional) - T3: New gradients

  • __group_753__new_moment_1 (optional) - T4: New averaged gradients

  • __group_753__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_753__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_754__new_weights (optional) - T2: New weights

  • __group_754__new_gradients (optional) - T3: New gradients

  • __group_754__new_moment_1 (optional) - T4: New averaged gradients

  • __group_754__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_754__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_755__new_weights (optional) - T2: New weights

  • __group_755__new_gradients (optional) - T3: New gradients

  • __group_755__new_moment_1 (optional) - T4: New averaged gradients

  • __group_755__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_755__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_756__new_weights (optional) - T2: New weights

  • __group_756__new_gradients (optional) - T3: New gradients

  • __group_756__new_moment_1 (optional) - T4: New averaged gradients

  • __group_756__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_756__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_757__new_weights (optional) - T2: New weights

  • __group_757__new_gradients (optional) - T3: New gradients

  • __group_757__new_moment_1 (optional) - T4: New averaged gradients

  • __group_757__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_757__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_758__new_weights (optional) - T2: New weights

  • __group_758__new_gradients (optional) - T3: New gradients

  • __group_758__new_moment_1 (optional) - T4: New averaged gradients

  • __group_758__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_758__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_759__new_weights (optional) - T2: New weights

  • __group_759__new_gradients (optional) - T3: New gradients

  • __group_759__new_moment_1 (optional) - T4: New averaged gradients

  • __group_759__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_759__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_760__new_weights (optional) - T2: New weights

  • __group_760__new_gradients (optional) - T3: New gradients

  • __group_760__new_moment_1 (optional) - T4: New averaged gradients

  • __group_760__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_760__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_761__new_weights (optional) - T2: New weights

  • __group_761__new_gradients (optional) - T3: New gradients

  • __group_761__new_moment_1 (optional) - T4: New averaged gradients

  • __group_761__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_761__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_762__new_weights (optional) - T2: New weights

  • __group_762__new_gradients (optional) - T3: New gradients

  • __group_762__new_moment_1 (optional) - T4: New averaged gradients

  • __group_762__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_762__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_763__new_weights (optional) - T2: New weights

  • __group_763__new_gradients (optional) - T3: New gradients

  • __group_763__new_moment_1 (optional) - T4: New averaged gradients

  • __group_763__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_763__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_764__new_weights (optional) - T2: New weights

  • __group_764__new_gradients (optional) - T3: New gradients

  • __group_764__new_moment_1 (optional) - T4: New averaged gradients

  • __group_764__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_764__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_765__new_weights (optional) - T2: New weights

  • __group_765__new_gradients (optional) - T3: New gradients

  • __group_765__new_moment_1 (optional) - T4: New averaged gradients

  • __group_765__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_765__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_766__new_weights (optional) - T2: New weights

  • __group_766__new_gradients (optional) - T3: New gradients

  • __group_766__new_moment_1 (optional) - T4: New averaged gradients

  • __group_766__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_766__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_767__new_weights (optional) - T2: New weights

  • __group_767__new_gradients (optional) - T3: New gradients

  • __group_767__new_moment_1 (optional) - T4: New averaged gradients

  • __group_767__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_767__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_768__new_weights (optional) - T2: New weights

  • __group_768__new_gradients (optional) - T3: New gradients

  • __group_768__new_moment_1 (optional) - T4: New averaged gradients

  • __group_768__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_768__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_769__new_weights (optional) - T2: New weights

  • __group_769__new_gradients (optional) - T3: New gradients

  • __group_769__new_moment_1 (optional) - T4: New averaged gradients

  • __group_769__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_769__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_770__new_weights (optional) - T2: New weights

  • __group_770__new_gradients (optional) - T3: New gradients

  • __group_770__new_moment_1 (optional) - T4: New averaged gradients

  • __group_770__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_770__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_771__new_weights (optional) - T2: New weights

  • __group_771__new_gradients (optional) - T3: New gradients

  • __group_771__new_moment_1 (optional) - T4: New averaged gradients

  • __group_771__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_771__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_772__new_weights (optional) - T2: New weights

  • __group_772__new_gradients (optional) - T3: New gradients

  • __group_772__new_moment_1 (optional) - T4: New averaged gradients

  • __group_772__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_772__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_773__new_weights (optional) - T2: New weights

  • __group_773__new_gradients (optional) - T3: New gradients

  • __group_773__new_moment_1 (optional) - T4: New averaged gradients

  • __group_773__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_773__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_774__new_weights (optional) - T2: New weights

  • __group_774__new_gradients (optional) - T3: New gradients

  • __group_774__new_moment_1 (optional) - T4: New averaged gradients

  • __group_774__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_774__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_775__new_weights (optional) - T2: New weights

  • __group_775__new_gradients (optional) - T3: New gradients

  • __group_775__new_moment_1 (optional) - T4: New averaged gradients

  • __group_775__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_775__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_776__new_weights (optional) - T2: New weights

  • __group_776__new_gradients (optional) - T3: New gradients

  • __group_776__new_moment_1 (optional) - T4: New averaged gradients

  • __group_776__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_776__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_777__new_weights (optional) - T2: New weights

  • __group_777__new_gradients (optional) - T3: New gradients

  • __group_777__new_moment_1 (optional) - T4: New averaged gradients

  • __group_777__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_777__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_778__new_weights (optional) - T2: New weights

  • __group_778__new_gradients (optional) - T3: New gradients

  • __group_778__new_moment_1 (optional) - T4: New averaged gradients

  • __group_778__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_778__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_779__new_weights (optional) - T2: New weights

  • __group_779__new_gradients (optional) - T3: New gradients

  • __group_779__new_moment_1 (optional) - T4: New averaged gradients

  • __group_779__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_779__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_780__new_weights (optional) - T2: New weights

  • __group_780__new_gradients (optional) - T3: New gradients

  • __group_780__new_moment_1 (optional) - T4: New averaged gradients

  • __group_780__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_780__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_781__new_weights (optional) - T2: New weights

  • __group_781__new_gradients (optional) - T3: New gradients

  • __group_781__new_moment_1 (optional) - T4: New averaged gradients

  • __group_781__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_781__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_782__new_weights (optional) - T2: New weights

  • __group_782__new_gradients (optional) - T3: New gradients

  • __group_782__new_moment_1 (optional) - T4: New averaged gradients

  • __group_782__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_782__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_783__new_weights (optional) - T2: New weights

  • __group_783__new_gradients (optional) - T3: New gradients

  • __group_783__new_moment_1 (optional) - T4: New averaged gradients

  • __group_783__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_783__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_784__new_weights (optional) - T2: New weights

  • __group_784__new_gradients (optional) - T3: New gradients

  • __group_784__new_moment_1 (optional) - T4: New averaged gradients

  • __group_784__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_784__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_785__new_weights (optional) - T2: New weights

  • __group_785__new_gradients (optional) - T3: New gradients

  • __group_785__new_moment_1 (optional) - T4: New averaged gradients

  • __group_785__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_785__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_786__new_weights (optional) - T2: New weights

  • __group_786__new_gradients (optional) - T3: New gradients

  • __group_786__new_moment_1 (optional) - T4: New averaged gradients

  • __group_786__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_786__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_787__new_weights (optional) - T2: New weights

  • __group_787__new_gradients (optional) - T3: New gradients

  • __group_787__new_moment_1 (optional) - T4: New averaged gradients

  • __group_787__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_787__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_788__new_weights (optional) - T2: New weights

  • __group_788__new_gradients (optional) - T3: New gradients

  • __group_788__new_moment_1 (optional) - T4: New averaged gradients

  • __group_788__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_788__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_789__new_weights (optional) - T2: New weights

  • __group_789__new_gradients (optional) - T3: New gradients

  • __group_789__new_moment_1 (optional) - T4: New averaged gradients

  • __group_789__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_789__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_790__new_weights (optional) - T2: New weights

  • __group_790__new_gradients (optional) - T3: New gradients

  • __group_790__new_moment_1 (optional) - T4: New averaged gradients

  • __group_790__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_790__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_791__new_weights (optional) - T2: New weights

  • __group_791__new_gradients (optional) - T3: New gradients

  • __group_791__new_moment_1 (optional) - T4: New averaged gradients

  • __group_791__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_791__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_792__new_weights (optional) - T2: New weights

  • __group_792__new_gradients (optional) - T3: New gradients

  • __group_792__new_moment_1 (optional) - T4: New averaged gradients

  • __group_792__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_792__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_793__new_weights (optional) - T2: New weights

  • __group_793__new_gradients (optional) - T3: New gradients

  • __group_793__new_moment_1 (optional) - T4: New averaged gradients

  • __group_793__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_793__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_794__new_weights (optional) - T2: New weights

  • __group_794__new_gradients (optional) - T3: New gradients

  • __group_794__new_moment_1 (optional) - T4: New averaged gradients

  • __group_794__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_794__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_795__new_weights (optional) - T2: New weights

  • __group_795__new_gradients (optional) - T3: New gradients

  • __group_795__new_moment_1 (optional) - T4: New averaged gradients

  • __group_795__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_795__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_796__new_weights (optional) - T2: New weights

  • __group_796__new_gradients (optional) - T3: New gradients

  • __group_796__new_moment_1 (optional) - T4: New averaged gradients

  • __group_796__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_796__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_797__new_weights (optional) - T2: New weights

  • __group_797__new_gradients (optional) - T3: New gradients

  • __group_797__new_moment_1 (optional) - T4: New averaged gradients

  • __group_797__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_797__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_798__new_weights (optional) - T2: New weights

  • __group_798__new_gradients (optional) - T3: New gradients

  • __group_798__new_moment_1 (optional) - T4: New averaged gradients

  • __group_798__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_798__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_799__new_weights (optional) - T2: New weights

  • __group_799__new_gradients (optional) - T3: New gradients

  • __group_799__new_moment_1 (optional) - T4: New averaged gradients

  • __group_799__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_799__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_800__new_weights (optional) - T2: New weights

  • __group_800__new_gradients (optional) - T3: New gradients

  • __group_800__new_moment_1 (optional) - T4: New averaged gradients

  • __group_800__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_800__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_801__new_weights (optional) - T2: New weights

  • __group_801__new_gradients (optional) - T3: New gradients

  • __group_801__new_moment_1 (optional) - T4: New averaged gradients

  • __group_801__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_801__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_802__new_weights (optional) - T2: New weights

  • __group_802__new_gradients (optional) - T3: New gradients

  • __group_802__new_moment_1 (optional) - T4: New averaged gradients

  • __group_802__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_802__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_803__new_weights (optional) - T2: New weights

  • __group_803__new_gradients (optional) - T3: New gradients

  • __group_803__new_moment_1 (optional) - T4: New averaged gradients

  • __group_803__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_803__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_804__new_weights (optional) - T2: New weights

  • __group_804__new_gradients (optional) - T3: New gradients

  • __group_804__new_moment_1 (optional) - T4: New averaged gradients

  • __group_804__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_804__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_805__new_weights (optional) - T2: New weights

  • __group_805__new_gradients (optional) - T3: New gradients

  • __group_805__new_moment_1 (optional) - T4: New averaged gradients

  • __group_805__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_805__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_806__new_weights (optional) - T2: New weights

  • __group_806__new_gradients (optional) - T3: New gradients

  • __group_806__new_moment_1 (optional) - T4: New averaged gradients

  • __group_806__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_806__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_807__new_weights (optional) - T2: New weights

  • __group_807__new_gradients (optional) - T3: New gradients

  • __group_807__new_moment_1 (optional) - T4: New averaged gradients

  • __group_807__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_807__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_808__new_weights (optional) - T2: New weights

  • __group_808__new_gradients (optional) - T3: New gradients

  • __group_808__new_moment_1 (optional) - T4: New averaged gradients

  • __group_808__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_808__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_809__new_weights (optional) - T2: New weights

  • __group_809__new_gradients (optional) - T3: New gradients

  • __group_809__new_moment_1 (optional) - T4: New averaged gradients

  • __group_809__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_809__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_810__new_weights (optional) - T2: New weights

  • __group_810__new_gradients (optional) - T3: New gradients

  • __group_810__new_moment_1 (optional) - T4: New averaged gradients

  • __group_810__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_810__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_811__new_weights (optional) - T2: New weights

  • __group_811__new_gradients (optional) - T3: New gradients

  • __group_811__new_moment_1 (optional) - T4: New averaged gradients

  • __group_811__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_811__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_812__new_weights (optional) - T2: New weights

  • __group_812__new_gradients (optional) - T3: New gradients

  • __group_812__new_moment_1 (optional) - T4: New averaged gradients

  • __group_812__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_812__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_813__new_weights (optional) - T2: New weights

  • __group_813__new_gradients (optional) - T3: New gradients

  • __group_813__new_moment_1 (optional) - T4: New averaged gradients

  • __group_813__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_813__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_814__new_weights (optional) - T2: New weights

  • __group_814__new_gradients (optional) - T3: New gradients

  • __group_814__new_moment_1 (optional) - T4: New averaged gradients

  • __group_814__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_814__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_815__new_weights (optional) - T2: New weights

  • __group_815__new_gradients (optional) - T3: New gradients

  • __group_815__new_moment_1 (optional) - T4: New averaged gradients

  • __group_815__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_815__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_816__new_weights (optional) - T2: New weights

  • __group_816__new_gradients (optional) - T3: New gradients

  • __group_816__new_moment_1 (optional) - T4: New averaged gradients

  • __group_816__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_816__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_817__new_weights (optional) - T2: New weights

  • __group_817__new_gradients (optional) - T3: New gradients

  • __group_817__new_moment_1 (optional) - T4: New averaged gradients

  • __group_817__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_817__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_818__new_weights (optional) - T2: New weights

  • __group_818__new_gradients (optional) - T3: New gradients

  • __group_818__new_moment_1 (optional) - T4: New averaged gradients

  • __group_818__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_818__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_819__new_weights (optional) - T2: New weights

  • __group_819__new_gradients (optional) - T3: New gradients

  • __group_819__new_moment_1 (optional) - T4: New averaged gradients

  • __group_819__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_819__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_820__new_weights (optional) - T2: New weights

  • __group_820__new_gradients (optional) - T3: New gradients

  • __group_820__new_moment_1 (optional) - T4: New averaged gradients

  • __group_820__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_820__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_821__new_weights (optional) - T2: New weights

  • __group_821__new_gradients (optional) - T3: New gradients

  • __group_821__new_moment_1 (optional) - T4: New averaged gradients

  • __group_821__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_821__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_822__new_weights (optional) - T2: New weights

  • __group_822__new_gradients (optional) - T3: New gradients

  • __group_822__new_moment_1 (optional) - T4: New averaged gradients

  • __group_822__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_822__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_823__new_weights (optional) - T2: New weights

  • __group_823__new_gradients (optional) - T3: New gradients

  • __group_823__new_moment_1 (optional) - T4: New averaged gradients

  • __group_823__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_823__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_824__new_weights (optional) - T2: New weights

  • __group_824__new_gradients (optional) - T3: New gradients

  • __group_824__new_moment_1 (optional) - T4: New averaged gradients

  • __group_824__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_824__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_825__new_weights (optional) - T2: New weights

  • __group_825__new_gradients (optional) - T3: New gradients

  • __group_825__new_moment_1 (optional) - T4: New averaged gradients

  • __group_825__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_825__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_826__new_weights (optional) - T2: New weights

  • __group_826__new_gradients (optional) - T3: New gradients

  • __group_826__new_moment_1 (optional) - T4: New averaged gradients

  • __group_826__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_826__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_827__new_weights (optional) - T2: New weights

  • __group_827__new_gradients (optional) - T3: New gradients

  • __group_827__new_moment_1 (optional) - T4: New averaged gradients

  • __group_827__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_827__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_828__new_weights (optional) - T2: New weights

  • __group_828__new_gradients (optional) - T3: New gradients

  • __group_828__new_moment_1 (optional) - T4: New averaged gradients

  • __group_828__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_828__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_829__new_weights (optional) - T2: New weights

  • __group_829__new_gradients (optional) - T3: New gradients

  • __group_829__new_moment_1 (optional) - T4: New averaged gradients

  • __group_829__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_829__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_830__new_weights (optional) - T2: New weights

  • __group_830__new_gradients (optional) - T3: New gradients

  • __group_830__new_moment_1 (optional) - T4: New averaged gradients

  • __group_830__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_830__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_831__new_weights (optional) - T2: New weights

  • __group_831__new_gradients (optional) - T3: New gradients

  • __group_831__new_moment_1 (optional) - T4: New averaged gradients

  • __group_831__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_831__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_832__new_weights (optional) - T2: New weights

  • __group_832__new_gradients (optional) - T3: New gradients

  • __group_832__new_moment_1 (optional) - T4: New averaged gradients

  • __group_832__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_832__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_833__new_weights (optional) - T2: New weights

  • __group_833__new_gradients (optional) - T3: New gradients

  • __group_833__new_moment_1 (optional) - T4: New averaged gradients

  • __group_833__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_833__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_834__new_weights (optional) - T2: New weights

  • __group_834__new_gradients (optional) - T3: New gradients

  • __group_834__new_moment_1 (optional) - T4: New averaged gradients

  • __group_834__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_834__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_835__new_weights (optional) - T2: New weights

  • __group_835__new_gradients (optional) - T3: New gradients

  • __group_835__new_moment_1 (optional) - T4: New averaged gradients

  • __group_835__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_835__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_836__new_weights (optional) - T2: New weights

  • __group_836__new_gradients (optional) - T3: New gradients

  • __group_836__new_moment_1 (optional) - T4: New averaged gradients

  • __group_836__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_836__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_837__new_weights (optional) - T2: New weights

  • __group_837__new_gradients (optional) - T3: New gradients

  • __group_837__new_moment_1 (optional) - T4: New averaged gradients

  • __group_837__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_837__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_838__new_weights (optional) - T2: New weights

  • __group_838__new_gradients (optional) - T3: New gradients

  • __group_838__new_moment_1 (optional) - T4: New averaged gradients

  • __group_838__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_838__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_839__new_weights (optional) - T2: New weights

  • __group_839__new_gradients (optional) - T3: New gradients

  • __group_839__new_moment_1 (optional) - T4: New averaged gradients

  • __group_839__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_839__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_840__new_weights (optional) - T2: New weights

  • __group_840__new_gradients (optional) - T3: New gradients

  • __group_840__new_moment_1 (optional) - T4: New averaged gradients

  • __group_840__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_840__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_841__new_weights (optional) - T2: New weights

  • __group_841__new_gradients (optional) - T3: New gradients

  • __group_841__new_moment_1 (optional) - T4: New averaged gradients

  • __group_841__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_841__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_842__new_weights (optional) - T2: New weights

  • __group_842__new_gradients (optional) - T3: New gradients

  • __group_842__new_moment_1 (optional) - T4: New averaged gradients

  • __group_842__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_842__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_843__new_weights (optional) - T2: New weights

  • __group_843__new_gradients (optional) - T3: New gradients

  • __group_843__new_moment_1 (optional) - T4: New averaged gradients

  • __group_843__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_843__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_844__new_weights (optional) - T2: New weights

  • __group_844__new_gradients (optional) - T3: New gradients

  • __group_844__new_moment_1 (optional) - T4: New averaged gradients

  • __group_844__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_844__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_845__new_weights (optional) - T2: New weights

  • __group_845__new_gradients (optional) - T3: New gradients

  • __group_845__new_moment_1 (optional) - T4: New averaged gradients

  • __group_845__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_845__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_846__new_weights (optional) - T2: New weights

  • __group_846__new_gradients (optional) - T3: New gradients

  • __group_846__new_moment_1 (optional) - T4: New averaged gradients

  • __group_846__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_846__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_847__new_weights (optional) - T2: New weights

  • __group_847__new_gradients (optional) - T3: New gradients

  • __group_847__new_moment_1 (optional) - T4: New averaged gradients

  • __group_847__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_847__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_848__new_weights (optional) - T2: New weights

  • __group_848__new_gradients (optional) - T3: New gradients

  • __group_848__new_moment_1 (optional) - T4: New averaged gradients

  • __group_848__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_848__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_849__new_weights (optional) - T2: New weights

  • __group_849__new_gradients (optional) - T3: New gradients

  • __group_849__new_moment_1 (optional) - T4: New averaged gradients

  • __group_849__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_849__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_850__new_weights (optional) - T2: New weights

  • __group_850__new_gradients (optional) - T3: New gradients

  • __group_850__new_moment_1 (optional) - T4: New averaged gradients

  • __group_850__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_850__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_851__new_weights (optional) - T2: New weights

  • __group_851__new_gradients (optional) - T3: New gradients

  • __group_851__new_moment_1 (optional) - T4: New averaged gradients

  • __group_851__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_851__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_852__new_weights (optional) - T2: New weights

  • __group_852__new_gradients (optional) - T3: New gradients

  • __group_852__new_moment_1 (optional) - T4: New averaged gradients

  • __group_852__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_852__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_853__new_weights (optional) - T2: New weights

  • __group_853__new_gradients (optional) - T3: New gradients

  • __group_853__new_moment_1 (optional) - T4: New averaged gradients

  • __group_853__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_853__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_854__new_weights (optional) - T2: New weights

  • __group_854__new_gradients (optional) - T3: New gradients

  • __group_854__new_moment_1 (optional) - T4: New averaged gradients

  • __group_854__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_854__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_855__new_weights (optional) - T2: New weights

  • __group_855__new_gradients (optional) - T3: New gradients

  • __group_855__new_moment_1 (optional) - T4: New averaged gradients

  • __group_855__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_855__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_856__new_weights (optional) - T2: New weights

  • __group_856__new_gradients (optional) - T3: New gradients

  • __group_856__new_moment_1 (optional) - T4: New averaged gradients

  • __group_856__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_856__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_857__new_weights (optional) - T2: New weights

  • __group_857__new_gradients (optional) - T3: New gradients

  • __group_857__new_moment_1 (optional) - T4: New averaged gradients

  • __group_857__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_857__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_858__new_weights (optional) - T2: New weights

  • __group_858__new_gradients (optional) - T3: New gradients

  • __group_858__new_moment_1 (optional) - T4: New averaged gradients

  • __group_858__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_858__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_859__new_weights (optional) - T2: New weights

  • __group_859__new_gradients (optional) - T3: New gradients

  • __group_859__new_moment_1 (optional) - T4: New averaged gradients

  • __group_859__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_859__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_860__new_weights (optional) - T2: New weights

  • __group_860__new_gradients (optional) - T3: New gradients

  • __group_860__new_moment_1 (optional) - T4: New averaged gradients

  • __group_860__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_860__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_861__new_weights (optional) - T2: New weights

  • __group_861__new_gradients (optional) - T3: New gradients

  • __group_861__new_moment_1 (optional) - T4: New averaged gradients

  • __group_861__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_861__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_862__new_weights (optional) - T2: New weights

  • __group_862__new_gradients (optional) - T3: New gradients

  • __group_862__new_moment_1 (optional) - T4: New averaged gradients

  • __group_862__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_862__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_863__new_weights (optional) - T2: New weights

  • __group_863__new_gradients (optional) - T3: New gradients

  • __group_863__new_moment_1 (optional) - T4: New averaged gradients

  • __group_863__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_863__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_864__new_weights (optional) - T2: New weights

  • __group_864__new_gradients (optional) - T3: New gradients

  • __group_864__new_moment_1 (optional) - T4: New averaged gradients

  • __group_864__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_864__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_865__new_weights (optional) - T2: New weights

  • __group_865__new_gradients (optional) - T3: New gradients

  • __group_865__new_moment_1 (optional) - T4: New averaged gradients

  • __group_865__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_865__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_866__new_weights (optional) - T2: New weights

  • __group_866__new_gradients (optional) - T3: New gradients

  • __group_866__new_moment_1 (optional) - T4: New averaged gradients

  • __group_866__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_866__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_867__new_weights (optional) - T2: New weights

  • __group_867__new_gradients (optional) - T3: New gradients

  • __group_867__new_moment_1 (optional) - T4: New averaged gradients

  • __group_867__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_867__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_868__new_weights (optional) - T2: New weights

  • __group_868__new_gradients (optional) - T3: New gradients

  • __group_868__new_moment_1 (optional) - T4: New averaged gradients

  • __group_868__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_868__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_869__new_weights (optional) - T2: New weights

  • __group_869__new_gradients (optional) - T3: New gradients

  • __group_869__new_moment_1 (optional) - T4: New averaged gradients

  • __group_869__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_869__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_870__new_weights (optional) - T2: New weights

  • __group_870__new_gradients (optional) - T3: New gradients

  • __group_870__new_moment_1 (optional) - T4: New averaged gradients

  • __group_870__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_870__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_871__new_weights (optional) - T2: New weights

  • __group_871__new_gradients (optional) - T3: New gradients

  • __group_871__new_moment_1 (optional) - T4: New averaged gradients

  • __group_871__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_871__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_872__new_weights (optional) - T2: New weights

  • __group_872__new_gradients (optional) - T3: New gradients

  • __group_872__new_moment_1 (optional) - T4: New averaged gradients

  • __group_872__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_872__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_873__new_weights (optional) - T2: New weights

  • __group_873__new_gradients (optional) - T3: New gradients

  • __group_873__new_moment_1 (optional) - T4: New averaged gradients

  • __group_873__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_873__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_874__new_weights (optional) - T2: New weights

  • __group_874__new_gradients (optional) - T3: New gradients

  • __group_874__new_moment_1 (optional) - T4: New averaged gradients

  • __group_874__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_874__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_875__new_weights (optional) - T2: New weights

  • __group_875__new_gradients (optional) - T3: New gradients

  • __group_875__new_moment_1 (optional) - T4: New averaged gradients

  • __group_875__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_875__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_876__new_weights (optional) - T2: New weights

  • __group_876__new_gradients (optional) - T3: New gradients

  • __group_876__new_moment_1 (optional) - T4: New averaged gradients

  • __group_876__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_876__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_877__new_weights (optional) - T2: New weights

  • __group_877__new_gradients (optional) - T3: New gradients

  • __group_877__new_moment_1 (optional) - T4: New averaged gradients

  • __group_877__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_877__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_878__new_weights (optional) - T2: New weights

  • __group_878__new_gradients (optional) - T3: New gradients

  • __group_878__new_moment_1 (optional) - T4: New averaged gradients

  • __group_878__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_878__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_879__new_weights (optional) - T2: New weights

  • __group_879__new_gradients (optional) - T3: New gradients

  • __group_879__new_moment_1 (optional) - T4: New averaged gradients

  • __group_879__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_879__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_880__new_weights (optional) - T2: New weights

  • __group_880__new_gradients (optional) - T3: New gradients

  • __group_880__new_moment_1 (optional) - T4: New averaged gradients

  • __group_880__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_880__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_881__new_weights (optional) - T2: New weights

  • __group_881__new_gradients (optional) - T3: New gradients

  • __group_881__new_moment_1 (optional) - T4: New averaged gradients

  • __group_881__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_881__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_882__new_weights (optional) - T2: New weights

  • __group_882__new_gradients (optional) - T3: New gradients

  • __group_882__new_moment_1 (optional) - T4: New averaged gradients

  • __group_882__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_882__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_883__new_weights (optional) - T2: New weights

  • __group_883__new_gradients (optional) - T3: New gradients

  • __group_883__new_moment_1 (optional) - T4: New averaged gradients

  • __group_883__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_883__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_884__new_weights (optional) - T2: New weights

  • __group_884__new_gradients (optional) - T3: New gradients

  • __group_884__new_moment_1 (optional) - T4: New averaged gradients

  • __group_884__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_884__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_885__new_weights (optional) - T2: New weights

  • __group_885__new_gradients (optional) - T3: New gradients

  • __group_885__new_moment_1 (optional) - T4: New averaged gradients

  • __group_885__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_885__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_886__new_weights (optional) - T2: New weights

  • __group_886__new_gradients (optional) - T3: New gradients

  • __group_886__new_moment_1 (optional) - T4: New averaged gradients

  • __group_886__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_886__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_887__new_weights (optional) - T2: New weights

  • __group_887__new_gradients (optional) - T3: New gradients

  • __group_887__new_moment_1 (optional) - T4: New averaged gradients

  • __group_887__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_887__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_888__new_weights (optional) - T2: New weights

  • __group_888__new_gradients (optional) - T3: New gradients

  • __group_888__new_moment_1 (optional) - T4: New averaged gradients

  • __group_888__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_888__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_889__new_weights (optional) - T2: New weights

  • __group_889__new_gradients (optional) - T3: New gradients

  • __group_889__new_moment_1 (optional) - T4: New averaged gradients

  • __group_889__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_889__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_890__new_weights (optional) - T2: New weights

  • __group_890__new_gradients (optional) - T3: New gradients

  • __group_890__new_moment_1 (optional) - T4: New averaged gradients

  • __group_890__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_890__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_891__new_weights (optional) - T2: New weights

  • __group_891__new_gradients (optional) - T3: New gradients

  • __group_891__new_moment_1 (optional) - T4: New averaged gradients

  • __group_891__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_891__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_892__new_weights (optional) - T2: New weights

  • __group_892__new_gradients (optional) - T3: New gradients

  • __group_892__new_moment_1 (optional) - T4: New averaged gradients

  • __group_892__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_892__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_893__new_weights (optional) - T2: New weights

  • __group_893__new_gradients (optional) - T3: New gradients

  • __group_893__new_moment_1 (optional) - T4: New averaged gradients

  • __group_893__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_893__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_894__new_weights (optional) - T2: New weights

  • __group_894__new_gradients (optional) - T3: New gradients

  • __group_894__new_moment_1 (optional) - T4: New averaged gradients

  • __group_894__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_894__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_895__new_weights (optional) - T2: New weights

  • __group_895__new_gradients (optional) - T3: New gradients

  • __group_895__new_moment_1 (optional) - T4: New averaged gradients

  • __group_895__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_895__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_896__new_weights (optional) - T2: New weights

  • __group_896__new_gradients (optional) - T3: New gradients

  • __group_896__new_moment_1 (optional) - T4: New averaged gradients

  • __group_896__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_896__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_897__new_weights (optional) - T2: New weights

  • __group_897__new_gradients (optional) - T3: New gradients

  • __group_897__new_moment_1 (optional) - T4: New averaged gradients

  • __group_897__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_897__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_898__new_weights (optional) - T2: New weights

  • __group_898__new_gradients (optional) - T3: New gradients

  • __group_898__new_moment_1 (optional) - T4: New averaged gradients

  • __group_898__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_898__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_899__new_weights (optional) - T2: New weights

  • __group_899__new_gradients (optional) - T3: New gradients

  • __group_899__new_moment_1 (optional) - T4: New averaged gradients

  • __group_899__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_899__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_900__new_weights (optional) - T2: New weights

  • __group_900__new_gradients (optional) - T3: New gradients

  • __group_900__new_moment_1 (optional) - T4: New averaged gradients

  • __group_900__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_900__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_901__new_weights (optional) - T2: New weights

  • __group_901__new_gradients (optional) - T3: New gradients

  • __group_901__new_moment_1 (optional) - T4: New averaged gradients

  • __group_901__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_901__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_902__new_weights (optional) - T2: New weights

  • __group_902__new_gradients (optional) - T3: New gradients

  • __group_902__new_moment_1 (optional) - T4: New averaged gradients

  • __group_902__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_902__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_903__new_weights (optional) - T2: New weights

  • __group_903__new_gradients (optional) - T3: New gradients

  • __group_903__new_moment_1 (optional) - T4: New averaged gradients

  • __group_903__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_903__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_904__new_weights (optional) - T2: New weights

  • __group_904__new_gradients (optional) - T3: New gradients

  • __group_904__new_moment_1 (optional) - T4: New averaged gradients

  • __group_904__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_904__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_905__new_weights (optional) - T2: New weights

  • __group_905__new_gradients (optional) - T3: New gradients

  • __group_905__new_moment_1 (optional) - T4: New averaged gradients

  • __group_905__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_905__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_906__new_weights (optional) - T2: New weights

  • __group_906__new_gradients (optional) - T3: New gradients

  • __group_906__new_moment_1 (optional) - T4: New averaged gradients

  • __group_906__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_906__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_907__new_weights (optional) - T2: New weights

  • __group_907__new_gradients (optional) - T3: New gradients

  • __group_907__new_moment_1 (optional) - T4: New averaged gradients

  • __group_907__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_907__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_908__new_weights (optional) - T2: New weights

  • __group_908__new_gradients (optional) - T3: New gradients

  • __group_908__new_moment_1 (optional) - T4: New averaged gradients

  • __group_908__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_908__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_909__new_weights (optional) - T2: New weights

  • __group_909__new_gradients (optional) - T3: New gradients

  • __group_909__new_moment_1 (optional) - T4: New averaged gradients

  • __group_909__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_909__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_910__new_weights (optional) - T2: New weights

  • __group_910__new_gradients (optional) - T3: New gradients

  • __group_910__new_moment_1 (optional) - T4: New averaged gradients

  • __group_910__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_910__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_911__new_weights (optional) - T2: New weights

  • __group_911__new_gradients (optional) - T3: New gradients

  • __group_911__new_moment_1 (optional) - T4: New averaged gradients

  • __group_911__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_911__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_912__new_weights (optional) - T2: New weights

  • __group_912__new_gradients (optional) - T3: New gradients

  • __group_912__new_moment_1 (optional) - T4: New averaged gradients

  • __group_912__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_912__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_913__new_weights (optional) - T2: New weights

  • __group_913__new_gradients (optional) - T3: New gradients

  • __group_913__new_moment_1 (optional) - T4: New averaged gradients

  • __group_913__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_913__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_914__new_weights (optional) - T2: New weights

  • __group_914__new_gradients (optional) - T3: New gradients

  • __group_914__new_moment_1 (optional) - T4: New averaged gradients

  • __group_914__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_914__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_915__new_weights (optional) - T2: New weights

  • __group_915__new_gradients (optional) - T3: New gradients

  • __group_915__new_moment_1 (optional) - T4: New averaged gradients

  • __group_915__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_915__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_916__new_weights (optional) - T2: New weights

  • __group_916__new_gradients (optional) - T3: New gradients

  • __group_916__new_moment_1 (optional) - T4: New averaged gradients

  • __group_916__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_916__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_917__new_weights (optional) - T2: New weights

  • __group_917__new_gradients (optional) - T3: New gradients

  • __group_917__new_moment_1 (optional) - T4: New averaged gradients

  • __group_917__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_917__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_918__new_weights (optional) - T2: New weights

  • __group_918__new_gradients (optional) - T3: New gradients

  • __group_918__new_moment_1 (optional) - T4: New averaged gradients

  • __group_918__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_918__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_919__new_weights (optional) - T2: New weights

  • __group_919__new_gradients (optional) - T3: New gradients

  • __group_919__new_moment_1 (optional) - T4: New averaged gradients

  • __group_919__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_919__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_920__new_weights (optional) - T2: New weights

  • __group_920__new_gradients (optional) - T3: New gradients

  • __group_920__new_moment_1 (optional) - T4: New averaged gradients

  • __group_920__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_920__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_921__new_weights (optional) - T2: New weights

  • __group_921__new_gradients (optional) - T3: New gradients

  • __group_921__new_moment_1 (optional) - T4: New averaged gradients

  • __group_921__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_921__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_922__new_weights (optional) - T2: New weights

  • __group_922__new_gradients (optional) - T3: New gradients

  • __group_922__new_moment_1 (optional) - T4: New averaged gradients

  • __group_922__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_922__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_923__new_weights (optional) - T2: New weights

  • __group_923__new_gradients (optional) - T3: New gradients

  • __group_923__new_moment_1 (optional) - T4: New averaged gradients

  • __group_923__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_923__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_924__new_weights (optional) - T2: New weights

  • __group_924__new_gradients (optional) - T3: New gradients

  • __group_924__new_moment_1 (optional) - T4: New averaged gradients

  • __group_924__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_924__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_925__new_weights (optional) - T2: New weights

  • __group_925__new_gradients (optional) - T3: New gradients

  • __group_925__new_moment_1 (optional) - T4: New averaged gradients

  • __group_925__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_925__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_926__new_weights (optional) - T2: New weights

  • __group_926__new_gradients (optional) - T3: New gradients

  • __group_926__new_moment_1 (optional) - T4: New averaged gradients

  • __group_926__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_926__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_927__new_weights (optional) - T2: New weights

  • __group_927__new_gradients (optional) - T3: New gradients

  • __group_927__new_moment_1 (optional) - T4: New averaged gradients

  • __group_927__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_927__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_928__new_weights (optional) - T2: New weights

  • __group_928__new_gradients (optional) - T3: New gradients

  • __group_928__new_moment_1 (optional) - T4: New averaged gradients

  • __group_928__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_928__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_929__new_weights (optional) - T2: New weights

  • __group_929__new_gradients (optional) - T3: New gradients

  • __group_929__new_moment_1 (optional) - T4: New averaged gradients

  • __group_929__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_929__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_930__new_weights (optional) - T2: New weights

  • __group_930__new_gradients (optional) - T3: New gradients

  • __group_930__new_moment_1 (optional) - T4: New averaged gradients

  • __group_930__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_930__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_931__new_weights (optional) - T2: New weights

  • __group_931__new_gradients (optional) - T3: New gradients

  • __group_931__new_moment_1 (optional) - T4: New averaged gradients

  • __group_931__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_931__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_932__new_weights (optional) - T2: New weights

  • __group_932__new_gradients (optional) - T3: New gradients

  • __group_932__new_moment_1 (optional) - T4: New averaged gradients

  • __group_932__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_932__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_933__new_weights (optional) - T2: New weights

  • __group_933__new_gradients (optional) - T3: New gradients

  • __group_933__new_moment_1 (optional) - T4: New averaged gradients

  • __group_933__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_933__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_934__new_weights (optional) - T2: New weights

  • __group_934__new_gradients (optional) - T3: New gradients

  • __group_934__new_moment_1 (optional) - T4: New averaged gradients

  • __group_934__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_934__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_935__new_weights (optional) - T2: New weights

  • __group_935__new_gradients (optional) - T3: New gradients

  • __group_935__new_moment_1 (optional) - T4: New averaged gradients

  • __group_935__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_935__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_936__new_weights (optional) - T2: New weights

  • __group_936__new_gradients (optional) - T3: New gradients

  • __group_936__new_moment_1 (optional) - T4: New averaged gradients

  • __group_936__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_936__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_937__new_weights (optional) - T2: New weights

  • __group_937__new_gradients (optional) - T3: New gradients

  • __group_937__new_moment_1 (optional) - T4: New averaged gradients

  • __group_937__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_937__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_938__new_weights (optional) - T2: New weights

  • __group_938__new_gradients (optional) - T3: New gradients

  • __group_938__new_moment_1 (optional) - T4: New averaged gradients

  • __group_938__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_938__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_939__new_weights (optional) - T2: New weights

  • __group_939__new_gradients (optional) - T3: New gradients

  • __group_939__new_moment_1 (optional) - T4: New averaged gradients

  • __group_939__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_939__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_940__new_weights (optional) - T2: New weights

  • __group_940__new_gradients (optional) - T3: New gradients

  • __group_940__new_moment_1 (optional) - T4: New averaged gradients

  • __group_940__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_940__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_941__new_weights (optional) - T2: New weights

  • __group_941__new_gradients (optional) - T3: New gradients

  • __group_941__new_moment_1 (optional) - T4: New averaged gradients

  • __group_941__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_941__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_942__new_weights (optional) - T2: New weights

  • __group_942__new_gradients (optional) - T3: New gradients

  • __group_942__new_moment_1 (optional) - T4: New averaged gradients

  • __group_942__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_942__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_943__new_weights (optional) - T2: New weights

  • __group_943__new_gradients (optional) - T3: New gradients

  • __group_943__new_moment_1 (optional) - T4: New averaged gradients

  • __group_943__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_943__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_944__new_weights (optional) - T2: New weights

  • __group_944__new_gradients (optional) - T3: New gradients

  • __group_944__new_moment_1 (optional) - T4: New averaged gradients

  • __group_944__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_944__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_945__new_weights (optional) - T2: New weights

  • __group_945__new_gradients (optional) - T3: New gradients

  • __group_945__new_moment_1 (optional) - T4: New averaged gradients

  • __group_945__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_945__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_946__new_weights (optional) - T2: New weights

  • __group_946__new_gradients (optional) - T3: New gradients

  • __group_946__new_moment_1 (optional) - T4: New averaged gradients

  • __group_946__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_946__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_947__new_weights (optional) - T2: New weights

  • __group_947__new_gradients (optional) - T3: New gradients

  • __group_947__new_moment_1 (optional) - T4: New averaged gradients

  • __group_947__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_947__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_948__new_weights (optional) - T2: New weights

  • __group_948__new_gradients (optional) - T3: New gradients

  • __group_948__new_moment_1 (optional) - T4: New averaged gradients

  • __group_948__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_948__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_949__new_weights (optional) - T2: New weights

  • __group_949__new_gradients (optional) - T3: New gradients

  • __group_949__new_moment_1 (optional) - T4: New averaged gradients

  • __group_949__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_949__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_950__new_weights (optional) - T2: New weights

  • __group_950__new_gradients (optional) - T3: New gradients

  • __group_950__new_moment_1 (optional) - T4: New averaged gradients

  • __group_950__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_950__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_951__new_weights (optional) - T2: New weights

  • __group_951__new_gradients (optional) - T3: New gradients

  • __group_951__new_moment_1 (optional) - T4: New averaged gradients

  • __group_951__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_951__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_952__new_weights (optional) - T2: New weights

  • __group_952__new_gradients (optional) - T3: New gradients

  • __group_952__new_moment_1 (optional) - T4: New averaged gradients

  • __group_952__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_952__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_953__new_weights (optional) - T2: New weights

  • __group_953__new_gradients (optional) - T3: New gradients

  • __group_953__new_moment_1 (optional) - T4: New averaged gradients

  • __group_953__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_953__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_954__new_weights (optional) - T2: New weights

  • __group_954__new_gradients (optional) - T3: New gradients

  • __group_954__new_moment_1 (optional) - T4: New averaged gradients

  • __group_954__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_954__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_955__new_weights (optional) - T2: New weights

  • __group_955__new_gradients (optional) - T3: New gradients

  • __group_955__new_moment_1 (optional) - T4: New averaged gradients

  • __group_955__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_955__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_956__new_weights (optional) - T2: New weights

  • __group_956__new_gradients (optional) - T3: New gradients

  • __group_956__new_moment_1 (optional) - T4: New averaged gradients

  • __group_956__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_956__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_957__new_weights (optional) - T2: New weights

  • __group_957__new_gradients (optional) - T3: New gradients

  • __group_957__new_moment_1 (optional) - T4: New averaged gradients

  • __group_957__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_957__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_958__new_weights (optional) - T2: New weights

  • __group_958__new_gradients (optional) - T3: New gradients

  • __group_958__new_moment_1 (optional) - T4: New averaged gradients

  • __group_958__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_958__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_959__new_weights (optional) - T2: New weights

  • __group_959__new_gradients (optional) - T3: New gradients

  • __group_959__new_moment_1 (optional) - T4: New averaged gradients

  • __group_959__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_959__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_960__new_weights (optional) - T2: New weights

  • __group_960__new_gradients (optional) - T3: New gradients

  • __group_960__new_moment_1 (optional) - T4: New averaged gradients

  • __group_960__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_960__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_961__new_weights (optional) - T2: New weights

  • __group_961__new_gradients (optional) - T3: New gradients

  • __group_961__new_moment_1 (optional) - T4: New averaged gradients

  • __group_961__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_961__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_962__new_weights (optional) - T2: New weights

  • __group_962__new_gradients (optional) - T3: New gradients

  • __group_962__new_moment_1 (optional) - T4: New averaged gradients

  • __group_962__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_962__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_963__new_weights (optional) - T2: New weights

  • __group_963__new_gradients (optional) - T3: New gradients

  • __group_963__new_moment_1 (optional) - T4: New averaged gradients

  • __group_963__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_963__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_964__new_weights (optional) - T2: New weights

  • __group_964__new_gradients (optional) - T3: New gradients

  • __group_964__new_moment_1 (optional) - T4: New averaged gradients

  • __group_964__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_964__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_965__new_weights (optional) - T2: New weights

  • __group_965__new_gradients (optional) - T3: New gradients

  • __group_965__new_moment_1 (optional) - T4: New averaged gradients

  • __group_965__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_965__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_966__new_weights (optional) - T2: New weights

  • __group_966__new_gradients (optional) - T3: New gradients

  • __group_966__new_moment_1 (optional) - T4: New averaged gradients

  • __group_966__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_966__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_967__new_weights (optional) - T2: New weights

  • __group_967__new_gradients (optional) - T3: New gradients

  • __group_967__new_moment_1 (optional) - T4: New averaged gradients

  • __group_967__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_967__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_968__new_weights (optional) - T2: New weights

  • __group_968__new_gradients (optional) - T3: New gradients

  • __group_968__new_moment_1 (optional) - T4: New averaged gradients

  • __group_968__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_968__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_969__new_weights (optional) - T2: New weights

  • __group_969__new_gradients (optional) - T3: New gradients

  • __group_969__new_moment_1 (optional) - T4: New averaged gradients

  • __group_969__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_969__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_970__new_weights (optional) - T2: New weights

  • __group_970__new_gradients (optional) - T3: New gradients

  • __group_970__new_moment_1 (optional) - T4: New averaged gradients

  • __group_970__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_970__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_971__new_weights (optional) - T2: New weights

  • __group_971__new_gradients (optional) - T3: New gradients

  • __group_971__new_moment_1 (optional) - T4: New averaged gradients

  • __group_971__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_971__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_972__new_weights (optional) - T2: New weights

  • __group_972__new_gradients (optional) - T3: New gradients

  • __group_972__new_moment_1 (optional) - T4: New averaged gradients

  • __group_972__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_972__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_973__new_weights (optional) - T2: New weights

  • __group_973__new_gradients (optional) - T3: New gradients

  • __group_973__new_moment_1 (optional) - T4: New averaged gradients

  • __group_973__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_973__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_974__new_weights (optional) - T2: New weights

  • __group_974__new_gradients (optional) - T3: New gradients

  • __group_974__new_moment_1 (optional) - T4: New averaged gradients

  • __group_974__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_974__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_975__new_weights (optional) - T2: New weights

  • __group_975__new_gradients (optional) - T3: New gradients

  • __group_975__new_moment_1 (optional) - T4: New averaged gradients

  • __group_975__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_975__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_976__new_weights (optional) - T2: New weights

  • __group_976__new_gradients (optional) - T3: New gradients

  • __group_976__new_moment_1 (optional) - T4: New averaged gradients

  • __group_976__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_976__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_977__new_weights (optional) - T2: New weights

  • __group_977__new_gradients (optional) - T3: New gradients

  • __group_977__new_moment_1 (optional) - T4: New averaged gradients

  • __group_977__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_977__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_978__new_weights (optional) - T2: New weights

  • __group_978__new_gradients (optional) - T3: New gradients

  • __group_978__new_moment_1 (optional) - T4: New averaged gradients

  • __group_978__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_978__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_979__new_weights (optional) - T2: New weights

  • __group_979__new_gradients (optional) - T3: New gradients

  • __group_979__new_moment_1 (optional) - T4: New averaged gradients

  • __group_979__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_979__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_980__new_weights (optional) - T2: New weights

  • __group_980__new_gradients (optional) - T3: New gradients

  • __group_980__new_moment_1 (optional) - T4: New averaged gradients

  • __group_980__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_980__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_981__new_weights (optional) - T2: New weights

  • __group_981__new_gradients (optional) - T3: New gradients

  • __group_981__new_moment_1 (optional) - T4: New averaged gradients

  • __group_981__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_981__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_982__new_weights (optional) - T2: New weights

  • __group_982__new_gradients (optional) - T3: New gradients

  • __group_982__new_moment_1 (optional) - T4: New averaged gradients

  • __group_982__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_982__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_983__new_weights (optional) - T2: New weights

  • __group_983__new_gradients (optional) - T3: New gradients

  • __group_983__new_moment_1 (optional) - T4: New averaged gradients

  • __group_983__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_983__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_984__new_weights (optional) - T2: New weights

  • __group_984__new_gradients (optional) - T3: New gradients

  • __group_984__new_moment_1 (optional) - T4: New averaged gradients

  • __group_984__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_984__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_985__new_weights (optional) - T2: New weights

  • __group_985__new_gradients (optional) - T3: New gradients

  • __group_985__new_moment_1 (optional) - T4: New averaged gradients

  • __group_985__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_985__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_986__new_weights (optional) - T2: New weights

  • __group_986__new_gradients (optional) - T3: New gradients

  • __group_986__new_moment_1 (optional) - T4: New averaged gradients

  • __group_986__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_986__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_987__new_weights (optional) - T2: New weights

  • __group_987__new_gradients (optional) - T3: New gradients

  • __group_987__new_moment_1 (optional) - T4: New averaged gradients

  • __group_987__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_987__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_988__new_weights (optional) - T2: New weights

  • __group_988__new_gradients (optional) - T3: New gradients

  • __group_988__new_moment_1 (optional) - T4: New averaged gradients

  • __group_988__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_988__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_989__new_weights (optional) - T2: New weights

  • __group_989__new_gradients (optional) - T3: New gradients

  • __group_989__new_moment_1 (optional) - T4: New averaged gradients

  • __group_989__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_989__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_990__new_weights (optional) - T2: New weights

  • __group_990__new_gradients (optional) - T3: New gradients

  • __group_990__new_moment_1 (optional) - T4: New averaged gradients

  • __group_990__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_990__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_991__new_weights (optional) - T2: New weights

  • __group_991__new_gradients (optional) - T3: New gradients

  • __group_991__new_moment_1 (optional) - T4: New averaged gradients

  • __group_991__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_991__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_992__new_weights (optional) - T2: New weights

  • __group_992__new_gradients (optional) - T3: New gradients

  • __group_992__new_moment_1 (optional) - T4: New averaged gradients

  • __group_992__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_992__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_993__new_weights (optional) - T2: New weights

  • __group_993__new_gradients (optional) - T3: New gradients

  • __group_993__new_moment_1 (optional) - T4: New averaged gradients

  • __group_993__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_993__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_994__new_weights (optional) - T2: New weights

  • __group_994__new_gradients (optional) - T3: New gradients

  • __group_994__new_moment_1 (optional) - T4: New averaged gradients

  • __group_994__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_994__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_995__new_weights (optional) - T2: New weights

  • __group_995__new_gradients (optional) - T3: New gradients

  • __group_995__new_moment_1 (optional) - T4: New averaged gradients

  • __group_995__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_995__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_996__new_weights (optional) - T2: New weights

  • __group_996__new_gradients (optional) - T3: New gradients

  • __group_996__new_moment_1 (optional) - T4: New averaged gradients

  • __group_996__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_996__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_997__new_weights (optional) - T2: New weights

  • __group_997__new_gradients (optional) - T3: New gradients

  • __group_997__new_moment_1 (optional) - T4: New averaged gradients

  • __group_997__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_997__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_998__new_weights (optional) - T2: New weights

  • __group_998__new_gradients (optional) - T3: New gradients

  • __group_998__new_moment_1 (optional) - T4: New averaged gradients

  • __group_998__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_998__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_999__new_weights (optional) - T2: New weights

  • __group_999__new_gradients (optional) - T3: New gradients

  • __group_999__new_moment_1 (optional) - T4: New averaged gradients

  • __group_999__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_999__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1000__new_weights (optional) - T2: New weights

  • __group_1000__new_gradients (optional) - T3: New gradients

  • __group_1000__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1000__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1000__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1001__new_weights (optional) - T2: New weights

  • __group_1001__new_gradients (optional) - T3: New gradients

  • __group_1001__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1001__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1001__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1002__new_weights (optional) - T2: New weights

  • __group_1002__new_gradients (optional) - T3: New gradients

  • __group_1002__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1002__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1002__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1003__new_weights (optional) - T2: New weights

  • __group_1003__new_gradients (optional) - T3: New gradients

  • __group_1003__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1003__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1003__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1004__new_weights (optional) - T2: New weights

  • __group_1004__new_gradients (optional) - T3: New gradients

  • __group_1004__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1004__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1004__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1005__new_weights (optional) - T2: New weights

  • __group_1005__new_gradients (optional) - T3: New gradients

  • __group_1005__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1005__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1005__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1006__new_weights (optional) - T2: New weights

  • __group_1006__new_gradients (optional) - T3: New gradients

  • __group_1006__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1006__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1006__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1007__new_weights (optional) - T2: New weights

  • __group_1007__new_gradients (optional) - T3: New gradients

  • __group_1007__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1007__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1007__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1008__new_weights (optional) - T2: New weights

  • __group_1008__new_gradients (optional) - T3: New gradients

  • __group_1008__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1008__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1008__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1009__new_weights (optional) - T2: New weights

  • __group_1009__new_gradients (optional) - T3: New gradients

  • __group_1009__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1009__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1009__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1010__new_weights (optional) - T2: New weights

  • __group_1010__new_gradients (optional) - T3: New gradients

  • __group_1010__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1010__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1010__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1011__new_weights (optional) - T2: New weights

  • __group_1011__new_gradients (optional) - T3: New gradients

  • __group_1011__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1011__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1011__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1012__new_weights (optional) - T2: New weights

  • __group_1012__new_gradients (optional) - T3: New gradients

  • __group_1012__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1012__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1012__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1013__new_weights (optional) - T2: New weights

  • __group_1013__new_gradients (optional) - T3: New gradients

  • __group_1013__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1013__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1013__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1014__new_weights (optional) - T2: New weights

  • __group_1014__new_gradients (optional) - T3: New gradients

  • __group_1014__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1014__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1014__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1015__new_weights (optional) - T2: New weights

  • __group_1015__new_gradients (optional) - T3: New gradients

  • __group_1015__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1015__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1015__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1016__new_weights (optional) - T2: New weights

  • __group_1016__new_gradients (optional) - T3: New gradients

  • __group_1016__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1016__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1016__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1017__new_weights (optional) - T2: New weights

  • __group_1017__new_gradients (optional) - T3: New gradients

  • __group_1017__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1017__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1017__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1018__new_weights (optional) - T2: New weights

  • __group_1018__new_gradients (optional) - T3: New gradients

  • __group_1018__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1018__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1018__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1019__new_weights (optional) - T2: New weights

  • __group_1019__new_gradients (optional) - T3: New gradients

  • __group_1019__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1019__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1019__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1020__new_weights (optional) - T2: New weights

  • __group_1020__new_gradients (optional) - T3: New gradients

  • __group_1020__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1020__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1020__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1021__new_weights (optional) - T2: New weights

  • __group_1021__new_gradients (optional) - T3: New gradients

  • __group_1021__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1021__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1021__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1022__new_weights (optional) - T2: New weights

  • __group_1022__new_gradients (optional) - T3: New gradients

  • __group_1022__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1022__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1022__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1023__new_weights (optional) - T2: New weights

  • __group_1023__new_gradients (optional) - T3: New gradients

  • __group_1023__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1023__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1023__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

OnnxComMicrosoftLambOptimizer_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftLambOptimizer_1(*args, **kwargs)#

Version

  • name: LambOptimizer (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • alpha: Coefficient of previous gradient in running average. Default value is ?.

  • beta: Coefficient of previous squared gradient in running average.The effective learning rate is computed by r = R / (1 + T * decay_factor). Default to 0 so that increasing update counts doesn’t reduce the learning rate. Default value is ?.

  • do_bias_correction: Compute unbiased 1st and 2nd momentums. Default value is ?.

  • epsilon: Small scalar to avoid dividing by zero. Default value is ?.

  • lambda: Regularization coefficient of 0.5 * lambda * ||X||_2^2. Default to 0, which means no regularization. Default value is ?.

  • max_norm_clip: clip threshold of gradients. Default value is ?.

  • ratio_max: Upper bound on confidence ratio. Default value is ?.

  • ratio_min: Lower bound on confidence ratio. Default value is ?.

Inputs

Between 0 and 5125 inputs.

  • update_signal (optional, heterogeneous) - T_BOOL: This signal indicates if weight tensors should be updated.

  • loss_scale (optional, heterogeneous) - T2: Loss scale for mixed precision training.

  • gradient_norm (optional, heterogeneous) - T_GRAD_NORM: Norm of global gradient.

  • R (optional, heterogeneous) - T1: The initial learning rate.

  • step (optional, heterogeneous) - TInt64: One-based index of the current training iteration.

  • __group_0__weights (optional) - T2: weights to optimize.

  • __group_0__gradients (optional) - T3: gradients computed in this iteration.

  • __group_0__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_0__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_0__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1__weights (optional) - T2: weights to optimize.

  • __group_1__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_2__weights (optional) - T2: weights to optimize.

  • __group_2__gradients (optional) - T3: gradients computed in this iteration.

  • __group_2__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_2__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_2__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_3__weights (optional) - T2: weights to optimize.

  • __group_3__gradients (optional) - T3: gradients computed in this iteration.

  • __group_3__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_3__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_3__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_4__weights (optional) - T2: weights to optimize.

  • __group_4__gradients (optional) - T3: gradients computed in this iteration.

  • __group_4__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_4__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_4__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_5__weights (optional) - T2: weights to optimize.

  • __group_5__gradients (optional) - T3: gradients computed in this iteration.

  • __group_5__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_5__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_5__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_6__weights (optional) - T2: weights to optimize.

  • __group_6__gradients (optional) - T3: gradients computed in this iteration.

  • __group_6__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_6__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_6__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_7__weights (optional) - T2: weights to optimize.

  • __group_7__gradients (optional) - T3: gradients computed in this iteration.

  • __group_7__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_7__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_7__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_8__weights (optional) - T2: weights to optimize.

  • __group_8__gradients (optional) - T3: gradients computed in this iteration.

  • __group_8__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_8__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_8__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_9__weights (optional) - T2: weights to optimize.

  • __group_9__gradients (optional) - T3: gradients computed in this iteration.

  • __group_9__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_9__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_9__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_10__weights (optional) - T2: weights to optimize.

  • __group_10__gradients (optional) - T3: gradients computed in this iteration.

  • __group_10__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_10__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_10__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_11__weights (optional) - T2: weights to optimize.

  • __group_11__gradients (optional) - T3: gradients computed in this iteration.

  • __group_11__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_11__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_11__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_12__weights (optional) - T2: weights to optimize.

  • __group_12__gradients (optional) - T3: gradients computed in this iteration.

  • __group_12__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_12__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_12__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_13__weights (optional) - T2: weights to optimize.

  • __group_13__gradients (optional) - T3: gradients computed in this iteration.

  • __group_13__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_13__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_13__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_14__weights (optional) - T2: weights to optimize.

  • __group_14__gradients (optional) - T3: gradients computed in this iteration.

  • __group_14__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_14__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_14__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_15__weights (optional) - T2: weights to optimize.

  • __group_15__gradients (optional) - T3: gradients computed in this iteration.

  • __group_15__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_15__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_15__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_16__weights (optional) - T2: weights to optimize.

  • __group_16__gradients (optional) - T3: gradients computed in this iteration.

  • __group_16__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_16__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_16__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_17__weights (optional) - T2: weights to optimize.

  • __group_17__gradients (optional) - T3: gradients computed in this iteration.

  • __group_17__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_17__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_17__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_18__weights (optional) - T2: weights to optimize.

  • __group_18__gradients (optional) - T3: gradients computed in this iteration.

  • __group_18__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_18__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_18__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_19__weights (optional) - T2: weights to optimize.

  • __group_19__gradients (optional) - T3: gradients computed in this iteration.

  • __group_19__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_19__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_19__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_20__weights (optional) - T2: weights to optimize.

  • __group_20__gradients (optional) - T3: gradients computed in this iteration.

  • __group_20__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_20__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_20__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_21__weights (optional) - T2: weights to optimize.

  • __group_21__gradients (optional) - T3: gradients computed in this iteration.

  • __group_21__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_21__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_21__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_22__weights (optional) - T2: weights to optimize.

  • __group_22__gradients (optional) - T3: gradients computed in this iteration.

  • __group_22__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_22__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_22__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_23__weights (optional) - T2: weights to optimize.

  • __group_23__gradients (optional) - T3: gradients computed in this iteration.

  • __group_23__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_23__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_23__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_24__weights (optional) - T2: weights to optimize.

  • __group_24__gradients (optional) - T3: gradients computed in this iteration.

  • __group_24__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_24__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_24__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_25__weights (optional) - T2: weights to optimize.

  • __group_25__gradients (optional) - T3: gradients computed in this iteration.

  • __group_25__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_25__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_25__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_26__weights (optional) - T2: weights to optimize.

  • __group_26__gradients (optional) - T3: gradients computed in this iteration.

  • __group_26__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_26__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_26__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_27__weights (optional) - T2: weights to optimize.

  • __group_27__gradients (optional) - T3: gradients computed in this iteration.

  • __group_27__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_27__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_27__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_28__weights (optional) - T2: weights to optimize.

  • __group_28__gradients (optional) - T3: gradients computed in this iteration.

  • __group_28__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_28__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_28__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_29__weights (optional) - T2: weights to optimize.

  • __group_29__gradients (optional) - T3: gradients computed in this iteration.

  • __group_29__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_29__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_29__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_30__weights (optional) - T2: weights to optimize.

  • __group_30__gradients (optional) - T3: gradients computed in this iteration.

  • __group_30__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_30__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_30__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_31__weights (optional) - T2: weights to optimize.

  • __group_31__gradients (optional) - T3: gradients computed in this iteration.

  • __group_31__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_31__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_31__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_32__weights (optional) - T2: weights to optimize.

  • __group_32__gradients (optional) - T3: gradients computed in this iteration.

  • __group_32__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_32__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_32__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_33__weights (optional) - T2: weights to optimize.

  • __group_33__gradients (optional) - T3: gradients computed in this iteration.

  • __group_33__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_33__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_33__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_34__weights (optional) - T2: weights to optimize.

  • __group_34__gradients (optional) - T3: gradients computed in this iteration.

  • __group_34__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_34__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_34__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_35__weights (optional) - T2: weights to optimize.

  • __group_35__gradients (optional) - T3: gradients computed in this iteration.

  • __group_35__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_35__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_35__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_36__weights (optional) - T2: weights to optimize.

  • __group_36__gradients (optional) - T3: gradients computed in this iteration.

  • __group_36__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_36__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_36__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_37__weights (optional) - T2: weights to optimize.

  • __group_37__gradients (optional) - T3: gradients computed in this iteration.

  • __group_37__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_37__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_37__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_38__weights (optional) - T2: weights to optimize.

  • __group_38__gradients (optional) - T3: gradients computed in this iteration.

  • __group_38__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_38__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_38__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_39__weights (optional) - T2: weights to optimize.

  • __group_39__gradients (optional) - T3: gradients computed in this iteration.

  • __group_39__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_39__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_39__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_40__weights (optional) - T2: weights to optimize.

  • __group_40__gradients (optional) - T3: gradients computed in this iteration.

  • __group_40__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_40__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_40__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_41__weights (optional) - T2: weights to optimize.

  • __group_41__gradients (optional) - T3: gradients computed in this iteration.

  • __group_41__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_41__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_41__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_42__weights (optional) - T2: weights to optimize.

  • __group_42__gradients (optional) - T3: gradients computed in this iteration.

  • __group_42__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_42__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_42__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_43__weights (optional) - T2: weights to optimize.

  • __group_43__gradients (optional) - T3: gradients computed in this iteration.

  • __group_43__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_43__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_43__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_44__weights (optional) - T2: weights to optimize.

  • __group_44__gradients (optional) - T3: gradients computed in this iteration.

  • __group_44__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_44__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_44__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_45__weights (optional) - T2: weights to optimize.

  • __group_45__gradients (optional) - T3: gradients computed in this iteration.

  • __group_45__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_45__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_45__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_46__weights (optional) - T2: weights to optimize.

  • __group_46__gradients (optional) - T3: gradients computed in this iteration.

  • __group_46__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_46__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_46__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_47__weights (optional) - T2: weights to optimize.

  • __group_47__gradients (optional) - T3: gradients computed in this iteration.

  • __group_47__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_47__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_47__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_48__weights (optional) - T2: weights to optimize.

  • __group_48__gradients (optional) - T3: gradients computed in this iteration.

  • __group_48__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_48__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_48__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_49__weights (optional) - T2: weights to optimize.

  • __group_49__gradients (optional) - T3: gradients computed in this iteration.

  • __group_49__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_49__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_49__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_50__weights (optional) - T2: weights to optimize.

  • __group_50__gradients (optional) - T3: gradients computed in this iteration.

  • __group_50__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_50__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_50__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_51__weights (optional) - T2: weights to optimize.

  • __group_51__gradients (optional) - T3: gradients computed in this iteration.

  • __group_51__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_51__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_51__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_52__weights (optional) - T2: weights to optimize.

  • __group_52__gradients (optional) - T3: gradients computed in this iteration.

  • __group_52__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_52__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_52__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_53__weights (optional) - T2: weights to optimize.

  • __group_53__gradients (optional) - T3: gradients computed in this iteration.

  • __group_53__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_53__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_53__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_54__weights (optional) - T2: weights to optimize.

  • __group_54__gradients (optional) - T3: gradients computed in this iteration.

  • __group_54__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_54__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_54__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_55__weights (optional) - T2: weights to optimize.

  • __group_55__gradients (optional) - T3: gradients computed in this iteration.

  • __group_55__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_55__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_55__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_56__weights (optional) - T2: weights to optimize.

  • __group_56__gradients (optional) - T3: gradients computed in this iteration.

  • __group_56__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_56__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_56__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_57__weights (optional) - T2: weights to optimize.

  • __group_57__gradients (optional) - T3: gradients computed in this iteration.

  • __group_57__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_57__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_57__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_58__weights (optional) - T2: weights to optimize.

  • __group_58__gradients (optional) - T3: gradients computed in this iteration.

  • __group_58__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_58__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_58__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_59__weights (optional) - T2: weights to optimize.

  • __group_59__gradients (optional) - T3: gradients computed in this iteration.

  • __group_59__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_59__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_59__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_60__weights (optional) - T2: weights to optimize.

  • __group_60__gradients (optional) - T3: gradients computed in this iteration.

  • __group_60__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_60__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_60__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_61__weights (optional) - T2: weights to optimize.

  • __group_61__gradients (optional) - T3: gradients computed in this iteration.

  • __group_61__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_61__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_61__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_62__weights (optional) - T2: weights to optimize.

  • __group_62__gradients (optional) - T3: gradients computed in this iteration.

  • __group_62__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_62__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_62__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_63__weights (optional) - T2: weights to optimize.

  • __group_63__gradients (optional) - T3: gradients computed in this iteration.

  • __group_63__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_63__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_63__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_64__weights (optional) - T2: weights to optimize.

  • __group_64__gradients (optional) - T3: gradients computed in this iteration.

  • __group_64__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_64__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_64__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_65__weights (optional) - T2: weights to optimize.

  • __group_65__gradients (optional) - T3: gradients computed in this iteration.

  • __group_65__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_65__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_65__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_66__weights (optional) - T2: weights to optimize.

  • __group_66__gradients (optional) - T3: gradients computed in this iteration.

  • __group_66__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_66__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_66__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_67__weights (optional) - T2: weights to optimize.

  • __group_67__gradients (optional) - T3: gradients computed in this iteration.

  • __group_67__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_67__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_67__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_68__weights (optional) - T2: weights to optimize.

  • __group_68__gradients (optional) - T3: gradients computed in this iteration.

  • __group_68__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_68__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_68__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_69__weights (optional) - T2: weights to optimize.

  • __group_69__gradients (optional) - T3: gradients computed in this iteration.

  • __group_69__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_69__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_69__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_70__weights (optional) - T2: weights to optimize.

  • __group_70__gradients (optional) - T3: gradients computed in this iteration.

  • __group_70__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_70__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_70__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_71__weights (optional) - T2: weights to optimize.

  • __group_71__gradients (optional) - T3: gradients computed in this iteration.

  • __group_71__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_71__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_71__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_72__weights (optional) - T2: weights to optimize.

  • __group_72__gradients (optional) - T3: gradients computed in this iteration.

  • __group_72__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_72__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_72__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_73__weights (optional) - T2: weights to optimize.

  • __group_73__gradients (optional) - T3: gradients computed in this iteration.

  • __group_73__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_73__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_73__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_74__weights (optional) - T2: weights to optimize.

  • __group_74__gradients (optional) - T3: gradients computed in this iteration.

  • __group_74__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_74__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_74__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_75__weights (optional) - T2: weights to optimize.

  • __group_75__gradients (optional) - T3: gradients computed in this iteration.

  • __group_75__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_75__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_75__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_76__weights (optional) - T2: weights to optimize.

  • __group_76__gradients (optional) - T3: gradients computed in this iteration.

  • __group_76__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_76__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_76__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_77__weights (optional) - T2: weights to optimize.

  • __group_77__gradients (optional) - T3: gradients computed in this iteration.

  • __group_77__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_77__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_77__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_78__weights (optional) - T2: weights to optimize.

  • __group_78__gradients (optional) - T3: gradients computed in this iteration.

  • __group_78__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_78__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_78__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_79__weights (optional) - T2: weights to optimize.

  • __group_79__gradients (optional) - T3: gradients computed in this iteration.

  • __group_79__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_79__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_79__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_80__weights (optional) - T2: weights to optimize.

  • __group_80__gradients (optional) - T3: gradients computed in this iteration.

  • __group_80__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_80__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_80__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_81__weights (optional) - T2: weights to optimize.

  • __group_81__gradients (optional) - T3: gradients computed in this iteration.

  • __group_81__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_81__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_81__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_82__weights (optional) - T2: weights to optimize.

  • __group_82__gradients (optional) - T3: gradients computed in this iteration.

  • __group_82__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_82__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_82__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_83__weights (optional) - T2: weights to optimize.

  • __group_83__gradients (optional) - T3: gradients computed in this iteration.

  • __group_83__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_83__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_83__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_84__weights (optional) - T2: weights to optimize.

  • __group_84__gradients (optional) - T3: gradients computed in this iteration.

  • __group_84__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_84__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_84__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_85__weights (optional) - T2: weights to optimize.

  • __group_85__gradients (optional) - T3: gradients computed in this iteration.

  • __group_85__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_85__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_85__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_86__weights (optional) - T2: weights to optimize.

  • __group_86__gradients (optional) - T3: gradients computed in this iteration.

  • __group_86__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_86__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_86__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_87__weights (optional) - T2: weights to optimize.

  • __group_87__gradients (optional) - T3: gradients computed in this iteration.

  • __group_87__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_87__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_87__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_88__weights (optional) - T2: weights to optimize.

  • __group_88__gradients (optional) - T3: gradients computed in this iteration.

  • __group_88__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_88__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_88__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_89__weights (optional) - T2: weights to optimize.

  • __group_89__gradients (optional) - T3: gradients computed in this iteration.

  • __group_89__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_89__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_89__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_90__weights (optional) - T2: weights to optimize.

  • __group_90__gradients (optional) - T3: gradients computed in this iteration.

  • __group_90__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_90__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_90__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_91__weights (optional) - T2: weights to optimize.

  • __group_91__gradients (optional) - T3: gradients computed in this iteration.

  • __group_91__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_91__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_91__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_92__weights (optional) - T2: weights to optimize.

  • __group_92__gradients (optional) - T3: gradients computed in this iteration.

  • __group_92__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_92__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_92__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_93__weights (optional) - T2: weights to optimize.

  • __group_93__gradients (optional) - T3: gradients computed in this iteration.

  • __group_93__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_93__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_93__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_94__weights (optional) - T2: weights to optimize.

  • __group_94__gradients (optional) - T3: gradients computed in this iteration.

  • __group_94__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_94__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_94__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_95__weights (optional) - T2: weights to optimize.

  • __group_95__gradients (optional) - T3: gradients computed in this iteration.

  • __group_95__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_95__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_95__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_96__weights (optional) - T2: weights to optimize.

  • __group_96__gradients (optional) - T3: gradients computed in this iteration.

  • __group_96__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_96__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_96__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_97__weights (optional) - T2: weights to optimize.

  • __group_97__gradients (optional) - T3: gradients computed in this iteration.

  • __group_97__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_97__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_97__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_98__weights (optional) - T2: weights to optimize.

  • __group_98__gradients (optional) - T3: gradients computed in this iteration.

  • __group_98__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_98__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_98__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_99__weights (optional) - T2: weights to optimize.

  • __group_99__gradients (optional) - T3: gradients computed in this iteration.

  • __group_99__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_99__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_99__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_100__weights (optional) - T2: weights to optimize.

  • __group_100__gradients (optional) - T3: gradients computed in this iteration.

  • __group_100__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_100__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_100__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_101__weights (optional) - T2: weights to optimize.

  • __group_101__gradients (optional) - T3: gradients computed in this iteration.

  • __group_101__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_101__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_101__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_102__weights (optional) - T2: weights to optimize.

  • __group_102__gradients (optional) - T3: gradients computed in this iteration.

  • __group_102__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_102__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_102__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_103__weights (optional) - T2: weights to optimize.

  • __group_103__gradients (optional) - T3: gradients computed in this iteration.

  • __group_103__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_103__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_103__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_104__weights (optional) - T2: weights to optimize.

  • __group_104__gradients (optional) - T3: gradients computed in this iteration.

  • __group_104__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_104__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_104__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_105__weights (optional) - T2: weights to optimize.

  • __group_105__gradients (optional) - T3: gradients computed in this iteration.

  • __group_105__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_105__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_105__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_106__weights (optional) - T2: weights to optimize.

  • __group_106__gradients (optional) - T3: gradients computed in this iteration.

  • __group_106__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_106__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_106__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_107__weights (optional) - T2: weights to optimize.

  • __group_107__gradients (optional) - T3: gradients computed in this iteration.

  • __group_107__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_107__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_107__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_108__weights (optional) - T2: weights to optimize.

  • __group_108__gradients (optional) - T3: gradients computed in this iteration.

  • __group_108__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_108__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_108__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_109__weights (optional) - T2: weights to optimize.

  • __group_109__gradients (optional) - T3: gradients computed in this iteration.

  • __group_109__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_109__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_109__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_110__weights (optional) - T2: weights to optimize.

  • __group_110__gradients (optional) - T3: gradients computed in this iteration.

  • __group_110__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_110__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_110__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_111__weights (optional) - T2: weights to optimize.

  • __group_111__gradients (optional) - T3: gradients computed in this iteration.

  • __group_111__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_111__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_111__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_112__weights (optional) - T2: weights to optimize.

  • __group_112__gradients (optional) - T3: gradients computed in this iteration.

  • __group_112__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_112__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_112__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_113__weights (optional) - T2: weights to optimize.

  • __group_113__gradients (optional) - T3: gradients computed in this iteration.

  • __group_113__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_113__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_113__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_114__weights (optional) - T2: weights to optimize.

  • __group_114__gradients (optional) - T3: gradients computed in this iteration.

  • __group_114__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_114__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_114__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_115__weights (optional) - T2: weights to optimize.

  • __group_115__gradients (optional) - T3: gradients computed in this iteration.

  • __group_115__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_115__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_115__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_116__weights (optional) - T2: weights to optimize.

  • __group_116__gradients (optional) - T3: gradients computed in this iteration.

  • __group_116__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_116__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_116__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_117__weights (optional) - T2: weights to optimize.

  • __group_117__gradients (optional) - T3: gradients computed in this iteration.

  • __group_117__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_117__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_117__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_118__weights (optional) - T2: weights to optimize.

  • __group_118__gradients (optional) - T3: gradients computed in this iteration.

  • __group_118__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_118__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_118__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_119__weights (optional) - T2: weights to optimize.

  • __group_119__gradients (optional) - T3: gradients computed in this iteration.

  • __group_119__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_119__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_119__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_120__weights (optional) - T2: weights to optimize.

  • __group_120__gradients (optional) - T3: gradients computed in this iteration.

  • __group_120__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_120__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_120__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_121__weights (optional) - T2: weights to optimize.

  • __group_121__gradients (optional) - T3: gradients computed in this iteration.

  • __group_121__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_121__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_121__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_122__weights (optional) - T2: weights to optimize.

  • __group_122__gradients (optional) - T3: gradients computed in this iteration.

  • __group_122__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_122__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_122__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_123__weights (optional) - T2: weights to optimize.

  • __group_123__gradients (optional) - T3: gradients computed in this iteration.

  • __group_123__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_123__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_123__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_124__weights (optional) - T2: weights to optimize.

  • __group_124__gradients (optional) - T3: gradients computed in this iteration.

  • __group_124__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_124__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_124__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_125__weights (optional) - T2: weights to optimize.

  • __group_125__gradients (optional) - T3: gradients computed in this iteration.

  • __group_125__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_125__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_125__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_126__weights (optional) - T2: weights to optimize.

  • __group_126__gradients (optional) - T3: gradients computed in this iteration.

  • __group_126__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_126__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_126__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_127__weights (optional) - T2: weights to optimize.

  • __group_127__gradients (optional) - T3: gradients computed in this iteration.

  • __group_127__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_127__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_127__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_128__weights (optional) - T2: weights to optimize.

  • __group_128__gradients (optional) - T3: gradients computed in this iteration.

  • __group_128__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_128__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_128__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_129__weights (optional) - T2: weights to optimize.

  • __group_129__gradients (optional) - T3: gradients computed in this iteration.

  • __group_129__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_129__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_129__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_130__weights (optional) - T2: weights to optimize.

  • __group_130__gradients (optional) - T3: gradients computed in this iteration.

  • __group_130__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_130__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_130__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_131__weights (optional) - T2: weights to optimize.

  • __group_131__gradients (optional) - T3: gradients computed in this iteration.

  • __group_131__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_131__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_131__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_132__weights (optional) - T2: weights to optimize.

  • __group_132__gradients (optional) - T3: gradients computed in this iteration.

  • __group_132__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_132__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_132__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_133__weights (optional) - T2: weights to optimize.

  • __group_133__gradients (optional) - T3: gradients computed in this iteration.

  • __group_133__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_133__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_133__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_134__weights (optional) - T2: weights to optimize.

  • __group_134__gradients (optional) - T3: gradients computed in this iteration.

  • __group_134__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_134__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_134__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_135__weights (optional) - T2: weights to optimize.

  • __group_135__gradients (optional) - T3: gradients computed in this iteration.

  • __group_135__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_135__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_135__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_136__weights (optional) - T2: weights to optimize.

  • __group_136__gradients (optional) - T3: gradients computed in this iteration.

  • __group_136__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_136__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_136__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_137__weights (optional) - T2: weights to optimize.

  • __group_137__gradients (optional) - T3: gradients computed in this iteration.

  • __group_137__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_137__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_137__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_138__weights (optional) - T2: weights to optimize.

  • __group_138__gradients (optional) - T3: gradients computed in this iteration.

  • __group_138__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_138__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_138__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_139__weights (optional) - T2: weights to optimize.

  • __group_139__gradients (optional) - T3: gradients computed in this iteration.

  • __group_139__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_139__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_139__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_140__weights (optional) - T2: weights to optimize.

  • __group_140__gradients (optional) - T3: gradients computed in this iteration.

  • __group_140__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_140__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_140__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_141__weights (optional) - T2: weights to optimize.

  • __group_141__gradients (optional) - T3: gradients computed in this iteration.

  • __group_141__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_141__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_141__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_142__weights (optional) - T2: weights to optimize.

  • __group_142__gradients (optional) - T3: gradients computed in this iteration.

  • __group_142__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_142__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_142__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_143__weights (optional) - T2: weights to optimize.

  • __group_143__gradients (optional) - T3: gradients computed in this iteration.

  • __group_143__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_143__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_143__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_144__weights (optional) - T2: weights to optimize.

  • __group_144__gradients (optional) - T3: gradients computed in this iteration.

  • __group_144__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_144__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_144__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_145__weights (optional) - T2: weights to optimize.

  • __group_145__gradients (optional) - T3: gradients computed in this iteration.

  • __group_145__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_145__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_145__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_146__weights (optional) - T2: weights to optimize.

  • __group_146__gradients (optional) - T3: gradients computed in this iteration.

  • __group_146__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_146__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_146__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_147__weights (optional) - T2: weights to optimize.

  • __group_147__gradients (optional) - T3: gradients computed in this iteration.

  • __group_147__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_147__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_147__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_148__weights (optional) - T2: weights to optimize.

  • __group_148__gradients (optional) - T3: gradients computed in this iteration.

  • __group_148__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_148__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_148__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_149__weights (optional) - T2: weights to optimize.

  • __group_149__gradients (optional) - T3: gradients computed in this iteration.

  • __group_149__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_149__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_149__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_150__weights (optional) - T2: weights to optimize.

  • __group_150__gradients (optional) - T3: gradients computed in this iteration.

  • __group_150__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_150__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_150__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_151__weights (optional) - T2: weights to optimize.

  • __group_151__gradients (optional) - T3: gradients computed in this iteration.

  • __group_151__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_151__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_151__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_152__weights (optional) - T2: weights to optimize.

  • __group_152__gradients (optional) - T3: gradients computed in this iteration.

  • __group_152__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_152__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_152__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_153__weights (optional) - T2: weights to optimize.

  • __group_153__gradients (optional) - T3: gradients computed in this iteration.

  • __group_153__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_153__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_153__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_154__weights (optional) - T2: weights to optimize.

  • __group_154__gradients (optional) - T3: gradients computed in this iteration.

  • __group_154__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_154__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_154__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_155__weights (optional) - T2: weights to optimize.

  • __group_155__gradients (optional) - T3: gradients computed in this iteration.

  • __group_155__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_155__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_155__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_156__weights (optional) - T2: weights to optimize.

  • __group_156__gradients (optional) - T3: gradients computed in this iteration.

  • __group_156__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_156__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_156__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_157__weights (optional) - T2: weights to optimize.

  • __group_157__gradients (optional) - T3: gradients computed in this iteration.

  • __group_157__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_157__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_157__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_158__weights (optional) - T2: weights to optimize.

  • __group_158__gradients (optional) - T3: gradients computed in this iteration.

  • __group_158__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_158__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_158__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_159__weights (optional) - T2: weights to optimize.

  • __group_159__gradients (optional) - T3: gradients computed in this iteration.

  • __group_159__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_159__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_159__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_160__weights (optional) - T2: weights to optimize.

  • __group_160__gradients (optional) - T3: gradients computed in this iteration.

  • __group_160__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_160__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_160__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_161__weights (optional) - T2: weights to optimize.

  • __group_161__gradients (optional) - T3: gradients computed in this iteration.

  • __group_161__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_161__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_161__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_162__weights (optional) - T2: weights to optimize.

  • __group_162__gradients (optional) - T3: gradients computed in this iteration.

  • __group_162__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_162__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_162__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_163__weights (optional) - T2: weights to optimize.

  • __group_163__gradients (optional) - T3: gradients computed in this iteration.

  • __group_163__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_163__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_163__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_164__weights (optional) - T2: weights to optimize.

  • __group_164__gradients (optional) - T3: gradients computed in this iteration.

  • __group_164__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_164__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_164__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_165__weights (optional) - T2: weights to optimize.

  • __group_165__gradients (optional) - T3: gradients computed in this iteration.

  • __group_165__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_165__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_165__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_166__weights (optional) - T2: weights to optimize.

  • __group_166__gradients (optional) - T3: gradients computed in this iteration.

  • __group_166__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_166__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_166__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_167__weights (optional) - T2: weights to optimize.

  • __group_167__gradients (optional) - T3: gradients computed in this iteration.

  • __group_167__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_167__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_167__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_168__weights (optional) - T2: weights to optimize.

  • __group_168__gradients (optional) - T3: gradients computed in this iteration.

  • __group_168__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_168__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_168__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_169__weights (optional) - T2: weights to optimize.

  • __group_169__gradients (optional) - T3: gradients computed in this iteration.

  • __group_169__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_169__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_169__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_170__weights (optional) - T2: weights to optimize.

  • __group_170__gradients (optional) - T3: gradients computed in this iteration.

  • __group_170__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_170__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_170__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_171__weights (optional) - T2: weights to optimize.

  • __group_171__gradients (optional) - T3: gradients computed in this iteration.

  • __group_171__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_171__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_171__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_172__weights (optional) - T2: weights to optimize.

  • __group_172__gradients (optional) - T3: gradients computed in this iteration.

  • __group_172__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_172__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_172__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_173__weights (optional) - T2: weights to optimize.

  • __group_173__gradients (optional) - T3: gradients computed in this iteration.

  • __group_173__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_173__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_173__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_174__weights (optional) - T2: weights to optimize.

  • __group_174__gradients (optional) - T3: gradients computed in this iteration.

  • __group_174__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_174__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_174__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_175__weights (optional) - T2: weights to optimize.

  • __group_175__gradients (optional) - T3: gradients computed in this iteration.

  • __group_175__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_175__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_175__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_176__weights (optional) - T2: weights to optimize.

  • __group_176__gradients (optional) - T3: gradients computed in this iteration.

  • __group_176__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_176__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_176__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_177__weights (optional) - T2: weights to optimize.

  • __group_177__gradients (optional) - T3: gradients computed in this iteration.

  • __group_177__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_177__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_177__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_178__weights (optional) - T2: weights to optimize.

  • __group_178__gradients (optional) - T3: gradients computed in this iteration.

  • __group_178__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_178__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_178__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_179__weights (optional) - T2: weights to optimize.

  • __group_179__gradients (optional) - T3: gradients computed in this iteration.

  • __group_179__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_179__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_179__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_180__weights (optional) - T2: weights to optimize.

  • __group_180__gradients (optional) - T3: gradients computed in this iteration.

  • __group_180__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_180__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_180__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_181__weights (optional) - T2: weights to optimize.

  • __group_181__gradients (optional) - T3: gradients computed in this iteration.

  • __group_181__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_181__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_181__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_182__weights (optional) - T2: weights to optimize.

  • __group_182__gradients (optional) - T3: gradients computed in this iteration.

  • __group_182__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_182__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_182__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_183__weights (optional) - T2: weights to optimize.

  • __group_183__gradients (optional) - T3: gradients computed in this iteration.

  • __group_183__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_183__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_183__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_184__weights (optional) - T2: weights to optimize.

  • __group_184__gradients (optional) - T3: gradients computed in this iteration.

  • __group_184__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_184__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_184__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_185__weights (optional) - T2: weights to optimize.

  • __group_185__gradients (optional) - T3: gradients computed in this iteration.

  • __group_185__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_185__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_185__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_186__weights (optional) - T2: weights to optimize.

  • __group_186__gradients (optional) - T3: gradients computed in this iteration.

  • __group_186__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_186__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_186__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_187__weights (optional) - T2: weights to optimize.

  • __group_187__gradients (optional) - T3: gradients computed in this iteration.

  • __group_187__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_187__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_187__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_188__weights (optional) - T2: weights to optimize.

  • __group_188__gradients (optional) - T3: gradients computed in this iteration.

  • __group_188__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_188__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_188__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_189__weights (optional) - T2: weights to optimize.

  • __group_189__gradients (optional) - T3: gradients computed in this iteration.

  • __group_189__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_189__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_189__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_190__weights (optional) - T2: weights to optimize.

  • __group_190__gradients (optional) - T3: gradients computed in this iteration.

  • __group_190__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_190__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_190__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_191__weights (optional) - T2: weights to optimize.

  • __group_191__gradients (optional) - T3: gradients computed in this iteration.

  • __group_191__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_191__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_191__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_192__weights (optional) - T2: weights to optimize.

  • __group_192__gradients (optional) - T3: gradients computed in this iteration.

  • __group_192__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_192__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_192__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_193__weights (optional) - T2: weights to optimize.

  • __group_193__gradients (optional) - T3: gradients computed in this iteration.

  • __group_193__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_193__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_193__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_194__weights (optional) - T2: weights to optimize.

  • __group_194__gradients (optional) - T3: gradients computed in this iteration.

  • __group_194__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_194__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_194__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_195__weights (optional) - T2: weights to optimize.

  • __group_195__gradients (optional) - T3: gradients computed in this iteration.

  • __group_195__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_195__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_195__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_196__weights (optional) - T2: weights to optimize.

  • __group_196__gradients (optional) - T3: gradients computed in this iteration.

  • __group_196__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_196__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_196__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_197__weights (optional) - T2: weights to optimize.

  • __group_197__gradients (optional) - T3: gradients computed in this iteration.

  • __group_197__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_197__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_197__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_198__weights (optional) - T2: weights to optimize.

  • __group_198__gradients (optional) - T3: gradients computed in this iteration.

  • __group_198__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_198__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_198__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_199__weights (optional) - T2: weights to optimize.

  • __group_199__gradients (optional) - T3: gradients computed in this iteration.

  • __group_199__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_199__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_199__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_200__weights (optional) - T2: weights to optimize.

  • __group_200__gradients (optional) - T3: gradients computed in this iteration.

  • __group_200__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_200__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_200__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_201__weights (optional) - T2: weights to optimize.

  • __group_201__gradients (optional) - T3: gradients computed in this iteration.

  • __group_201__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_201__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_201__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_202__weights (optional) - T2: weights to optimize.

  • __group_202__gradients (optional) - T3: gradients computed in this iteration.

  • __group_202__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_202__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_202__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_203__weights (optional) - T2: weights to optimize.

  • __group_203__gradients (optional) - T3: gradients computed in this iteration.

  • __group_203__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_203__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_203__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_204__weights (optional) - T2: weights to optimize.

  • __group_204__gradients (optional) - T3: gradients computed in this iteration.

  • __group_204__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_204__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_204__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_205__weights (optional) - T2: weights to optimize.

  • __group_205__gradients (optional) - T3: gradients computed in this iteration.

  • __group_205__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_205__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_205__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_206__weights (optional) - T2: weights to optimize.

  • __group_206__gradients (optional) - T3: gradients computed in this iteration.

  • __group_206__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_206__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_206__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_207__weights (optional) - T2: weights to optimize.

  • __group_207__gradients (optional) - T3: gradients computed in this iteration.

  • __group_207__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_207__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_207__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_208__weights (optional) - T2: weights to optimize.

  • __group_208__gradients (optional) - T3: gradients computed in this iteration.

  • __group_208__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_208__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_208__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_209__weights (optional) - T2: weights to optimize.

  • __group_209__gradients (optional) - T3: gradients computed in this iteration.

  • __group_209__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_209__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_209__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_210__weights (optional) - T2: weights to optimize.

  • __group_210__gradients (optional) - T3: gradients computed in this iteration.

  • __group_210__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_210__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_210__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_211__weights (optional) - T2: weights to optimize.

  • __group_211__gradients (optional) - T3: gradients computed in this iteration.

  • __group_211__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_211__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_211__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_212__weights (optional) - T2: weights to optimize.

  • __group_212__gradients (optional) - T3: gradients computed in this iteration.

  • __group_212__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_212__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_212__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_213__weights (optional) - T2: weights to optimize.

  • __group_213__gradients (optional) - T3: gradients computed in this iteration.

  • __group_213__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_213__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_213__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_214__weights (optional) - T2: weights to optimize.

  • __group_214__gradients (optional) - T3: gradients computed in this iteration.

  • __group_214__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_214__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_214__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_215__weights (optional) - T2: weights to optimize.

  • __group_215__gradients (optional) - T3: gradients computed in this iteration.

  • __group_215__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_215__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_215__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_216__weights (optional) - T2: weights to optimize.

  • __group_216__gradients (optional) - T3: gradients computed in this iteration.

  • __group_216__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_216__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_216__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_217__weights (optional) - T2: weights to optimize.

  • __group_217__gradients (optional) - T3: gradients computed in this iteration.

  • __group_217__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_217__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_217__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_218__weights (optional) - T2: weights to optimize.

  • __group_218__gradients (optional) - T3: gradients computed in this iteration.

  • __group_218__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_218__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_218__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_219__weights (optional) - T2: weights to optimize.

  • __group_219__gradients (optional) - T3: gradients computed in this iteration.

  • __group_219__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_219__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_219__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_220__weights (optional) - T2: weights to optimize.

  • __group_220__gradients (optional) - T3: gradients computed in this iteration.

  • __group_220__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_220__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_220__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_221__weights (optional) - T2: weights to optimize.

  • __group_221__gradients (optional) - T3: gradients computed in this iteration.

  • __group_221__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_221__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_221__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_222__weights (optional) - T2: weights to optimize.

  • __group_222__gradients (optional) - T3: gradients computed in this iteration.

  • __group_222__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_222__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_222__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_223__weights (optional) - T2: weights to optimize.

  • __group_223__gradients (optional) - T3: gradients computed in this iteration.

  • __group_223__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_223__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_223__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_224__weights (optional) - T2: weights to optimize.

  • __group_224__gradients (optional) - T3: gradients computed in this iteration.

  • __group_224__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_224__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_224__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_225__weights (optional) - T2: weights to optimize.

  • __group_225__gradients (optional) - T3: gradients computed in this iteration.

  • __group_225__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_225__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_225__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_226__weights (optional) - T2: weights to optimize.

  • __group_226__gradients (optional) - T3: gradients computed in this iteration.

  • __group_226__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_226__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_226__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_227__weights (optional) - T2: weights to optimize.

  • __group_227__gradients (optional) - T3: gradients computed in this iteration.

  • __group_227__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_227__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_227__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_228__weights (optional) - T2: weights to optimize.

  • __group_228__gradients (optional) - T3: gradients computed in this iteration.

  • __group_228__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_228__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_228__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_229__weights (optional) - T2: weights to optimize.

  • __group_229__gradients (optional) - T3: gradients computed in this iteration.

  • __group_229__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_229__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_229__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_230__weights (optional) - T2: weights to optimize.

  • __group_230__gradients (optional) - T3: gradients computed in this iteration.

  • __group_230__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_230__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_230__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_231__weights (optional) - T2: weights to optimize.

  • __group_231__gradients (optional) - T3: gradients computed in this iteration.

  • __group_231__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_231__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_231__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_232__weights (optional) - T2: weights to optimize.

  • __group_232__gradients (optional) - T3: gradients computed in this iteration.

  • __group_232__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_232__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_232__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_233__weights (optional) - T2: weights to optimize.

  • __group_233__gradients (optional) - T3: gradients computed in this iteration.

  • __group_233__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_233__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_233__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_234__weights (optional) - T2: weights to optimize.

  • __group_234__gradients (optional) - T3: gradients computed in this iteration.

  • __group_234__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_234__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_234__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_235__weights (optional) - T2: weights to optimize.

  • __group_235__gradients (optional) - T3: gradients computed in this iteration.

  • __group_235__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_235__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_235__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_236__weights (optional) - T2: weights to optimize.

  • __group_236__gradients (optional) - T3: gradients computed in this iteration.

  • __group_236__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_236__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_236__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_237__weights (optional) - T2: weights to optimize.

  • __group_237__gradients (optional) - T3: gradients computed in this iteration.

  • __group_237__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_237__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_237__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_238__weights (optional) - T2: weights to optimize.

  • __group_238__gradients (optional) - T3: gradients computed in this iteration.

  • __group_238__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_238__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_238__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_239__weights (optional) - T2: weights to optimize.

  • __group_239__gradients (optional) - T3: gradients computed in this iteration.

  • __group_239__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_239__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_239__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_240__weights (optional) - T2: weights to optimize.

  • __group_240__gradients (optional) - T3: gradients computed in this iteration.

  • __group_240__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_240__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_240__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_241__weights (optional) - T2: weights to optimize.

  • __group_241__gradients (optional) - T3: gradients computed in this iteration.

  • __group_241__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_241__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_241__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_242__weights (optional) - T2: weights to optimize.

  • __group_242__gradients (optional) - T3: gradients computed in this iteration.

  • __group_242__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_242__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_242__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_243__weights (optional) - T2: weights to optimize.

  • __group_243__gradients (optional) - T3: gradients computed in this iteration.

  • __group_243__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_243__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_243__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_244__weights (optional) - T2: weights to optimize.

  • __group_244__gradients (optional) - T3: gradients computed in this iteration.

  • __group_244__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_244__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_244__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_245__weights (optional) - T2: weights to optimize.

  • __group_245__gradients (optional) - T3: gradients computed in this iteration.

  • __group_245__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_245__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_245__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_246__weights (optional) - T2: weights to optimize.

  • __group_246__gradients (optional) - T3: gradients computed in this iteration.

  • __group_246__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_246__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_246__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_247__weights (optional) - T2: weights to optimize.

  • __group_247__gradients (optional) - T3: gradients computed in this iteration.

  • __group_247__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_247__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_247__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_248__weights (optional) - T2: weights to optimize.

  • __group_248__gradients (optional) - T3: gradients computed in this iteration.

  • __group_248__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_248__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_248__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_249__weights (optional) - T2: weights to optimize.

  • __group_249__gradients (optional) - T3: gradients computed in this iteration.

  • __group_249__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_249__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_249__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_250__weights (optional) - T2: weights to optimize.

  • __group_250__gradients (optional) - T3: gradients computed in this iteration.

  • __group_250__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_250__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_250__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_251__weights (optional) - T2: weights to optimize.

  • __group_251__gradients (optional) - T3: gradients computed in this iteration.

  • __group_251__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_251__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_251__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_252__weights (optional) - T2: weights to optimize.

  • __group_252__gradients (optional) - T3: gradients computed in this iteration.

  • __group_252__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_252__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_252__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_253__weights (optional) - T2: weights to optimize.

  • __group_253__gradients (optional) - T3: gradients computed in this iteration.

  • __group_253__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_253__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_253__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_254__weights (optional) - T2: weights to optimize.

  • __group_254__gradients (optional) - T3: gradients computed in this iteration.

  • __group_254__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_254__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_254__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_255__weights (optional) - T2: weights to optimize.

  • __group_255__gradients (optional) - T3: gradients computed in this iteration.

  • __group_255__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_255__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_255__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_256__weights (optional) - T2: weights to optimize.

  • __group_256__gradients (optional) - T3: gradients computed in this iteration.

  • __group_256__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_256__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_256__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_257__weights (optional) - T2: weights to optimize.

  • __group_257__gradients (optional) - T3: gradients computed in this iteration.

  • __group_257__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_257__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_257__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_258__weights (optional) - T2: weights to optimize.

  • __group_258__gradients (optional) - T3: gradients computed in this iteration.

  • __group_258__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_258__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_258__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_259__weights (optional) - T2: weights to optimize.

  • __group_259__gradients (optional) - T3: gradients computed in this iteration.

  • __group_259__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_259__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_259__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_260__weights (optional) - T2: weights to optimize.

  • __group_260__gradients (optional) - T3: gradients computed in this iteration.

  • __group_260__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_260__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_260__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_261__weights (optional) - T2: weights to optimize.

  • __group_261__gradients (optional) - T3: gradients computed in this iteration.

  • __group_261__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_261__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_261__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_262__weights (optional) - T2: weights to optimize.

  • __group_262__gradients (optional) - T3: gradients computed in this iteration.

  • __group_262__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_262__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_262__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_263__weights (optional) - T2: weights to optimize.

  • __group_263__gradients (optional) - T3: gradients computed in this iteration.

  • __group_263__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_263__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_263__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_264__weights (optional) - T2: weights to optimize.

  • __group_264__gradients (optional) - T3: gradients computed in this iteration.

  • __group_264__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_264__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_264__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_265__weights (optional) - T2: weights to optimize.

  • __group_265__gradients (optional) - T3: gradients computed in this iteration.

  • __group_265__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_265__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_265__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_266__weights (optional) - T2: weights to optimize.

  • __group_266__gradients (optional) - T3: gradients computed in this iteration.

  • __group_266__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_266__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_266__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_267__weights (optional) - T2: weights to optimize.

  • __group_267__gradients (optional) - T3: gradients computed in this iteration.

  • __group_267__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_267__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_267__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_268__weights (optional) - T2: weights to optimize.

  • __group_268__gradients (optional) - T3: gradients computed in this iteration.

  • __group_268__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_268__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_268__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_269__weights (optional) - T2: weights to optimize.

  • __group_269__gradients (optional) - T3: gradients computed in this iteration.

  • __group_269__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_269__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_269__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_270__weights (optional) - T2: weights to optimize.

  • __group_270__gradients (optional) - T3: gradients computed in this iteration.

  • __group_270__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_270__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_270__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_271__weights (optional) - T2: weights to optimize.

  • __group_271__gradients (optional) - T3: gradients computed in this iteration.

  • __group_271__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_271__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_271__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_272__weights (optional) - T2: weights to optimize.

  • __group_272__gradients (optional) - T3: gradients computed in this iteration.

  • __group_272__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_272__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_272__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_273__weights (optional) - T2: weights to optimize.

  • __group_273__gradients (optional) - T3: gradients computed in this iteration.

  • __group_273__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_273__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_273__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_274__weights (optional) - T2: weights to optimize.

  • __group_274__gradients (optional) - T3: gradients computed in this iteration.

  • __group_274__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_274__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_274__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_275__weights (optional) - T2: weights to optimize.

  • __group_275__gradients (optional) - T3: gradients computed in this iteration.

  • __group_275__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_275__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_275__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_276__weights (optional) - T2: weights to optimize.

  • __group_276__gradients (optional) - T3: gradients computed in this iteration.

  • __group_276__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_276__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_276__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_277__weights (optional) - T2: weights to optimize.

  • __group_277__gradients (optional) - T3: gradients computed in this iteration.

  • __group_277__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_277__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_277__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_278__weights (optional) - T2: weights to optimize.

  • __group_278__gradients (optional) - T3: gradients computed in this iteration.

  • __group_278__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_278__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_278__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_279__weights (optional) - T2: weights to optimize.

  • __group_279__gradients (optional) - T3: gradients computed in this iteration.

  • __group_279__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_279__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_279__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_280__weights (optional) - T2: weights to optimize.

  • __group_280__gradients (optional) - T3: gradients computed in this iteration.

  • __group_280__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_280__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_280__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_281__weights (optional) - T2: weights to optimize.

  • __group_281__gradients (optional) - T3: gradients computed in this iteration.

  • __group_281__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_281__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_281__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_282__weights (optional) - T2: weights to optimize.

  • __group_282__gradients (optional) - T3: gradients computed in this iteration.

  • __group_282__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_282__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_282__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_283__weights (optional) - T2: weights to optimize.

  • __group_283__gradients (optional) - T3: gradients computed in this iteration.

  • __group_283__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_283__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_283__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_284__weights (optional) - T2: weights to optimize.

  • __group_284__gradients (optional) - T3: gradients computed in this iteration.

  • __group_284__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_284__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_284__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_285__weights (optional) - T2: weights to optimize.

  • __group_285__gradients (optional) - T3: gradients computed in this iteration.

  • __group_285__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_285__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_285__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_286__weights (optional) - T2: weights to optimize.

  • __group_286__gradients (optional) - T3: gradients computed in this iteration.

  • __group_286__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_286__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_286__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_287__weights (optional) - T2: weights to optimize.

  • __group_287__gradients (optional) - T3: gradients computed in this iteration.

  • __group_287__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_287__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_287__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_288__weights (optional) - T2: weights to optimize.

  • __group_288__gradients (optional) - T3: gradients computed in this iteration.

  • __group_288__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_288__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_288__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_289__weights (optional) - T2: weights to optimize.

  • __group_289__gradients (optional) - T3: gradients computed in this iteration.

  • __group_289__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_289__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_289__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_290__weights (optional) - T2: weights to optimize.

  • __group_290__gradients (optional) - T3: gradients computed in this iteration.

  • __group_290__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_290__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_290__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_291__weights (optional) - T2: weights to optimize.

  • __group_291__gradients (optional) - T3: gradients computed in this iteration.

  • __group_291__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_291__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_291__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_292__weights (optional) - T2: weights to optimize.

  • __group_292__gradients (optional) - T3: gradients computed in this iteration.

  • __group_292__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_292__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_292__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_293__weights (optional) - T2: weights to optimize.

  • __group_293__gradients (optional) - T3: gradients computed in this iteration.

  • __group_293__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_293__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_293__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_294__weights (optional) - T2: weights to optimize.

  • __group_294__gradients (optional) - T3: gradients computed in this iteration.

  • __group_294__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_294__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_294__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_295__weights (optional) - T2: weights to optimize.

  • __group_295__gradients (optional) - T3: gradients computed in this iteration.

  • __group_295__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_295__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_295__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_296__weights (optional) - T2: weights to optimize.

  • __group_296__gradients (optional) - T3: gradients computed in this iteration.

  • __group_296__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_296__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_296__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_297__weights (optional) - T2: weights to optimize.

  • __group_297__gradients (optional) - T3: gradients computed in this iteration.

  • __group_297__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_297__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_297__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_298__weights (optional) - T2: weights to optimize.

  • __group_298__gradients (optional) - T3: gradients computed in this iteration.

  • __group_298__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_298__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_298__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_299__weights (optional) - T2: weights to optimize.

  • __group_299__gradients (optional) - T3: gradients computed in this iteration.

  • __group_299__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_299__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_299__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_300__weights (optional) - T2: weights to optimize.

  • __group_300__gradients (optional) - T3: gradients computed in this iteration.

  • __group_300__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_300__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_300__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_301__weights (optional) - T2: weights to optimize.

  • __group_301__gradients (optional) - T3: gradients computed in this iteration.

  • __group_301__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_301__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_301__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_302__weights (optional) - T2: weights to optimize.

  • __group_302__gradients (optional) - T3: gradients computed in this iteration.

  • __group_302__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_302__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_302__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_303__weights (optional) - T2: weights to optimize.

  • __group_303__gradients (optional) - T3: gradients computed in this iteration.

  • __group_303__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_303__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_303__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_304__weights (optional) - T2: weights to optimize.

  • __group_304__gradients (optional) - T3: gradients computed in this iteration.

  • __group_304__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_304__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_304__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_305__weights (optional) - T2: weights to optimize.

  • __group_305__gradients (optional) - T3: gradients computed in this iteration.

  • __group_305__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_305__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_305__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_306__weights (optional) - T2: weights to optimize.

  • __group_306__gradients (optional) - T3: gradients computed in this iteration.

  • __group_306__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_306__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_306__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_307__weights (optional) - T2: weights to optimize.

  • __group_307__gradients (optional) - T3: gradients computed in this iteration.

  • __group_307__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_307__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_307__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_308__weights (optional) - T2: weights to optimize.

  • __group_308__gradients (optional) - T3: gradients computed in this iteration.

  • __group_308__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_308__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_308__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_309__weights (optional) - T2: weights to optimize.

  • __group_309__gradients (optional) - T3: gradients computed in this iteration.

  • __group_309__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_309__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_309__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_310__weights (optional) - T2: weights to optimize.

  • __group_310__gradients (optional) - T3: gradients computed in this iteration.

  • __group_310__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_310__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_310__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_311__weights (optional) - T2: weights to optimize.

  • __group_311__gradients (optional) - T3: gradients computed in this iteration.

  • __group_311__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_311__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_311__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_312__weights (optional) - T2: weights to optimize.

  • __group_312__gradients (optional) - T3: gradients computed in this iteration.

  • __group_312__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_312__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_312__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_313__weights (optional) - T2: weights to optimize.

  • __group_313__gradients (optional) - T3: gradients computed in this iteration.

  • __group_313__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_313__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_313__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_314__weights (optional) - T2: weights to optimize.

  • __group_314__gradients (optional) - T3: gradients computed in this iteration.

  • __group_314__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_314__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_314__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_315__weights (optional) - T2: weights to optimize.

  • __group_315__gradients (optional) - T3: gradients computed in this iteration.

  • __group_315__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_315__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_315__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_316__weights (optional) - T2: weights to optimize.

  • __group_316__gradients (optional) - T3: gradients computed in this iteration.

  • __group_316__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_316__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_316__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_317__weights (optional) - T2: weights to optimize.

  • __group_317__gradients (optional) - T3: gradients computed in this iteration.

  • __group_317__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_317__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_317__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_318__weights (optional) - T2: weights to optimize.

  • __group_318__gradients (optional) - T3: gradients computed in this iteration.

  • __group_318__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_318__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_318__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_319__weights (optional) - T2: weights to optimize.

  • __group_319__gradients (optional) - T3: gradients computed in this iteration.

  • __group_319__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_319__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_319__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_320__weights (optional) - T2: weights to optimize.

  • __group_320__gradients (optional) - T3: gradients computed in this iteration.

  • __group_320__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_320__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_320__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_321__weights (optional) - T2: weights to optimize.

  • __group_321__gradients (optional) - T3: gradients computed in this iteration.

  • __group_321__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_321__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_321__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_322__weights (optional) - T2: weights to optimize.

  • __group_322__gradients (optional) - T3: gradients computed in this iteration.

  • __group_322__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_322__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_322__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_323__weights (optional) - T2: weights to optimize.

  • __group_323__gradients (optional) - T3: gradients computed in this iteration.

  • __group_323__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_323__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_323__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_324__weights (optional) - T2: weights to optimize.

  • __group_324__gradients (optional) - T3: gradients computed in this iteration.

  • __group_324__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_324__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_324__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_325__weights (optional) - T2: weights to optimize.

  • __group_325__gradients (optional) - T3: gradients computed in this iteration.

  • __group_325__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_325__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_325__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_326__weights (optional) - T2: weights to optimize.

  • __group_326__gradients (optional) - T3: gradients computed in this iteration.

  • __group_326__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_326__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_326__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_327__weights (optional) - T2: weights to optimize.

  • __group_327__gradients (optional) - T3: gradients computed in this iteration.

  • __group_327__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_327__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_327__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_328__weights (optional) - T2: weights to optimize.

  • __group_328__gradients (optional) - T3: gradients computed in this iteration.

  • __group_328__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_328__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_328__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_329__weights (optional) - T2: weights to optimize.

  • __group_329__gradients (optional) - T3: gradients computed in this iteration.

  • __group_329__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_329__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_329__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_330__weights (optional) - T2: weights to optimize.

  • __group_330__gradients (optional) - T3: gradients computed in this iteration.

  • __group_330__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_330__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_330__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_331__weights (optional) - T2: weights to optimize.

  • __group_331__gradients (optional) - T3: gradients computed in this iteration.

  • __group_331__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_331__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_331__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_332__weights (optional) - T2: weights to optimize.

  • __group_332__gradients (optional) - T3: gradients computed in this iteration.

  • __group_332__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_332__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_332__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_333__weights (optional) - T2: weights to optimize.

  • __group_333__gradients (optional) - T3: gradients computed in this iteration.

  • __group_333__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_333__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_333__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_334__weights (optional) - T2: weights to optimize.

  • __group_334__gradients (optional) - T3: gradients computed in this iteration.

  • __group_334__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_334__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_334__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_335__weights (optional) - T2: weights to optimize.

  • __group_335__gradients (optional) - T3: gradients computed in this iteration.

  • __group_335__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_335__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_335__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_336__weights (optional) - T2: weights to optimize.

  • __group_336__gradients (optional) - T3: gradients computed in this iteration.

  • __group_336__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_336__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_336__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_337__weights (optional) - T2: weights to optimize.

  • __group_337__gradients (optional) - T3: gradients computed in this iteration.

  • __group_337__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_337__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_337__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_338__weights (optional) - T2: weights to optimize.

  • __group_338__gradients (optional) - T3: gradients computed in this iteration.

  • __group_338__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_338__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_338__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_339__weights (optional) - T2: weights to optimize.

  • __group_339__gradients (optional) - T3: gradients computed in this iteration.

  • __group_339__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_339__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_339__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_340__weights (optional) - T2: weights to optimize.

  • __group_340__gradients (optional) - T3: gradients computed in this iteration.

  • __group_340__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_340__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_340__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_341__weights (optional) - T2: weights to optimize.

  • __group_341__gradients (optional) - T3: gradients computed in this iteration.

  • __group_341__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_341__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_341__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_342__weights (optional) - T2: weights to optimize.

  • __group_342__gradients (optional) - T3: gradients computed in this iteration.

  • __group_342__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_342__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_342__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_343__weights (optional) - T2: weights to optimize.

  • __group_343__gradients (optional) - T3: gradients computed in this iteration.

  • __group_343__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_343__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_343__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_344__weights (optional) - T2: weights to optimize.

  • __group_344__gradients (optional) - T3: gradients computed in this iteration.

  • __group_344__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_344__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_344__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_345__weights (optional) - T2: weights to optimize.

  • __group_345__gradients (optional) - T3: gradients computed in this iteration.

  • __group_345__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_345__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_345__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_346__weights (optional) - T2: weights to optimize.

  • __group_346__gradients (optional) - T3: gradients computed in this iteration.

  • __group_346__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_346__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_346__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_347__weights (optional) - T2: weights to optimize.

  • __group_347__gradients (optional) - T3: gradients computed in this iteration.

  • __group_347__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_347__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_347__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_348__weights (optional) - T2: weights to optimize.

  • __group_348__gradients (optional) - T3: gradients computed in this iteration.

  • __group_348__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_348__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_348__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_349__weights (optional) - T2: weights to optimize.

  • __group_349__gradients (optional) - T3: gradients computed in this iteration.

  • __group_349__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_349__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_349__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_350__weights (optional) - T2: weights to optimize.

  • __group_350__gradients (optional) - T3: gradients computed in this iteration.

  • __group_350__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_350__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_350__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_351__weights (optional) - T2: weights to optimize.

  • __group_351__gradients (optional) - T3: gradients computed in this iteration.

  • __group_351__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_351__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_351__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_352__weights (optional) - T2: weights to optimize.

  • __group_352__gradients (optional) - T3: gradients computed in this iteration.

  • __group_352__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_352__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_352__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_353__weights (optional) - T2: weights to optimize.

  • __group_353__gradients (optional) - T3: gradients computed in this iteration.

  • __group_353__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_353__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_353__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_354__weights (optional) - T2: weights to optimize.

  • __group_354__gradients (optional) - T3: gradients computed in this iteration.

  • __group_354__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_354__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_354__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_355__weights (optional) - T2: weights to optimize.

  • __group_355__gradients (optional) - T3: gradients computed in this iteration.

  • __group_355__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_355__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_355__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_356__weights (optional) - T2: weights to optimize.

  • __group_356__gradients (optional) - T3: gradients computed in this iteration.

  • __group_356__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_356__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_356__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_357__weights (optional) - T2: weights to optimize.

  • __group_357__gradients (optional) - T3: gradients computed in this iteration.

  • __group_357__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_357__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_357__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_358__weights (optional) - T2: weights to optimize.

  • __group_358__gradients (optional) - T3: gradients computed in this iteration.

  • __group_358__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_358__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_358__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_359__weights (optional) - T2: weights to optimize.

  • __group_359__gradients (optional) - T3: gradients computed in this iteration.

  • __group_359__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_359__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_359__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_360__weights (optional) - T2: weights to optimize.

  • __group_360__gradients (optional) - T3: gradients computed in this iteration.

  • __group_360__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_360__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_360__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_361__weights (optional) - T2: weights to optimize.

  • __group_361__gradients (optional) - T3: gradients computed in this iteration.

  • __group_361__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_361__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_361__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_362__weights (optional) - T2: weights to optimize.

  • __group_362__gradients (optional) - T3: gradients computed in this iteration.

  • __group_362__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_362__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_362__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_363__weights (optional) - T2: weights to optimize.

  • __group_363__gradients (optional) - T3: gradients computed in this iteration.

  • __group_363__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_363__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_363__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_364__weights (optional) - T2: weights to optimize.

  • __group_364__gradients (optional) - T3: gradients computed in this iteration.

  • __group_364__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_364__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_364__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_365__weights (optional) - T2: weights to optimize.

  • __group_365__gradients (optional) - T3: gradients computed in this iteration.

  • __group_365__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_365__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_365__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_366__weights (optional) - T2: weights to optimize.

  • __group_366__gradients (optional) - T3: gradients computed in this iteration.

  • __group_366__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_366__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_366__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_367__weights (optional) - T2: weights to optimize.

  • __group_367__gradients (optional) - T3: gradients computed in this iteration.

  • __group_367__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_367__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_367__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_368__weights (optional) - T2: weights to optimize.

  • __group_368__gradients (optional) - T3: gradients computed in this iteration.

  • __group_368__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_368__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_368__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_369__weights (optional) - T2: weights to optimize.

  • __group_369__gradients (optional) - T3: gradients computed in this iteration.

  • __group_369__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_369__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_369__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_370__weights (optional) - T2: weights to optimize.

  • __group_370__gradients (optional) - T3: gradients computed in this iteration.

  • __group_370__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_370__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_370__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_371__weights (optional) - T2: weights to optimize.

  • __group_371__gradients (optional) - T3: gradients computed in this iteration.

  • __group_371__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_371__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_371__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_372__weights (optional) - T2: weights to optimize.

  • __group_372__gradients (optional) - T3: gradients computed in this iteration.

  • __group_372__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_372__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_372__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_373__weights (optional) - T2: weights to optimize.

  • __group_373__gradients (optional) - T3: gradients computed in this iteration.

  • __group_373__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_373__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_373__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_374__weights (optional) - T2: weights to optimize.

  • __group_374__gradients (optional) - T3: gradients computed in this iteration.

  • __group_374__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_374__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_374__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_375__weights (optional) - T2: weights to optimize.

  • __group_375__gradients (optional) - T3: gradients computed in this iteration.

  • __group_375__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_375__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_375__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_376__weights (optional) - T2: weights to optimize.

  • __group_376__gradients (optional) - T3: gradients computed in this iteration.

  • __group_376__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_376__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_376__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_377__weights (optional) - T2: weights to optimize.

  • __group_377__gradients (optional) - T3: gradients computed in this iteration.

  • __group_377__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_377__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_377__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_378__weights (optional) - T2: weights to optimize.

  • __group_378__gradients (optional) - T3: gradients computed in this iteration.

  • __group_378__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_378__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_378__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_379__weights (optional) - T2: weights to optimize.

  • __group_379__gradients (optional) - T3: gradients computed in this iteration.

  • __group_379__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_379__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_379__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_380__weights (optional) - T2: weights to optimize.

  • __group_380__gradients (optional) - T3: gradients computed in this iteration.

  • __group_380__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_380__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_380__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_381__weights (optional) - T2: weights to optimize.

  • __group_381__gradients (optional) - T3: gradients computed in this iteration.

  • __group_381__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_381__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_381__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_382__weights (optional) - T2: weights to optimize.

  • __group_382__gradients (optional) - T3: gradients computed in this iteration.

  • __group_382__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_382__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_382__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_383__weights (optional) - T2: weights to optimize.

  • __group_383__gradients (optional) - T3: gradients computed in this iteration.

  • __group_383__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_383__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_383__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_384__weights (optional) - T2: weights to optimize.

  • __group_384__gradients (optional) - T3: gradients computed in this iteration.

  • __group_384__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_384__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_384__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_385__weights (optional) - T2: weights to optimize.

  • __group_385__gradients (optional) - T3: gradients computed in this iteration.

  • __group_385__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_385__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_385__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_386__weights (optional) - T2: weights to optimize.

  • __group_386__gradients (optional) - T3: gradients computed in this iteration.

  • __group_386__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_386__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_386__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_387__weights (optional) - T2: weights to optimize.

  • __group_387__gradients (optional) - T3: gradients computed in this iteration.

  • __group_387__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_387__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_387__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_388__weights (optional) - T2: weights to optimize.

  • __group_388__gradients (optional) - T3: gradients computed in this iteration.

  • __group_388__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_388__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_388__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_389__weights (optional) - T2: weights to optimize.

  • __group_389__gradients (optional) - T3: gradients computed in this iteration.

  • __group_389__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_389__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_389__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_390__weights (optional) - T2: weights to optimize.

  • __group_390__gradients (optional) - T3: gradients computed in this iteration.

  • __group_390__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_390__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_390__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_391__weights (optional) - T2: weights to optimize.

  • __group_391__gradients (optional) - T3: gradients computed in this iteration.

  • __group_391__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_391__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_391__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_392__weights (optional) - T2: weights to optimize.

  • __group_392__gradients (optional) - T3: gradients computed in this iteration.

  • __group_392__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_392__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_392__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_393__weights (optional) - T2: weights to optimize.

  • __group_393__gradients (optional) - T3: gradients computed in this iteration.

  • __group_393__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_393__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_393__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_394__weights (optional) - T2: weights to optimize.

  • __group_394__gradients (optional) - T3: gradients computed in this iteration.

  • __group_394__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_394__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_394__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_395__weights (optional) - T2: weights to optimize.

  • __group_395__gradients (optional) - T3: gradients computed in this iteration.

  • __group_395__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_395__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_395__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_396__weights (optional) - T2: weights to optimize.

  • __group_396__gradients (optional) - T3: gradients computed in this iteration.

  • __group_396__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_396__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_396__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_397__weights (optional) - T2: weights to optimize.

  • __group_397__gradients (optional) - T3: gradients computed in this iteration.

  • __group_397__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_397__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_397__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_398__weights (optional) - T2: weights to optimize.

  • __group_398__gradients (optional) - T3: gradients computed in this iteration.

  • __group_398__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_398__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_398__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_399__weights (optional) - T2: weights to optimize.

  • __group_399__gradients (optional) - T3: gradients computed in this iteration.

  • __group_399__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_399__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_399__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_400__weights (optional) - T2: weights to optimize.

  • __group_400__gradients (optional) - T3: gradients computed in this iteration.

  • __group_400__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_400__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_400__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_401__weights (optional) - T2: weights to optimize.

  • __group_401__gradients (optional) - T3: gradients computed in this iteration.

  • __group_401__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_401__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_401__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_402__weights (optional) - T2: weights to optimize.

  • __group_402__gradients (optional) - T3: gradients computed in this iteration.

  • __group_402__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_402__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_402__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_403__weights (optional) - T2: weights to optimize.

  • __group_403__gradients (optional) - T3: gradients computed in this iteration.

  • __group_403__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_403__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_403__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_404__weights (optional) - T2: weights to optimize.

  • __group_404__gradients (optional) - T3: gradients computed in this iteration.

  • __group_404__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_404__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_404__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_405__weights (optional) - T2: weights to optimize.

  • __group_405__gradients (optional) - T3: gradients computed in this iteration.

  • __group_405__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_405__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_405__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_406__weights (optional) - T2: weights to optimize.

  • __group_406__gradients (optional) - T3: gradients computed in this iteration.

  • __group_406__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_406__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_406__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_407__weights (optional) - T2: weights to optimize.

  • __group_407__gradients (optional) - T3: gradients computed in this iteration.

  • __group_407__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_407__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_407__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_408__weights (optional) - T2: weights to optimize.

  • __group_408__gradients (optional) - T3: gradients computed in this iteration.

  • __group_408__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_408__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_408__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_409__weights (optional) - T2: weights to optimize.

  • __group_409__gradients (optional) - T3: gradients computed in this iteration.

  • __group_409__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_409__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_409__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_410__weights (optional) - T2: weights to optimize.

  • __group_410__gradients (optional) - T3: gradients computed in this iteration.

  • __group_410__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_410__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_410__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_411__weights (optional) - T2: weights to optimize.

  • __group_411__gradients (optional) - T3: gradients computed in this iteration.

  • __group_411__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_411__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_411__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_412__weights (optional) - T2: weights to optimize.

  • __group_412__gradients (optional) - T3: gradients computed in this iteration.

  • __group_412__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_412__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_412__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_413__weights (optional) - T2: weights to optimize.

  • __group_413__gradients (optional) - T3: gradients computed in this iteration.

  • __group_413__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_413__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_413__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_414__weights (optional) - T2: weights to optimize.

  • __group_414__gradients (optional) - T3: gradients computed in this iteration.

  • __group_414__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_414__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_414__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_415__weights (optional) - T2: weights to optimize.

  • __group_415__gradients (optional) - T3: gradients computed in this iteration.

  • __group_415__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_415__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_415__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_416__weights (optional) - T2: weights to optimize.

  • __group_416__gradients (optional) - T3: gradients computed in this iteration.

  • __group_416__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_416__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_416__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_417__weights (optional) - T2: weights to optimize.

  • __group_417__gradients (optional) - T3: gradients computed in this iteration.

  • __group_417__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_417__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_417__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_418__weights (optional) - T2: weights to optimize.

  • __group_418__gradients (optional) - T3: gradients computed in this iteration.

  • __group_418__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_418__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_418__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_419__weights (optional) - T2: weights to optimize.

  • __group_419__gradients (optional) - T3: gradients computed in this iteration.

  • __group_419__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_419__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_419__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_420__weights (optional) - T2: weights to optimize.

  • __group_420__gradients (optional) - T3: gradients computed in this iteration.

  • __group_420__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_420__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_420__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_421__weights (optional) - T2: weights to optimize.

  • __group_421__gradients (optional) - T3: gradients computed in this iteration.

  • __group_421__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_421__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_421__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_422__weights (optional) - T2: weights to optimize.

  • __group_422__gradients (optional) - T3: gradients computed in this iteration.

  • __group_422__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_422__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_422__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_423__weights (optional) - T2: weights to optimize.

  • __group_423__gradients (optional) - T3: gradients computed in this iteration.

  • __group_423__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_423__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_423__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_424__weights (optional) - T2: weights to optimize.

  • __group_424__gradients (optional) - T3: gradients computed in this iteration.

  • __group_424__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_424__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_424__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_425__weights (optional) - T2: weights to optimize.

  • __group_425__gradients (optional) - T3: gradients computed in this iteration.

  • __group_425__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_425__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_425__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_426__weights (optional) - T2: weights to optimize.

  • __group_426__gradients (optional) - T3: gradients computed in this iteration.

  • __group_426__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_426__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_426__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_427__weights (optional) - T2: weights to optimize.

  • __group_427__gradients (optional) - T3: gradients computed in this iteration.

  • __group_427__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_427__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_427__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_428__weights (optional) - T2: weights to optimize.

  • __group_428__gradients (optional) - T3: gradients computed in this iteration.

  • __group_428__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_428__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_428__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_429__weights (optional) - T2: weights to optimize.

  • __group_429__gradients (optional) - T3: gradients computed in this iteration.

  • __group_429__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_429__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_429__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_430__weights (optional) - T2: weights to optimize.

  • __group_430__gradients (optional) - T3: gradients computed in this iteration.

  • __group_430__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_430__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_430__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_431__weights (optional) - T2: weights to optimize.

  • __group_431__gradients (optional) - T3: gradients computed in this iteration.

  • __group_431__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_431__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_431__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_432__weights (optional) - T2: weights to optimize.

  • __group_432__gradients (optional) - T3: gradients computed in this iteration.

  • __group_432__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_432__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_432__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_433__weights (optional) - T2: weights to optimize.

  • __group_433__gradients (optional) - T3: gradients computed in this iteration.

  • __group_433__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_433__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_433__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_434__weights (optional) - T2: weights to optimize.

  • __group_434__gradients (optional) - T3: gradients computed in this iteration.

  • __group_434__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_434__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_434__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_435__weights (optional) - T2: weights to optimize.

  • __group_435__gradients (optional) - T3: gradients computed in this iteration.

  • __group_435__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_435__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_435__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_436__weights (optional) - T2: weights to optimize.

  • __group_436__gradients (optional) - T3: gradients computed in this iteration.

  • __group_436__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_436__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_436__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_437__weights (optional) - T2: weights to optimize.

  • __group_437__gradients (optional) - T3: gradients computed in this iteration.

  • __group_437__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_437__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_437__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_438__weights (optional) - T2: weights to optimize.

  • __group_438__gradients (optional) - T3: gradients computed in this iteration.

  • __group_438__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_438__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_438__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_439__weights (optional) - T2: weights to optimize.

  • __group_439__gradients (optional) - T3: gradients computed in this iteration.

  • __group_439__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_439__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_439__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_440__weights (optional) - T2: weights to optimize.

  • __group_440__gradients (optional) - T3: gradients computed in this iteration.

  • __group_440__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_440__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_440__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_441__weights (optional) - T2: weights to optimize.

  • __group_441__gradients (optional) - T3: gradients computed in this iteration.

  • __group_441__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_441__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_441__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_442__weights (optional) - T2: weights to optimize.

  • __group_442__gradients (optional) - T3: gradients computed in this iteration.

  • __group_442__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_442__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_442__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_443__weights (optional) - T2: weights to optimize.

  • __group_443__gradients (optional) - T3: gradients computed in this iteration.

  • __group_443__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_443__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_443__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_444__weights (optional) - T2: weights to optimize.

  • __group_444__gradients (optional) - T3: gradients computed in this iteration.

  • __group_444__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_444__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_444__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_445__weights (optional) - T2: weights to optimize.

  • __group_445__gradients (optional) - T3: gradients computed in this iteration.

  • __group_445__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_445__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_445__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_446__weights (optional) - T2: weights to optimize.

  • __group_446__gradients (optional) - T3: gradients computed in this iteration.

  • __group_446__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_446__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_446__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_447__weights (optional) - T2: weights to optimize.

  • __group_447__gradients (optional) - T3: gradients computed in this iteration.

  • __group_447__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_447__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_447__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_448__weights (optional) - T2: weights to optimize.

  • __group_448__gradients (optional) - T3: gradients computed in this iteration.

  • __group_448__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_448__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_448__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_449__weights (optional) - T2: weights to optimize.

  • __group_449__gradients (optional) - T3: gradients computed in this iteration.

  • __group_449__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_449__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_449__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_450__weights (optional) - T2: weights to optimize.

  • __group_450__gradients (optional) - T3: gradients computed in this iteration.

  • __group_450__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_450__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_450__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_451__weights (optional) - T2: weights to optimize.

  • __group_451__gradients (optional) - T3: gradients computed in this iteration.

  • __group_451__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_451__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_451__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_452__weights (optional) - T2: weights to optimize.

  • __group_452__gradients (optional) - T3: gradients computed in this iteration.

  • __group_452__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_452__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_452__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_453__weights (optional) - T2: weights to optimize.

  • __group_453__gradients (optional) - T3: gradients computed in this iteration.

  • __group_453__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_453__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_453__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_454__weights (optional) - T2: weights to optimize.

  • __group_454__gradients (optional) - T3: gradients computed in this iteration.

  • __group_454__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_454__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_454__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_455__weights (optional) - T2: weights to optimize.

  • __group_455__gradients (optional) - T3: gradients computed in this iteration.

  • __group_455__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_455__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_455__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_456__weights (optional) - T2: weights to optimize.

  • __group_456__gradients (optional) - T3: gradients computed in this iteration.

  • __group_456__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_456__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_456__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_457__weights (optional) - T2: weights to optimize.

  • __group_457__gradients (optional) - T3: gradients computed in this iteration.

  • __group_457__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_457__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_457__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_458__weights (optional) - T2: weights to optimize.

  • __group_458__gradients (optional) - T3: gradients computed in this iteration.

  • __group_458__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_458__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_458__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_459__weights (optional) - T2: weights to optimize.

  • __group_459__gradients (optional) - T3: gradients computed in this iteration.

  • __group_459__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_459__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_459__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_460__weights (optional) - T2: weights to optimize.

  • __group_460__gradients (optional) - T3: gradients computed in this iteration.

  • __group_460__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_460__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_460__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_461__weights (optional) - T2: weights to optimize.

  • __group_461__gradients (optional) - T3: gradients computed in this iteration.

  • __group_461__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_461__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_461__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_462__weights (optional) - T2: weights to optimize.

  • __group_462__gradients (optional) - T3: gradients computed in this iteration.

  • __group_462__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_462__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_462__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_463__weights (optional) - T2: weights to optimize.

  • __group_463__gradients (optional) - T3: gradients computed in this iteration.

  • __group_463__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_463__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_463__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_464__weights (optional) - T2: weights to optimize.

  • __group_464__gradients (optional) - T3: gradients computed in this iteration.

  • __group_464__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_464__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_464__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_465__weights (optional) - T2: weights to optimize.

  • __group_465__gradients (optional) - T3: gradients computed in this iteration.

  • __group_465__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_465__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_465__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_466__weights (optional) - T2: weights to optimize.

  • __group_466__gradients (optional) - T3: gradients computed in this iteration.

  • __group_466__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_466__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_466__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_467__weights (optional) - T2: weights to optimize.

  • __group_467__gradients (optional) - T3: gradients computed in this iteration.

  • __group_467__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_467__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_467__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_468__weights (optional) - T2: weights to optimize.

  • __group_468__gradients (optional) - T3: gradients computed in this iteration.

  • __group_468__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_468__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_468__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_469__weights (optional) - T2: weights to optimize.

  • __group_469__gradients (optional) - T3: gradients computed in this iteration.

  • __group_469__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_469__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_469__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_470__weights (optional) - T2: weights to optimize.

  • __group_470__gradients (optional) - T3: gradients computed in this iteration.

  • __group_470__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_470__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_470__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_471__weights (optional) - T2: weights to optimize.

  • __group_471__gradients (optional) - T3: gradients computed in this iteration.

  • __group_471__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_471__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_471__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_472__weights (optional) - T2: weights to optimize.

  • __group_472__gradients (optional) - T3: gradients computed in this iteration.

  • __group_472__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_472__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_472__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_473__weights (optional) - T2: weights to optimize.

  • __group_473__gradients (optional) - T3: gradients computed in this iteration.

  • __group_473__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_473__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_473__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_474__weights (optional) - T2: weights to optimize.

  • __group_474__gradients (optional) - T3: gradients computed in this iteration.

  • __group_474__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_474__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_474__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_475__weights (optional) - T2: weights to optimize.

  • __group_475__gradients (optional) - T3: gradients computed in this iteration.

  • __group_475__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_475__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_475__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_476__weights (optional) - T2: weights to optimize.

  • __group_476__gradients (optional) - T3: gradients computed in this iteration.

  • __group_476__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_476__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_476__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_477__weights (optional) - T2: weights to optimize.

  • __group_477__gradients (optional) - T3: gradients computed in this iteration.

  • __group_477__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_477__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_477__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_478__weights (optional) - T2: weights to optimize.

  • __group_478__gradients (optional) - T3: gradients computed in this iteration.

  • __group_478__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_478__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_478__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_479__weights (optional) - T2: weights to optimize.

  • __group_479__gradients (optional) - T3: gradients computed in this iteration.

  • __group_479__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_479__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_479__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_480__weights (optional) - T2: weights to optimize.

  • __group_480__gradients (optional) - T3: gradients computed in this iteration.

  • __group_480__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_480__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_480__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_481__weights (optional) - T2: weights to optimize.

  • __group_481__gradients (optional) - T3: gradients computed in this iteration.

  • __group_481__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_481__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_481__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_482__weights (optional) - T2: weights to optimize.

  • __group_482__gradients (optional) - T3: gradients computed in this iteration.

  • __group_482__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_482__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_482__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_483__weights (optional) - T2: weights to optimize.

  • __group_483__gradients (optional) - T3: gradients computed in this iteration.

  • __group_483__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_483__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_483__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_484__weights (optional) - T2: weights to optimize.

  • __group_484__gradients (optional) - T3: gradients computed in this iteration.

  • __group_484__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_484__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_484__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_485__weights (optional) - T2: weights to optimize.

  • __group_485__gradients (optional) - T3: gradients computed in this iteration.

  • __group_485__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_485__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_485__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_486__weights (optional) - T2: weights to optimize.

  • __group_486__gradients (optional) - T3: gradients computed in this iteration.

  • __group_486__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_486__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_486__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_487__weights (optional) - T2: weights to optimize.

  • __group_487__gradients (optional) - T3: gradients computed in this iteration.

  • __group_487__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_487__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_487__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_488__weights (optional) - T2: weights to optimize.

  • __group_488__gradients (optional) - T3: gradients computed in this iteration.

  • __group_488__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_488__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_488__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_489__weights (optional) - T2: weights to optimize.

  • __group_489__gradients (optional) - T3: gradients computed in this iteration.

  • __group_489__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_489__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_489__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_490__weights (optional) - T2: weights to optimize.

  • __group_490__gradients (optional) - T3: gradients computed in this iteration.

  • __group_490__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_490__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_490__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_491__weights (optional) - T2: weights to optimize.

  • __group_491__gradients (optional) - T3: gradients computed in this iteration.

  • __group_491__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_491__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_491__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_492__weights (optional) - T2: weights to optimize.

  • __group_492__gradients (optional) - T3: gradients computed in this iteration.

  • __group_492__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_492__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_492__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_493__weights (optional) - T2: weights to optimize.

  • __group_493__gradients (optional) - T3: gradients computed in this iteration.

  • __group_493__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_493__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_493__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_494__weights (optional) - T2: weights to optimize.

  • __group_494__gradients (optional) - T3: gradients computed in this iteration.

  • __group_494__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_494__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_494__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_495__weights (optional) - T2: weights to optimize.

  • __group_495__gradients (optional) - T3: gradients computed in this iteration.

  • __group_495__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_495__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_495__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_496__weights (optional) - T2: weights to optimize.

  • __group_496__gradients (optional) - T3: gradients computed in this iteration.

  • __group_496__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_496__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_496__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_497__weights (optional) - T2: weights to optimize.

  • __group_497__gradients (optional) - T3: gradients computed in this iteration.

  • __group_497__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_497__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_497__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_498__weights (optional) - T2: weights to optimize.

  • __group_498__gradients (optional) - T3: gradients computed in this iteration.

  • __group_498__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_498__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_498__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_499__weights (optional) - T2: weights to optimize.

  • __group_499__gradients (optional) - T3: gradients computed in this iteration.

  • __group_499__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_499__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_499__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_500__weights (optional) - T2: weights to optimize.

  • __group_500__gradients (optional) - T3: gradients computed in this iteration.

  • __group_500__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_500__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_500__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_501__weights (optional) - T2: weights to optimize.

  • __group_501__gradients (optional) - T3: gradients computed in this iteration.

  • __group_501__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_501__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_501__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_502__weights (optional) - T2: weights to optimize.

  • __group_502__gradients (optional) - T3: gradients computed in this iteration.

  • __group_502__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_502__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_502__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_503__weights (optional) - T2: weights to optimize.

  • __group_503__gradients (optional) - T3: gradients computed in this iteration.

  • __group_503__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_503__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_503__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_504__weights (optional) - T2: weights to optimize.

  • __group_504__gradients (optional) - T3: gradients computed in this iteration.

  • __group_504__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_504__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_504__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_505__weights (optional) - T2: weights to optimize.

  • __group_505__gradients (optional) - T3: gradients computed in this iteration.

  • __group_505__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_505__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_505__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_506__weights (optional) - T2: weights to optimize.

  • __group_506__gradients (optional) - T3: gradients computed in this iteration.

  • __group_506__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_506__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_506__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_507__weights (optional) - T2: weights to optimize.

  • __group_507__gradients (optional) - T3: gradients computed in this iteration.

  • __group_507__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_507__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_507__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_508__weights (optional) - T2: weights to optimize.

  • __group_508__gradients (optional) - T3: gradients computed in this iteration.

  • __group_508__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_508__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_508__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_509__weights (optional) - T2: weights to optimize.

  • __group_509__gradients (optional) - T3: gradients computed in this iteration.

  • __group_509__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_509__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_509__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_510__weights (optional) - T2: weights to optimize.

  • __group_510__gradients (optional) - T3: gradients computed in this iteration.

  • __group_510__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_510__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_510__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_511__weights (optional) - T2: weights to optimize.

  • __group_511__gradients (optional) - T3: gradients computed in this iteration.

  • __group_511__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_511__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_511__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_512__weights (optional) - T2: weights to optimize.

  • __group_512__gradients (optional) - T3: gradients computed in this iteration.

  • __group_512__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_512__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_512__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_513__weights (optional) - T2: weights to optimize.

  • __group_513__gradients (optional) - T3: gradients computed in this iteration.

  • __group_513__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_513__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_513__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_514__weights (optional) - T2: weights to optimize.

  • __group_514__gradients (optional) - T3: gradients computed in this iteration.

  • __group_514__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_514__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_514__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_515__weights (optional) - T2: weights to optimize.

  • __group_515__gradients (optional) - T3: gradients computed in this iteration.

  • __group_515__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_515__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_515__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_516__weights (optional) - T2: weights to optimize.

  • __group_516__gradients (optional) - T3: gradients computed in this iteration.

  • __group_516__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_516__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_516__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_517__weights (optional) - T2: weights to optimize.

  • __group_517__gradients (optional) - T3: gradients computed in this iteration.

  • __group_517__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_517__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_517__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_518__weights (optional) - T2: weights to optimize.

  • __group_518__gradients (optional) - T3: gradients computed in this iteration.

  • __group_518__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_518__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_518__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_519__weights (optional) - T2: weights to optimize.

  • __group_519__gradients (optional) - T3: gradients computed in this iteration.

  • __group_519__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_519__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_519__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_520__weights (optional) - T2: weights to optimize.

  • __group_520__gradients (optional) - T3: gradients computed in this iteration.

  • __group_520__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_520__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_520__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_521__weights (optional) - T2: weights to optimize.

  • __group_521__gradients (optional) - T3: gradients computed in this iteration.

  • __group_521__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_521__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_521__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_522__weights (optional) - T2: weights to optimize.

  • __group_522__gradients (optional) - T3: gradients computed in this iteration.

  • __group_522__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_522__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_522__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_523__weights (optional) - T2: weights to optimize.

  • __group_523__gradients (optional) - T3: gradients computed in this iteration.

  • __group_523__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_523__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_523__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_524__weights (optional) - T2: weights to optimize.

  • __group_524__gradients (optional) - T3: gradients computed in this iteration.

  • __group_524__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_524__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_524__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_525__weights (optional) - T2: weights to optimize.

  • __group_525__gradients (optional) - T3: gradients computed in this iteration.

  • __group_525__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_525__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_525__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_526__weights (optional) - T2: weights to optimize.

  • __group_526__gradients (optional) - T3: gradients computed in this iteration.

  • __group_526__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_526__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_526__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_527__weights (optional) - T2: weights to optimize.

  • __group_527__gradients (optional) - T3: gradients computed in this iteration.

  • __group_527__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_527__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_527__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_528__weights (optional) - T2: weights to optimize.

  • __group_528__gradients (optional) - T3: gradients computed in this iteration.

  • __group_528__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_528__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_528__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_529__weights (optional) - T2: weights to optimize.

  • __group_529__gradients (optional) - T3: gradients computed in this iteration.

  • __group_529__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_529__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_529__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_530__weights (optional) - T2: weights to optimize.

  • __group_530__gradients (optional) - T3: gradients computed in this iteration.

  • __group_530__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_530__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_530__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_531__weights (optional) - T2: weights to optimize.

  • __group_531__gradients (optional) - T3: gradients computed in this iteration.

  • __group_531__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_531__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_531__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_532__weights (optional) - T2: weights to optimize.

  • __group_532__gradients (optional) - T3: gradients computed in this iteration.

  • __group_532__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_532__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_532__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_533__weights (optional) - T2: weights to optimize.

  • __group_533__gradients (optional) - T3: gradients computed in this iteration.

  • __group_533__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_533__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_533__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_534__weights (optional) - T2: weights to optimize.

  • __group_534__gradients (optional) - T3: gradients computed in this iteration.

  • __group_534__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_534__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_534__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_535__weights (optional) - T2: weights to optimize.

  • __group_535__gradients (optional) - T3: gradients computed in this iteration.

  • __group_535__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_535__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_535__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_536__weights (optional) - T2: weights to optimize.

  • __group_536__gradients (optional) - T3: gradients computed in this iteration.

  • __group_536__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_536__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_536__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_537__weights (optional) - T2: weights to optimize.

  • __group_537__gradients (optional) - T3: gradients computed in this iteration.

  • __group_537__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_537__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_537__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_538__weights (optional) - T2: weights to optimize.

  • __group_538__gradients (optional) - T3: gradients computed in this iteration.

  • __group_538__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_538__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_538__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_539__weights (optional) - T2: weights to optimize.

  • __group_539__gradients (optional) - T3: gradients computed in this iteration.

  • __group_539__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_539__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_539__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_540__weights (optional) - T2: weights to optimize.

  • __group_540__gradients (optional) - T3: gradients computed in this iteration.

  • __group_540__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_540__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_540__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_541__weights (optional) - T2: weights to optimize.

  • __group_541__gradients (optional) - T3: gradients computed in this iteration.

  • __group_541__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_541__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_541__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_542__weights (optional) - T2: weights to optimize.

  • __group_542__gradients (optional) - T3: gradients computed in this iteration.

  • __group_542__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_542__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_542__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_543__weights (optional) - T2: weights to optimize.

  • __group_543__gradients (optional) - T3: gradients computed in this iteration.

  • __group_543__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_543__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_543__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_544__weights (optional) - T2: weights to optimize.

  • __group_544__gradients (optional) - T3: gradients computed in this iteration.

  • __group_544__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_544__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_544__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_545__weights (optional) - T2: weights to optimize.

  • __group_545__gradients (optional) - T3: gradients computed in this iteration.

  • __group_545__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_545__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_545__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_546__weights (optional) - T2: weights to optimize.

  • __group_546__gradients (optional) - T3: gradients computed in this iteration.

  • __group_546__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_546__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_546__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_547__weights (optional) - T2: weights to optimize.

  • __group_547__gradients (optional) - T3: gradients computed in this iteration.

  • __group_547__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_547__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_547__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_548__weights (optional) - T2: weights to optimize.

  • __group_548__gradients (optional) - T3: gradients computed in this iteration.

  • __group_548__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_548__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_548__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_549__weights (optional) - T2: weights to optimize.

  • __group_549__gradients (optional) - T3: gradients computed in this iteration.

  • __group_549__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_549__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_549__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_550__weights (optional) - T2: weights to optimize.

  • __group_550__gradients (optional) - T3: gradients computed in this iteration.

  • __group_550__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_550__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_550__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_551__weights (optional) - T2: weights to optimize.

  • __group_551__gradients (optional) - T3: gradients computed in this iteration.

  • __group_551__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_551__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_551__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_552__weights (optional) - T2: weights to optimize.

  • __group_552__gradients (optional) - T3: gradients computed in this iteration.

  • __group_552__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_552__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_552__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_553__weights (optional) - T2: weights to optimize.

  • __group_553__gradients (optional) - T3: gradients computed in this iteration.

  • __group_553__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_553__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_553__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_554__weights (optional) - T2: weights to optimize.

  • __group_554__gradients (optional) - T3: gradients computed in this iteration.

  • __group_554__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_554__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_554__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_555__weights (optional) - T2: weights to optimize.

  • __group_555__gradients (optional) - T3: gradients computed in this iteration.

  • __group_555__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_555__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_555__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_556__weights (optional) - T2: weights to optimize.

  • __group_556__gradients (optional) - T3: gradients computed in this iteration.

  • __group_556__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_556__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_556__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_557__weights (optional) - T2: weights to optimize.

  • __group_557__gradients (optional) - T3: gradients computed in this iteration.

  • __group_557__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_557__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_557__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_558__weights (optional) - T2: weights to optimize.

  • __group_558__gradients (optional) - T3: gradients computed in this iteration.

  • __group_558__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_558__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_558__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_559__weights (optional) - T2: weights to optimize.

  • __group_559__gradients (optional) - T3: gradients computed in this iteration.

  • __group_559__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_559__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_559__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_560__weights (optional) - T2: weights to optimize.

  • __group_560__gradients (optional) - T3: gradients computed in this iteration.

  • __group_560__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_560__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_560__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_561__weights (optional) - T2: weights to optimize.

  • __group_561__gradients (optional) - T3: gradients computed in this iteration.

  • __group_561__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_561__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_561__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_562__weights (optional) - T2: weights to optimize.

  • __group_562__gradients (optional) - T3: gradients computed in this iteration.

  • __group_562__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_562__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_562__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_563__weights (optional) - T2: weights to optimize.

  • __group_563__gradients (optional) - T3: gradients computed in this iteration.

  • __group_563__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_563__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_563__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_564__weights (optional) - T2: weights to optimize.

  • __group_564__gradients (optional) - T3: gradients computed in this iteration.

  • __group_564__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_564__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_564__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_565__weights (optional) - T2: weights to optimize.

  • __group_565__gradients (optional) - T3: gradients computed in this iteration.

  • __group_565__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_565__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_565__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_566__weights (optional) - T2: weights to optimize.

  • __group_566__gradients (optional) - T3: gradients computed in this iteration.

  • __group_566__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_566__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_566__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_567__weights (optional) - T2: weights to optimize.

  • __group_567__gradients (optional) - T3: gradients computed in this iteration.

  • __group_567__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_567__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_567__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_568__weights (optional) - T2: weights to optimize.

  • __group_568__gradients (optional) - T3: gradients computed in this iteration.

  • __group_568__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_568__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_568__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_569__weights (optional) - T2: weights to optimize.

  • __group_569__gradients (optional) - T3: gradients computed in this iteration.

  • __group_569__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_569__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_569__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_570__weights (optional) - T2: weights to optimize.

  • __group_570__gradients (optional) - T3: gradients computed in this iteration.

  • __group_570__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_570__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_570__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_571__weights (optional) - T2: weights to optimize.

  • __group_571__gradients (optional) - T3: gradients computed in this iteration.

  • __group_571__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_571__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_571__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_572__weights (optional) - T2: weights to optimize.

  • __group_572__gradients (optional) - T3: gradients computed in this iteration.

  • __group_572__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_572__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_572__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_573__weights (optional) - T2: weights to optimize.

  • __group_573__gradients (optional) - T3: gradients computed in this iteration.

  • __group_573__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_573__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_573__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_574__weights (optional) - T2: weights to optimize.

  • __group_574__gradients (optional) - T3: gradients computed in this iteration.

  • __group_574__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_574__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_574__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_575__weights (optional) - T2: weights to optimize.

  • __group_575__gradients (optional) - T3: gradients computed in this iteration.

  • __group_575__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_575__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_575__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_576__weights (optional) - T2: weights to optimize.

  • __group_576__gradients (optional) - T3: gradients computed in this iteration.

  • __group_576__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_576__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_576__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_577__weights (optional) - T2: weights to optimize.

  • __group_577__gradients (optional) - T3: gradients computed in this iteration.

  • __group_577__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_577__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_577__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_578__weights (optional) - T2: weights to optimize.

  • __group_578__gradients (optional) - T3: gradients computed in this iteration.

  • __group_578__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_578__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_578__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_579__weights (optional) - T2: weights to optimize.

  • __group_579__gradients (optional) - T3: gradients computed in this iteration.

  • __group_579__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_579__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_579__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_580__weights (optional) - T2: weights to optimize.

  • __group_580__gradients (optional) - T3: gradients computed in this iteration.

  • __group_580__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_580__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_580__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_581__weights (optional) - T2: weights to optimize.

  • __group_581__gradients (optional) - T3: gradients computed in this iteration.

  • __group_581__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_581__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_581__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_582__weights (optional) - T2: weights to optimize.

  • __group_582__gradients (optional) - T3: gradients computed in this iteration.

  • __group_582__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_582__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_582__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_583__weights (optional) - T2: weights to optimize.

  • __group_583__gradients (optional) - T3: gradients computed in this iteration.

  • __group_583__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_583__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_583__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_584__weights (optional) - T2: weights to optimize.

  • __group_584__gradients (optional) - T3: gradients computed in this iteration.

  • __group_584__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_584__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_584__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_585__weights (optional) - T2: weights to optimize.

  • __group_585__gradients (optional) - T3: gradients computed in this iteration.

  • __group_585__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_585__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_585__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_586__weights (optional) - T2: weights to optimize.

  • __group_586__gradients (optional) - T3: gradients computed in this iteration.

  • __group_586__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_586__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_586__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_587__weights (optional) - T2: weights to optimize.

  • __group_587__gradients (optional) - T3: gradients computed in this iteration.

  • __group_587__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_587__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_587__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_588__weights (optional) - T2: weights to optimize.

  • __group_588__gradients (optional) - T3: gradients computed in this iteration.

  • __group_588__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_588__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_588__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_589__weights (optional) - T2: weights to optimize.

  • __group_589__gradients (optional) - T3: gradients computed in this iteration.

  • __group_589__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_589__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_589__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_590__weights (optional) - T2: weights to optimize.

  • __group_590__gradients (optional) - T3: gradients computed in this iteration.

  • __group_590__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_590__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_590__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_591__weights (optional) - T2: weights to optimize.

  • __group_591__gradients (optional) - T3: gradients computed in this iteration.

  • __group_591__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_591__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_591__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_592__weights (optional) - T2: weights to optimize.

  • __group_592__gradients (optional) - T3: gradients computed in this iteration.

  • __group_592__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_592__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_592__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_593__weights (optional) - T2: weights to optimize.

  • __group_593__gradients (optional) - T3: gradients computed in this iteration.

  • __group_593__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_593__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_593__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_594__weights (optional) - T2: weights to optimize.

  • __group_594__gradients (optional) - T3: gradients computed in this iteration.

  • __group_594__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_594__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_594__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_595__weights (optional) - T2: weights to optimize.

  • __group_595__gradients (optional) - T3: gradients computed in this iteration.

  • __group_595__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_595__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_595__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_596__weights (optional) - T2: weights to optimize.

  • __group_596__gradients (optional) - T3: gradients computed in this iteration.

  • __group_596__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_596__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_596__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_597__weights (optional) - T2: weights to optimize.

  • __group_597__gradients (optional) - T3: gradients computed in this iteration.

  • __group_597__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_597__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_597__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_598__weights (optional) - T2: weights to optimize.

  • __group_598__gradients (optional) - T3: gradients computed in this iteration.

  • __group_598__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_598__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_598__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_599__weights (optional) - T2: weights to optimize.

  • __group_599__gradients (optional) - T3: gradients computed in this iteration.

  • __group_599__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_599__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_599__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_600__weights (optional) - T2: weights to optimize.

  • __group_600__gradients (optional) - T3: gradients computed in this iteration.

  • __group_600__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_600__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_600__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_601__weights (optional) - T2: weights to optimize.

  • __group_601__gradients (optional) - T3: gradients computed in this iteration.

  • __group_601__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_601__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_601__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_602__weights (optional) - T2: weights to optimize.

  • __group_602__gradients (optional) - T3: gradients computed in this iteration.

  • __group_602__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_602__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_602__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_603__weights (optional) - T2: weights to optimize.

  • __group_603__gradients (optional) - T3: gradients computed in this iteration.

  • __group_603__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_603__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_603__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_604__weights (optional) - T2: weights to optimize.

  • __group_604__gradients (optional) - T3: gradients computed in this iteration.

  • __group_604__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_604__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_604__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_605__weights (optional) - T2: weights to optimize.

  • __group_605__gradients (optional) - T3: gradients computed in this iteration.

  • __group_605__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_605__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_605__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_606__weights (optional) - T2: weights to optimize.

  • __group_606__gradients (optional) - T3: gradients computed in this iteration.

  • __group_606__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_606__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_606__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_607__weights (optional) - T2: weights to optimize.

  • __group_607__gradients (optional) - T3: gradients computed in this iteration.

  • __group_607__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_607__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_607__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_608__weights (optional) - T2: weights to optimize.

  • __group_608__gradients (optional) - T3: gradients computed in this iteration.

  • __group_608__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_608__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_608__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_609__weights (optional) - T2: weights to optimize.

  • __group_609__gradients (optional) - T3: gradients computed in this iteration.

  • __group_609__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_609__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_609__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_610__weights (optional) - T2: weights to optimize.

  • __group_610__gradients (optional) - T3: gradients computed in this iteration.

  • __group_610__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_610__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_610__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_611__weights (optional) - T2: weights to optimize.

  • __group_611__gradients (optional) - T3: gradients computed in this iteration.

  • __group_611__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_611__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_611__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_612__weights (optional) - T2: weights to optimize.

  • __group_612__gradients (optional) - T3: gradients computed in this iteration.

  • __group_612__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_612__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_612__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_613__weights (optional) - T2: weights to optimize.

  • __group_613__gradients (optional) - T3: gradients computed in this iteration.

  • __group_613__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_613__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_613__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_614__weights (optional) - T2: weights to optimize.

  • __group_614__gradients (optional) - T3: gradients computed in this iteration.

  • __group_614__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_614__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_614__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_615__weights (optional) - T2: weights to optimize.

  • __group_615__gradients (optional) - T3: gradients computed in this iteration.

  • __group_615__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_615__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_615__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_616__weights (optional) - T2: weights to optimize.

  • __group_616__gradients (optional) - T3: gradients computed in this iteration.

  • __group_616__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_616__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_616__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_617__weights (optional) - T2: weights to optimize.

  • __group_617__gradients (optional) - T3: gradients computed in this iteration.

  • __group_617__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_617__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_617__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_618__weights (optional) - T2: weights to optimize.

  • __group_618__gradients (optional) - T3: gradients computed in this iteration.

  • __group_618__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_618__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_618__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_619__weights (optional) - T2: weights to optimize.

  • __group_619__gradients (optional) - T3: gradients computed in this iteration.

  • __group_619__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_619__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_619__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_620__weights (optional) - T2: weights to optimize.

  • __group_620__gradients (optional) - T3: gradients computed in this iteration.

  • __group_620__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_620__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_620__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_621__weights (optional) - T2: weights to optimize.

  • __group_621__gradients (optional) - T3: gradients computed in this iteration.

  • __group_621__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_621__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_621__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_622__weights (optional) - T2: weights to optimize.

  • __group_622__gradients (optional) - T3: gradients computed in this iteration.

  • __group_622__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_622__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_622__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_623__weights (optional) - T2: weights to optimize.

  • __group_623__gradients (optional) - T3: gradients computed in this iteration.

  • __group_623__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_623__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_623__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_624__weights (optional) - T2: weights to optimize.

  • __group_624__gradients (optional) - T3: gradients computed in this iteration.

  • __group_624__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_624__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_624__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_625__weights (optional) - T2: weights to optimize.

  • __group_625__gradients (optional) - T3: gradients computed in this iteration.

  • __group_625__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_625__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_625__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_626__weights (optional) - T2: weights to optimize.

  • __group_626__gradients (optional) - T3: gradients computed in this iteration.

  • __group_626__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_626__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_626__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_627__weights (optional) - T2: weights to optimize.

  • __group_627__gradients (optional) - T3: gradients computed in this iteration.

  • __group_627__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_627__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_627__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_628__weights (optional) - T2: weights to optimize.

  • __group_628__gradients (optional) - T3: gradients computed in this iteration.

  • __group_628__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_628__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_628__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_629__weights (optional) - T2: weights to optimize.

  • __group_629__gradients (optional) - T3: gradients computed in this iteration.

  • __group_629__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_629__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_629__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_630__weights (optional) - T2: weights to optimize.

  • __group_630__gradients (optional) - T3: gradients computed in this iteration.

  • __group_630__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_630__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_630__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_631__weights (optional) - T2: weights to optimize.

  • __group_631__gradients (optional) - T3: gradients computed in this iteration.

  • __group_631__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_631__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_631__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_632__weights (optional) - T2: weights to optimize.

  • __group_632__gradients (optional) - T3: gradients computed in this iteration.

  • __group_632__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_632__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_632__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_633__weights (optional) - T2: weights to optimize.

  • __group_633__gradients (optional) - T3: gradients computed in this iteration.

  • __group_633__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_633__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_633__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_634__weights (optional) - T2: weights to optimize.

  • __group_634__gradients (optional) - T3: gradients computed in this iteration.

  • __group_634__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_634__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_634__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_635__weights (optional) - T2: weights to optimize.

  • __group_635__gradients (optional) - T3: gradients computed in this iteration.

  • __group_635__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_635__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_635__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_636__weights (optional) - T2: weights to optimize.

  • __group_636__gradients (optional) - T3: gradients computed in this iteration.

  • __group_636__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_636__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_636__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_637__weights (optional) - T2: weights to optimize.

  • __group_637__gradients (optional) - T3: gradients computed in this iteration.

  • __group_637__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_637__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_637__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_638__weights (optional) - T2: weights to optimize.

  • __group_638__gradients (optional) - T3: gradients computed in this iteration.

  • __group_638__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_638__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_638__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_639__weights (optional) - T2: weights to optimize.

  • __group_639__gradients (optional) - T3: gradients computed in this iteration.

  • __group_639__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_639__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_639__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_640__weights (optional) - T2: weights to optimize.

  • __group_640__gradients (optional) - T3: gradients computed in this iteration.

  • __group_640__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_640__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_640__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_641__weights (optional) - T2: weights to optimize.

  • __group_641__gradients (optional) - T3: gradients computed in this iteration.

  • __group_641__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_641__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_641__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_642__weights (optional) - T2: weights to optimize.

  • __group_642__gradients (optional) - T3: gradients computed in this iteration.

  • __group_642__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_642__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_642__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_643__weights (optional) - T2: weights to optimize.

  • __group_643__gradients (optional) - T3: gradients computed in this iteration.

  • __group_643__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_643__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_643__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_644__weights (optional) - T2: weights to optimize.

  • __group_644__gradients (optional) - T3: gradients computed in this iteration.

  • __group_644__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_644__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_644__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_645__weights (optional) - T2: weights to optimize.

  • __group_645__gradients (optional) - T3: gradients computed in this iteration.

  • __group_645__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_645__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_645__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_646__weights (optional) - T2: weights to optimize.

  • __group_646__gradients (optional) - T3: gradients computed in this iteration.

  • __group_646__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_646__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_646__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_647__weights (optional) - T2: weights to optimize.

  • __group_647__gradients (optional) - T3: gradients computed in this iteration.

  • __group_647__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_647__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_647__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_648__weights (optional) - T2: weights to optimize.

  • __group_648__gradients (optional) - T3: gradients computed in this iteration.

  • __group_648__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_648__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_648__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_649__weights (optional) - T2: weights to optimize.

  • __group_649__gradients (optional) - T3: gradients computed in this iteration.

  • __group_649__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_649__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_649__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_650__weights (optional) - T2: weights to optimize.

  • __group_650__gradients (optional) - T3: gradients computed in this iteration.

  • __group_650__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_650__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_650__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_651__weights (optional) - T2: weights to optimize.

  • __group_651__gradients (optional) - T3: gradients computed in this iteration.

  • __group_651__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_651__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_651__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_652__weights (optional) - T2: weights to optimize.

  • __group_652__gradients (optional) - T3: gradients computed in this iteration.

  • __group_652__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_652__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_652__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_653__weights (optional) - T2: weights to optimize.

  • __group_653__gradients (optional) - T3: gradients computed in this iteration.

  • __group_653__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_653__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_653__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_654__weights (optional) - T2: weights to optimize.

  • __group_654__gradients (optional) - T3: gradients computed in this iteration.

  • __group_654__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_654__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_654__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_655__weights (optional) - T2: weights to optimize.

  • __group_655__gradients (optional) - T3: gradients computed in this iteration.

  • __group_655__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_655__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_655__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_656__weights (optional) - T2: weights to optimize.

  • __group_656__gradients (optional) - T3: gradients computed in this iteration.

  • __group_656__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_656__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_656__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_657__weights (optional) - T2: weights to optimize.

  • __group_657__gradients (optional) - T3: gradients computed in this iteration.

  • __group_657__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_657__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_657__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_658__weights (optional) - T2: weights to optimize.

  • __group_658__gradients (optional) - T3: gradients computed in this iteration.

  • __group_658__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_658__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_658__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_659__weights (optional) - T2: weights to optimize.

  • __group_659__gradients (optional) - T3: gradients computed in this iteration.

  • __group_659__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_659__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_659__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_660__weights (optional) - T2: weights to optimize.

  • __group_660__gradients (optional) - T3: gradients computed in this iteration.

  • __group_660__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_660__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_660__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_661__weights (optional) - T2: weights to optimize.

  • __group_661__gradients (optional) - T3: gradients computed in this iteration.

  • __group_661__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_661__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_661__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_662__weights (optional) - T2: weights to optimize.

  • __group_662__gradients (optional) - T3: gradients computed in this iteration.

  • __group_662__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_662__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_662__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_663__weights (optional) - T2: weights to optimize.

  • __group_663__gradients (optional) - T3: gradients computed in this iteration.

  • __group_663__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_663__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_663__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_664__weights (optional) - T2: weights to optimize.

  • __group_664__gradients (optional) - T3: gradients computed in this iteration.

  • __group_664__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_664__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_664__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_665__weights (optional) - T2: weights to optimize.

  • __group_665__gradients (optional) - T3: gradients computed in this iteration.

  • __group_665__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_665__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_665__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_666__weights (optional) - T2: weights to optimize.

  • __group_666__gradients (optional) - T3: gradients computed in this iteration.

  • __group_666__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_666__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_666__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_667__weights (optional) - T2: weights to optimize.

  • __group_667__gradients (optional) - T3: gradients computed in this iteration.

  • __group_667__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_667__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_667__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_668__weights (optional) - T2: weights to optimize.

  • __group_668__gradients (optional) - T3: gradients computed in this iteration.

  • __group_668__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_668__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_668__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_669__weights (optional) - T2: weights to optimize.

  • __group_669__gradients (optional) - T3: gradients computed in this iteration.

  • __group_669__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_669__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_669__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_670__weights (optional) - T2: weights to optimize.

  • __group_670__gradients (optional) - T3: gradients computed in this iteration.

  • __group_670__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_670__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_670__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_671__weights (optional) - T2: weights to optimize.

  • __group_671__gradients (optional) - T3: gradients computed in this iteration.

  • __group_671__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_671__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_671__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_672__weights (optional) - T2: weights to optimize.

  • __group_672__gradients (optional) - T3: gradients computed in this iteration.

  • __group_672__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_672__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_672__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_673__weights (optional) - T2: weights to optimize.

  • __group_673__gradients (optional) - T3: gradients computed in this iteration.

  • __group_673__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_673__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_673__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_674__weights (optional) - T2: weights to optimize.

  • __group_674__gradients (optional) - T3: gradients computed in this iteration.

  • __group_674__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_674__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_674__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_675__weights (optional) - T2: weights to optimize.

  • __group_675__gradients (optional) - T3: gradients computed in this iteration.

  • __group_675__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_675__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_675__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_676__weights (optional) - T2: weights to optimize.

  • __group_676__gradients (optional) - T3: gradients computed in this iteration.

  • __group_676__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_676__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_676__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_677__weights (optional) - T2: weights to optimize.

  • __group_677__gradients (optional) - T3: gradients computed in this iteration.

  • __group_677__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_677__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_677__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_678__weights (optional) - T2: weights to optimize.

  • __group_678__gradients (optional) - T3: gradients computed in this iteration.

  • __group_678__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_678__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_678__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_679__weights (optional) - T2: weights to optimize.

  • __group_679__gradients (optional) - T3: gradients computed in this iteration.

  • __group_679__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_679__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_679__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_680__weights (optional) - T2: weights to optimize.

  • __group_680__gradients (optional) - T3: gradients computed in this iteration.

  • __group_680__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_680__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_680__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_681__weights (optional) - T2: weights to optimize.

  • __group_681__gradients (optional) - T3: gradients computed in this iteration.

  • __group_681__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_681__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_681__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_682__weights (optional) - T2: weights to optimize.

  • __group_682__gradients (optional) - T3: gradients computed in this iteration.

  • __group_682__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_682__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_682__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_683__weights (optional) - T2: weights to optimize.

  • __group_683__gradients (optional) - T3: gradients computed in this iteration.

  • __group_683__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_683__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_683__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_684__weights (optional) - T2: weights to optimize.

  • __group_684__gradients (optional) - T3: gradients computed in this iteration.

  • __group_684__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_684__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_684__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_685__weights (optional) - T2: weights to optimize.

  • __group_685__gradients (optional) - T3: gradients computed in this iteration.

  • __group_685__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_685__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_685__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_686__weights (optional) - T2: weights to optimize.

  • __group_686__gradients (optional) - T3: gradients computed in this iteration.

  • __group_686__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_686__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_686__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_687__weights (optional) - T2: weights to optimize.

  • __group_687__gradients (optional) - T3: gradients computed in this iteration.

  • __group_687__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_687__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_687__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_688__weights (optional) - T2: weights to optimize.

  • __group_688__gradients (optional) - T3: gradients computed in this iteration.

  • __group_688__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_688__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_688__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_689__weights (optional) - T2: weights to optimize.

  • __group_689__gradients (optional) - T3: gradients computed in this iteration.

  • __group_689__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_689__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_689__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_690__weights (optional) - T2: weights to optimize.

  • __group_690__gradients (optional) - T3: gradients computed in this iteration.

  • __group_690__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_690__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_690__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_691__weights (optional) - T2: weights to optimize.

  • __group_691__gradients (optional) - T3: gradients computed in this iteration.

  • __group_691__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_691__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_691__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_692__weights (optional) - T2: weights to optimize.

  • __group_692__gradients (optional) - T3: gradients computed in this iteration.

  • __group_692__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_692__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_692__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_693__weights (optional) - T2: weights to optimize.

  • __group_693__gradients (optional) - T3: gradients computed in this iteration.

  • __group_693__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_693__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_693__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_694__weights (optional) - T2: weights to optimize.

  • __group_694__gradients (optional) - T3: gradients computed in this iteration.

  • __group_694__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_694__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_694__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_695__weights (optional) - T2: weights to optimize.

  • __group_695__gradients (optional) - T3: gradients computed in this iteration.

  • __group_695__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_695__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_695__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_696__weights (optional) - T2: weights to optimize.

  • __group_696__gradients (optional) - T3: gradients computed in this iteration.

  • __group_696__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_696__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_696__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_697__weights (optional) - T2: weights to optimize.

  • __group_697__gradients (optional) - T3: gradients computed in this iteration.

  • __group_697__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_697__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_697__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_698__weights (optional) - T2: weights to optimize.

  • __group_698__gradients (optional) - T3: gradients computed in this iteration.

  • __group_698__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_698__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_698__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_699__weights (optional) - T2: weights to optimize.

  • __group_699__gradients (optional) - T3: gradients computed in this iteration.

  • __group_699__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_699__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_699__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_700__weights (optional) - T2: weights to optimize.

  • __group_700__gradients (optional) - T3: gradients computed in this iteration.

  • __group_700__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_700__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_700__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_701__weights (optional) - T2: weights to optimize.

  • __group_701__gradients (optional) - T3: gradients computed in this iteration.

  • __group_701__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_701__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_701__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_702__weights (optional) - T2: weights to optimize.

  • __group_702__gradients (optional) - T3: gradients computed in this iteration.

  • __group_702__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_702__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_702__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_703__weights (optional) - T2: weights to optimize.

  • __group_703__gradients (optional) - T3: gradients computed in this iteration.

  • __group_703__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_703__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_703__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_704__weights (optional) - T2: weights to optimize.

  • __group_704__gradients (optional) - T3: gradients computed in this iteration.

  • __group_704__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_704__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_704__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_705__weights (optional) - T2: weights to optimize.

  • __group_705__gradients (optional) - T3: gradients computed in this iteration.

  • __group_705__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_705__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_705__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_706__weights (optional) - T2: weights to optimize.

  • __group_706__gradients (optional) - T3: gradients computed in this iteration.

  • __group_706__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_706__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_706__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_707__weights (optional) - T2: weights to optimize.

  • __group_707__gradients (optional) - T3: gradients computed in this iteration.

  • __group_707__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_707__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_707__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_708__weights (optional) - T2: weights to optimize.

  • __group_708__gradients (optional) - T3: gradients computed in this iteration.

  • __group_708__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_708__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_708__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_709__weights (optional) - T2: weights to optimize.

  • __group_709__gradients (optional) - T3: gradients computed in this iteration.

  • __group_709__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_709__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_709__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_710__weights (optional) - T2: weights to optimize.

  • __group_710__gradients (optional) - T3: gradients computed in this iteration.

  • __group_710__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_710__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_710__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_711__weights (optional) - T2: weights to optimize.

  • __group_711__gradients (optional) - T3: gradients computed in this iteration.

  • __group_711__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_711__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_711__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_712__weights (optional) - T2: weights to optimize.

  • __group_712__gradients (optional) - T3: gradients computed in this iteration.

  • __group_712__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_712__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_712__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_713__weights (optional) - T2: weights to optimize.

  • __group_713__gradients (optional) - T3: gradients computed in this iteration.

  • __group_713__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_713__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_713__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_714__weights (optional) - T2: weights to optimize.

  • __group_714__gradients (optional) - T3: gradients computed in this iteration.

  • __group_714__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_714__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_714__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_715__weights (optional) - T2: weights to optimize.

  • __group_715__gradients (optional) - T3: gradients computed in this iteration.

  • __group_715__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_715__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_715__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_716__weights (optional) - T2: weights to optimize.

  • __group_716__gradients (optional) - T3: gradients computed in this iteration.

  • __group_716__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_716__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_716__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_717__weights (optional) - T2: weights to optimize.

  • __group_717__gradients (optional) - T3: gradients computed in this iteration.

  • __group_717__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_717__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_717__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_718__weights (optional) - T2: weights to optimize.

  • __group_718__gradients (optional) - T3: gradients computed in this iteration.

  • __group_718__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_718__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_718__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_719__weights (optional) - T2: weights to optimize.

  • __group_719__gradients (optional) - T3: gradients computed in this iteration.

  • __group_719__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_719__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_719__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_720__weights (optional) - T2: weights to optimize.

  • __group_720__gradients (optional) - T3: gradients computed in this iteration.

  • __group_720__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_720__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_720__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_721__weights (optional) - T2: weights to optimize.

  • __group_721__gradients (optional) - T3: gradients computed in this iteration.

  • __group_721__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_721__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_721__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_722__weights (optional) - T2: weights to optimize.

  • __group_722__gradients (optional) - T3: gradients computed in this iteration.

  • __group_722__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_722__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_722__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_723__weights (optional) - T2: weights to optimize.

  • __group_723__gradients (optional) - T3: gradients computed in this iteration.

  • __group_723__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_723__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_723__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_724__weights (optional) - T2: weights to optimize.

  • __group_724__gradients (optional) - T3: gradients computed in this iteration.

  • __group_724__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_724__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_724__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_725__weights (optional) - T2: weights to optimize.

  • __group_725__gradients (optional) - T3: gradients computed in this iteration.

  • __group_725__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_725__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_725__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_726__weights (optional) - T2: weights to optimize.

  • __group_726__gradients (optional) - T3: gradients computed in this iteration.

  • __group_726__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_726__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_726__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_727__weights (optional) - T2: weights to optimize.

  • __group_727__gradients (optional) - T3: gradients computed in this iteration.

  • __group_727__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_727__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_727__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_728__weights (optional) - T2: weights to optimize.

  • __group_728__gradients (optional) - T3: gradients computed in this iteration.

  • __group_728__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_728__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_728__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_729__weights (optional) - T2: weights to optimize.

  • __group_729__gradients (optional) - T3: gradients computed in this iteration.

  • __group_729__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_729__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_729__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_730__weights (optional) - T2: weights to optimize.

  • __group_730__gradients (optional) - T3: gradients computed in this iteration.

  • __group_730__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_730__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_730__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_731__weights (optional) - T2: weights to optimize.

  • __group_731__gradients (optional) - T3: gradients computed in this iteration.

  • __group_731__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_731__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_731__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_732__weights (optional) - T2: weights to optimize.

  • __group_732__gradients (optional) - T3: gradients computed in this iteration.

  • __group_732__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_732__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_732__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_733__weights (optional) - T2: weights to optimize.

  • __group_733__gradients (optional) - T3: gradients computed in this iteration.

  • __group_733__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_733__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_733__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_734__weights (optional) - T2: weights to optimize.

  • __group_734__gradients (optional) - T3: gradients computed in this iteration.

  • __group_734__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_734__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_734__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_735__weights (optional) - T2: weights to optimize.

  • __group_735__gradients (optional) - T3: gradients computed in this iteration.

  • __group_735__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_735__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_735__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_736__weights (optional) - T2: weights to optimize.

  • __group_736__gradients (optional) - T3: gradients computed in this iteration.

  • __group_736__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_736__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_736__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_737__weights (optional) - T2: weights to optimize.

  • __group_737__gradients (optional) - T3: gradients computed in this iteration.

  • __group_737__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_737__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_737__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_738__weights (optional) - T2: weights to optimize.

  • __group_738__gradients (optional) - T3: gradients computed in this iteration.

  • __group_738__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_738__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_738__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_739__weights (optional) - T2: weights to optimize.

  • __group_739__gradients (optional) - T3: gradients computed in this iteration.

  • __group_739__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_739__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_739__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_740__weights (optional) - T2: weights to optimize.

  • __group_740__gradients (optional) - T3: gradients computed in this iteration.

  • __group_740__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_740__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_740__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_741__weights (optional) - T2: weights to optimize.

  • __group_741__gradients (optional) - T3: gradients computed in this iteration.

  • __group_741__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_741__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_741__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_742__weights (optional) - T2: weights to optimize.

  • __group_742__gradients (optional) - T3: gradients computed in this iteration.

  • __group_742__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_742__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_742__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_743__weights (optional) - T2: weights to optimize.

  • __group_743__gradients (optional) - T3: gradients computed in this iteration.

  • __group_743__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_743__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_743__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_744__weights (optional) - T2: weights to optimize.

  • __group_744__gradients (optional) - T3: gradients computed in this iteration.

  • __group_744__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_744__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_744__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_745__weights (optional) - T2: weights to optimize.

  • __group_745__gradients (optional) - T3: gradients computed in this iteration.

  • __group_745__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_745__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_745__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_746__weights (optional) - T2: weights to optimize.

  • __group_746__gradients (optional) - T3: gradients computed in this iteration.

  • __group_746__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_746__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_746__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_747__weights (optional) - T2: weights to optimize.

  • __group_747__gradients (optional) - T3: gradients computed in this iteration.

  • __group_747__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_747__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_747__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_748__weights (optional) - T2: weights to optimize.

  • __group_748__gradients (optional) - T3: gradients computed in this iteration.

  • __group_748__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_748__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_748__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_749__weights (optional) - T2: weights to optimize.

  • __group_749__gradients (optional) - T3: gradients computed in this iteration.

  • __group_749__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_749__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_749__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_750__weights (optional) - T2: weights to optimize.

  • __group_750__gradients (optional) - T3: gradients computed in this iteration.

  • __group_750__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_750__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_750__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_751__weights (optional) - T2: weights to optimize.

  • __group_751__gradients (optional) - T3: gradients computed in this iteration.

  • __group_751__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_751__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_751__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_752__weights (optional) - T2: weights to optimize.

  • __group_752__gradients (optional) - T3: gradients computed in this iteration.

  • __group_752__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_752__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_752__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_753__weights (optional) - T2: weights to optimize.

  • __group_753__gradients (optional) - T3: gradients computed in this iteration.

  • __group_753__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_753__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_753__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_754__weights (optional) - T2: weights to optimize.

  • __group_754__gradients (optional) - T3: gradients computed in this iteration.

  • __group_754__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_754__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_754__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_755__weights (optional) - T2: weights to optimize.

  • __group_755__gradients (optional) - T3: gradients computed in this iteration.

  • __group_755__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_755__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_755__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_756__weights (optional) - T2: weights to optimize.

  • __group_756__gradients (optional) - T3: gradients computed in this iteration.

  • __group_756__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_756__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_756__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_757__weights (optional) - T2: weights to optimize.

  • __group_757__gradients (optional) - T3: gradients computed in this iteration.

  • __group_757__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_757__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_757__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_758__weights (optional) - T2: weights to optimize.

  • __group_758__gradients (optional) - T3: gradients computed in this iteration.

  • __group_758__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_758__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_758__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_759__weights (optional) - T2: weights to optimize.

  • __group_759__gradients (optional) - T3: gradients computed in this iteration.

  • __group_759__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_759__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_759__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_760__weights (optional) - T2: weights to optimize.

  • __group_760__gradients (optional) - T3: gradients computed in this iteration.

  • __group_760__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_760__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_760__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_761__weights (optional) - T2: weights to optimize.

  • __group_761__gradients (optional) - T3: gradients computed in this iteration.

  • __group_761__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_761__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_761__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_762__weights (optional) - T2: weights to optimize.

  • __group_762__gradients (optional) - T3: gradients computed in this iteration.

  • __group_762__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_762__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_762__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_763__weights (optional) - T2: weights to optimize.

  • __group_763__gradients (optional) - T3: gradients computed in this iteration.

  • __group_763__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_763__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_763__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_764__weights (optional) - T2: weights to optimize.

  • __group_764__gradients (optional) - T3: gradients computed in this iteration.

  • __group_764__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_764__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_764__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_765__weights (optional) - T2: weights to optimize.

  • __group_765__gradients (optional) - T3: gradients computed in this iteration.

  • __group_765__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_765__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_765__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_766__weights (optional) - T2: weights to optimize.

  • __group_766__gradients (optional) - T3: gradients computed in this iteration.

  • __group_766__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_766__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_766__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_767__weights (optional) - T2: weights to optimize.

  • __group_767__gradients (optional) - T3: gradients computed in this iteration.

  • __group_767__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_767__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_767__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_768__weights (optional) - T2: weights to optimize.

  • __group_768__gradients (optional) - T3: gradients computed in this iteration.

  • __group_768__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_768__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_768__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_769__weights (optional) - T2: weights to optimize.

  • __group_769__gradients (optional) - T3: gradients computed in this iteration.

  • __group_769__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_769__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_769__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_770__weights (optional) - T2: weights to optimize.

  • __group_770__gradients (optional) - T3: gradients computed in this iteration.

  • __group_770__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_770__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_770__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_771__weights (optional) - T2: weights to optimize.

  • __group_771__gradients (optional) - T3: gradients computed in this iteration.

  • __group_771__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_771__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_771__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_772__weights (optional) - T2: weights to optimize.

  • __group_772__gradients (optional) - T3: gradients computed in this iteration.

  • __group_772__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_772__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_772__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_773__weights (optional) - T2: weights to optimize.

  • __group_773__gradients (optional) - T3: gradients computed in this iteration.

  • __group_773__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_773__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_773__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_774__weights (optional) - T2: weights to optimize.

  • __group_774__gradients (optional) - T3: gradients computed in this iteration.

  • __group_774__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_774__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_774__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_775__weights (optional) - T2: weights to optimize.

  • __group_775__gradients (optional) - T3: gradients computed in this iteration.

  • __group_775__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_775__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_775__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_776__weights (optional) - T2: weights to optimize.

  • __group_776__gradients (optional) - T3: gradients computed in this iteration.

  • __group_776__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_776__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_776__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_777__weights (optional) - T2: weights to optimize.

  • __group_777__gradients (optional) - T3: gradients computed in this iteration.

  • __group_777__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_777__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_777__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_778__weights (optional) - T2: weights to optimize.

  • __group_778__gradients (optional) - T3: gradients computed in this iteration.

  • __group_778__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_778__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_778__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_779__weights (optional) - T2: weights to optimize.

  • __group_779__gradients (optional) - T3: gradients computed in this iteration.

  • __group_779__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_779__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_779__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_780__weights (optional) - T2: weights to optimize.

  • __group_780__gradients (optional) - T3: gradients computed in this iteration.

  • __group_780__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_780__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_780__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_781__weights (optional) - T2: weights to optimize.

  • __group_781__gradients (optional) - T3: gradients computed in this iteration.

  • __group_781__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_781__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_781__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_782__weights (optional) - T2: weights to optimize.

  • __group_782__gradients (optional) - T3: gradients computed in this iteration.

  • __group_782__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_782__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_782__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_783__weights (optional) - T2: weights to optimize.

  • __group_783__gradients (optional) - T3: gradients computed in this iteration.

  • __group_783__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_783__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_783__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_784__weights (optional) - T2: weights to optimize.

  • __group_784__gradients (optional) - T3: gradients computed in this iteration.

  • __group_784__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_784__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_784__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_785__weights (optional) - T2: weights to optimize.

  • __group_785__gradients (optional) - T3: gradients computed in this iteration.

  • __group_785__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_785__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_785__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_786__weights (optional) - T2: weights to optimize.

  • __group_786__gradients (optional) - T3: gradients computed in this iteration.

  • __group_786__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_786__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_786__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_787__weights (optional) - T2: weights to optimize.

  • __group_787__gradients (optional) - T3: gradients computed in this iteration.

  • __group_787__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_787__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_787__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_788__weights (optional) - T2: weights to optimize.

  • __group_788__gradients (optional) - T3: gradients computed in this iteration.

  • __group_788__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_788__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_788__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_789__weights (optional) - T2: weights to optimize.

  • __group_789__gradients (optional) - T3: gradients computed in this iteration.

  • __group_789__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_789__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_789__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_790__weights (optional) - T2: weights to optimize.

  • __group_790__gradients (optional) - T3: gradients computed in this iteration.

  • __group_790__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_790__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_790__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_791__weights (optional) - T2: weights to optimize.

  • __group_791__gradients (optional) - T3: gradients computed in this iteration.

  • __group_791__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_791__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_791__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_792__weights (optional) - T2: weights to optimize.

  • __group_792__gradients (optional) - T3: gradients computed in this iteration.

  • __group_792__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_792__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_792__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_793__weights (optional) - T2: weights to optimize.

  • __group_793__gradients (optional) - T3: gradients computed in this iteration.

  • __group_793__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_793__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_793__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_794__weights (optional) - T2: weights to optimize.

  • __group_794__gradients (optional) - T3: gradients computed in this iteration.

  • __group_794__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_794__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_794__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_795__weights (optional) - T2: weights to optimize.

  • __group_795__gradients (optional) - T3: gradients computed in this iteration.

  • __group_795__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_795__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_795__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_796__weights (optional) - T2: weights to optimize.

  • __group_796__gradients (optional) - T3: gradients computed in this iteration.

  • __group_796__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_796__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_796__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_797__weights (optional) - T2: weights to optimize.

  • __group_797__gradients (optional) - T3: gradients computed in this iteration.

  • __group_797__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_797__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_797__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_798__weights (optional) - T2: weights to optimize.

  • __group_798__gradients (optional) - T3: gradients computed in this iteration.

  • __group_798__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_798__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_798__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_799__weights (optional) - T2: weights to optimize.

  • __group_799__gradients (optional) - T3: gradients computed in this iteration.

  • __group_799__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_799__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_799__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_800__weights (optional) - T2: weights to optimize.

  • __group_800__gradients (optional) - T3: gradients computed in this iteration.

  • __group_800__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_800__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_800__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_801__weights (optional) - T2: weights to optimize.

  • __group_801__gradients (optional) - T3: gradients computed in this iteration.

  • __group_801__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_801__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_801__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_802__weights (optional) - T2: weights to optimize.

  • __group_802__gradients (optional) - T3: gradients computed in this iteration.

  • __group_802__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_802__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_802__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_803__weights (optional) - T2: weights to optimize.

  • __group_803__gradients (optional) - T3: gradients computed in this iteration.

  • __group_803__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_803__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_803__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_804__weights (optional) - T2: weights to optimize.

  • __group_804__gradients (optional) - T3: gradients computed in this iteration.

  • __group_804__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_804__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_804__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_805__weights (optional) - T2: weights to optimize.

  • __group_805__gradients (optional) - T3: gradients computed in this iteration.

  • __group_805__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_805__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_805__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_806__weights (optional) - T2: weights to optimize.

  • __group_806__gradients (optional) - T3: gradients computed in this iteration.

  • __group_806__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_806__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_806__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_807__weights (optional) - T2: weights to optimize.

  • __group_807__gradients (optional) - T3: gradients computed in this iteration.

  • __group_807__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_807__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_807__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_808__weights (optional) - T2: weights to optimize.

  • __group_808__gradients (optional) - T3: gradients computed in this iteration.

  • __group_808__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_808__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_808__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_809__weights (optional) - T2: weights to optimize.

  • __group_809__gradients (optional) - T3: gradients computed in this iteration.

  • __group_809__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_809__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_809__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_810__weights (optional) - T2: weights to optimize.

  • __group_810__gradients (optional) - T3: gradients computed in this iteration.

  • __group_810__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_810__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_810__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_811__weights (optional) - T2: weights to optimize.

  • __group_811__gradients (optional) - T3: gradients computed in this iteration.

  • __group_811__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_811__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_811__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_812__weights (optional) - T2: weights to optimize.

  • __group_812__gradients (optional) - T3: gradients computed in this iteration.

  • __group_812__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_812__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_812__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_813__weights (optional) - T2: weights to optimize.

  • __group_813__gradients (optional) - T3: gradients computed in this iteration.

  • __group_813__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_813__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_813__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_814__weights (optional) - T2: weights to optimize.

  • __group_814__gradients (optional) - T3: gradients computed in this iteration.

  • __group_814__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_814__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_814__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_815__weights (optional) - T2: weights to optimize.

  • __group_815__gradients (optional) - T3: gradients computed in this iteration.

  • __group_815__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_815__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_815__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_816__weights (optional) - T2: weights to optimize.

  • __group_816__gradients (optional) - T3: gradients computed in this iteration.

  • __group_816__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_816__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_816__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_817__weights (optional) - T2: weights to optimize.

  • __group_817__gradients (optional) - T3: gradients computed in this iteration.

  • __group_817__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_817__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_817__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_818__weights (optional) - T2: weights to optimize.

  • __group_818__gradients (optional) - T3: gradients computed in this iteration.

  • __group_818__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_818__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_818__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_819__weights (optional) - T2: weights to optimize.

  • __group_819__gradients (optional) - T3: gradients computed in this iteration.

  • __group_819__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_819__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_819__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_820__weights (optional) - T2: weights to optimize.

  • __group_820__gradients (optional) - T3: gradients computed in this iteration.

  • __group_820__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_820__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_820__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_821__weights (optional) - T2: weights to optimize.

  • __group_821__gradients (optional) - T3: gradients computed in this iteration.

  • __group_821__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_821__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_821__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_822__weights (optional) - T2: weights to optimize.

  • __group_822__gradients (optional) - T3: gradients computed in this iteration.

  • __group_822__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_822__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_822__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_823__weights (optional) - T2: weights to optimize.

  • __group_823__gradients (optional) - T3: gradients computed in this iteration.

  • __group_823__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_823__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_823__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_824__weights (optional) - T2: weights to optimize.

  • __group_824__gradients (optional) - T3: gradients computed in this iteration.

  • __group_824__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_824__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_824__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_825__weights (optional) - T2: weights to optimize.

  • __group_825__gradients (optional) - T3: gradients computed in this iteration.

  • __group_825__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_825__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_825__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_826__weights (optional) - T2: weights to optimize.

  • __group_826__gradients (optional) - T3: gradients computed in this iteration.

  • __group_826__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_826__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_826__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_827__weights (optional) - T2: weights to optimize.

  • __group_827__gradients (optional) - T3: gradients computed in this iteration.

  • __group_827__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_827__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_827__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_828__weights (optional) - T2: weights to optimize.

  • __group_828__gradients (optional) - T3: gradients computed in this iteration.

  • __group_828__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_828__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_828__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_829__weights (optional) - T2: weights to optimize.

  • __group_829__gradients (optional) - T3: gradients computed in this iteration.

  • __group_829__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_829__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_829__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_830__weights (optional) - T2: weights to optimize.

  • __group_830__gradients (optional) - T3: gradients computed in this iteration.

  • __group_830__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_830__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_830__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_831__weights (optional) - T2: weights to optimize.

  • __group_831__gradients (optional) - T3: gradients computed in this iteration.

  • __group_831__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_831__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_831__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_832__weights (optional) - T2: weights to optimize.

  • __group_832__gradients (optional) - T3: gradients computed in this iteration.

  • __group_832__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_832__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_832__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_833__weights (optional) - T2: weights to optimize.

  • __group_833__gradients (optional) - T3: gradients computed in this iteration.

  • __group_833__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_833__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_833__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_834__weights (optional) - T2: weights to optimize.

  • __group_834__gradients (optional) - T3: gradients computed in this iteration.

  • __group_834__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_834__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_834__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_835__weights (optional) - T2: weights to optimize.

  • __group_835__gradients (optional) - T3: gradients computed in this iteration.

  • __group_835__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_835__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_835__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_836__weights (optional) - T2: weights to optimize.

  • __group_836__gradients (optional) - T3: gradients computed in this iteration.

  • __group_836__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_836__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_836__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_837__weights (optional) - T2: weights to optimize.

  • __group_837__gradients (optional) - T3: gradients computed in this iteration.

  • __group_837__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_837__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_837__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_838__weights (optional) - T2: weights to optimize.

  • __group_838__gradients (optional) - T3: gradients computed in this iteration.

  • __group_838__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_838__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_838__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_839__weights (optional) - T2: weights to optimize.

  • __group_839__gradients (optional) - T3: gradients computed in this iteration.

  • __group_839__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_839__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_839__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_840__weights (optional) - T2: weights to optimize.

  • __group_840__gradients (optional) - T3: gradients computed in this iteration.

  • __group_840__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_840__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_840__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_841__weights (optional) - T2: weights to optimize.

  • __group_841__gradients (optional) - T3: gradients computed in this iteration.

  • __group_841__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_841__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_841__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_842__weights (optional) - T2: weights to optimize.

  • __group_842__gradients (optional) - T3: gradients computed in this iteration.

  • __group_842__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_842__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_842__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_843__weights (optional) - T2: weights to optimize.

  • __group_843__gradients (optional) - T3: gradients computed in this iteration.

  • __group_843__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_843__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_843__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_844__weights (optional) - T2: weights to optimize.

  • __group_844__gradients (optional) - T3: gradients computed in this iteration.

  • __group_844__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_844__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_844__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_845__weights (optional) - T2: weights to optimize.

  • __group_845__gradients (optional) - T3: gradients computed in this iteration.

  • __group_845__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_845__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_845__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_846__weights (optional) - T2: weights to optimize.

  • __group_846__gradients (optional) - T3: gradients computed in this iteration.

  • __group_846__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_846__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_846__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_847__weights (optional) - T2: weights to optimize.

  • __group_847__gradients (optional) - T3: gradients computed in this iteration.

  • __group_847__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_847__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_847__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_848__weights (optional) - T2: weights to optimize.

  • __group_848__gradients (optional) - T3: gradients computed in this iteration.

  • __group_848__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_848__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_848__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_849__weights (optional) - T2: weights to optimize.

  • __group_849__gradients (optional) - T3: gradients computed in this iteration.

  • __group_849__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_849__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_849__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_850__weights (optional) - T2: weights to optimize.

  • __group_850__gradients (optional) - T3: gradients computed in this iteration.

  • __group_850__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_850__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_850__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_851__weights (optional) - T2: weights to optimize.

  • __group_851__gradients (optional) - T3: gradients computed in this iteration.

  • __group_851__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_851__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_851__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_852__weights (optional) - T2: weights to optimize.

  • __group_852__gradients (optional) - T3: gradients computed in this iteration.

  • __group_852__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_852__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_852__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_853__weights (optional) - T2: weights to optimize.

  • __group_853__gradients (optional) - T3: gradients computed in this iteration.

  • __group_853__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_853__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_853__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_854__weights (optional) - T2: weights to optimize.

  • __group_854__gradients (optional) - T3: gradients computed in this iteration.

  • __group_854__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_854__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_854__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_855__weights (optional) - T2: weights to optimize.

  • __group_855__gradients (optional) - T3: gradients computed in this iteration.

  • __group_855__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_855__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_855__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_856__weights (optional) - T2: weights to optimize.

  • __group_856__gradients (optional) - T3: gradients computed in this iteration.

  • __group_856__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_856__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_856__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_857__weights (optional) - T2: weights to optimize.

  • __group_857__gradients (optional) - T3: gradients computed in this iteration.

  • __group_857__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_857__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_857__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_858__weights (optional) - T2: weights to optimize.

  • __group_858__gradients (optional) - T3: gradients computed in this iteration.

  • __group_858__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_858__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_858__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_859__weights (optional) - T2: weights to optimize.

  • __group_859__gradients (optional) - T3: gradients computed in this iteration.

  • __group_859__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_859__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_859__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_860__weights (optional) - T2: weights to optimize.

  • __group_860__gradients (optional) - T3: gradients computed in this iteration.

  • __group_860__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_860__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_860__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_861__weights (optional) - T2: weights to optimize.

  • __group_861__gradients (optional) - T3: gradients computed in this iteration.

  • __group_861__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_861__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_861__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_862__weights (optional) - T2: weights to optimize.

  • __group_862__gradients (optional) - T3: gradients computed in this iteration.

  • __group_862__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_862__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_862__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_863__weights (optional) - T2: weights to optimize.

  • __group_863__gradients (optional) - T3: gradients computed in this iteration.

  • __group_863__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_863__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_863__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_864__weights (optional) - T2: weights to optimize.

  • __group_864__gradients (optional) - T3: gradients computed in this iteration.

  • __group_864__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_864__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_864__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_865__weights (optional) - T2: weights to optimize.

  • __group_865__gradients (optional) - T3: gradients computed in this iteration.

  • __group_865__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_865__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_865__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_866__weights (optional) - T2: weights to optimize.

  • __group_866__gradients (optional) - T3: gradients computed in this iteration.

  • __group_866__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_866__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_866__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_867__weights (optional) - T2: weights to optimize.

  • __group_867__gradients (optional) - T3: gradients computed in this iteration.

  • __group_867__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_867__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_867__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_868__weights (optional) - T2: weights to optimize.

  • __group_868__gradients (optional) - T3: gradients computed in this iteration.

  • __group_868__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_868__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_868__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_869__weights (optional) - T2: weights to optimize.

  • __group_869__gradients (optional) - T3: gradients computed in this iteration.

  • __group_869__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_869__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_869__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_870__weights (optional) - T2: weights to optimize.

  • __group_870__gradients (optional) - T3: gradients computed in this iteration.

  • __group_870__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_870__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_870__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_871__weights (optional) - T2: weights to optimize.

  • __group_871__gradients (optional) - T3: gradients computed in this iteration.

  • __group_871__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_871__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_871__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_872__weights (optional) - T2: weights to optimize.

  • __group_872__gradients (optional) - T3: gradients computed in this iteration.

  • __group_872__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_872__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_872__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_873__weights (optional) - T2: weights to optimize.

  • __group_873__gradients (optional) - T3: gradients computed in this iteration.

  • __group_873__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_873__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_873__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_874__weights (optional) - T2: weights to optimize.

  • __group_874__gradients (optional) - T3: gradients computed in this iteration.

  • __group_874__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_874__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_874__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_875__weights (optional) - T2: weights to optimize.

  • __group_875__gradients (optional) - T3: gradients computed in this iteration.

  • __group_875__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_875__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_875__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_876__weights (optional) - T2: weights to optimize.

  • __group_876__gradients (optional) - T3: gradients computed in this iteration.

  • __group_876__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_876__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_876__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_877__weights (optional) - T2: weights to optimize.

  • __group_877__gradients (optional) - T3: gradients computed in this iteration.

  • __group_877__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_877__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_877__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_878__weights (optional) - T2: weights to optimize.

  • __group_878__gradients (optional) - T3: gradients computed in this iteration.

  • __group_878__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_878__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_878__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_879__weights (optional) - T2: weights to optimize.

  • __group_879__gradients (optional) - T3: gradients computed in this iteration.

  • __group_879__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_879__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_879__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_880__weights (optional) - T2: weights to optimize.

  • __group_880__gradients (optional) - T3: gradients computed in this iteration.

  • __group_880__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_880__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_880__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_881__weights (optional) - T2: weights to optimize.

  • __group_881__gradients (optional) - T3: gradients computed in this iteration.

  • __group_881__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_881__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_881__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_882__weights (optional) - T2: weights to optimize.

  • __group_882__gradients (optional) - T3: gradients computed in this iteration.

  • __group_882__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_882__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_882__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_883__weights (optional) - T2: weights to optimize.

  • __group_883__gradients (optional) - T3: gradients computed in this iteration.

  • __group_883__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_883__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_883__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_884__weights (optional) - T2: weights to optimize.

  • __group_884__gradients (optional) - T3: gradients computed in this iteration.

  • __group_884__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_884__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_884__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_885__weights (optional) - T2: weights to optimize.

  • __group_885__gradients (optional) - T3: gradients computed in this iteration.

  • __group_885__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_885__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_885__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_886__weights (optional) - T2: weights to optimize.

  • __group_886__gradients (optional) - T3: gradients computed in this iteration.

  • __group_886__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_886__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_886__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_887__weights (optional) - T2: weights to optimize.

  • __group_887__gradients (optional) - T3: gradients computed in this iteration.

  • __group_887__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_887__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_887__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_888__weights (optional) - T2: weights to optimize.

  • __group_888__gradients (optional) - T3: gradients computed in this iteration.

  • __group_888__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_888__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_888__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_889__weights (optional) - T2: weights to optimize.

  • __group_889__gradients (optional) - T3: gradients computed in this iteration.

  • __group_889__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_889__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_889__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_890__weights (optional) - T2: weights to optimize.

  • __group_890__gradients (optional) - T3: gradients computed in this iteration.

  • __group_890__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_890__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_890__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_891__weights (optional) - T2: weights to optimize.

  • __group_891__gradients (optional) - T3: gradients computed in this iteration.

  • __group_891__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_891__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_891__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_892__weights (optional) - T2: weights to optimize.

  • __group_892__gradients (optional) - T3: gradients computed in this iteration.

  • __group_892__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_892__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_892__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_893__weights (optional) - T2: weights to optimize.

  • __group_893__gradients (optional) - T3: gradients computed in this iteration.

  • __group_893__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_893__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_893__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_894__weights (optional) - T2: weights to optimize.

  • __group_894__gradients (optional) - T3: gradients computed in this iteration.

  • __group_894__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_894__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_894__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_895__weights (optional) - T2: weights to optimize.

  • __group_895__gradients (optional) - T3: gradients computed in this iteration.

  • __group_895__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_895__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_895__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_896__weights (optional) - T2: weights to optimize.

  • __group_896__gradients (optional) - T3: gradients computed in this iteration.

  • __group_896__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_896__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_896__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_897__weights (optional) - T2: weights to optimize.

  • __group_897__gradients (optional) - T3: gradients computed in this iteration.

  • __group_897__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_897__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_897__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_898__weights (optional) - T2: weights to optimize.

  • __group_898__gradients (optional) - T3: gradients computed in this iteration.

  • __group_898__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_898__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_898__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_899__weights (optional) - T2: weights to optimize.

  • __group_899__gradients (optional) - T3: gradients computed in this iteration.

  • __group_899__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_899__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_899__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_900__weights (optional) - T2: weights to optimize.

  • __group_900__gradients (optional) - T3: gradients computed in this iteration.

  • __group_900__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_900__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_900__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_901__weights (optional) - T2: weights to optimize.

  • __group_901__gradients (optional) - T3: gradients computed in this iteration.

  • __group_901__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_901__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_901__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_902__weights (optional) - T2: weights to optimize.

  • __group_902__gradients (optional) - T3: gradients computed in this iteration.

  • __group_902__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_902__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_902__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_903__weights (optional) - T2: weights to optimize.

  • __group_903__gradients (optional) - T3: gradients computed in this iteration.

  • __group_903__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_903__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_903__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_904__weights (optional) - T2: weights to optimize.

  • __group_904__gradients (optional) - T3: gradients computed in this iteration.

  • __group_904__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_904__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_904__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_905__weights (optional) - T2: weights to optimize.

  • __group_905__gradients (optional) - T3: gradients computed in this iteration.

  • __group_905__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_905__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_905__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_906__weights (optional) - T2: weights to optimize.

  • __group_906__gradients (optional) - T3: gradients computed in this iteration.

  • __group_906__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_906__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_906__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_907__weights (optional) - T2: weights to optimize.

  • __group_907__gradients (optional) - T3: gradients computed in this iteration.

  • __group_907__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_907__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_907__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_908__weights (optional) - T2: weights to optimize.

  • __group_908__gradients (optional) - T3: gradients computed in this iteration.

  • __group_908__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_908__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_908__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_909__weights (optional) - T2: weights to optimize.

  • __group_909__gradients (optional) - T3: gradients computed in this iteration.

  • __group_909__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_909__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_909__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_910__weights (optional) - T2: weights to optimize.

  • __group_910__gradients (optional) - T3: gradients computed in this iteration.

  • __group_910__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_910__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_910__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_911__weights (optional) - T2: weights to optimize.

  • __group_911__gradients (optional) - T3: gradients computed in this iteration.

  • __group_911__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_911__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_911__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_912__weights (optional) - T2: weights to optimize.

  • __group_912__gradients (optional) - T3: gradients computed in this iteration.

  • __group_912__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_912__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_912__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_913__weights (optional) - T2: weights to optimize.

  • __group_913__gradients (optional) - T3: gradients computed in this iteration.

  • __group_913__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_913__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_913__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_914__weights (optional) - T2: weights to optimize.

  • __group_914__gradients (optional) - T3: gradients computed in this iteration.

  • __group_914__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_914__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_914__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_915__weights (optional) - T2: weights to optimize.

  • __group_915__gradients (optional) - T3: gradients computed in this iteration.

  • __group_915__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_915__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_915__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_916__weights (optional) - T2: weights to optimize.

  • __group_916__gradients (optional) - T3: gradients computed in this iteration.

  • __group_916__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_916__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_916__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_917__weights (optional) - T2: weights to optimize.

  • __group_917__gradients (optional) - T3: gradients computed in this iteration.

  • __group_917__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_917__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_917__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_918__weights (optional) - T2: weights to optimize.

  • __group_918__gradients (optional) - T3: gradients computed in this iteration.

  • __group_918__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_918__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_918__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_919__weights (optional) - T2: weights to optimize.

  • __group_919__gradients (optional) - T3: gradients computed in this iteration.

  • __group_919__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_919__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_919__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_920__weights (optional) - T2: weights to optimize.

  • __group_920__gradients (optional) - T3: gradients computed in this iteration.

  • __group_920__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_920__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_920__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_921__weights (optional) - T2: weights to optimize.

  • __group_921__gradients (optional) - T3: gradients computed in this iteration.

  • __group_921__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_921__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_921__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_922__weights (optional) - T2: weights to optimize.

  • __group_922__gradients (optional) - T3: gradients computed in this iteration.

  • __group_922__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_922__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_922__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_923__weights (optional) - T2: weights to optimize.

  • __group_923__gradients (optional) - T3: gradients computed in this iteration.

  • __group_923__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_923__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_923__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_924__weights (optional) - T2: weights to optimize.

  • __group_924__gradients (optional) - T3: gradients computed in this iteration.

  • __group_924__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_924__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_924__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_925__weights (optional) - T2: weights to optimize.

  • __group_925__gradients (optional) - T3: gradients computed in this iteration.

  • __group_925__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_925__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_925__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_926__weights (optional) - T2: weights to optimize.

  • __group_926__gradients (optional) - T3: gradients computed in this iteration.

  • __group_926__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_926__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_926__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_927__weights (optional) - T2: weights to optimize.

  • __group_927__gradients (optional) - T3: gradients computed in this iteration.

  • __group_927__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_927__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_927__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_928__weights (optional) - T2: weights to optimize.

  • __group_928__gradients (optional) - T3: gradients computed in this iteration.

  • __group_928__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_928__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_928__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_929__weights (optional) - T2: weights to optimize.

  • __group_929__gradients (optional) - T3: gradients computed in this iteration.

  • __group_929__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_929__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_929__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_930__weights (optional) - T2: weights to optimize.

  • __group_930__gradients (optional) - T3: gradients computed in this iteration.

  • __group_930__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_930__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_930__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_931__weights (optional) - T2: weights to optimize.

  • __group_931__gradients (optional) - T3: gradients computed in this iteration.

  • __group_931__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_931__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_931__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_932__weights (optional) - T2: weights to optimize.

  • __group_932__gradients (optional) - T3: gradients computed in this iteration.

  • __group_932__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_932__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_932__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_933__weights (optional) - T2: weights to optimize.

  • __group_933__gradients (optional) - T3: gradients computed in this iteration.

  • __group_933__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_933__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_933__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_934__weights (optional) - T2: weights to optimize.

  • __group_934__gradients (optional) - T3: gradients computed in this iteration.

  • __group_934__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_934__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_934__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_935__weights (optional) - T2: weights to optimize.

  • __group_935__gradients (optional) - T3: gradients computed in this iteration.

  • __group_935__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_935__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_935__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_936__weights (optional) - T2: weights to optimize.

  • __group_936__gradients (optional) - T3: gradients computed in this iteration.

  • __group_936__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_936__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_936__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_937__weights (optional) - T2: weights to optimize.

  • __group_937__gradients (optional) - T3: gradients computed in this iteration.

  • __group_937__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_937__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_937__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_938__weights (optional) - T2: weights to optimize.

  • __group_938__gradients (optional) - T3: gradients computed in this iteration.

  • __group_938__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_938__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_938__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_939__weights (optional) - T2: weights to optimize.

  • __group_939__gradients (optional) - T3: gradients computed in this iteration.

  • __group_939__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_939__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_939__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_940__weights (optional) - T2: weights to optimize.

  • __group_940__gradients (optional) - T3: gradients computed in this iteration.

  • __group_940__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_940__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_940__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_941__weights (optional) - T2: weights to optimize.

  • __group_941__gradients (optional) - T3: gradients computed in this iteration.

  • __group_941__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_941__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_941__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_942__weights (optional) - T2: weights to optimize.

  • __group_942__gradients (optional) - T3: gradients computed in this iteration.

  • __group_942__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_942__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_942__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_943__weights (optional) - T2: weights to optimize.

  • __group_943__gradients (optional) - T3: gradients computed in this iteration.

  • __group_943__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_943__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_943__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_944__weights (optional) - T2: weights to optimize.

  • __group_944__gradients (optional) - T3: gradients computed in this iteration.

  • __group_944__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_944__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_944__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_945__weights (optional) - T2: weights to optimize.

  • __group_945__gradients (optional) - T3: gradients computed in this iteration.

  • __group_945__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_945__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_945__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_946__weights (optional) - T2: weights to optimize.

  • __group_946__gradients (optional) - T3: gradients computed in this iteration.

  • __group_946__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_946__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_946__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_947__weights (optional) - T2: weights to optimize.

  • __group_947__gradients (optional) - T3: gradients computed in this iteration.

  • __group_947__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_947__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_947__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_948__weights (optional) - T2: weights to optimize.

  • __group_948__gradients (optional) - T3: gradients computed in this iteration.

  • __group_948__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_948__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_948__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_949__weights (optional) - T2: weights to optimize.

  • __group_949__gradients (optional) - T3: gradients computed in this iteration.

  • __group_949__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_949__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_949__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_950__weights (optional) - T2: weights to optimize.

  • __group_950__gradients (optional) - T3: gradients computed in this iteration.

  • __group_950__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_950__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_950__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_951__weights (optional) - T2: weights to optimize.

  • __group_951__gradients (optional) - T3: gradients computed in this iteration.

  • __group_951__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_951__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_951__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_952__weights (optional) - T2: weights to optimize.

  • __group_952__gradients (optional) - T3: gradients computed in this iteration.

  • __group_952__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_952__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_952__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_953__weights (optional) - T2: weights to optimize.

  • __group_953__gradients (optional) - T3: gradients computed in this iteration.

  • __group_953__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_953__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_953__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_954__weights (optional) - T2: weights to optimize.

  • __group_954__gradients (optional) - T3: gradients computed in this iteration.

  • __group_954__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_954__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_954__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_955__weights (optional) - T2: weights to optimize.

  • __group_955__gradients (optional) - T3: gradients computed in this iteration.

  • __group_955__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_955__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_955__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_956__weights (optional) - T2: weights to optimize.

  • __group_956__gradients (optional) - T3: gradients computed in this iteration.

  • __group_956__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_956__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_956__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_957__weights (optional) - T2: weights to optimize.

  • __group_957__gradients (optional) - T3: gradients computed in this iteration.

  • __group_957__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_957__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_957__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_958__weights (optional) - T2: weights to optimize.

  • __group_958__gradients (optional) - T3: gradients computed in this iteration.

  • __group_958__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_958__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_958__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_959__weights (optional) - T2: weights to optimize.

  • __group_959__gradients (optional) - T3: gradients computed in this iteration.

  • __group_959__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_959__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_959__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_960__weights (optional) - T2: weights to optimize.

  • __group_960__gradients (optional) - T3: gradients computed in this iteration.

  • __group_960__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_960__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_960__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_961__weights (optional) - T2: weights to optimize.

  • __group_961__gradients (optional) - T3: gradients computed in this iteration.

  • __group_961__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_961__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_961__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_962__weights (optional) - T2: weights to optimize.

  • __group_962__gradients (optional) - T3: gradients computed in this iteration.

  • __group_962__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_962__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_962__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_963__weights (optional) - T2: weights to optimize.

  • __group_963__gradients (optional) - T3: gradients computed in this iteration.

  • __group_963__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_963__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_963__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_964__weights (optional) - T2: weights to optimize.

  • __group_964__gradients (optional) - T3: gradients computed in this iteration.

  • __group_964__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_964__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_964__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_965__weights (optional) - T2: weights to optimize.

  • __group_965__gradients (optional) - T3: gradients computed in this iteration.

  • __group_965__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_965__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_965__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_966__weights (optional) - T2: weights to optimize.

  • __group_966__gradients (optional) - T3: gradients computed in this iteration.

  • __group_966__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_966__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_966__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_967__weights (optional) - T2: weights to optimize.

  • __group_967__gradients (optional) - T3: gradients computed in this iteration.

  • __group_967__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_967__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_967__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_968__weights (optional) - T2: weights to optimize.

  • __group_968__gradients (optional) - T3: gradients computed in this iteration.

  • __group_968__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_968__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_968__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_969__weights (optional) - T2: weights to optimize.

  • __group_969__gradients (optional) - T3: gradients computed in this iteration.

  • __group_969__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_969__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_969__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_970__weights (optional) - T2: weights to optimize.

  • __group_970__gradients (optional) - T3: gradients computed in this iteration.

  • __group_970__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_970__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_970__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_971__weights (optional) - T2: weights to optimize.

  • __group_971__gradients (optional) - T3: gradients computed in this iteration.

  • __group_971__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_971__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_971__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_972__weights (optional) - T2: weights to optimize.

  • __group_972__gradients (optional) - T3: gradients computed in this iteration.

  • __group_972__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_972__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_972__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_973__weights (optional) - T2: weights to optimize.

  • __group_973__gradients (optional) - T3: gradients computed in this iteration.

  • __group_973__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_973__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_973__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_974__weights (optional) - T2: weights to optimize.

  • __group_974__gradients (optional) - T3: gradients computed in this iteration.

  • __group_974__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_974__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_974__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_975__weights (optional) - T2: weights to optimize.

  • __group_975__gradients (optional) - T3: gradients computed in this iteration.

  • __group_975__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_975__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_975__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_976__weights (optional) - T2: weights to optimize.

  • __group_976__gradients (optional) - T3: gradients computed in this iteration.

  • __group_976__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_976__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_976__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_977__weights (optional) - T2: weights to optimize.

  • __group_977__gradients (optional) - T3: gradients computed in this iteration.

  • __group_977__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_977__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_977__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_978__weights (optional) - T2: weights to optimize.

  • __group_978__gradients (optional) - T3: gradients computed in this iteration.

  • __group_978__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_978__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_978__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_979__weights (optional) - T2: weights to optimize.

  • __group_979__gradients (optional) - T3: gradients computed in this iteration.

  • __group_979__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_979__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_979__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_980__weights (optional) - T2: weights to optimize.

  • __group_980__gradients (optional) - T3: gradients computed in this iteration.

  • __group_980__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_980__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_980__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_981__weights (optional) - T2: weights to optimize.

  • __group_981__gradients (optional) - T3: gradients computed in this iteration.

  • __group_981__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_981__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_981__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_982__weights (optional) - T2: weights to optimize.

  • __group_982__gradients (optional) - T3: gradients computed in this iteration.

  • __group_982__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_982__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_982__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_983__weights (optional) - T2: weights to optimize.

  • __group_983__gradients (optional) - T3: gradients computed in this iteration.

  • __group_983__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_983__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_983__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_984__weights (optional) - T2: weights to optimize.

  • __group_984__gradients (optional) - T3: gradients computed in this iteration.

  • __group_984__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_984__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_984__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_985__weights (optional) - T2: weights to optimize.

  • __group_985__gradients (optional) - T3: gradients computed in this iteration.

  • __group_985__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_985__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_985__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_986__weights (optional) - T2: weights to optimize.

  • __group_986__gradients (optional) - T3: gradients computed in this iteration.

  • __group_986__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_986__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_986__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_987__weights (optional) - T2: weights to optimize.

  • __group_987__gradients (optional) - T3: gradients computed in this iteration.

  • __group_987__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_987__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_987__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_988__weights (optional) - T2: weights to optimize.

  • __group_988__gradients (optional) - T3: gradients computed in this iteration.

  • __group_988__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_988__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_988__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_989__weights (optional) - T2: weights to optimize.

  • __group_989__gradients (optional) - T3: gradients computed in this iteration.

  • __group_989__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_989__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_989__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_990__weights (optional) - T2: weights to optimize.

  • __group_990__gradients (optional) - T3: gradients computed in this iteration.

  • __group_990__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_990__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_990__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_991__weights (optional) - T2: weights to optimize.

  • __group_991__gradients (optional) - T3: gradients computed in this iteration.

  • __group_991__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_991__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_991__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_992__weights (optional) - T2: weights to optimize.

  • __group_992__gradients (optional) - T3: gradients computed in this iteration.

  • __group_992__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_992__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_992__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_993__weights (optional) - T2: weights to optimize.

  • __group_993__gradients (optional) - T3: gradients computed in this iteration.

  • __group_993__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_993__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_993__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_994__weights (optional) - T2: weights to optimize.

  • __group_994__gradients (optional) - T3: gradients computed in this iteration.

  • __group_994__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_994__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_994__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_995__weights (optional) - T2: weights to optimize.

  • __group_995__gradients (optional) - T3: gradients computed in this iteration.

  • __group_995__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_995__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_995__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_996__weights (optional) - T2: weights to optimize.

  • __group_996__gradients (optional) - T3: gradients computed in this iteration.

  • __group_996__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_996__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_996__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_997__weights (optional) - T2: weights to optimize.

  • __group_997__gradients (optional) - T3: gradients computed in this iteration.

  • __group_997__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_997__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_997__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_998__weights (optional) - T2: weights to optimize.

  • __group_998__gradients (optional) - T3: gradients computed in this iteration.

  • __group_998__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_998__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_998__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_999__weights (optional) - T2: weights to optimize.

  • __group_999__gradients (optional) - T3: gradients computed in this iteration.

  • __group_999__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_999__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_999__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1000__weights (optional) - T2: weights to optimize.

  • __group_1000__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1000__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1000__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1000__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1001__weights (optional) - T2: weights to optimize.

  • __group_1001__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1001__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1001__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1001__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1002__weights (optional) - T2: weights to optimize.

  • __group_1002__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1002__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1002__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1002__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1003__weights (optional) - T2: weights to optimize.

  • __group_1003__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1003__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1003__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1003__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1004__weights (optional) - T2: weights to optimize.

  • __group_1004__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1004__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1004__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1004__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1005__weights (optional) - T2: weights to optimize.

  • __group_1005__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1005__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1005__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1005__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1006__weights (optional) - T2: weights to optimize.

  • __group_1006__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1006__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1006__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1006__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1007__weights (optional) - T2: weights to optimize.

  • __group_1007__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1007__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1007__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1007__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1008__weights (optional) - T2: weights to optimize.

  • __group_1008__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1008__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1008__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1008__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1009__weights (optional) - T2: weights to optimize.

  • __group_1009__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1009__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1009__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1009__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1010__weights (optional) - T2: weights to optimize.

  • __group_1010__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1010__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1010__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1010__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1011__weights (optional) - T2: weights to optimize.

  • __group_1011__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1011__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1011__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1011__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1012__weights (optional) - T2: weights to optimize.

  • __group_1012__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1012__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1012__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1012__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1013__weights (optional) - T2: weights to optimize.

  • __group_1013__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1013__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1013__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1013__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1014__weights (optional) - T2: weights to optimize.

  • __group_1014__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1014__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1014__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1014__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1015__weights (optional) - T2: weights to optimize.

  • __group_1015__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1015__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1015__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1015__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1016__weights (optional) - T2: weights to optimize.

  • __group_1016__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1016__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1016__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1016__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1017__weights (optional) - T2: weights to optimize.

  • __group_1017__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1017__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1017__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1017__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1018__weights (optional) - T2: weights to optimize.

  • __group_1018__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1018__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1018__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1018__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1019__weights (optional) - T2: weights to optimize.

  • __group_1019__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1019__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1019__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1019__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1020__weights (optional) - T2: weights to optimize.

  • __group_1020__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1020__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1020__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1020__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1021__weights (optional) - T2: weights to optimize.

  • __group_1021__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1021__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1021__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1021__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1022__weights (optional) - T2: weights to optimize.

  • __group_1022__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1022__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1022__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1022__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

  • __group_1023__weights (optional) - T2: weights to optimize.

  • __group_1023__gradients (optional) - T3: gradients computed in this iteration.

  • __group_1023__moment1 (optional) - T4: exponentially averaged historical gradients.

  • __group_1023__moment2 (optional) - T4: exponentially averaged historical squared gradients.

  • __group_1023__mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: FP16 or BF16 weights to optimize.

Outputs

Between 0 and 5121 outputs.

  • new_step (optional, heterogeneous) - TInt64: One-based index of the next training iteration.

  • __group_0__new_weights (optional) - T2: New weights

  • __group_0__new_gradients (optional) - T3: New gradients

  • __group_0__new_moment_1 (optional) - T4: New averaged gradients

  • __group_0__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_0__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1__new_weights (optional) - T2: New weights

  • __group_1__new_gradients (optional) - T3: New gradients

  • __group_1__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_2__new_weights (optional) - T2: New weights

  • __group_2__new_gradients (optional) - T3: New gradients

  • __group_2__new_moment_1 (optional) - T4: New averaged gradients

  • __group_2__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_2__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_3__new_weights (optional) - T2: New weights

  • __group_3__new_gradients (optional) - T3: New gradients

  • __group_3__new_moment_1 (optional) - T4: New averaged gradients

  • __group_3__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_3__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_4__new_weights (optional) - T2: New weights

  • __group_4__new_gradients (optional) - T3: New gradients

  • __group_4__new_moment_1 (optional) - T4: New averaged gradients

  • __group_4__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_4__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_5__new_weights (optional) - T2: New weights

  • __group_5__new_gradients (optional) - T3: New gradients

  • __group_5__new_moment_1 (optional) - T4: New averaged gradients

  • __group_5__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_5__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_6__new_weights (optional) - T2: New weights

  • __group_6__new_gradients (optional) - T3: New gradients

  • __group_6__new_moment_1 (optional) - T4: New averaged gradients

  • __group_6__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_6__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_7__new_weights (optional) - T2: New weights

  • __group_7__new_gradients (optional) - T3: New gradients

  • __group_7__new_moment_1 (optional) - T4: New averaged gradients

  • __group_7__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_7__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_8__new_weights (optional) - T2: New weights

  • __group_8__new_gradients (optional) - T3: New gradients

  • __group_8__new_moment_1 (optional) - T4: New averaged gradients

  • __group_8__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_8__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_9__new_weights (optional) - T2: New weights

  • __group_9__new_gradients (optional) - T3: New gradients

  • __group_9__new_moment_1 (optional) - T4: New averaged gradients

  • __group_9__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_9__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_10__new_weights (optional) - T2: New weights

  • __group_10__new_gradients (optional) - T3: New gradients

  • __group_10__new_moment_1 (optional) - T4: New averaged gradients

  • __group_10__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_10__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_11__new_weights (optional) - T2: New weights

  • __group_11__new_gradients (optional) - T3: New gradients

  • __group_11__new_moment_1 (optional) - T4: New averaged gradients

  • __group_11__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_11__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_12__new_weights (optional) - T2: New weights

  • __group_12__new_gradients (optional) - T3: New gradients

  • __group_12__new_moment_1 (optional) - T4: New averaged gradients

  • __group_12__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_12__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_13__new_weights (optional) - T2: New weights

  • __group_13__new_gradients (optional) - T3: New gradients

  • __group_13__new_moment_1 (optional) - T4: New averaged gradients

  • __group_13__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_13__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_14__new_weights (optional) - T2: New weights

  • __group_14__new_gradients (optional) - T3: New gradients

  • __group_14__new_moment_1 (optional) - T4: New averaged gradients

  • __group_14__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_14__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_15__new_weights (optional) - T2: New weights

  • __group_15__new_gradients (optional) - T3: New gradients

  • __group_15__new_moment_1 (optional) - T4: New averaged gradients

  • __group_15__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_15__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_16__new_weights (optional) - T2: New weights

  • __group_16__new_gradients (optional) - T3: New gradients

  • __group_16__new_moment_1 (optional) - T4: New averaged gradients

  • __group_16__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_16__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_17__new_weights (optional) - T2: New weights

  • __group_17__new_gradients (optional) - T3: New gradients

  • __group_17__new_moment_1 (optional) - T4: New averaged gradients

  • __group_17__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_17__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_18__new_weights (optional) - T2: New weights

  • __group_18__new_gradients (optional) - T3: New gradients

  • __group_18__new_moment_1 (optional) - T4: New averaged gradients

  • __group_18__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_18__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_19__new_weights (optional) - T2: New weights

  • __group_19__new_gradients (optional) - T3: New gradients

  • __group_19__new_moment_1 (optional) - T4: New averaged gradients

  • __group_19__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_19__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_20__new_weights (optional) - T2: New weights

  • __group_20__new_gradients (optional) - T3: New gradients

  • __group_20__new_moment_1 (optional) - T4: New averaged gradients

  • __group_20__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_20__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_21__new_weights (optional) - T2: New weights

  • __group_21__new_gradients (optional) - T3: New gradients

  • __group_21__new_moment_1 (optional) - T4: New averaged gradients

  • __group_21__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_21__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_22__new_weights (optional) - T2: New weights

  • __group_22__new_gradients (optional) - T3: New gradients

  • __group_22__new_moment_1 (optional) - T4: New averaged gradients

  • __group_22__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_22__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_23__new_weights (optional) - T2: New weights

  • __group_23__new_gradients (optional) - T3: New gradients

  • __group_23__new_moment_1 (optional) - T4: New averaged gradients

  • __group_23__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_23__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_24__new_weights (optional) - T2: New weights

  • __group_24__new_gradients (optional) - T3: New gradients

  • __group_24__new_moment_1 (optional) - T4: New averaged gradients

  • __group_24__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_24__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_25__new_weights (optional) - T2: New weights

  • __group_25__new_gradients (optional) - T3: New gradients

  • __group_25__new_moment_1 (optional) - T4: New averaged gradients

  • __group_25__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_25__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_26__new_weights (optional) - T2: New weights

  • __group_26__new_gradients (optional) - T3: New gradients

  • __group_26__new_moment_1 (optional) - T4: New averaged gradients

  • __group_26__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_26__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_27__new_weights (optional) - T2: New weights

  • __group_27__new_gradients (optional) - T3: New gradients

  • __group_27__new_moment_1 (optional) - T4: New averaged gradients

  • __group_27__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_27__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_28__new_weights (optional) - T2: New weights

  • __group_28__new_gradients (optional) - T3: New gradients

  • __group_28__new_moment_1 (optional) - T4: New averaged gradients

  • __group_28__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_28__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_29__new_weights (optional) - T2: New weights

  • __group_29__new_gradients (optional) - T3: New gradients

  • __group_29__new_moment_1 (optional) - T4: New averaged gradients

  • __group_29__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_29__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_30__new_weights (optional) - T2: New weights

  • __group_30__new_gradients (optional) - T3: New gradients

  • __group_30__new_moment_1 (optional) - T4: New averaged gradients

  • __group_30__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_30__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_31__new_weights (optional) - T2: New weights

  • __group_31__new_gradients (optional) - T3: New gradients

  • __group_31__new_moment_1 (optional) - T4: New averaged gradients

  • __group_31__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_31__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_32__new_weights (optional) - T2: New weights

  • __group_32__new_gradients (optional) - T3: New gradients

  • __group_32__new_moment_1 (optional) - T4: New averaged gradients

  • __group_32__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_32__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_33__new_weights (optional) - T2: New weights

  • __group_33__new_gradients (optional) - T3: New gradients

  • __group_33__new_moment_1 (optional) - T4: New averaged gradients

  • __group_33__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_33__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_34__new_weights (optional) - T2: New weights

  • __group_34__new_gradients (optional) - T3: New gradients

  • __group_34__new_moment_1 (optional) - T4: New averaged gradients

  • __group_34__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_34__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_35__new_weights (optional) - T2: New weights

  • __group_35__new_gradients (optional) - T3: New gradients

  • __group_35__new_moment_1 (optional) - T4: New averaged gradients

  • __group_35__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_35__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_36__new_weights (optional) - T2: New weights

  • __group_36__new_gradients (optional) - T3: New gradients

  • __group_36__new_moment_1 (optional) - T4: New averaged gradients

  • __group_36__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_36__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_37__new_weights (optional) - T2: New weights

  • __group_37__new_gradients (optional) - T3: New gradients

  • __group_37__new_moment_1 (optional) - T4: New averaged gradients

  • __group_37__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_37__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_38__new_weights (optional) - T2: New weights

  • __group_38__new_gradients (optional) - T3: New gradients

  • __group_38__new_moment_1 (optional) - T4: New averaged gradients

  • __group_38__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_38__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_39__new_weights (optional) - T2: New weights

  • __group_39__new_gradients (optional) - T3: New gradients

  • __group_39__new_moment_1 (optional) - T4: New averaged gradients

  • __group_39__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_39__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_40__new_weights (optional) - T2: New weights

  • __group_40__new_gradients (optional) - T3: New gradients

  • __group_40__new_moment_1 (optional) - T4: New averaged gradients

  • __group_40__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_40__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_41__new_weights (optional) - T2: New weights

  • __group_41__new_gradients (optional) - T3: New gradients

  • __group_41__new_moment_1 (optional) - T4: New averaged gradients

  • __group_41__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_41__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_42__new_weights (optional) - T2: New weights

  • __group_42__new_gradients (optional) - T3: New gradients

  • __group_42__new_moment_1 (optional) - T4: New averaged gradients

  • __group_42__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_42__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_43__new_weights (optional) - T2: New weights

  • __group_43__new_gradients (optional) - T3: New gradients

  • __group_43__new_moment_1 (optional) - T4: New averaged gradients

  • __group_43__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_43__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_44__new_weights (optional) - T2: New weights

  • __group_44__new_gradients (optional) - T3: New gradients

  • __group_44__new_moment_1 (optional) - T4: New averaged gradients

  • __group_44__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_44__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_45__new_weights (optional) - T2: New weights

  • __group_45__new_gradients (optional) - T3: New gradients

  • __group_45__new_moment_1 (optional) - T4: New averaged gradients

  • __group_45__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_45__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_46__new_weights (optional) - T2: New weights

  • __group_46__new_gradients (optional) - T3: New gradients

  • __group_46__new_moment_1 (optional) - T4: New averaged gradients

  • __group_46__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_46__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_47__new_weights (optional) - T2: New weights

  • __group_47__new_gradients (optional) - T3: New gradients

  • __group_47__new_moment_1 (optional) - T4: New averaged gradients

  • __group_47__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_47__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_48__new_weights (optional) - T2: New weights

  • __group_48__new_gradients (optional) - T3: New gradients

  • __group_48__new_moment_1 (optional) - T4: New averaged gradients

  • __group_48__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_48__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_49__new_weights (optional) - T2: New weights

  • __group_49__new_gradients (optional) - T3: New gradients

  • __group_49__new_moment_1 (optional) - T4: New averaged gradients

  • __group_49__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_49__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_50__new_weights (optional) - T2: New weights

  • __group_50__new_gradients (optional) - T3: New gradients

  • __group_50__new_moment_1 (optional) - T4: New averaged gradients

  • __group_50__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_50__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_51__new_weights (optional) - T2: New weights

  • __group_51__new_gradients (optional) - T3: New gradients

  • __group_51__new_moment_1 (optional) - T4: New averaged gradients

  • __group_51__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_51__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_52__new_weights (optional) - T2: New weights

  • __group_52__new_gradients (optional) - T3: New gradients

  • __group_52__new_moment_1 (optional) - T4: New averaged gradients

  • __group_52__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_52__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_53__new_weights (optional) - T2: New weights

  • __group_53__new_gradients (optional) - T3: New gradients

  • __group_53__new_moment_1 (optional) - T4: New averaged gradients

  • __group_53__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_53__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_54__new_weights (optional) - T2: New weights

  • __group_54__new_gradients (optional) - T3: New gradients

  • __group_54__new_moment_1 (optional) - T4: New averaged gradients

  • __group_54__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_54__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_55__new_weights (optional) - T2: New weights

  • __group_55__new_gradients (optional) - T3: New gradients

  • __group_55__new_moment_1 (optional) - T4: New averaged gradients

  • __group_55__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_55__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_56__new_weights (optional) - T2: New weights

  • __group_56__new_gradients (optional) - T3: New gradients

  • __group_56__new_moment_1 (optional) - T4: New averaged gradients

  • __group_56__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_56__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_57__new_weights (optional) - T2: New weights

  • __group_57__new_gradients (optional) - T3: New gradients

  • __group_57__new_moment_1 (optional) - T4: New averaged gradients

  • __group_57__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_57__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_58__new_weights (optional) - T2: New weights

  • __group_58__new_gradients (optional) - T3: New gradients

  • __group_58__new_moment_1 (optional) - T4: New averaged gradients

  • __group_58__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_58__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_59__new_weights (optional) - T2: New weights

  • __group_59__new_gradients (optional) - T3: New gradients

  • __group_59__new_moment_1 (optional) - T4: New averaged gradients

  • __group_59__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_59__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_60__new_weights (optional) - T2: New weights

  • __group_60__new_gradients (optional) - T3: New gradients

  • __group_60__new_moment_1 (optional) - T4: New averaged gradients

  • __group_60__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_60__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_61__new_weights (optional) - T2: New weights

  • __group_61__new_gradients (optional) - T3: New gradients

  • __group_61__new_moment_1 (optional) - T4: New averaged gradients

  • __group_61__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_61__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_62__new_weights (optional) - T2: New weights

  • __group_62__new_gradients (optional) - T3: New gradients

  • __group_62__new_moment_1 (optional) - T4: New averaged gradients

  • __group_62__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_62__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_63__new_weights (optional) - T2: New weights

  • __group_63__new_gradients (optional) - T3: New gradients

  • __group_63__new_moment_1 (optional) - T4: New averaged gradients

  • __group_63__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_63__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_64__new_weights (optional) - T2: New weights

  • __group_64__new_gradients (optional) - T3: New gradients

  • __group_64__new_moment_1 (optional) - T4: New averaged gradients

  • __group_64__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_64__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_65__new_weights (optional) - T2: New weights

  • __group_65__new_gradients (optional) - T3: New gradients

  • __group_65__new_moment_1 (optional) - T4: New averaged gradients

  • __group_65__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_65__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_66__new_weights (optional) - T2: New weights

  • __group_66__new_gradients (optional) - T3: New gradients

  • __group_66__new_moment_1 (optional) - T4: New averaged gradients

  • __group_66__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_66__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_67__new_weights (optional) - T2: New weights

  • __group_67__new_gradients (optional) - T3: New gradients

  • __group_67__new_moment_1 (optional) - T4: New averaged gradients

  • __group_67__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_67__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_68__new_weights (optional) - T2: New weights

  • __group_68__new_gradients (optional) - T3: New gradients

  • __group_68__new_moment_1 (optional) - T4: New averaged gradients

  • __group_68__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_68__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_69__new_weights (optional) - T2: New weights

  • __group_69__new_gradients (optional) - T3: New gradients

  • __group_69__new_moment_1 (optional) - T4: New averaged gradients

  • __group_69__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_69__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_70__new_weights (optional) - T2: New weights

  • __group_70__new_gradients (optional) - T3: New gradients

  • __group_70__new_moment_1 (optional) - T4: New averaged gradients

  • __group_70__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_70__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_71__new_weights (optional) - T2: New weights

  • __group_71__new_gradients (optional) - T3: New gradients

  • __group_71__new_moment_1 (optional) - T4: New averaged gradients

  • __group_71__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_71__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_72__new_weights (optional) - T2: New weights

  • __group_72__new_gradients (optional) - T3: New gradients

  • __group_72__new_moment_1 (optional) - T4: New averaged gradients

  • __group_72__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_72__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_73__new_weights (optional) - T2: New weights

  • __group_73__new_gradients (optional) - T3: New gradients

  • __group_73__new_moment_1 (optional) - T4: New averaged gradients

  • __group_73__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_73__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_74__new_weights (optional) - T2: New weights

  • __group_74__new_gradients (optional) - T3: New gradients

  • __group_74__new_moment_1 (optional) - T4: New averaged gradients

  • __group_74__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_74__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_75__new_weights (optional) - T2: New weights

  • __group_75__new_gradients (optional) - T3: New gradients

  • __group_75__new_moment_1 (optional) - T4: New averaged gradients

  • __group_75__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_75__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_76__new_weights (optional) - T2: New weights

  • __group_76__new_gradients (optional) - T3: New gradients

  • __group_76__new_moment_1 (optional) - T4: New averaged gradients

  • __group_76__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_76__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_77__new_weights (optional) - T2: New weights

  • __group_77__new_gradients (optional) - T3: New gradients

  • __group_77__new_moment_1 (optional) - T4: New averaged gradients

  • __group_77__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_77__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_78__new_weights (optional) - T2: New weights

  • __group_78__new_gradients (optional) - T3: New gradients

  • __group_78__new_moment_1 (optional) - T4: New averaged gradients

  • __group_78__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_78__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_79__new_weights (optional) - T2: New weights

  • __group_79__new_gradients (optional) - T3: New gradients

  • __group_79__new_moment_1 (optional) - T4: New averaged gradients

  • __group_79__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_79__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_80__new_weights (optional) - T2: New weights

  • __group_80__new_gradients (optional) - T3: New gradients

  • __group_80__new_moment_1 (optional) - T4: New averaged gradients

  • __group_80__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_80__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_81__new_weights (optional) - T2: New weights

  • __group_81__new_gradients (optional) - T3: New gradients

  • __group_81__new_moment_1 (optional) - T4: New averaged gradients

  • __group_81__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_81__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_82__new_weights (optional) - T2: New weights

  • __group_82__new_gradients (optional) - T3: New gradients

  • __group_82__new_moment_1 (optional) - T4: New averaged gradients

  • __group_82__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_82__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_83__new_weights (optional) - T2: New weights

  • __group_83__new_gradients (optional) - T3: New gradients

  • __group_83__new_moment_1 (optional) - T4: New averaged gradients

  • __group_83__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_83__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_84__new_weights (optional) - T2: New weights

  • __group_84__new_gradients (optional) - T3: New gradients

  • __group_84__new_moment_1 (optional) - T4: New averaged gradients

  • __group_84__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_84__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_85__new_weights (optional) - T2: New weights

  • __group_85__new_gradients (optional) - T3: New gradients

  • __group_85__new_moment_1 (optional) - T4: New averaged gradients

  • __group_85__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_85__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_86__new_weights (optional) - T2: New weights

  • __group_86__new_gradients (optional) - T3: New gradients

  • __group_86__new_moment_1 (optional) - T4: New averaged gradients

  • __group_86__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_86__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_87__new_weights (optional) - T2: New weights

  • __group_87__new_gradients (optional) - T3: New gradients

  • __group_87__new_moment_1 (optional) - T4: New averaged gradients

  • __group_87__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_87__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_88__new_weights (optional) - T2: New weights

  • __group_88__new_gradients (optional) - T3: New gradients

  • __group_88__new_moment_1 (optional) - T4: New averaged gradients

  • __group_88__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_88__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_89__new_weights (optional) - T2: New weights

  • __group_89__new_gradients (optional) - T3: New gradients

  • __group_89__new_moment_1 (optional) - T4: New averaged gradients

  • __group_89__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_89__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_90__new_weights (optional) - T2: New weights

  • __group_90__new_gradients (optional) - T3: New gradients

  • __group_90__new_moment_1 (optional) - T4: New averaged gradients

  • __group_90__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_90__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_91__new_weights (optional) - T2: New weights

  • __group_91__new_gradients (optional) - T3: New gradients

  • __group_91__new_moment_1 (optional) - T4: New averaged gradients

  • __group_91__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_91__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_92__new_weights (optional) - T2: New weights

  • __group_92__new_gradients (optional) - T3: New gradients

  • __group_92__new_moment_1 (optional) - T4: New averaged gradients

  • __group_92__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_92__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_93__new_weights (optional) - T2: New weights

  • __group_93__new_gradients (optional) - T3: New gradients

  • __group_93__new_moment_1 (optional) - T4: New averaged gradients

  • __group_93__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_93__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_94__new_weights (optional) - T2: New weights

  • __group_94__new_gradients (optional) - T3: New gradients

  • __group_94__new_moment_1 (optional) - T4: New averaged gradients

  • __group_94__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_94__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_95__new_weights (optional) - T2: New weights

  • __group_95__new_gradients (optional) - T3: New gradients

  • __group_95__new_moment_1 (optional) - T4: New averaged gradients

  • __group_95__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_95__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_96__new_weights (optional) - T2: New weights

  • __group_96__new_gradients (optional) - T3: New gradients

  • __group_96__new_moment_1 (optional) - T4: New averaged gradients

  • __group_96__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_96__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_97__new_weights (optional) - T2: New weights

  • __group_97__new_gradients (optional) - T3: New gradients

  • __group_97__new_moment_1 (optional) - T4: New averaged gradients

  • __group_97__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_97__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_98__new_weights (optional) - T2: New weights

  • __group_98__new_gradients (optional) - T3: New gradients

  • __group_98__new_moment_1 (optional) - T4: New averaged gradients

  • __group_98__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_98__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_99__new_weights (optional) - T2: New weights

  • __group_99__new_gradients (optional) - T3: New gradients

  • __group_99__new_moment_1 (optional) - T4: New averaged gradients

  • __group_99__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_99__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_100__new_weights (optional) - T2: New weights

  • __group_100__new_gradients (optional) - T3: New gradients

  • __group_100__new_moment_1 (optional) - T4: New averaged gradients

  • __group_100__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_100__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_101__new_weights (optional) - T2: New weights

  • __group_101__new_gradients (optional) - T3: New gradients

  • __group_101__new_moment_1 (optional) - T4: New averaged gradients

  • __group_101__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_101__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_102__new_weights (optional) - T2: New weights

  • __group_102__new_gradients (optional) - T3: New gradients

  • __group_102__new_moment_1 (optional) - T4: New averaged gradients

  • __group_102__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_102__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_103__new_weights (optional) - T2: New weights

  • __group_103__new_gradients (optional) - T3: New gradients

  • __group_103__new_moment_1 (optional) - T4: New averaged gradients

  • __group_103__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_103__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_104__new_weights (optional) - T2: New weights

  • __group_104__new_gradients (optional) - T3: New gradients

  • __group_104__new_moment_1 (optional) - T4: New averaged gradients

  • __group_104__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_104__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_105__new_weights (optional) - T2: New weights

  • __group_105__new_gradients (optional) - T3: New gradients

  • __group_105__new_moment_1 (optional) - T4: New averaged gradients

  • __group_105__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_105__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_106__new_weights (optional) - T2: New weights

  • __group_106__new_gradients (optional) - T3: New gradients

  • __group_106__new_moment_1 (optional) - T4: New averaged gradients

  • __group_106__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_106__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_107__new_weights (optional) - T2: New weights

  • __group_107__new_gradients (optional) - T3: New gradients

  • __group_107__new_moment_1 (optional) - T4: New averaged gradients

  • __group_107__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_107__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_108__new_weights (optional) - T2: New weights

  • __group_108__new_gradients (optional) - T3: New gradients

  • __group_108__new_moment_1 (optional) - T4: New averaged gradients

  • __group_108__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_108__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_109__new_weights (optional) - T2: New weights

  • __group_109__new_gradients (optional) - T3: New gradients

  • __group_109__new_moment_1 (optional) - T4: New averaged gradients

  • __group_109__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_109__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_110__new_weights (optional) - T2: New weights

  • __group_110__new_gradients (optional) - T3: New gradients

  • __group_110__new_moment_1 (optional) - T4: New averaged gradients

  • __group_110__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_110__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_111__new_weights (optional) - T2: New weights

  • __group_111__new_gradients (optional) - T3: New gradients

  • __group_111__new_moment_1 (optional) - T4: New averaged gradients

  • __group_111__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_111__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_112__new_weights (optional) - T2: New weights

  • __group_112__new_gradients (optional) - T3: New gradients

  • __group_112__new_moment_1 (optional) - T4: New averaged gradients

  • __group_112__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_112__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_113__new_weights (optional) - T2: New weights

  • __group_113__new_gradients (optional) - T3: New gradients

  • __group_113__new_moment_1 (optional) - T4: New averaged gradients

  • __group_113__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_113__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_114__new_weights (optional) - T2: New weights

  • __group_114__new_gradients (optional) - T3: New gradients

  • __group_114__new_moment_1 (optional) - T4: New averaged gradients

  • __group_114__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_114__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_115__new_weights (optional) - T2: New weights

  • __group_115__new_gradients (optional) - T3: New gradients

  • __group_115__new_moment_1 (optional) - T4: New averaged gradients

  • __group_115__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_115__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_116__new_weights (optional) - T2: New weights

  • __group_116__new_gradients (optional) - T3: New gradients

  • __group_116__new_moment_1 (optional) - T4: New averaged gradients

  • __group_116__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_116__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_117__new_weights (optional) - T2: New weights

  • __group_117__new_gradients (optional) - T3: New gradients

  • __group_117__new_moment_1 (optional) - T4: New averaged gradients

  • __group_117__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_117__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_118__new_weights (optional) - T2: New weights

  • __group_118__new_gradients (optional) - T3: New gradients

  • __group_118__new_moment_1 (optional) - T4: New averaged gradients

  • __group_118__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_118__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_119__new_weights (optional) - T2: New weights

  • __group_119__new_gradients (optional) - T3: New gradients

  • __group_119__new_moment_1 (optional) - T4: New averaged gradients

  • __group_119__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_119__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_120__new_weights (optional) - T2: New weights

  • __group_120__new_gradients (optional) - T3: New gradients

  • __group_120__new_moment_1 (optional) - T4: New averaged gradients

  • __group_120__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_120__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_121__new_weights (optional) - T2: New weights

  • __group_121__new_gradients (optional) - T3: New gradients

  • __group_121__new_moment_1 (optional) - T4: New averaged gradients

  • __group_121__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_121__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_122__new_weights (optional) - T2: New weights

  • __group_122__new_gradients (optional) - T3: New gradients

  • __group_122__new_moment_1 (optional) - T4: New averaged gradients

  • __group_122__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_122__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_123__new_weights (optional) - T2: New weights

  • __group_123__new_gradients (optional) - T3: New gradients

  • __group_123__new_moment_1 (optional) - T4: New averaged gradients

  • __group_123__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_123__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_124__new_weights (optional) - T2: New weights

  • __group_124__new_gradients (optional) - T3: New gradients

  • __group_124__new_moment_1 (optional) - T4: New averaged gradients

  • __group_124__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_124__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_125__new_weights (optional) - T2: New weights

  • __group_125__new_gradients (optional) - T3: New gradients

  • __group_125__new_moment_1 (optional) - T4: New averaged gradients

  • __group_125__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_125__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_126__new_weights (optional) - T2: New weights

  • __group_126__new_gradients (optional) - T3: New gradients

  • __group_126__new_moment_1 (optional) - T4: New averaged gradients

  • __group_126__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_126__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_127__new_weights (optional) - T2: New weights

  • __group_127__new_gradients (optional) - T3: New gradients

  • __group_127__new_moment_1 (optional) - T4: New averaged gradients

  • __group_127__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_127__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_128__new_weights (optional) - T2: New weights

  • __group_128__new_gradients (optional) - T3: New gradients

  • __group_128__new_moment_1 (optional) - T4: New averaged gradients

  • __group_128__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_128__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_129__new_weights (optional) - T2: New weights

  • __group_129__new_gradients (optional) - T3: New gradients

  • __group_129__new_moment_1 (optional) - T4: New averaged gradients

  • __group_129__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_129__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_130__new_weights (optional) - T2: New weights

  • __group_130__new_gradients (optional) - T3: New gradients

  • __group_130__new_moment_1 (optional) - T4: New averaged gradients

  • __group_130__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_130__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_131__new_weights (optional) - T2: New weights

  • __group_131__new_gradients (optional) - T3: New gradients

  • __group_131__new_moment_1 (optional) - T4: New averaged gradients

  • __group_131__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_131__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_132__new_weights (optional) - T2: New weights

  • __group_132__new_gradients (optional) - T3: New gradients

  • __group_132__new_moment_1 (optional) - T4: New averaged gradients

  • __group_132__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_132__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_133__new_weights (optional) - T2: New weights

  • __group_133__new_gradients (optional) - T3: New gradients

  • __group_133__new_moment_1 (optional) - T4: New averaged gradients

  • __group_133__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_133__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_134__new_weights (optional) - T2: New weights

  • __group_134__new_gradients (optional) - T3: New gradients

  • __group_134__new_moment_1 (optional) - T4: New averaged gradients

  • __group_134__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_134__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_135__new_weights (optional) - T2: New weights

  • __group_135__new_gradients (optional) - T3: New gradients

  • __group_135__new_moment_1 (optional) - T4: New averaged gradients

  • __group_135__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_135__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_136__new_weights (optional) - T2: New weights

  • __group_136__new_gradients (optional) - T3: New gradients

  • __group_136__new_moment_1 (optional) - T4: New averaged gradients

  • __group_136__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_136__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_137__new_weights (optional) - T2: New weights

  • __group_137__new_gradients (optional) - T3: New gradients

  • __group_137__new_moment_1 (optional) - T4: New averaged gradients

  • __group_137__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_137__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_138__new_weights (optional) - T2: New weights

  • __group_138__new_gradients (optional) - T3: New gradients

  • __group_138__new_moment_1 (optional) - T4: New averaged gradients

  • __group_138__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_138__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_139__new_weights (optional) - T2: New weights

  • __group_139__new_gradients (optional) - T3: New gradients

  • __group_139__new_moment_1 (optional) - T4: New averaged gradients

  • __group_139__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_139__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_140__new_weights (optional) - T2: New weights

  • __group_140__new_gradients (optional) - T3: New gradients

  • __group_140__new_moment_1 (optional) - T4: New averaged gradients

  • __group_140__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_140__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_141__new_weights (optional) - T2: New weights

  • __group_141__new_gradients (optional) - T3: New gradients

  • __group_141__new_moment_1 (optional) - T4: New averaged gradients

  • __group_141__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_141__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_142__new_weights (optional) - T2: New weights

  • __group_142__new_gradients (optional) - T3: New gradients

  • __group_142__new_moment_1 (optional) - T4: New averaged gradients

  • __group_142__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_142__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_143__new_weights (optional) - T2: New weights

  • __group_143__new_gradients (optional) - T3: New gradients

  • __group_143__new_moment_1 (optional) - T4: New averaged gradients

  • __group_143__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_143__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_144__new_weights (optional) - T2: New weights

  • __group_144__new_gradients (optional) - T3: New gradients

  • __group_144__new_moment_1 (optional) - T4: New averaged gradients

  • __group_144__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_144__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_145__new_weights (optional) - T2: New weights

  • __group_145__new_gradients (optional) - T3: New gradients

  • __group_145__new_moment_1 (optional) - T4: New averaged gradients

  • __group_145__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_145__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_146__new_weights (optional) - T2: New weights

  • __group_146__new_gradients (optional) - T3: New gradients

  • __group_146__new_moment_1 (optional) - T4: New averaged gradients

  • __group_146__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_146__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_147__new_weights (optional) - T2: New weights

  • __group_147__new_gradients (optional) - T3: New gradients

  • __group_147__new_moment_1 (optional) - T4: New averaged gradients

  • __group_147__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_147__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_148__new_weights (optional) - T2: New weights

  • __group_148__new_gradients (optional) - T3: New gradients

  • __group_148__new_moment_1 (optional) - T4: New averaged gradients

  • __group_148__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_148__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_149__new_weights (optional) - T2: New weights

  • __group_149__new_gradients (optional) - T3: New gradients

  • __group_149__new_moment_1 (optional) - T4: New averaged gradients

  • __group_149__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_149__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_150__new_weights (optional) - T2: New weights

  • __group_150__new_gradients (optional) - T3: New gradients

  • __group_150__new_moment_1 (optional) - T4: New averaged gradients

  • __group_150__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_150__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_151__new_weights (optional) - T2: New weights

  • __group_151__new_gradients (optional) - T3: New gradients

  • __group_151__new_moment_1 (optional) - T4: New averaged gradients

  • __group_151__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_151__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_152__new_weights (optional) - T2: New weights

  • __group_152__new_gradients (optional) - T3: New gradients

  • __group_152__new_moment_1 (optional) - T4: New averaged gradients

  • __group_152__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_152__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_153__new_weights (optional) - T2: New weights

  • __group_153__new_gradients (optional) - T3: New gradients

  • __group_153__new_moment_1 (optional) - T4: New averaged gradients

  • __group_153__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_153__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_154__new_weights (optional) - T2: New weights

  • __group_154__new_gradients (optional) - T3: New gradients

  • __group_154__new_moment_1 (optional) - T4: New averaged gradients

  • __group_154__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_154__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_155__new_weights (optional) - T2: New weights

  • __group_155__new_gradients (optional) - T3: New gradients

  • __group_155__new_moment_1 (optional) - T4: New averaged gradients

  • __group_155__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_155__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_156__new_weights (optional) - T2: New weights

  • __group_156__new_gradients (optional) - T3: New gradients

  • __group_156__new_moment_1 (optional) - T4: New averaged gradients

  • __group_156__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_156__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_157__new_weights (optional) - T2: New weights

  • __group_157__new_gradients (optional) - T3: New gradients

  • __group_157__new_moment_1 (optional) - T4: New averaged gradients

  • __group_157__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_157__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_158__new_weights (optional) - T2: New weights

  • __group_158__new_gradients (optional) - T3: New gradients

  • __group_158__new_moment_1 (optional) - T4: New averaged gradients

  • __group_158__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_158__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_159__new_weights (optional) - T2: New weights

  • __group_159__new_gradients (optional) - T3: New gradients

  • __group_159__new_moment_1 (optional) - T4: New averaged gradients

  • __group_159__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_159__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_160__new_weights (optional) - T2: New weights

  • __group_160__new_gradients (optional) - T3: New gradients

  • __group_160__new_moment_1 (optional) - T4: New averaged gradients

  • __group_160__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_160__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_161__new_weights (optional) - T2: New weights

  • __group_161__new_gradients (optional) - T3: New gradients

  • __group_161__new_moment_1 (optional) - T4: New averaged gradients

  • __group_161__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_161__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_162__new_weights (optional) - T2: New weights

  • __group_162__new_gradients (optional) - T3: New gradients

  • __group_162__new_moment_1 (optional) - T4: New averaged gradients

  • __group_162__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_162__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_163__new_weights (optional) - T2: New weights

  • __group_163__new_gradients (optional) - T3: New gradients

  • __group_163__new_moment_1 (optional) - T4: New averaged gradients

  • __group_163__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_163__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_164__new_weights (optional) - T2: New weights

  • __group_164__new_gradients (optional) - T3: New gradients

  • __group_164__new_moment_1 (optional) - T4: New averaged gradients

  • __group_164__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_164__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_165__new_weights (optional) - T2: New weights

  • __group_165__new_gradients (optional) - T3: New gradients

  • __group_165__new_moment_1 (optional) - T4: New averaged gradients

  • __group_165__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_165__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_166__new_weights (optional) - T2: New weights

  • __group_166__new_gradients (optional) - T3: New gradients

  • __group_166__new_moment_1 (optional) - T4: New averaged gradients

  • __group_166__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_166__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_167__new_weights (optional) - T2: New weights

  • __group_167__new_gradients (optional) - T3: New gradients

  • __group_167__new_moment_1 (optional) - T4: New averaged gradients

  • __group_167__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_167__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_168__new_weights (optional) - T2: New weights

  • __group_168__new_gradients (optional) - T3: New gradients

  • __group_168__new_moment_1 (optional) - T4: New averaged gradients

  • __group_168__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_168__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_169__new_weights (optional) - T2: New weights

  • __group_169__new_gradients (optional) - T3: New gradients

  • __group_169__new_moment_1 (optional) - T4: New averaged gradients

  • __group_169__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_169__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_170__new_weights (optional) - T2: New weights

  • __group_170__new_gradients (optional) - T3: New gradients

  • __group_170__new_moment_1 (optional) - T4: New averaged gradients

  • __group_170__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_170__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_171__new_weights (optional) - T2: New weights

  • __group_171__new_gradients (optional) - T3: New gradients

  • __group_171__new_moment_1 (optional) - T4: New averaged gradients

  • __group_171__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_171__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_172__new_weights (optional) - T2: New weights

  • __group_172__new_gradients (optional) - T3: New gradients

  • __group_172__new_moment_1 (optional) - T4: New averaged gradients

  • __group_172__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_172__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_173__new_weights (optional) - T2: New weights

  • __group_173__new_gradients (optional) - T3: New gradients

  • __group_173__new_moment_1 (optional) - T4: New averaged gradients

  • __group_173__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_173__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_174__new_weights (optional) - T2: New weights

  • __group_174__new_gradients (optional) - T3: New gradients

  • __group_174__new_moment_1 (optional) - T4: New averaged gradients

  • __group_174__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_174__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_175__new_weights (optional) - T2: New weights

  • __group_175__new_gradients (optional) - T3: New gradients

  • __group_175__new_moment_1 (optional) - T4: New averaged gradients

  • __group_175__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_175__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_176__new_weights (optional) - T2: New weights

  • __group_176__new_gradients (optional) - T3: New gradients

  • __group_176__new_moment_1 (optional) - T4: New averaged gradients

  • __group_176__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_176__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_177__new_weights (optional) - T2: New weights

  • __group_177__new_gradients (optional) - T3: New gradients

  • __group_177__new_moment_1 (optional) - T4: New averaged gradients

  • __group_177__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_177__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_178__new_weights (optional) - T2: New weights

  • __group_178__new_gradients (optional) - T3: New gradients

  • __group_178__new_moment_1 (optional) - T4: New averaged gradients

  • __group_178__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_178__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_179__new_weights (optional) - T2: New weights

  • __group_179__new_gradients (optional) - T3: New gradients

  • __group_179__new_moment_1 (optional) - T4: New averaged gradients

  • __group_179__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_179__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_180__new_weights (optional) - T2: New weights

  • __group_180__new_gradients (optional) - T3: New gradients

  • __group_180__new_moment_1 (optional) - T4: New averaged gradients

  • __group_180__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_180__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_181__new_weights (optional) - T2: New weights

  • __group_181__new_gradients (optional) - T3: New gradients

  • __group_181__new_moment_1 (optional) - T4: New averaged gradients

  • __group_181__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_181__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_182__new_weights (optional) - T2: New weights

  • __group_182__new_gradients (optional) - T3: New gradients

  • __group_182__new_moment_1 (optional) - T4: New averaged gradients

  • __group_182__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_182__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_183__new_weights (optional) - T2: New weights

  • __group_183__new_gradients (optional) - T3: New gradients

  • __group_183__new_moment_1 (optional) - T4: New averaged gradients

  • __group_183__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_183__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_184__new_weights (optional) - T2: New weights

  • __group_184__new_gradients (optional) - T3: New gradients

  • __group_184__new_moment_1 (optional) - T4: New averaged gradients

  • __group_184__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_184__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_185__new_weights (optional) - T2: New weights

  • __group_185__new_gradients (optional) - T3: New gradients

  • __group_185__new_moment_1 (optional) - T4: New averaged gradients

  • __group_185__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_185__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_186__new_weights (optional) - T2: New weights

  • __group_186__new_gradients (optional) - T3: New gradients

  • __group_186__new_moment_1 (optional) - T4: New averaged gradients

  • __group_186__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_186__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_187__new_weights (optional) - T2: New weights

  • __group_187__new_gradients (optional) - T3: New gradients

  • __group_187__new_moment_1 (optional) - T4: New averaged gradients

  • __group_187__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_187__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_188__new_weights (optional) - T2: New weights

  • __group_188__new_gradients (optional) - T3: New gradients

  • __group_188__new_moment_1 (optional) - T4: New averaged gradients

  • __group_188__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_188__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_189__new_weights (optional) - T2: New weights

  • __group_189__new_gradients (optional) - T3: New gradients

  • __group_189__new_moment_1 (optional) - T4: New averaged gradients

  • __group_189__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_189__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_190__new_weights (optional) - T2: New weights

  • __group_190__new_gradients (optional) - T3: New gradients

  • __group_190__new_moment_1 (optional) - T4: New averaged gradients

  • __group_190__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_190__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_191__new_weights (optional) - T2: New weights

  • __group_191__new_gradients (optional) - T3: New gradients

  • __group_191__new_moment_1 (optional) - T4: New averaged gradients

  • __group_191__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_191__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_192__new_weights (optional) - T2: New weights

  • __group_192__new_gradients (optional) - T3: New gradients

  • __group_192__new_moment_1 (optional) - T4: New averaged gradients

  • __group_192__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_192__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_193__new_weights (optional) - T2: New weights

  • __group_193__new_gradients (optional) - T3: New gradients

  • __group_193__new_moment_1 (optional) - T4: New averaged gradients

  • __group_193__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_193__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_194__new_weights (optional) - T2: New weights

  • __group_194__new_gradients (optional) - T3: New gradients

  • __group_194__new_moment_1 (optional) - T4: New averaged gradients

  • __group_194__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_194__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_195__new_weights (optional) - T2: New weights

  • __group_195__new_gradients (optional) - T3: New gradients

  • __group_195__new_moment_1 (optional) - T4: New averaged gradients

  • __group_195__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_195__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_196__new_weights (optional) - T2: New weights

  • __group_196__new_gradients (optional) - T3: New gradients

  • __group_196__new_moment_1 (optional) - T4: New averaged gradients

  • __group_196__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_196__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_197__new_weights (optional) - T2: New weights

  • __group_197__new_gradients (optional) - T3: New gradients

  • __group_197__new_moment_1 (optional) - T4: New averaged gradients

  • __group_197__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_197__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_198__new_weights (optional) - T2: New weights

  • __group_198__new_gradients (optional) - T3: New gradients

  • __group_198__new_moment_1 (optional) - T4: New averaged gradients

  • __group_198__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_198__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_199__new_weights (optional) - T2: New weights

  • __group_199__new_gradients (optional) - T3: New gradients

  • __group_199__new_moment_1 (optional) - T4: New averaged gradients

  • __group_199__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_199__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_200__new_weights (optional) - T2: New weights

  • __group_200__new_gradients (optional) - T3: New gradients

  • __group_200__new_moment_1 (optional) - T4: New averaged gradients

  • __group_200__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_200__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_201__new_weights (optional) - T2: New weights

  • __group_201__new_gradients (optional) - T3: New gradients

  • __group_201__new_moment_1 (optional) - T4: New averaged gradients

  • __group_201__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_201__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_202__new_weights (optional) - T2: New weights

  • __group_202__new_gradients (optional) - T3: New gradients

  • __group_202__new_moment_1 (optional) - T4: New averaged gradients

  • __group_202__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_202__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_203__new_weights (optional) - T2: New weights

  • __group_203__new_gradients (optional) - T3: New gradients

  • __group_203__new_moment_1 (optional) - T4: New averaged gradients

  • __group_203__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_203__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_204__new_weights (optional) - T2: New weights

  • __group_204__new_gradients (optional) - T3: New gradients

  • __group_204__new_moment_1 (optional) - T4: New averaged gradients

  • __group_204__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_204__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_205__new_weights (optional) - T2: New weights

  • __group_205__new_gradients (optional) - T3: New gradients

  • __group_205__new_moment_1 (optional) - T4: New averaged gradients

  • __group_205__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_205__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_206__new_weights (optional) - T2: New weights

  • __group_206__new_gradients (optional) - T3: New gradients

  • __group_206__new_moment_1 (optional) - T4: New averaged gradients

  • __group_206__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_206__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_207__new_weights (optional) - T2: New weights

  • __group_207__new_gradients (optional) - T3: New gradients

  • __group_207__new_moment_1 (optional) - T4: New averaged gradients

  • __group_207__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_207__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_208__new_weights (optional) - T2: New weights

  • __group_208__new_gradients (optional) - T3: New gradients

  • __group_208__new_moment_1 (optional) - T4: New averaged gradients

  • __group_208__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_208__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_209__new_weights (optional) - T2: New weights

  • __group_209__new_gradients (optional) - T3: New gradients

  • __group_209__new_moment_1 (optional) - T4: New averaged gradients

  • __group_209__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_209__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_210__new_weights (optional) - T2: New weights

  • __group_210__new_gradients (optional) - T3: New gradients

  • __group_210__new_moment_1 (optional) - T4: New averaged gradients

  • __group_210__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_210__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_211__new_weights (optional) - T2: New weights

  • __group_211__new_gradients (optional) - T3: New gradients

  • __group_211__new_moment_1 (optional) - T4: New averaged gradients

  • __group_211__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_211__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_212__new_weights (optional) - T2: New weights

  • __group_212__new_gradients (optional) - T3: New gradients

  • __group_212__new_moment_1 (optional) - T4: New averaged gradients

  • __group_212__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_212__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_213__new_weights (optional) - T2: New weights

  • __group_213__new_gradients (optional) - T3: New gradients

  • __group_213__new_moment_1 (optional) - T4: New averaged gradients

  • __group_213__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_213__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_214__new_weights (optional) - T2: New weights

  • __group_214__new_gradients (optional) - T3: New gradients

  • __group_214__new_moment_1 (optional) - T4: New averaged gradients

  • __group_214__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_214__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_215__new_weights (optional) - T2: New weights

  • __group_215__new_gradients (optional) - T3: New gradients

  • __group_215__new_moment_1 (optional) - T4: New averaged gradients

  • __group_215__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_215__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_216__new_weights (optional) - T2: New weights

  • __group_216__new_gradients (optional) - T3: New gradients

  • __group_216__new_moment_1 (optional) - T4: New averaged gradients

  • __group_216__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_216__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_217__new_weights (optional) - T2: New weights

  • __group_217__new_gradients (optional) - T3: New gradients

  • __group_217__new_moment_1 (optional) - T4: New averaged gradients

  • __group_217__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_217__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_218__new_weights (optional) - T2: New weights

  • __group_218__new_gradients (optional) - T3: New gradients

  • __group_218__new_moment_1 (optional) - T4: New averaged gradients

  • __group_218__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_218__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_219__new_weights (optional) - T2: New weights

  • __group_219__new_gradients (optional) - T3: New gradients

  • __group_219__new_moment_1 (optional) - T4: New averaged gradients

  • __group_219__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_219__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_220__new_weights (optional) - T2: New weights

  • __group_220__new_gradients (optional) - T3: New gradients

  • __group_220__new_moment_1 (optional) - T4: New averaged gradients

  • __group_220__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_220__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_221__new_weights (optional) - T2: New weights

  • __group_221__new_gradients (optional) - T3: New gradients

  • __group_221__new_moment_1 (optional) - T4: New averaged gradients

  • __group_221__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_221__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_222__new_weights (optional) - T2: New weights

  • __group_222__new_gradients (optional) - T3: New gradients

  • __group_222__new_moment_1 (optional) - T4: New averaged gradients

  • __group_222__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_222__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_223__new_weights (optional) - T2: New weights

  • __group_223__new_gradients (optional) - T3: New gradients

  • __group_223__new_moment_1 (optional) - T4: New averaged gradients

  • __group_223__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_223__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_224__new_weights (optional) - T2: New weights

  • __group_224__new_gradients (optional) - T3: New gradients

  • __group_224__new_moment_1 (optional) - T4: New averaged gradients

  • __group_224__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_224__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_225__new_weights (optional) - T2: New weights

  • __group_225__new_gradients (optional) - T3: New gradients

  • __group_225__new_moment_1 (optional) - T4: New averaged gradients

  • __group_225__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_225__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_226__new_weights (optional) - T2: New weights

  • __group_226__new_gradients (optional) - T3: New gradients

  • __group_226__new_moment_1 (optional) - T4: New averaged gradients

  • __group_226__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_226__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_227__new_weights (optional) - T2: New weights

  • __group_227__new_gradients (optional) - T3: New gradients

  • __group_227__new_moment_1 (optional) - T4: New averaged gradients

  • __group_227__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_227__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_228__new_weights (optional) - T2: New weights

  • __group_228__new_gradients (optional) - T3: New gradients

  • __group_228__new_moment_1 (optional) - T4: New averaged gradients

  • __group_228__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_228__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_229__new_weights (optional) - T2: New weights

  • __group_229__new_gradients (optional) - T3: New gradients

  • __group_229__new_moment_1 (optional) - T4: New averaged gradients

  • __group_229__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_229__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_230__new_weights (optional) - T2: New weights

  • __group_230__new_gradients (optional) - T3: New gradients

  • __group_230__new_moment_1 (optional) - T4: New averaged gradients

  • __group_230__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_230__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_231__new_weights (optional) - T2: New weights

  • __group_231__new_gradients (optional) - T3: New gradients

  • __group_231__new_moment_1 (optional) - T4: New averaged gradients

  • __group_231__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_231__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_232__new_weights (optional) - T2: New weights

  • __group_232__new_gradients (optional) - T3: New gradients

  • __group_232__new_moment_1 (optional) - T4: New averaged gradients

  • __group_232__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_232__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_233__new_weights (optional) - T2: New weights

  • __group_233__new_gradients (optional) - T3: New gradients

  • __group_233__new_moment_1 (optional) - T4: New averaged gradients

  • __group_233__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_233__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_234__new_weights (optional) - T2: New weights

  • __group_234__new_gradients (optional) - T3: New gradients

  • __group_234__new_moment_1 (optional) - T4: New averaged gradients

  • __group_234__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_234__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_235__new_weights (optional) - T2: New weights

  • __group_235__new_gradients (optional) - T3: New gradients

  • __group_235__new_moment_1 (optional) - T4: New averaged gradients

  • __group_235__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_235__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_236__new_weights (optional) - T2: New weights

  • __group_236__new_gradients (optional) - T3: New gradients

  • __group_236__new_moment_1 (optional) - T4: New averaged gradients

  • __group_236__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_236__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_237__new_weights (optional) - T2: New weights

  • __group_237__new_gradients (optional) - T3: New gradients

  • __group_237__new_moment_1 (optional) - T4: New averaged gradients

  • __group_237__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_237__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_238__new_weights (optional) - T2: New weights

  • __group_238__new_gradients (optional) - T3: New gradients

  • __group_238__new_moment_1 (optional) - T4: New averaged gradients

  • __group_238__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_238__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_239__new_weights (optional) - T2: New weights

  • __group_239__new_gradients (optional) - T3: New gradients

  • __group_239__new_moment_1 (optional) - T4: New averaged gradients

  • __group_239__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_239__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_240__new_weights (optional) - T2: New weights

  • __group_240__new_gradients (optional) - T3: New gradients

  • __group_240__new_moment_1 (optional) - T4: New averaged gradients

  • __group_240__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_240__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_241__new_weights (optional) - T2: New weights

  • __group_241__new_gradients (optional) - T3: New gradients

  • __group_241__new_moment_1 (optional) - T4: New averaged gradients

  • __group_241__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_241__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_242__new_weights (optional) - T2: New weights

  • __group_242__new_gradients (optional) - T3: New gradients

  • __group_242__new_moment_1 (optional) - T4: New averaged gradients

  • __group_242__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_242__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_243__new_weights (optional) - T2: New weights

  • __group_243__new_gradients (optional) - T3: New gradients

  • __group_243__new_moment_1 (optional) - T4: New averaged gradients

  • __group_243__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_243__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_244__new_weights (optional) - T2: New weights

  • __group_244__new_gradients (optional) - T3: New gradients

  • __group_244__new_moment_1 (optional) - T4: New averaged gradients

  • __group_244__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_244__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_245__new_weights (optional) - T2: New weights

  • __group_245__new_gradients (optional) - T3: New gradients

  • __group_245__new_moment_1 (optional) - T4: New averaged gradients

  • __group_245__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_245__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_246__new_weights (optional) - T2: New weights

  • __group_246__new_gradients (optional) - T3: New gradients

  • __group_246__new_moment_1 (optional) - T4: New averaged gradients

  • __group_246__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_246__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_247__new_weights (optional) - T2: New weights

  • __group_247__new_gradients (optional) - T3: New gradients

  • __group_247__new_moment_1 (optional) - T4: New averaged gradients

  • __group_247__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_247__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_248__new_weights (optional) - T2: New weights

  • __group_248__new_gradients (optional) - T3: New gradients

  • __group_248__new_moment_1 (optional) - T4: New averaged gradients

  • __group_248__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_248__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_249__new_weights (optional) - T2: New weights

  • __group_249__new_gradients (optional) - T3: New gradients

  • __group_249__new_moment_1 (optional) - T4: New averaged gradients

  • __group_249__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_249__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_250__new_weights (optional) - T2: New weights

  • __group_250__new_gradients (optional) - T3: New gradients

  • __group_250__new_moment_1 (optional) - T4: New averaged gradients

  • __group_250__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_250__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_251__new_weights (optional) - T2: New weights

  • __group_251__new_gradients (optional) - T3: New gradients

  • __group_251__new_moment_1 (optional) - T4: New averaged gradients

  • __group_251__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_251__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_252__new_weights (optional) - T2: New weights

  • __group_252__new_gradients (optional) - T3: New gradients

  • __group_252__new_moment_1 (optional) - T4: New averaged gradients

  • __group_252__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_252__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_253__new_weights (optional) - T2: New weights

  • __group_253__new_gradients (optional) - T3: New gradients

  • __group_253__new_moment_1 (optional) - T4: New averaged gradients

  • __group_253__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_253__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_254__new_weights (optional) - T2: New weights

  • __group_254__new_gradients (optional) - T3: New gradients

  • __group_254__new_moment_1 (optional) - T4: New averaged gradients

  • __group_254__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_254__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_255__new_weights (optional) - T2: New weights

  • __group_255__new_gradients (optional) - T3: New gradients

  • __group_255__new_moment_1 (optional) - T4: New averaged gradients

  • __group_255__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_255__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_256__new_weights (optional) - T2: New weights

  • __group_256__new_gradients (optional) - T3: New gradients

  • __group_256__new_moment_1 (optional) - T4: New averaged gradients

  • __group_256__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_256__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_257__new_weights (optional) - T2: New weights

  • __group_257__new_gradients (optional) - T3: New gradients

  • __group_257__new_moment_1 (optional) - T4: New averaged gradients

  • __group_257__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_257__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_258__new_weights (optional) - T2: New weights

  • __group_258__new_gradients (optional) - T3: New gradients

  • __group_258__new_moment_1 (optional) - T4: New averaged gradients

  • __group_258__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_258__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_259__new_weights (optional) - T2: New weights

  • __group_259__new_gradients (optional) - T3: New gradients

  • __group_259__new_moment_1 (optional) - T4: New averaged gradients

  • __group_259__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_259__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_260__new_weights (optional) - T2: New weights

  • __group_260__new_gradients (optional) - T3: New gradients

  • __group_260__new_moment_1 (optional) - T4: New averaged gradients

  • __group_260__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_260__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_261__new_weights (optional) - T2: New weights

  • __group_261__new_gradients (optional) - T3: New gradients

  • __group_261__new_moment_1 (optional) - T4: New averaged gradients

  • __group_261__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_261__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_262__new_weights (optional) - T2: New weights

  • __group_262__new_gradients (optional) - T3: New gradients

  • __group_262__new_moment_1 (optional) - T4: New averaged gradients

  • __group_262__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_262__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_263__new_weights (optional) - T2: New weights

  • __group_263__new_gradients (optional) - T3: New gradients

  • __group_263__new_moment_1 (optional) - T4: New averaged gradients

  • __group_263__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_263__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_264__new_weights (optional) - T2: New weights

  • __group_264__new_gradients (optional) - T3: New gradients

  • __group_264__new_moment_1 (optional) - T4: New averaged gradients

  • __group_264__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_264__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_265__new_weights (optional) - T2: New weights

  • __group_265__new_gradients (optional) - T3: New gradients

  • __group_265__new_moment_1 (optional) - T4: New averaged gradients

  • __group_265__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_265__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_266__new_weights (optional) - T2: New weights

  • __group_266__new_gradients (optional) - T3: New gradients

  • __group_266__new_moment_1 (optional) - T4: New averaged gradients

  • __group_266__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_266__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_267__new_weights (optional) - T2: New weights

  • __group_267__new_gradients (optional) - T3: New gradients

  • __group_267__new_moment_1 (optional) - T4: New averaged gradients

  • __group_267__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_267__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_268__new_weights (optional) - T2: New weights

  • __group_268__new_gradients (optional) - T3: New gradients

  • __group_268__new_moment_1 (optional) - T4: New averaged gradients

  • __group_268__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_268__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_269__new_weights (optional) - T2: New weights

  • __group_269__new_gradients (optional) - T3: New gradients

  • __group_269__new_moment_1 (optional) - T4: New averaged gradients

  • __group_269__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_269__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_270__new_weights (optional) - T2: New weights

  • __group_270__new_gradients (optional) - T3: New gradients

  • __group_270__new_moment_1 (optional) - T4: New averaged gradients

  • __group_270__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_270__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_271__new_weights (optional) - T2: New weights

  • __group_271__new_gradients (optional) - T3: New gradients

  • __group_271__new_moment_1 (optional) - T4: New averaged gradients

  • __group_271__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_271__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_272__new_weights (optional) - T2: New weights

  • __group_272__new_gradients (optional) - T3: New gradients

  • __group_272__new_moment_1 (optional) - T4: New averaged gradients

  • __group_272__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_272__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_273__new_weights (optional) - T2: New weights

  • __group_273__new_gradients (optional) - T3: New gradients

  • __group_273__new_moment_1 (optional) - T4: New averaged gradients

  • __group_273__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_273__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_274__new_weights (optional) - T2: New weights

  • __group_274__new_gradients (optional) - T3: New gradients

  • __group_274__new_moment_1 (optional) - T4: New averaged gradients

  • __group_274__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_274__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_275__new_weights (optional) - T2: New weights

  • __group_275__new_gradients (optional) - T3: New gradients

  • __group_275__new_moment_1 (optional) - T4: New averaged gradients

  • __group_275__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_275__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_276__new_weights (optional) - T2: New weights

  • __group_276__new_gradients (optional) - T3: New gradients

  • __group_276__new_moment_1 (optional) - T4: New averaged gradients

  • __group_276__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_276__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_277__new_weights (optional) - T2: New weights

  • __group_277__new_gradients (optional) - T3: New gradients

  • __group_277__new_moment_1 (optional) - T4: New averaged gradients

  • __group_277__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_277__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_278__new_weights (optional) - T2: New weights

  • __group_278__new_gradients (optional) - T3: New gradients

  • __group_278__new_moment_1 (optional) - T4: New averaged gradients

  • __group_278__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_278__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_279__new_weights (optional) - T2: New weights

  • __group_279__new_gradients (optional) - T3: New gradients

  • __group_279__new_moment_1 (optional) - T4: New averaged gradients

  • __group_279__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_279__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_280__new_weights (optional) - T2: New weights

  • __group_280__new_gradients (optional) - T3: New gradients

  • __group_280__new_moment_1 (optional) - T4: New averaged gradients

  • __group_280__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_280__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_281__new_weights (optional) - T2: New weights

  • __group_281__new_gradients (optional) - T3: New gradients

  • __group_281__new_moment_1 (optional) - T4: New averaged gradients

  • __group_281__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_281__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_282__new_weights (optional) - T2: New weights

  • __group_282__new_gradients (optional) - T3: New gradients

  • __group_282__new_moment_1 (optional) - T4: New averaged gradients

  • __group_282__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_282__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_283__new_weights (optional) - T2: New weights

  • __group_283__new_gradients (optional) - T3: New gradients

  • __group_283__new_moment_1 (optional) - T4: New averaged gradients

  • __group_283__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_283__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_284__new_weights (optional) - T2: New weights

  • __group_284__new_gradients (optional) - T3: New gradients

  • __group_284__new_moment_1 (optional) - T4: New averaged gradients

  • __group_284__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_284__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_285__new_weights (optional) - T2: New weights

  • __group_285__new_gradients (optional) - T3: New gradients

  • __group_285__new_moment_1 (optional) - T4: New averaged gradients

  • __group_285__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_285__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_286__new_weights (optional) - T2: New weights

  • __group_286__new_gradients (optional) - T3: New gradients

  • __group_286__new_moment_1 (optional) - T4: New averaged gradients

  • __group_286__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_286__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_287__new_weights (optional) - T2: New weights

  • __group_287__new_gradients (optional) - T3: New gradients

  • __group_287__new_moment_1 (optional) - T4: New averaged gradients

  • __group_287__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_287__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_288__new_weights (optional) - T2: New weights

  • __group_288__new_gradients (optional) - T3: New gradients

  • __group_288__new_moment_1 (optional) - T4: New averaged gradients

  • __group_288__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_288__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_289__new_weights (optional) - T2: New weights

  • __group_289__new_gradients (optional) - T3: New gradients

  • __group_289__new_moment_1 (optional) - T4: New averaged gradients

  • __group_289__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_289__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_290__new_weights (optional) - T2: New weights

  • __group_290__new_gradients (optional) - T3: New gradients

  • __group_290__new_moment_1 (optional) - T4: New averaged gradients

  • __group_290__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_290__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_291__new_weights (optional) - T2: New weights

  • __group_291__new_gradients (optional) - T3: New gradients

  • __group_291__new_moment_1 (optional) - T4: New averaged gradients

  • __group_291__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_291__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_292__new_weights (optional) - T2: New weights

  • __group_292__new_gradients (optional) - T3: New gradients

  • __group_292__new_moment_1 (optional) - T4: New averaged gradients

  • __group_292__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_292__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_293__new_weights (optional) - T2: New weights

  • __group_293__new_gradients (optional) - T3: New gradients

  • __group_293__new_moment_1 (optional) - T4: New averaged gradients

  • __group_293__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_293__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_294__new_weights (optional) - T2: New weights

  • __group_294__new_gradients (optional) - T3: New gradients

  • __group_294__new_moment_1 (optional) - T4: New averaged gradients

  • __group_294__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_294__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_295__new_weights (optional) - T2: New weights

  • __group_295__new_gradients (optional) - T3: New gradients

  • __group_295__new_moment_1 (optional) - T4: New averaged gradients

  • __group_295__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_295__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_296__new_weights (optional) - T2: New weights

  • __group_296__new_gradients (optional) - T3: New gradients

  • __group_296__new_moment_1 (optional) - T4: New averaged gradients

  • __group_296__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_296__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_297__new_weights (optional) - T2: New weights

  • __group_297__new_gradients (optional) - T3: New gradients

  • __group_297__new_moment_1 (optional) - T4: New averaged gradients

  • __group_297__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_297__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_298__new_weights (optional) - T2: New weights

  • __group_298__new_gradients (optional) - T3: New gradients

  • __group_298__new_moment_1 (optional) - T4: New averaged gradients

  • __group_298__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_298__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_299__new_weights (optional) - T2: New weights

  • __group_299__new_gradients (optional) - T3: New gradients

  • __group_299__new_moment_1 (optional) - T4: New averaged gradients

  • __group_299__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_299__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_300__new_weights (optional) - T2: New weights

  • __group_300__new_gradients (optional) - T3: New gradients

  • __group_300__new_moment_1 (optional) - T4: New averaged gradients

  • __group_300__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_300__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_301__new_weights (optional) - T2: New weights

  • __group_301__new_gradients (optional) - T3: New gradients

  • __group_301__new_moment_1 (optional) - T4: New averaged gradients

  • __group_301__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_301__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_302__new_weights (optional) - T2: New weights

  • __group_302__new_gradients (optional) - T3: New gradients

  • __group_302__new_moment_1 (optional) - T4: New averaged gradients

  • __group_302__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_302__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_303__new_weights (optional) - T2: New weights

  • __group_303__new_gradients (optional) - T3: New gradients

  • __group_303__new_moment_1 (optional) - T4: New averaged gradients

  • __group_303__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_303__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_304__new_weights (optional) - T2: New weights

  • __group_304__new_gradients (optional) - T3: New gradients

  • __group_304__new_moment_1 (optional) - T4: New averaged gradients

  • __group_304__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_304__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_305__new_weights (optional) - T2: New weights

  • __group_305__new_gradients (optional) - T3: New gradients

  • __group_305__new_moment_1 (optional) - T4: New averaged gradients

  • __group_305__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_305__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_306__new_weights (optional) - T2: New weights

  • __group_306__new_gradients (optional) - T3: New gradients

  • __group_306__new_moment_1 (optional) - T4: New averaged gradients

  • __group_306__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_306__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_307__new_weights (optional) - T2: New weights

  • __group_307__new_gradients (optional) - T3: New gradients

  • __group_307__new_moment_1 (optional) - T4: New averaged gradients

  • __group_307__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_307__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_308__new_weights (optional) - T2: New weights

  • __group_308__new_gradients (optional) - T3: New gradients

  • __group_308__new_moment_1 (optional) - T4: New averaged gradients

  • __group_308__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_308__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_309__new_weights (optional) - T2: New weights

  • __group_309__new_gradients (optional) - T3: New gradients

  • __group_309__new_moment_1 (optional) - T4: New averaged gradients

  • __group_309__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_309__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_310__new_weights (optional) - T2: New weights

  • __group_310__new_gradients (optional) - T3: New gradients

  • __group_310__new_moment_1 (optional) - T4: New averaged gradients

  • __group_310__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_310__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_311__new_weights (optional) - T2: New weights

  • __group_311__new_gradients (optional) - T3: New gradients

  • __group_311__new_moment_1 (optional) - T4: New averaged gradients

  • __group_311__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_311__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_312__new_weights (optional) - T2: New weights

  • __group_312__new_gradients (optional) - T3: New gradients

  • __group_312__new_moment_1 (optional) - T4: New averaged gradients

  • __group_312__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_312__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_313__new_weights (optional) - T2: New weights

  • __group_313__new_gradients (optional) - T3: New gradients

  • __group_313__new_moment_1 (optional) - T4: New averaged gradients

  • __group_313__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_313__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_314__new_weights (optional) - T2: New weights

  • __group_314__new_gradients (optional) - T3: New gradients

  • __group_314__new_moment_1 (optional) - T4: New averaged gradients

  • __group_314__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_314__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_315__new_weights (optional) - T2: New weights

  • __group_315__new_gradients (optional) - T3: New gradients

  • __group_315__new_moment_1 (optional) - T4: New averaged gradients

  • __group_315__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_315__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_316__new_weights (optional) - T2: New weights

  • __group_316__new_gradients (optional) - T3: New gradients

  • __group_316__new_moment_1 (optional) - T4: New averaged gradients

  • __group_316__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_316__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_317__new_weights (optional) - T2: New weights

  • __group_317__new_gradients (optional) - T3: New gradients

  • __group_317__new_moment_1 (optional) - T4: New averaged gradients

  • __group_317__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_317__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_318__new_weights (optional) - T2: New weights

  • __group_318__new_gradients (optional) - T3: New gradients

  • __group_318__new_moment_1 (optional) - T4: New averaged gradients

  • __group_318__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_318__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_319__new_weights (optional) - T2: New weights

  • __group_319__new_gradients (optional) - T3: New gradients

  • __group_319__new_moment_1 (optional) - T4: New averaged gradients

  • __group_319__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_319__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_320__new_weights (optional) - T2: New weights

  • __group_320__new_gradients (optional) - T3: New gradients

  • __group_320__new_moment_1 (optional) - T4: New averaged gradients

  • __group_320__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_320__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_321__new_weights (optional) - T2: New weights

  • __group_321__new_gradients (optional) - T3: New gradients

  • __group_321__new_moment_1 (optional) - T4: New averaged gradients

  • __group_321__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_321__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_322__new_weights (optional) - T2: New weights

  • __group_322__new_gradients (optional) - T3: New gradients

  • __group_322__new_moment_1 (optional) - T4: New averaged gradients

  • __group_322__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_322__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_323__new_weights (optional) - T2: New weights

  • __group_323__new_gradients (optional) - T3: New gradients

  • __group_323__new_moment_1 (optional) - T4: New averaged gradients

  • __group_323__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_323__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_324__new_weights (optional) - T2: New weights

  • __group_324__new_gradients (optional) - T3: New gradients

  • __group_324__new_moment_1 (optional) - T4: New averaged gradients

  • __group_324__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_324__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_325__new_weights (optional) - T2: New weights

  • __group_325__new_gradients (optional) - T3: New gradients

  • __group_325__new_moment_1 (optional) - T4: New averaged gradients

  • __group_325__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_325__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_326__new_weights (optional) - T2: New weights

  • __group_326__new_gradients (optional) - T3: New gradients

  • __group_326__new_moment_1 (optional) - T4: New averaged gradients

  • __group_326__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_326__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_327__new_weights (optional) - T2: New weights

  • __group_327__new_gradients (optional) - T3: New gradients

  • __group_327__new_moment_1 (optional) - T4: New averaged gradients

  • __group_327__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_327__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_328__new_weights (optional) - T2: New weights

  • __group_328__new_gradients (optional) - T3: New gradients

  • __group_328__new_moment_1 (optional) - T4: New averaged gradients

  • __group_328__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_328__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_329__new_weights (optional) - T2: New weights

  • __group_329__new_gradients (optional) - T3: New gradients

  • __group_329__new_moment_1 (optional) - T4: New averaged gradients

  • __group_329__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_329__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_330__new_weights (optional) - T2: New weights

  • __group_330__new_gradients (optional) - T3: New gradients

  • __group_330__new_moment_1 (optional) - T4: New averaged gradients

  • __group_330__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_330__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_331__new_weights (optional) - T2: New weights

  • __group_331__new_gradients (optional) - T3: New gradients

  • __group_331__new_moment_1 (optional) - T4: New averaged gradients

  • __group_331__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_331__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_332__new_weights (optional) - T2: New weights

  • __group_332__new_gradients (optional) - T3: New gradients

  • __group_332__new_moment_1 (optional) - T4: New averaged gradients

  • __group_332__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_332__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_333__new_weights (optional) - T2: New weights

  • __group_333__new_gradients (optional) - T3: New gradients

  • __group_333__new_moment_1 (optional) - T4: New averaged gradients

  • __group_333__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_333__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_334__new_weights (optional) - T2: New weights

  • __group_334__new_gradients (optional) - T3: New gradients

  • __group_334__new_moment_1 (optional) - T4: New averaged gradients

  • __group_334__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_334__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_335__new_weights (optional) - T2: New weights

  • __group_335__new_gradients (optional) - T3: New gradients

  • __group_335__new_moment_1 (optional) - T4: New averaged gradients

  • __group_335__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_335__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_336__new_weights (optional) - T2: New weights

  • __group_336__new_gradients (optional) - T3: New gradients

  • __group_336__new_moment_1 (optional) - T4: New averaged gradients

  • __group_336__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_336__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_337__new_weights (optional) - T2: New weights

  • __group_337__new_gradients (optional) - T3: New gradients

  • __group_337__new_moment_1 (optional) - T4: New averaged gradients

  • __group_337__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_337__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_338__new_weights (optional) - T2: New weights

  • __group_338__new_gradients (optional) - T3: New gradients

  • __group_338__new_moment_1 (optional) - T4: New averaged gradients

  • __group_338__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_338__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_339__new_weights (optional) - T2: New weights

  • __group_339__new_gradients (optional) - T3: New gradients

  • __group_339__new_moment_1 (optional) - T4: New averaged gradients

  • __group_339__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_339__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_340__new_weights (optional) - T2: New weights

  • __group_340__new_gradients (optional) - T3: New gradients

  • __group_340__new_moment_1 (optional) - T4: New averaged gradients

  • __group_340__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_340__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_341__new_weights (optional) - T2: New weights

  • __group_341__new_gradients (optional) - T3: New gradients

  • __group_341__new_moment_1 (optional) - T4: New averaged gradients

  • __group_341__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_341__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_342__new_weights (optional) - T2: New weights

  • __group_342__new_gradients (optional) - T3: New gradients

  • __group_342__new_moment_1 (optional) - T4: New averaged gradients

  • __group_342__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_342__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_343__new_weights (optional) - T2: New weights

  • __group_343__new_gradients (optional) - T3: New gradients

  • __group_343__new_moment_1 (optional) - T4: New averaged gradients

  • __group_343__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_343__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_344__new_weights (optional) - T2: New weights

  • __group_344__new_gradients (optional) - T3: New gradients

  • __group_344__new_moment_1 (optional) - T4: New averaged gradients

  • __group_344__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_344__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_345__new_weights (optional) - T2: New weights

  • __group_345__new_gradients (optional) - T3: New gradients

  • __group_345__new_moment_1 (optional) - T4: New averaged gradients

  • __group_345__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_345__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_346__new_weights (optional) - T2: New weights

  • __group_346__new_gradients (optional) - T3: New gradients

  • __group_346__new_moment_1 (optional) - T4: New averaged gradients

  • __group_346__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_346__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_347__new_weights (optional) - T2: New weights

  • __group_347__new_gradients (optional) - T3: New gradients

  • __group_347__new_moment_1 (optional) - T4: New averaged gradients

  • __group_347__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_347__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_348__new_weights (optional) - T2: New weights

  • __group_348__new_gradients (optional) - T3: New gradients

  • __group_348__new_moment_1 (optional) - T4: New averaged gradients

  • __group_348__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_348__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_349__new_weights (optional) - T2: New weights

  • __group_349__new_gradients (optional) - T3: New gradients

  • __group_349__new_moment_1 (optional) - T4: New averaged gradients

  • __group_349__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_349__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_350__new_weights (optional) - T2: New weights

  • __group_350__new_gradients (optional) - T3: New gradients

  • __group_350__new_moment_1 (optional) - T4: New averaged gradients

  • __group_350__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_350__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_351__new_weights (optional) - T2: New weights

  • __group_351__new_gradients (optional) - T3: New gradients

  • __group_351__new_moment_1 (optional) - T4: New averaged gradients

  • __group_351__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_351__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_352__new_weights (optional) - T2: New weights

  • __group_352__new_gradients (optional) - T3: New gradients

  • __group_352__new_moment_1 (optional) - T4: New averaged gradients

  • __group_352__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_352__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_353__new_weights (optional) - T2: New weights

  • __group_353__new_gradients (optional) - T3: New gradients

  • __group_353__new_moment_1 (optional) - T4: New averaged gradients

  • __group_353__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_353__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_354__new_weights (optional) - T2: New weights

  • __group_354__new_gradients (optional) - T3: New gradients

  • __group_354__new_moment_1 (optional) - T4: New averaged gradients

  • __group_354__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_354__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_355__new_weights (optional) - T2: New weights

  • __group_355__new_gradients (optional) - T3: New gradients

  • __group_355__new_moment_1 (optional) - T4: New averaged gradients

  • __group_355__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_355__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_356__new_weights (optional) - T2: New weights

  • __group_356__new_gradients (optional) - T3: New gradients

  • __group_356__new_moment_1 (optional) - T4: New averaged gradients

  • __group_356__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_356__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_357__new_weights (optional) - T2: New weights

  • __group_357__new_gradients (optional) - T3: New gradients

  • __group_357__new_moment_1 (optional) - T4: New averaged gradients

  • __group_357__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_357__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_358__new_weights (optional) - T2: New weights

  • __group_358__new_gradients (optional) - T3: New gradients

  • __group_358__new_moment_1 (optional) - T4: New averaged gradients

  • __group_358__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_358__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_359__new_weights (optional) - T2: New weights

  • __group_359__new_gradients (optional) - T3: New gradients

  • __group_359__new_moment_1 (optional) - T4: New averaged gradients

  • __group_359__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_359__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_360__new_weights (optional) - T2: New weights

  • __group_360__new_gradients (optional) - T3: New gradients

  • __group_360__new_moment_1 (optional) - T4: New averaged gradients

  • __group_360__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_360__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_361__new_weights (optional) - T2: New weights

  • __group_361__new_gradients (optional) - T3: New gradients

  • __group_361__new_moment_1 (optional) - T4: New averaged gradients

  • __group_361__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_361__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_362__new_weights (optional) - T2: New weights

  • __group_362__new_gradients (optional) - T3: New gradients

  • __group_362__new_moment_1 (optional) - T4: New averaged gradients

  • __group_362__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_362__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_363__new_weights (optional) - T2: New weights

  • __group_363__new_gradients (optional) - T3: New gradients

  • __group_363__new_moment_1 (optional) - T4: New averaged gradients

  • __group_363__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_363__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_364__new_weights (optional) - T2: New weights

  • __group_364__new_gradients (optional) - T3: New gradients

  • __group_364__new_moment_1 (optional) - T4: New averaged gradients

  • __group_364__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_364__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_365__new_weights (optional) - T2: New weights

  • __group_365__new_gradients (optional) - T3: New gradients

  • __group_365__new_moment_1 (optional) - T4: New averaged gradients

  • __group_365__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_365__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_366__new_weights (optional) - T2: New weights

  • __group_366__new_gradients (optional) - T3: New gradients

  • __group_366__new_moment_1 (optional) - T4: New averaged gradients

  • __group_366__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_366__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_367__new_weights (optional) - T2: New weights

  • __group_367__new_gradients (optional) - T3: New gradients

  • __group_367__new_moment_1 (optional) - T4: New averaged gradients

  • __group_367__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_367__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_368__new_weights (optional) - T2: New weights

  • __group_368__new_gradients (optional) - T3: New gradients

  • __group_368__new_moment_1 (optional) - T4: New averaged gradients

  • __group_368__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_368__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_369__new_weights (optional) - T2: New weights

  • __group_369__new_gradients (optional) - T3: New gradients

  • __group_369__new_moment_1 (optional) - T4: New averaged gradients

  • __group_369__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_369__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_370__new_weights (optional) - T2: New weights

  • __group_370__new_gradients (optional) - T3: New gradients

  • __group_370__new_moment_1 (optional) - T4: New averaged gradients

  • __group_370__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_370__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_371__new_weights (optional) - T2: New weights

  • __group_371__new_gradients (optional) - T3: New gradients

  • __group_371__new_moment_1 (optional) - T4: New averaged gradients

  • __group_371__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_371__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_372__new_weights (optional) - T2: New weights

  • __group_372__new_gradients (optional) - T3: New gradients

  • __group_372__new_moment_1 (optional) - T4: New averaged gradients

  • __group_372__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_372__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_373__new_weights (optional) - T2: New weights

  • __group_373__new_gradients (optional) - T3: New gradients

  • __group_373__new_moment_1 (optional) - T4: New averaged gradients

  • __group_373__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_373__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_374__new_weights (optional) - T2: New weights

  • __group_374__new_gradients (optional) - T3: New gradients

  • __group_374__new_moment_1 (optional) - T4: New averaged gradients

  • __group_374__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_374__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_375__new_weights (optional) - T2: New weights

  • __group_375__new_gradients (optional) - T3: New gradients

  • __group_375__new_moment_1 (optional) - T4: New averaged gradients

  • __group_375__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_375__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_376__new_weights (optional) - T2: New weights

  • __group_376__new_gradients (optional) - T3: New gradients

  • __group_376__new_moment_1 (optional) - T4: New averaged gradients

  • __group_376__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_376__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_377__new_weights (optional) - T2: New weights

  • __group_377__new_gradients (optional) - T3: New gradients

  • __group_377__new_moment_1 (optional) - T4: New averaged gradients

  • __group_377__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_377__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_378__new_weights (optional) - T2: New weights

  • __group_378__new_gradients (optional) - T3: New gradients

  • __group_378__new_moment_1 (optional) - T4: New averaged gradients

  • __group_378__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_378__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_379__new_weights (optional) - T2: New weights

  • __group_379__new_gradients (optional) - T3: New gradients

  • __group_379__new_moment_1 (optional) - T4: New averaged gradients

  • __group_379__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_379__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_380__new_weights (optional) - T2: New weights

  • __group_380__new_gradients (optional) - T3: New gradients

  • __group_380__new_moment_1 (optional) - T4: New averaged gradients

  • __group_380__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_380__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_381__new_weights (optional) - T2: New weights

  • __group_381__new_gradients (optional) - T3: New gradients

  • __group_381__new_moment_1 (optional) - T4: New averaged gradients

  • __group_381__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_381__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_382__new_weights (optional) - T2: New weights

  • __group_382__new_gradients (optional) - T3: New gradients

  • __group_382__new_moment_1 (optional) - T4: New averaged gradients

  • __group_382__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_382__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_383__new_weights (optional) - T2: New weights

  • __group_383__new_gradients (optional) - T3: New gradients

  • __group_383__new_moment_1 (optional) - T4: New averaged gradients

  • __group_383__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_383__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_384__new_weights (optional) - T2: New weights

  • __group_384__new_gradients (optional) - T3: New gradients

  • __group_384__new_moment_1 (optional) - T4: New averaged gradients

  • __group_384__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_384__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_385__new_weights (optional) - T2: New weights

  • __group_385__new_gradients (optional) - T3: New gradients

  • __group_385__new_moment_1 (optional) - T4: New averaged gradients

  • __group_385__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_385__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_386__new_weights (optional) - T2: New weights

  • __group_386__new_gradients (optional) - T3: New gradients

  • __group_386__new_moment_1 (optional) - T4: New averaged gradients

  • __group_386__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_386__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_387__new_weights (optional) - T2: New weights

  • __group_387__new_gradients (optional) - T3: New gradients

  • __group_387__new_moment_1 (optional) - T4: New averaged gradients

  • __group_387__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_387__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_388__new_weights (optional) - T2: New weights

  • __group_388__new_gradients (optional) - T3: New gradients

  • __group_388__new_moment_1 (optional) - T4: New averaged gradients

  • __group_388__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_388__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_389__new_weights (optional) - T2: New weights

  • __group_389__new_gradients (optional) - T3: New gradients

  • __group_389__new_moment_1 (optional) - T4: New averaged gradients

  • __group_389__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_389__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_390__new_weights (optional) - T2: New weights

  • __group_390__new_gradients (optional) - T3: New gradients

  • __group_390__new_moment_1 (optional) - T4: New averaged gradients

  • __group_390__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_390__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_391__new_weights (optional) - T2: New weights

  • __group_391__new_gradients (optional) - T3: New gradients

  • __group_391__new_moment_1 (optional) - T4: New averaged gradients

  • __group_391__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_391__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_392__new_weights (optional) - T2: New weights

  • __group_392__new_gradients (optional) - T3: New gradients

  • __group_392__new_moment_1 (optional) - T4: New averaged gradients

  • __group_392__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_392__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_393__new_weights (optional) - T2: New weights

  • __group_393__new_gradients (optional) - T3: New gradients

  • __group_393__new_moment_1 (optional) - T4: New averaged gradients

  • __group_393__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_393__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_394__new_weights (optional) - T2: New weights

  • __group_394__new_gradients (optional) - T3: New gradients

  • __group_394__new_moment_1 (optional) - T4: New averaged gradients

  • __group_394__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_394__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_395__new_weights (optional) - T2: New weights

  • __group_395__new_gradients (optional) - T3: New gradients

  • __group_395__new_moment_1 (optional) - T4: New averaged gradients

  • __group_395__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_395__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_396__new_weights (optional) - T2: New weights

  • __group_396__new_gradients (optional) - T3: New gradients

  • __group_396__new_moment_1 (optional) - T4: New averaged gradients

  • __group_396__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_396__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_397__new_weights (optional) - T2: New weights

  • __group_397__new_gradients (optional) - T3: New gradients

  • __group_397__new_moment_1 (optional) - T4: New averaged gradients

  • __group_397__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_397__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_398__new_weights (optional) - T2: New weights

  • __group_398__new_gradients (optional) - T3: New gradients

  • __group_398__new_moment_1 (optional) - T4: New averaged gradients

  • __group_398__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_398__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_399__new_weights (optional) - T2: New weights

  • __group_399__new_gradients (optional) - T3: New gradients

  • __group_399__new_moment_1 (optional) - T4: New averaged gradients

  • __group_399__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_399__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_400__new_weights (optional) - T2: New weights

  • __group_400__new_gradients (optional) - T3: New gradients

  • __group_400__new_moment_1 (optional) - T4: New averaged gradients

  • __group_400__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_400__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_401__new_weights (optional) - T2: New weights

  • __group_401__new_gradients (optional) - T3: New gradients

  • __group_401__new_moment_1 (optional) - T4: New averaged gradients

  • __group_401__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_401__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_402__new_weights (optional) - T2: New weights

  • __group_402__new_gradients (optional) - T3: New gradients

  • __group_402__new_moment_1 (optional) - T4: New averaged gradients

  • __group_402__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_402__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_403__new_weights (optional) - T2: New weights

  • __group_403__new_gradients (optional) - T3: New gradients

  • __group_403__new_moment_1 (optional) - T4: New averaged gradients

  • __group_403__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_403__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_404__new_weights (optional) - T2: New weights

  • __group_404__new_gradients (optional) - T3: New gradients

  • __group_404__new_moment_1 (optional) - T4: New averaged gradients

  • __group_404__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_404__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_405__new_weights (optional) - T2: New weights

  • __group_405__new_gradients (optional) - T3: New gradients

  • __group_405__new_moment_1 (optional) - T4: New averaged gradients

  • __group_405__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_405__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_406__new_weights (optional) - T2: New weights

  • __group_406__new_gradients (optional) - T3: New gradients

  • __group_406__new_moment_1 (optional) - T4: New averaged gradients

  • __group_406__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_406__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_407__new_weights (optional) - T2: New weights

  • __group_407__new_gradients (optional) - T3: New gradients

  • __group_407__new_moment_1 (optional) - T4: New averaged gradients

  • __group_407__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_407__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_408__new_weights (optional) - T2: New weights

  • __group_408__new_gradients (optional) - T3: New gradients

  • __group_408__new_moment_1 (optional) - T4: New averaged gradients

  • __group_408__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_408__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_409__new_weights (optional) - T2: New weights

  • __group_409__new_gradients (optional) - T3: New gradients

  • __group_409__new_moment_1 (optional) - T4: New averaged gradients

  • __group_409__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_409__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_410__new_weights (optional) - T2: New weights

  • __group_410__new_gradients (optional) - T3: New gradients

  • __group_410__new_moment_1 (optional) - T4: New averaged gradients

  • __group_410__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_410__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_411__new_weights (optional) - T2: New weights

  • __group_411__new_gradients (optional) - T3: New gradients

  • __group_411__new_moment_1 (optional) - T4: New averaged gradients

  • __group_411__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_411__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_412__new_weights (optional) - T2: New weights

  • __group_412__new_gradients (optional) - T3: New gradients

  • __group_412__new_moment_1 (optional) - T4: New averaged gradients

  • __group_412__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_412__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_413__new_weights (optional) - T2: New weights

  • __group_413__new_gradients (optional) - T3: New gradients

  • __group_413__new_moment_1 (optional) - T4: New averaged gradients

  • __group_413__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_413__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_414__new_weights (optional) - T2: New weights

  • __group_414__new_gradients (optional) - T3: New gradients

  • __group_414__new_moment_1 (optional) - T4: New averaged gradients

  • __group_414__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_414__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_415__new_weights (optional) - T2: New weights

  • __group_415__new_gradients (optional) - T3: New gradients

  • __group_415__new_moment_1 (optional) - T4: New averaged gradients

  • __group_415__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_415__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_416__new_weights (optional) - T2: New weights

  • __group_416__new_gradients (optional) - T3: New gradients

  • __group_416__new_moment_1 (optional) - T4: New averaged gradients

  • __group_416__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_416__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_417__new_weights (optional) - T2: New weights

  • __group_417__new_gradients (optional) - T3: New gradients

  • __group_417__new_moment_1 (optional) - T4: New averaged gradients

  • __group_417__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_417__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_418__new_weights (optional) - T2: New weights

  • __group_418__new_gradients (optional) - T3: New gradients

  • __group_418__new_moment_1 (optional) - T4: New averaged gradients

  • __group_418__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_418__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_419__new_weights (optional) - T2: New weights

  • __group_419__new_gradients (optional) - T3: New gradients

  • __group_419__new_moment_1 (optional) - T4: New averaged gradients

  • __group_419__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_419__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_420__new_weights (optional) - T2: New weights

  • __group_420__new_gradients (optional) - T3: New gradients

  • __group_420__new_moment_1 (optional) - T4: New averaged gradients

  • __group_420__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_420__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_421__new_weights (optional) - T2: New weights

  • __group_421__new_gradients (optional) - T3: New gradients

  • __group_421__new_moment_1 (optional) - T4: New averaged gradients

  • __group_421__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_421__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_422__new_weights (optional) - T2: New weights

  • __group_422__new_gradients (optional) - T3: New gradients

  • __group_422__new_moment_1 (optional) - T4: New averaged gradients

  • __group_422__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_422__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_423__new_weights (optional) - T2: New weights

  • __group_423__new_gradients (optional) - T3: New gradients

  • __group_423__new_moment_1 (optional) - T4: New averaged gradients

  • __group_423__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_423__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_424__new_weights (optional) - T2: New weights

  • __group_424__new_gradients (optional) - T3: New gradients

  • __group_424__new_moment_1 (optional) - T4: New averaged gradients

  • __group_424__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_424__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_425__new_weights (optional) - T2: New weights

  • __group_425__new_gradients (optional) - T3: New gradients

  • __group_425__new_moment_1 (optional) - T4: New averaged gradients

  • __group_425__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_425__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_426__new_weights (optional) - T2: New weights

  • __group_426__new_gradients (optional) - T3: New gradients

  • __group_426__new_moment_1 (optional) - T4: New averaged gradients

  • __group_426__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_426__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_427__new_weights (optional) - T2: New weights

  • __group_427__new_gradients (optional) - T3: New gradients

  • __group_427__new_moment_1 (optional) - T4: New averaged gradients

  • __group_427__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_427__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_428__new_weights (optional) - T2: New weights

  • __group_428__new_gradients (optional) - T3: New gradients

  • __group_428__new_moment_1 (optional) - T4: New averaged gradients

  • __group_428__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_428__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_429__new_weights (optional) - T2: New weights

  • __group_429__new_gradients (optional) - T3: New gradients

  • __group_429__new_moment_1 (optional) - T4: New averaged gradients

  • __group_429__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_429__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_430__new_weights (optional) - T2: New weights

  • __group_430__new_gradients (optional) - T3: New gradients

  • __group_430__new_moment_1 (optional) - T4: New averaged gradients

  • __group_430__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_430__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_431__new_weights (optional) - T2: New weights

  • __group_431__new_gradients (optional) - T3: New gradients

  • __group_431__new_moment_1 (optional) - T4: New averaged gradients

  • __group_431__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_431__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_432__new_weights (optional) - T2: New weights

  • __group_432__new_gradients (optional) - T3: New gradients

  • __group_432__new_moment_1 (optional) - T4: New averaged gradients

  • __group_432__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_432__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_433__new_weights (optional) - T2: New weights

  • __group_433__new_gradients (optional) - T3: New gradients

  • __group_433__new_moment_1 (optional) - T4: New averaged gradients

  • __group_433__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_433__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_434__new_weights (optional) - T2: New weights

  • __group_434__new_gradients (optional) - T3: New gradients

  • __group_434__new_moment_1 (optional) - T4: New averaged gradients

  • __group_434__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_434__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_435__new_weights (optional) - T2: New weights

  • __group_435__new_gradients (optional) - T3: New gradients

  • __group_435__new_moment_1 (optional) - T4: New averaged gradients

  • __group_435__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_435__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_436__new_weights (optional) - T2: New weights

  • __group_436__new_gradients (optional) - T3: New gradients

  • __group_436__new_moment_1 (optional) - T4: New averaged gradients

  • __group_436__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_436__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_437__new_weights (optional) - T2: New weights

  • __group_437__new_gradients (optional) - T3: New gradients

  • __group_437__new_moment_1 (optional) - T4: New averaged gradients

  • __group_437__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_437__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_438__new_weights (optional) - T2: New weights

  • __group_438__new_gradients (optional) - T3: New gradients

  • __group_438__new_moment_1 (optional) - T4: New averaged gradients

  • __group_438__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_438__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_439__new_weights (optional) - T2: New weights

  • __group_439__new_gradients (optional) - T3: New gradients

  • __group_439__new_moment_1 (optional) - T4: New averaged gradients

  • __group_439__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_439__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_440__new_weights (optional) - T2: New weights

  • __group_440__new_gradients (optional) - T3: New gradients

  • __group_440__new_moment_1 (optional) - T4: New averaged gradients

  • __group_440__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_440__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_441__new_weights (optional) - T2: New weights

  • __group_441__new_gradients (optional) - T3: New gradients

  • __group_441__new_moment_1 (optional) - T4: New averaged gradients

  • __group_441__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_441__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_442__new_weights (optional) - T2: New weights

  • __group_442__new_gradients (optional) - T3: New gradients

  • __group_442__new_moment_1 (optional) - T4: New averaged gradients

  • __group_442__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_442__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_443__new_weights (optional) - T2: New weights

  • __group_443__new_gradients (optional) - T3: New gradients

  • __group_443__new_moment_1 (optional) - T4: New averaged gradients

  • __group_443__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_443__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_444__new_weights (optional) - T2: New weights

  • __group_444__new_gradients (optional) - T3: New gradients

  • __group_444__new_moment_1 (optional) - T4: New averaged gradients

  • __group_444__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_444__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_445__new_weights (optional) - T2: New weights

  • __group_445__new_gradients (optional) - T3: New gradients

  • __group_445__new_moment_1 (optional) - T4: New averaged gradients

  • __group_445__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_445__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_446__new_weights (optional) - T2: New weights

  • __group_446__new_gradients (optional) - T3: New gradients

  • __group_446__new_moment_1 (optional) - T4: New averaged gradients

  • __group_446__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_446__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_447__new_weights (optional) - T2: New weights

  • __group_447__new_gradients (optional) - T3: New gradients

  • __group_447__new_moment_1 (optional) - T4: New averaged gradients

  • __group_447__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_447__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_448__new_weights (optional) - T2: New weights

  • __group_448__new_gradients (optional) - T3: New gradients

  • __group_448__new_moment_1 (optional) - T4: New averaged gradients

  • __group_448__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_448__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_449__new_weights (optional) - T2: New weights

  • __group_449__new_gradients (optional) - T3: New gradients

  • __group_449__new_moment_1 (optional) - T4: New averaged gradients

  • __group_449__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_449__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_450__new_weights (optional) - T2: New weights

  • __group_450__new_gradients (optional) - T3: New gradients

  • __group_450__new_moment_1 (optional) - T4: New averaged gradients

  • __group_450__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_450__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_451__new_weights (optional) - T2: New weights

  • __group_451__new_gradients (optional) - T3: New gradients

  • __group_451__new_moment_1 (optional) - T4: New averaged gradients

  • __group_451__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_451__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_452__new_weights (optional) - T2: New weights

  • __group_452__new_gradients (optional) - T3: New gradients

  • __group_452__new_moment_1 (optional) - T4: New averaged gradients

  • __group_452__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_452__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_453__new_weights (optional) - T2: New weights

  • __group_453__new_gradients (optional) - T3: New gradients

  • __group_453__new_moment_1 (optional) - T4: New averaged gradients

  • __group_453__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_453__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_454__new_weights (optional) - T2: New weights

  • __group_454__new_gradients (optional) - T3: New gradients

  • __group_454__new_moment_1 (optional) - T4: New averaged gradients

  • __group_454__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_454__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_455__new_weights (optional) - T2: New weights

  • __group_455__new_gradients (optional) - T3: New gradients

  • __group_455__new_moment_1 (optional) - T4: New averaged gradients

  • __group_455__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_455__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_456__new_weights (optional) - T2: New weights

  • __group_456__new_gradients (optional) - T3: New gradients

  • __group_456__new_moment_1 (optional) - T4: New averaged gradients

  • __group_456__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_456__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_457__new_weights (optional) - T2: New weights

  • __group_457__new_gradients (optional) - T3: New gradients

  • __group_457__new_moment_1 (optional) - T4: New averaged gradients

  • __group_457__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_457__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_458__new_weights (optional) - T2: New weights

  • __group_458__new_gradients (optional) - T3: New gradients

  • __group_458__new_moment_1 (optional) - T4: New averaged gradients

  • __group_458__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_458__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_459__new_weights (optional) - T2: New weights

  • __group_459__new_gradients (optional) - T3: New gradients

  • __group_459__new_moment_1 (optional) - T4: New averaged gradients

  • __group_459__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_459__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_460__new_weights (optional) - T2: New weights

  • __group_460__new_gradients (optional) - T3: New gradients

  • __group_460__new_moment_1 (optional) - T4: New averaged gradients

  • __group_460__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_460__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_461__new_weights (optional) - T2: New weights

  • __group_461__new_gradients (optional) - T3: New gradients

  • __group_461__new_moment_1 (optional) - T4: New averaged gradients

  • __group_461__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_461__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_462__new_weights (optional) - T2: New weights

  • __group_462__new_gradients (optional) - T3: New gradients

  • __group_462__new_moment_1 (optional) - T4: New averaged gradients

  • __group_462__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_462__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_463__new_weights (optional) - T2: New weights

  • __group_463__new_gradients (optional) - T3: New gradients

  • __group_463__new_moment_1 (optional) - T4: New averaged gradients

  • __group_463__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_463__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_464__new_weights (optional) - T2: New weights

  • __group_464__new_gradients (optional) - T3: New gradients

  • __group_464__new_moment_1 (optional) - T4: New averaged gradients

  • __group_464__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_464__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_465__new_weights (optional) - T2: New weights

  • __group_465__new_gradients (optional) - T3: New gradients

  • __group_465__new_moment_1 (optional) - T4: New averaged gradients

  • __group_465__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_465__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_466__new_weights (optional) - T2: New weights

  • __group_466__new_gradients (optional) - T3: New gradients

  • __group_466__new_moment_1 (optional) - T4: New averaged gradients

  • __group_466__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_466__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_467__new_weights (optional) - T2: New weights

  • __group_467__new_gradients (optional) - T3: New gradients

  • __group_467__new_moment_1 (optional) - T4: New averaged gradients

  • __group_467__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_467__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_468__new_weights (optional) - T2: New weights

  • __group_468__new_gradients (optional) - T3: New gradients

  • __group_468__new_moment_1 (optional) - T4: New averaged gradients

  • __group_468__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_468__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_469__new_weights (optional) - T2: New weights

  • __group_469__new_gradients (optional) - T3: New gradients

  • __group_469__new_moment_1 (optional) - T4: New averaged gradients

  • __group_469__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_469__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_470__new_weights (optional) - T2: New weights

  • __group_470__new_gradients (optional) - T3: New gradients

  • __group_470__new_moment_1 (optional) - T4: New averaged gradients

  • __group_470__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_470__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_471__new_weights (optional) - T2: New weights

  • __group_471__new_gradients (optional) - T3: New gradients

  • __group_471__new_moment_1 (optional) - T4: New averaged gradients

  • __group_471__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_471__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_472__new_weights (optional) - T2: New weights

  • __group_472__new_gradients (optional) - T3: New gradients

  • __group_472__new_moment_1 (optional) - T4: New averaged gradients

  • __group_472__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_472__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_473__new_weights (optional) - T2: New weights

  • __group_473__new_gradients (optional) - T3: New gradients

  • __group_473__new_moment_1 (optional) - T4: New averaged gradients

  • __group_473__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_473__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_474__new_weights (optional) - T2: New weights

  • __group_474__new_gradients (optional) - T3: New gradients

  • __group_474__new_moment_1 (optional) - T4: New averaged gradients

  • __group_474__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_474__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_475__new_weights (optional) - T2: New weights

  • __group_475__new_gradients (optional) - T3: New gradients

  • __group_475__new_moment_1 (optional) - T4: New averaged gradients

  • __group_475__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_475__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_476__new_weights (optional) - T2: New weights

  • __group_476__new_gradients (optional) - T3: New gradients

  • __group_476__new_moment_1 (optional) - T4: New averaged gradients

  • __group_476__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_476__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_477__new_weights (optional) - T2: New weights

  • __group_477__new_gradients (optional) - T3: New gradients

  • __group_477__new_moment_1 (optional) - T4: New averaged gradients

  • __group_477__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_477__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_478__new_weights (optional) - T2: New weights

  • __group_478__new_gradients (optional) - T3: New gradients

  • __group_478__new_moment_1 (optional) - T4: New averaged gradients

  • __group_478__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_478__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_479__new_weights (optional) - T2: New weights

  • __group_479__new_gradients (optional) - T3: New gradients

  • __group_479__new_moment_1 (optional) - T4: New averaged gradients

  • __group_479__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_479__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_480__new_weights (optional) - T2: New weights

  • __group_480__new_gradients (optional) - T3: New gradients

  • __group_480__new_moment_1 (optional) - T4: New averaged gradients

  • __group_480__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_480__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_481__new_weights (optional) - T2: New weights

  • __group_481__new_gradients (optional) - T3: New gradients

  • __group_481__new_moment_1 (optional) - T4: New averaged gradients

  • __group_481__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_481__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_482__new_weights (optional) - T2: New weights

  • __group_482__new_gradients (optional) - T3: New gradients

  • __group_482__new_moment_1 (optional) - T4: New averaged gradients

  • __group_482__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_482__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_483__new_weights (optional) - T2: New weights

  • __group_483__new_gradients (optional) - T3: New gradients

  • __group_483__new_moment_1 (optional) - T4: New averaged gradients

  • __group_483__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_483__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_484__new_weights (optional) - T2: New weights

  • __group_484__new_gradients (optional) - T3: New gradients

  • __group_484__new_moment_1 (optional) - T4: New averaged gradients

  • __group_484__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_484__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_485__new_weights (optional) - T2: New weights

  • __group_485__new_gradients (optional) - T3: New gradients

  • __group_485__new_moment_1 (optional) - T4: New averaged gradients

  • __group_485__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_485__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_486__new_weights (optional) - T2: New weights

  • __group_486__new_gradients (optional) - T3: New gradients

  • __group_486__new_moment_1 (optional) - T4: New averaged gradients

  • __group_486__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_486__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_487__new_weights (optional) - T2: New weights

  • __group_487__new_gradients (optional) - T3: New gradients

  • __group_487__new_moment_1 (optional) - T4: New averaged gradients

  • __group_487__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_487__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_488__new_weights (optional) - T2: New weights

  • __group_488__new_gradients (optional) - T3: New gradients

  • __group_488__new_moment_1 (optional) - T4: New averaged gradients

  • __group_488__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_488__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_489__new_weights (optional) - T2: New weights

  • __group_489__new_gradients (optional) - T3: New gradients

  • __group_489__new_moment_1 (optional) - T4: New averaged gradients

  • __group_489__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_489__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_490__new_weights (optional) - T2: New weights

  • __group_490__new_gradients (optional) - T3: New gradients

  • __group_490__new_moment_1 (optional) - T4: New averaged gradients

  • __group_490__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_490__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_491__new_weights (optional) - T2: New weights

  • __group_491__new_gradients (optional) - T3: New gradients

  • __group_491__new_moment_1 (optional) - T4: New averaged gradients

  • __group_491__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_491__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_492__new_weights (optional) - T2: New weights

  • __group_492__new_gradients (optional) - T3: New gradients

  • __group_492__new_moment_1 (optional) - T4: New averaged gradients

  • __group_492__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_492__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_493__new_weights (optional) - T2: New weights

  • __group_493__new_gradients (optional) - T3: New gradients

  • __group_493__new_moment_1 (optional) - T4: New averaged gradients

  • __group_493__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_493__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_494__new_weights (optional) - T2: New weights

  • __group_494__new_gradients (optional) - T3: New gradients

  • __group_494__new_moment_1 (optional) - T4: New averaged gradients

  • __group_494__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_494__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_495__new_weights (optional) - T2: New weights

  • __group_495__new_gradients (optional) - T3: New gradients

  • __group_495__new_moment_1 (optional) - T4: New averaged gradients

  • __group_495__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_495__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_496__new_weights (optional) - T2: New weights

  • __group_496__new_gradients (optional) - T3: New gradients

  • __group_496__new_moment_1 (optional) - T4: New averaged gradients

  • __group_496__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_496__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_497__new_weights (optional) - T2: New weights

  • __group_497__new_gradients (optional) - T3: New gradients

  • __group_497__new_moment_1 (optional) - T4: New averaged gradients

  • __group_497__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_497__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_498__new_weights (optional) - T2: New weights

  • __group_498__new_gradients (optional) - T3: New gradients

  • __group_498__new_moment_1 (optional) - T4: New averaged gradients

  • __group_498__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_498__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_499__new_weights (optional) - T2: New weights

  • __group_499__new_gradients (optional) - T3: New gradients

  • __group_499__new_moment_1 (optional) - T4: New averaged gradients

  • __group_499__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_499__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_500__new_weights (optional) - T2: New weights

  • __group_500__new_gradients (optional) - T3: New gradients

  • __group_500__new_moment_1 (optional) - T4: New averaged gradients

  • __group_500__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_500__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_501__new_weights (optional) - T2: New weights

  • __group_501__new_gradients (optional) - T3: New gradients

  • __group_501__new_moment_1 (optional) - T4: New averaged gradients

  • __group_501__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_501__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_502__new_weights (optional) - T2: New weights

  • __group_502__new_gradients (optional) - T3: New gradients

  • __group_502__new_moment_1 (optional) - T4: New averaged gradients

  • __group_502__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_502__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_503__new_weights (optional) - T2: New weights

  • __group_503__new_gradients (optional) - T3: New gradients

  • __group_503__new_moment_1 (optional) - T4: New averaged gradients

  • __group_503__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_503__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_504__new_weights (optional) - T2: New weights

  • __group_504__new_gradients (optional) - T3: New gradients

  • __group_504__new_moment_1 (optional) - T4: New averaged gradients

  • __group_504__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_504__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_505__new_weights (optional) - T2: New weights

  • __group_505__new_gradients (optional) - T3: New gradients

  • __group_505__new_moment_1 (optional) - T4: New averaged gradients

  • __group_505__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_505__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_506__new_weights (optional) - T2: New weights

  • __group_506__new_gradients (optional) - T3: New gradients

  • __group_506__new_moment_1 (optional) - T4: New averaged gradients

  • __group_506__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_506__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_507__new_weights (optional) - T2: New weights

  • __group_507__new_gradients (optional) - T3: New gradients

  • __group_507__new_moment_1 (optional) - T4: New averaged gradients

  • __group_507__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_507__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_508__new_weights (optional) - T2: New weights

  • __group_508__new_gradients (optional) - T3: New gradients

  • __group_508__new_moment_1 (optional) - T4: New averaged gradients

  • __group_508__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_508__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_509__new_weights (optional) - T2: New weights

  • __group_509__new_gradients (optional) - T3: New gradients

  • __group_509__new_moment_1 (optional) - T4: New averaged gradients

  • __group_509__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_509__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_510__new_weights (optional) - T2: New weights

  • __group_510__new_gradients (optional) - T3: New gradients

  • __group_510__new_moment_1 (optional) - T4: New averaged gradients

  • __group_510__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_510__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_511__new_weights (optional) - T2: New weights

  • __group_511__new_gradients (optional) - T3: New gradients

  • __group_511__new_moment_1 (optional) - T4: New averaged gradients

  • __group_511__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_511__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_512__new_weights (optional) - T2: New weights

  • __group_512__new_gradients (optional) - T3: New gradients

  • __group_512__new_moment_1 (optional) - T4: New averaged gradients

  • __group_512__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_512__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_513__new_weights (optional) - T2: New weights

  • __group_513__new_gradients (optional) - T3: New gradients

  • __group_513__new_moment_1 (optional) - T4: New averaged gradients

  • __group_513__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_513__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_514__new_weights (optional) - T2: New weights

  • __group_514__new_gradients (optional) - T3: New gradients

  • __group_514__new_moment_1 (optional) - T4: New averaged gradients

  • __group_514__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_514__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_515__new_weights (optional) - T2: New weights

  • __group_515__new_gradients (optional) - T3: New gradients

  • __group_515__new_moment_1 (optional) - T4: New averaged gradients

  • __group_515__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_515__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_516__new_weights (optional) - T2: New weights

  • __group_516__new_gradients (optional) - T3: New gradients

  • __group_516__new_moment_1 (optional) - T4: New averaged gradients

  • __group_516__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_516__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_517__new_weights (optional) - T2: New weights

  • __group_517__new_gradients (optional) - T3: New gradients

  • __group_517__new_moment_1 (optional) - T4: New averaged gradients

  • __group_517__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_517__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_518__new_weights (optional) - T2: New weights

  • __group_518__new_gradients (optional) - T3: New gradients

  • __group_518__new_moment_1 (optional) - T4: New averaged gradients

  • __group_518__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_518__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_519__new_weights (optional) - T2: New weights

  • __group_519__new_gradients (optional) - T3: New gradients

  • __group_519__new_moment_1 (optional) - T4: New averaged gradients

  • __group_519__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_519__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_520__new_weights (optional) - T2: New weights

  • __group_520__new_gradients (optional) - T3: New gradients

  • __group_520__new_moment_1 (optional) - T4: New averaged gradients

  • __group_520__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_520__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_521__new_weights (optional) - T2: New weights

  • __group_521__new_gradients (optional) - T3: New gradients

  • __group_521__new_moment_1 (optional) - T4: New averaged gradients

  • __group_521__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_521__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_522__new_weights (optional) - T2: New weights

  • __group_522__new_gradients (optional) - T3: New gradients

  • __group_522__new_moment_1 (optional) - T4: New averaged gradients

  • __group_522__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_522__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_523__new_weights (optional) - T2: New weights

  • __group_523__new_gradients (optional) - T3: New gradients

  • __group_523__new_moment_1 (optional) - T4: New averaged gradients

  • __group_523__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_523__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_524__new_weights (optional) - T2: New weights

  • __group_524__new_gradients (optional) - T3: New gradients

  • __group_524__new_moment_1 (optional) - T4: New averaged gradients

  • __group_524__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_524__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_525__new_weights (optional) - T2: New weights

  • __group_525__new_gradients (optional) - T3: New gradients

  • __group_525__new_moment_1 (optional) - T4: New averaged gradients

  • __group_525__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_525__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_526__new_weights (optional) - T2: New weights

  • __group_526__new_gradients (optional) - T3: New gradients

  • __group_526__new_moment_1 (optional) - T4: New averaged gradients

  • __group_526__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_526__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_527__new_weights (optional) - T2: New weights

  • __group_527__new_gradients (optional) - T3: New gradients

  • __group_527__new_moment_1 (optional) - T4: New averaged gradients

  • __group_527__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_527__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_528__new_weights (optional) - T2: New weights

  • __group_528__new_gradients (optional) - T3: New gradients

  • __group_528__new_moment_1 (optional) - T4: New averaged gradients

  • __group_528__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_528__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_529__new_weights (optional) - T2: New weights

  • __group_529__new_gradients (optional) - T3: New gradients

  • __group_529__new_moment_1 (optional) - T4: New averaged gradients

  • __group_529__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_529__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_530__new_weights (optional) - T2: New weights

  • __group_530__new_gradients (optional) - T3: New gradients

  • __group_530__new_moment_1 (optional) - T4: New averaged gradients

  • __group_530__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_530__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_531__new_weights (optional) - T2: New weights

  • __group_531__new_gradients (optional) - T3: New gradients

  • __group_531__new_moment_1 (optional) - T4: New averaged gradients

  • __group_531__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_531__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_532__new_weights (optional) - T2: New weights

  • __group_532__new_gradients (optional) - T3: New gradients

  • __group_532__new_moment_1 (optional) - T4: New averaged gradients

  • __group_532__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_532__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_533__new_weights (optional) - T2: New weights

  • __group_533__new_gradients (optional) - T3: New gradients

  • __group_533__new_moment_1 (optional) - T4: New averaged gradients

  • __group_533__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_533__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_534__new_weights (optional) - T2: New weights

  • __group_534__new_gradients (optional) - T3: New gradients

  • __group_534__new_moment_1 (optional) - T4: New averaged gradients

  • __group_534__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_534__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_535__new_weights (optional) - T2: New weights

  • __group_535__new_gradients (optional) - T3: New gradients

  • __group_535__new_moment_1 (optional) - T4: New averaged gradients

  • __group_535__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_535__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_536__new_weights (optional) - T2: New weights

  • __group_536__new_gradients (optional) - T3: New gradients

  • __group_536__new_moment_1 (optional) - T4: New averaged gradients

  • __group_536__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_536__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_537__new_weights (optional) - T2: New weights

  • __group_537__new_gradients (optional) - T3: New gradients

  • __group_537__new_moment_1 (optional) - T4: New averaged gradients

  • __group_537__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_537__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_538__new_weights (optional) - T2: New weights

  • __group_538__new_gradients (optional) - T3: New gradients

  • __group_538__new_moment_1 (optional) - T4: New averaged gradients

  • __group_538__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_538__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_539__new_weights (optional) - T2: New weights

  • __group_539__new_gradients (optional) - T3: New gradients

  • __group_539__new_moment_1 (optional) - T4: New averaged gradients

  • __group_539__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_539__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_540__new_weights (optional) - T2: New weights

  • __group_540__new_gradients (optional) - T3: New gradients

  • __group_540__new_moment_1 (optional) - T4: New averaged gradients

  • __group_540__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_540__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_541__new_weights (optional) - T2: New weights

  • __group_541__new_gradients (optional) - T3: New gradients

  • __group_541__new_moment_1 (optional) - T4: New averaged gradients

  • __group_541__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_541__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_542__new_weights (optional) - T2: New weights

  • __group_542__new_gradients (optional) - T3: New gradients

  • __group_542__new_moment_1 (optional) - T4: New averaged gradients

  • __group_542__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_542__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_543__new_weights (optional) - T2: New weights

  • __group_543__new_gradients (optional) - T3: New gradients

  • __group_543__new_moment_1 (optional) - T4: New averaged gradients

  • __group_543__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_543__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_544__new_weights (optional) - T2: New weights

  • __group_544__new_gradients (optional) - T3: New gradients

  • __group_544__new_moment_1 (optional) - T4: New averaged gradients

  • __group_544__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_544__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_545__new_weights (optional) - T2: New weights

  • __group_545__new_gradients (optional) - T3: New gradients

  • __group_545__new_moment_1 (optional) - T4: New averaged gradients

  • __group_545__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_545__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_546__new_weights (optional) - T2: New weights

  • __group_546__new_gradients (optional) - T3: New gradients

  • __group_546__new_moment_1 (optional) - T4: New averaged gradients

  • __group_546__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_546__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_547__new_weights (optional) - T2: New weights

  • __group_547__new_gradients (optional) - T3: New gradients

  • __group_547__new_moment_1 (optional) - T4: New averaged gradients

  • __group_547__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_547__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_548__new_weights (optional) - T2: New weights

  • __group_548__new_gradients (optional) - T3: New gradients

  • __group_548__new_moment_1 (optional) - T4: New averaged gradients

  • __group_548__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_548__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_549__new_weights (optional) - T2: New weights

  • __group_549__new_gradients (optional) - T3: New gradients

  • __group_549__new_moment_1 (optional) - T4: New averaged gradients

  • __group_549__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_549__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_550__new_weights (optional) - T2: New weights

  • __group_550__new_gradients (optional) - T3: New gradients

  • __group_550__new_moment_1 (optional) - T4: New averaged gradients

  • __group_550__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_550__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_551__new_weights (optional) - T2: New weights

  • __group_551__new_gradients (optional) - T3: New gradients

  • __group_551__new_moment_1 (optional) - T4: New averaged gradients

  • __group_551__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_551__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_552__new_weights (optional) - T2: New weights

  • __group_552__new_gradients (optional) - T3: New gradients

  • __group_552__new_moment_1 (optional) - T4: New averaged gradients

  • __group_552__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_552__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_553__new_weights (optional) - T2: New weights

  • __group_553__new_gradients (optional) - T3: New gradients

  • __group_553__new_moment_1 (optional) - T4: New averaged gradients

  • __group_553__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_553__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_554__new_weights (optional) - T2: New weights

  • __group_554__new_gradients (optional) - T3: New gradients

  • __group_554__new_moment_1 (optional) - T4: New averaged gradients

  • __group_554__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_554__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_555__new_weights (optional) - T2: New weights

  • __group_555__new_gradients (optional) - T3: New gradients

  • __group_555__new_moment_1 (optional) - T4: New averaged gradients

  • __group_555__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_555__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_556__new_weights (optional) - T2: New weights

  • __group_556__new_gradients (optional) - T3: New gradients

  • __group_556__new_moment_1 (optional) - T4: New averaged gradients

  • __group_556__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_556__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_557__new_weights (optional) - T2: New weights

  • __group_557__new_gradients (optional) - T3: New gradients

  • __group_557__new_moment_1 (optional) - T4: New averaged gradients

  • __group_557__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_557__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_558__new_weights (optional) - T2: New weights

  • __group_558__new_gradients (optional) - T3: New gradients

  • __group_558__new_moment_1 (optional) - T4: New averaged gradients

  • __group_558__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_558__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_559__new_weights (optional) - T2: New weights

  • __group_559__new_gradients (optional) - T3: New gradients

  • __group_559__new_moment_1 (optional) - T4: New averaged gradients

  • __group_559__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_559__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_560__new_weights (optional) - T2: New weights

  • __group_560__new_gradients (optional) - T3: New gradients

  • __group_560__new_moment_1 (optional) - T4: New averaged gradients

  • __group_560__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_560__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_561__new_weights (optional) - T2: New weights

  • __group_561__new_gradients (optional) - T3: New gradients

  • __group_561__new_moment_1 (optional) - T4: New averaged gradients

  • __group_561__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_561__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_562__new_weights (optional) - T2: New weights

  • __group_562__new_gradients (optional) - T3: New gradients

  • __group_562__new_moment_1 (optional) - T4: New averaged gradients

  • __group_562__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_562__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_563__new_weights (optional) - T2: New weights

  • __group_563__new_gradients (optional) - T3: New gradients

  • __group_563__new_moment_1 (optional) - T4: New averaged gradients

  • __group_563__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_563__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_564__new_weights (optional) - T2: New weights

  • __group_564__new_gradients (optional) - T3: New gradients

  • __group_564__new_moment_1 (optional) - T4: New averaged gradients

  • __group_564__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_564__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_565__new_weights (optional) - T2: New weights

  • __group_565__new_gradients (optional) - T3: New gradients

  • __group_565__new_moment_1 (optional) - T4: New averaged gradients

  • __group_565__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_565__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_566__new_weights (optional) - T2: New weights

  • __group_566__new_gradients (optional) - T3: New gradients

  • __group_566__new_moment_1 (optional) - T4: New averaged gradients

  • __group_566__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_566__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_567__new_weights (optional) - T2: New weights

  • __group_567__new_gradients (optional) - T3: New gradients

  • __group_567__new_moment_1 (optional) - T4: New averaged gradients

  • __group_567__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_567__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_568__new_weights (optional) - T2: New weights

  • __group_568__new_gradients (optional) - T3: New gradients

  • __group_568__new_moment_1 (optional) - T4: New averaged gradients

  • __group_568__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_568__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_569__new_weights (optional) - T2: New weights

  • __group_569__new_gradients (optional) - T3: New gradients

  • __group_569__new_moment_1 (optional) - T4: New averaged gradients

  • __group_569__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_569__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_570__new_weights (optional) - T2: New weights

  • __group_570__new_gradients (optional) - T3: New gradients

  • __group_570__new_moment_1 (optional) - T4: New averaged gradients

  • __group_570__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_570__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_571__new_weights (optional) - T2: New weights

  • __group_571__new_gradients (optional) - T3: New gradients

  • __group_571__new_moment_1 (optional) - T4: New averaged gradients

  • __group_571__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_571__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_572__new_weights (optional) - T2: New weights

  • __group_572__new_gradients (optional) - T3: New gradients

  • __group_572__new_moment_1 (optional) - T4: New averaged gradients

  • __group_572__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_572__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_573__new_weights (optional) - T2: New weights

  • __group_573__new_gradients (optional) - T3: New gradients

  • __group_573__new_moment_1 (optional) - T4: New averaged gradients

  • __group_573__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_573__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_574__new_weights (optional) - T2: New weights

  • __group_574__new_gradients (optional) - T3: New gradients

  • __group_574__new_moment_1 (optional) - T4: New averaged gradients

  • __group_574__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_574__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_575__new_weights (optional) - T2: New weights

  • __group_575__new_gradients (optional) - T3: New gradients

  • __group_575__new_moment_1 (optional) - T4: New averaged gradients

  • __group_575__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_575__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_576__new_weights (optional) - T2: New weights

  • __group_576__new_gradients (optional) - T3: New gradients

  • __group_576__new_moment_1 (optional) - T4: New averaged gradients

  • __group_576__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_576__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_577__new_weights (optional) - T2: New weights

  • __group_577__new_gradients (optional) - T3: New gradients

  • __group_577__new_moment_1 (optional) - T4: New averaged gradients

  • __group_577__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_577__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_578__new_weights (optional) - T2: New weights

  • __group_578__new_gradients (optional) - T3: New gradients

  • __group_578__new_moment_1 (optional) - T4: New averaged gradients

  • __group_578__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_578__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_579__new_weights (optional) - T2: New weights

  • __group_579__new_gradients (optional) - T3: New gradients

  • __group_579__new_moment_1 (optional) - T4: New averaged gradients

  • __group_579__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_579__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_580__new_weights (optional) - T2: New weights

  • __group_580__new_gradients (optional) - T3: New gradients

  • __group_580__new_moment_1 (optional) - T4: New averaged gradients

  • __group_580__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_580__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_581__new_weights (optional) - T2: New weights

  • __group_581__new_gradients (optional) - T3: New gradients

  • __group_581__new_moment_1 (optional) - T4: New averaged gradients

  • __group_581__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_581__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_582__new_weights (optional) - T2: New weights

  • __group_582__new_gradients (optional) - T3: New gradients

  • __group_582__new_moment_1 (optional) - T4: New averaged gradients

  • __group_582__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_582__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_583__new_weights (optional) - T2: New weights

  • __group_583__new_gradients (optional) - T3: New gradients

  • __group_583__new_moment_1 (optional) - T4: New averaged gradients

  • __group_583__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_583__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_584__new_weights (optional) - T2: New weights

  • __group_584__new_gradients (optional) - T3: New gradients

  • __group_584__new_moment_1 (optional) - T4: New averaged gradients

  • __group_584__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_584__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_585__new_weights (optional) - T2: New weights

  • __group_585__new_gradients (optional) - T3: New gradients

  • __group_585__new_moment_1 (optional) - T4: New averaged gradients

  • __group_585__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_585__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_586__new_weights (optional) - T2: New weights

  • __group_586__new_gradients (optional) - T3: New gradients

  • __group_586__new_moment_1 (optional) - T4: New averaged gradients

  • __group_586__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_586__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_587__new_weights (optional) - T2: New weights

  • __group_587__new_gradients (optional) - T3: New gradients

  • __group_587__new_moment_1 (optional) - T4: New averaged gradients

  • __group_587__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_587__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_588__new_weights (optional) - T2: New weights

  • __group_588__new_gradients (optional) - T3: New gradients

  • __group_588__new_moment_1 (optional) - T4: New averaged gradients

  • __group_588__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_588__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_589__new_weights (optional) - T2: New weights

  • __group_589__new_gradients (optional) - T3: New gradients

  • __group_589__new_moment_1 (optional) - T4: New averaged gradients

  • __group_589__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_589__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_590__new_weights (optional) - T2: New weights

  • __group_590__new_gradients (optional) - T3: New gradients

  • __group_590__new_moment_1 (optional) - T4: New averaged gradients

  • __group_590__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_590__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_591__new_weights (optional) - T2: New weights

  • __group_591__new_gradients (optional) - T3: New gradients

  • __group_591__new_moment_1 (optional) - T4: New averaged gradients

  • __group_591__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_591__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_592__new_weights (optional) - T2: New weights

  • __group_592__new_gradients (optional) - T3: New gradients

  • __group_592__new_moment_1 (optional) - T4: New averaged gradients

  • __group_592__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_592__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_593__new_weights (optional) - T2: New weights

  • __group_593__new_gradients (optional) - T3: New gradients

  • __group_593__new_moment_1 (optional) - T4: New averaged gradients

  • __group_593__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_593__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_594__new_weights (optional) - T2: New weights

  • __group_594__new_gradients (optional) - T3: New gradients

  • __group_594__new_moment_1 (optional) - T4: New averaged gradients

  • __group_594__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_594__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_595__new_weights (optional) - T2: New weights

  • __group_595__new_gradients (optional) - T3: New gradients

  • __group_595__new_moment_1 (optional) - T4: New averaged gradients

  • __group_595__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_595__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_596__new_weights (optional) - T2: New weights

  • __group_596__new_gradients (optional) - T3: New gradients

  • __group_596__new_moment_1 (optional) - T4: New averaged gradients

  • __group_596__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_596__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_597__new_weights (optional) - T2: New weights

  • __group_597__new_gradients (optional) - T3: New gradients

  • __group_597__new_moment_1 (optional) - T4: New averaged gradients

  • __group_597__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_597__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_598__new_weights (optional) - T2: New weights

  • __group_598__new_gradients (optional) - T3: New gradients

  • __group_598__new_moment_1 (optional) - T4: New averaged gradients

  • __group_598__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_598__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_599__new_weights (optional) - T2: New weights

  • __group_599__new_gradients (optional) - T3: New gradients

  • __group_599__new_moment_1 (optional) - T4: New averaged gradients

  • __group_599__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_599__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_600__new_weights (optional) - T2: New weights

  • __group_600__new_gradients (optional) - T3: New gradients

  • __group_600__new_moment_1 (optional) - T4: New averaged gradients

  • __group_600__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_600__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_601__new_weights (optional) - T2: New weights

  • __group_601__new_gradients (optional) - T3: New gradients

  • __group_601__new_moment_1 (optional) - T4: New averaged gradients

  • __group_601__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_601__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_602__new_weights (optional) - T2: New weights

  • __group_602__new_gradients (optional) - T3: New gradients

  • __group_602__new_moment_1 (optional) - T4: New averaged gradients

  • __group_602__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_602__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_603__new_weights (optional) - T2: New weights

  • __group_603__new_gradients (optional) - T3: New gradients

  • __group_603__new_moment_1 (optional) - T4: New averaged gradients

  • __group_603__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_603__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_604__new_weights (optional) - T2: New weights

  • __group_604__new_gradients (optional) - T3: New gradients

  • __group_604__new_moment_1 (optional) - T4: New averaged gradients

  • __group_604__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_604__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_605__new_weights (optional) - T2: New weights

  • __group_605__new_gradients (optional) - T3: New gradients

  • __group_605__new_moment_1 (optional) - T4: New averaged gradients

  • __group_605__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_605__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_606__new_weights (optional) - T2: New weights

  • __group_606__new_gradients (optional) - T3: New gradients

  • __group_606__new_moment_1 (optional) - T4: New averaged gradients

  • __group_606__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_606__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_607__new_weights (optional) - T2: New weights

  • __group_607__new_gradients (optional) - T3: New gradients

  • __group_607__new_moment_1 (optional) - T4: New averaged gradients

  • __group_607__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_607__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_608__new_weights (optional) - T2: New weights

  • __group_608__new_gradients (optional) - T3: New gradients

  • __group_608__new_moment_1 (optional) - T4: New averaged gradients

  • __group_608__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_608__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_609__new_weights (optional) - T2: New weights

  • __group_609__new_gradients (optional) - T3: New gradients

  • __group_609__new_moment_1 (optional) - T4: New averaged gradients

  • __group_609__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_609__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_610__new_weights (optional) - T2: New weights

  • __group_610__new_gradients (optional) - T3: New gradients

  • __group_610__new_moment_1 (optional) - T4: New averaged gradients

  • __group_610__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_610__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_611__new_weights (optional) - T2: New weights

  • __group_611__new_gradients (optional) - T3: New gradients

  • __group_611__new_moment_1 (optional) - T4: New averaged gradients

  • __group_611__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_611__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_612__new_weights (optional) - T2: New weights

  • __group_612__new_gradients (optional) - T3: New gradients

  • __group_612__new_moment_1 (optional) - T4: New averaged gradients

  • __group_612__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_612__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_613__new_weights (optional) - T2: New weights

  • __group_613__new_gradients (optional) - T3: New gradients

  • __group_613__new_moment_1 (optional) - T4: New averaged gradients

  • __group_613__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_613__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_614__new_weights (optional) - T2: New weights

  • __group_614__new_gradients (optional) - T3: New gradients

  • __group_614__new_moment_1 (optional) - T4: New averaged gradients

  • __group_614__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_614__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_615__new_weights (optional) - T2: New weights

  • __group_615__new_gradients (optional) - T3: New gradients

  • __group_615__new_moment_1 (optional) - T4: New averaged gradients

  • __group_615__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_615__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_616__new_weights (optional) - T2: New weights

  • __group_616__new_gradients (optional) - T3: New gradients

  • __group_616__new_moment_1 (optional) - T4: New averaged gradients

  • __group_616__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_616__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_617__new_weights (optional) - T2: New weights

  • __group_617__new_gradients (optional) - T3: New gradients

  • __group_617__new_moment_1 (optional) - T4: New averaged gradients

  • __group_617__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_617__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_618__new_weights (optional) - T2: New weights

  • __group_618__new_gradients (optional) - T3: New gradients

  • __group_618__new_moment_1 (optional) - T4: New averaged gradients

  • __group_618__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_618__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_619__new_weights (optional) - T2: New weights

  • __group_619__new_gradients (optional) - T3: New gradients

  • __group_619__new_moment_1 (optional) - T4: New averaged gradients

  • __group_619__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_619__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_620__new_weights (optional) - T2: New weights

  • __group_620__new_gradients (optional) - T3: New gradients

  • __group_620__new_moment_1 (optional) - T4: New averaged gradients

  • __group_620__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_620__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_621__new_weights (optional) - T2: New weights

  • __group_621__new_gradients (optional) - T3: New gradients

  • __group_621__new_moment_1 (optional) - T4: New averaged gradients

  • __group_621__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_621__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_622__new_weights (optional) - T2: New weights

  • __group_622__new_gradients (optional) - T3: New gradients

  • __group_622__new_moment_1 (optional) - T4: New averaged gradients

  • __group_622__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_622__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_623__new_weights (optional) - T2: New weights

  • __group_623__new_gradients (optional) - T3: New gradients

  • __group_623__new_moment_1 (optional) - T4: New averaged gradients

  • __group_623__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_623__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_624__new_weights (optional) - T2: New weights

  • __group_624__new_gradients (optional) - T3: New gradients

  • __group_624__new_moment_1 (optional) - T4: New averaged gradients

  • __group_624__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_624__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_625__new_weights (optional) - T2: New weights

  • __group_625__new_gradients (optional) - T3: New gradients

  • __group_625__new_moment_1 (optional) - T4: New averaged gradients

  • __group_625__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_625__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_626__new_weights (optional) - T2: New weights

  • __group_626__new_gradients (optional) - T3: New gradients

  • __group_626__new_moment_1 (optional) - T4: New averaged gradients

  • __group_626__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_626__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_627__new_weights (optional) - T2: New weights

  • __group_627__new_gradients (optional) - T3: New gradients

  • __group_627__new_moment_1 (optional) - T4: New averaged gradients

  • __group_627__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_627__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_628__new_weights (optional) - T2: New weights

  • __group_628__new_gradients (optional) - T3: New gradients

  • __group_628__new_moment_1 (optional) - T4: New averaged gradients

  • __group_628__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_628__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_629__new_weights (optional) - T2: New weights

  • __group_629__new_gradients (optional) - T3: New gradients

  • __group_629__new_moment_1 (optional) - T4: New averaged gradients

  • __group_629__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_629__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_630__new_weights (optional) - T2: New weights

  • __group_630__new_gradients (optional) - T3: New gradients

  • __group_630__new_moment_1 (optional) - T4: New averaged gradients

  • __group_630__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_630__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_631__new_weights (optional) - T2: New weights

  • __group_631__new_gradients (optional) - T3: New gradients

  • __group_631__new_moment_1 (optional) - T4: New averaged gradients

  • __group_631__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_631__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_632__new_weights (optional) - T2: New weights

  • __group_632__new_gradients (optional) - T3: New gradients

  • __group_632__new_moment_1 (optional) - T4: New averaged gradients

  • __group_632__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_632__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_633__new_weights (optional) - T2: New weights

  • __group_633__new_gradients (optional) - T3: New gradients

  • __group_633__new_moment_1 (optional) - T4: New averaged gradients

  • __group_633__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_633__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_634__new_weights (optional) - T2: New weights

  • __group_634__new_gradients (optional) - T3: New gradients

  • __group_634__new_moment_1 (optional) - T4: New averaged gradients

  • __group_634__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_634__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_635__new_weights (optional) - T2: New weights

  • __group_635__new_gradients (optional) - T3: New gradients

  • __group_635__new_moment_1 (optional) - T4: New averaged gradients

  • __group_635__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_635__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_636__new_weights (optional) - T2: New weights

  • __group_636__new_gradients (optional) - T3: New gradients

  • __group_636__new_moment_1 (optional) - T4: New averaged gradients

  • __group_636__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_636__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_637__new_weights (optional) - T2: New weights

  • __group_637__new_gradients (optional) - T3: New gradients

  • __group_637__new_moment_1 (optional) - T4: New averaged gradients

  • __group_637__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_637__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_638__new_weights (optional) - T2: New weights

  • __group_638__new_gradients (optional) - T3: New gradients

  • __group_638__new_moment_1 (optional) - T4: New averaged gradients

  • __group_638__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_638__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_639__new_weights (optional) - T2: New weights

  • __group_639__new_gradients (optional) - T3: New gradients

  • __group_639__new_moment_1 (optional) - T4: New averaged gradients

  • __group_639__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_639__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_640__new_weights (optional) - T2: New weights

  • __group_640__new_gradients (optional) - T3: New gradients

  • __group_640__new_moment_1 (optional) - T4: New averaged gradients

  • __group_640__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_640__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_641__new_weights (optional) - T2: New weights

  • __group_641__new_gradients (optional) - T3: New gradients

  • __group_641__new_moment_1 (optional) - T4: New averaged gradients

  • __group_641__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_641__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_642__new_weights (optional) - T2: New weights

  • __group_642__new_gradients (optional) - T3: New gradients

  • __group_642__new_moment_1 (optional) - T4: New averaged gradients

  • __group_642__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_642__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_643__new_weights (optional) - T2: New weights

  • __group_643__new_gradients (optional) - T3: New gradients

  • __group_643__new_moment_1 (optional) - T4: New averaged gradients

  • __group_643__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_643__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_644__new_weights (optional) - T2: New weights

  • __group_644__new_gradients (optional) - T3: New gradients

  • __group_644__new_moment_1 (optional) - T4: New averaged gradients

  • __group_644__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_644__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_645__new_weights (optional) - T2: New weights

  • __group_645__new_gradients (optional) - T3: New gradients

  • __group_645__new_moment_1 (optional) - T4: New averaged gradients

  • __group_645__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_645__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_646__new_weights (optional) - T2: New weights

  • __group_646__new_gradients (optional) - T3: New gradients

  • __group_646__new_moment_1 (optional) - T4: New averaged gradients

  • __group_646__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_646__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_647__new_weights (optional) - T2: New weights

  • __group_647__new_gradients (optional) - T3: New gradients

  • __group_647__new_moment_1 (optional) - T4: New averaged gradients

  • __group_647__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_647__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_648__new_weights (optional) - T2: New weights

  • __group_648__new_gradients (optional) - T3: New gradients

  • __group_648__new_moment_1 (optional) - T4: New averaged gradients

  • __group_648__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_648__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_649__new_weights (optional) - T2: New weights

  • __group_649__new_gradients (optional) - T3: New gradients

  • __group_649__new_moment_1 (optional) - T4: New averaged gradients

  • __group_649__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_649__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_650__new_weights (optional) - T2: New weights

  • __group_650__new_gradients (optional) - T3: New gradients

  • __group_650__new_moment_1 (optional) - T4: New averaged gradients

  • __group_650__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_650__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_651__new_weights (optional) - T2: New weights

  • __group_651__new_gradients (optional) - T3: New gradients

  • __group_651__new_moment_1 (optional) - T4: New averaged gradients

  • __group_651__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_651__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_652__new_weights (optional) - T2: New weights

  • __group_652__new_gradients (optional) - T3: New gradients

  • __group_652__new_moment_1 (optional) - T4: New averaged gradients

  • __group_652__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_652__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_653__new_weights (optional) - T2: New weights

  • __group_653__new_gradients (optional) - T3: New gradients

  • __group_653__new_moment_1 (optional) - T4: New averaged gradients

  • __group_653__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_653__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_654__new_weights (optional) - T2: New weights

  • __group_654__new_gradients (optional) - T3: New gradients

  • __group_654__new_moment_1 (optional) - T4: New averaged gradients

  • __group_654__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_654__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_655__new_weights (optional) - T2: New weights

  • __group_655__new_gradients (optional) - T3: New gradients

  • __group_655__new_moment_1 (optional) - T4: New averaged gradients

  • __group_655__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_655__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_656__new_weights (optional) - T2: New weights

  • __group_656__new_gradients (optional) - T3: New gradients

  • __group_656__new_moment_1 (optional) - T4: New averaged gradients

  • __group_656__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_656__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_657__new_weights (optional) - T2: New weights

  • __group_657__new_gradients (optional) - T3: New gradients

  • __group_657__new_moment_1 (optional) - T4: New averaged gradients

  • __group_657__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_657__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_658__new_weights (optional) - T2: New weights

  • __group_658__new_gradients (optional) - T3: New gradients

  • __group_658__new_moment_1 (optional) - T4: New averaged gradients

  • __group_658__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_658__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_659__new_weights (optional) - T2: New weights

  • __group_659__new_gradients (optional) - T3: New gradients

  • __group_659__new_moment_1 (optional) - T4: New averaged gradients

  • __group_659__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_659__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_660__new_weights (optional) - T2: New weights

  • __group_660__new_gradients (optional) - T3: New gradients

  • __group_660__new_moment_1 (optional) - T4: New averaged gradients

  • __group_660__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_660__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_661__new_weights (optional) - T2: New weights

  • __group_661__new_gradients (optional) - T3: New gradients

  • __group_661__new_moment_1 (optional) - T4: New averaged gradients

  • __group_661__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_661__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_662__new_weights (optional) - T2: New weights

  • __group_662__new_gradients (optional) - T3: New gradients

  • __group_662__new_moment_1 (optional) - T4: New averaged gradients

  • __group_662__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_662__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_663__new_weights (optional) - T2: New weights

  • __group_663__new_gradients (optional) - T3: New gradients

  • __group_663__new_moment_1 (optional) - T4: New averaged gradients

  • __group_663__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_663__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_664__new_weights (optional) - T2: New weights

  • __group_664__new_gradients (optional) - T3: New gradients

  • __group_664__new_moment_1 (optional) - T4: New averaged gradients

  • __group_664__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_664__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_665__new_weights (optional) - T2: New weights

  • __group_665__new_gradients (optional) - T3: New gradients

  • __group_665__new_moment_1 (optional) - T4: New averaged gradients

  • __group_665__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_665__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_666__new_weights (optional) - T2: New weights

  • __group_666__new_gradients (optional) - T3: New gradients

  • __group_666__new_moment_1 (optional) - T4: New averaged gradients

  • __group_666__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_666__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_667__new_weights (optional) - T2: New weights

  • __group_667__new_gradients (optional) - T3: New gradients

  • __group_667__new_moment_1 (optional) - T4: New averaged gradients

  • __group_667__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_667__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_668__new_weights (optional) - T2: New weights

  • __group_668__new_gradients (optional) - T3: New gradients

  • __group_668__new_moment_1 (optional) - T4: New averaged gradients

  • __group_668__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_668__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_669__new_weights (optional) - T2: New weights

  • __group_669__new_gradients (optional) - T3: New gradients

  • __group_669__new_moment_1 (optional) - T4: New averaged gradients

  • __group_669__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_669__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_670__new_weights (optional) - T2: New weights

  • __group_670__new_gradients (optional) - T3: New gradients

  • __group_670__new_moment_1 (optional) - T4: New averaged gradients

  • __group_670__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_670__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_671__new_weights (optional) - T2: New weights

  • __group_671__new_gradients (optional) - T3: New gradients

  • __group_671__new_moment_1 (optional) - T4: New averaged gradients

  • __group_671__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_671__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_672__new_weights (optional) - T2: New weights

  • __group_672__new_gradients (optional) - T3: New gradients

  • __group_672__new_moment_1 (optional) - T4: New averaged gradients

  • __group_672__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_672__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_673__new_weights (optional) - T2: New weights

  • __group_673__new_gradients (optional) - T3: New gradients

  • __group_673__new_moment_1 (optional) - T4: New averaged gradients

  • __group_673__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_673__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_674__new_weights (optional) - T2: New weights

  • __group_674__new_gradients (optional) - T3: New gradients

  • __group_674__new_moment_1 (optional) - T4: New averaged gradients

  • __group_674__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_674__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_675__new_weights (optional) - T2: New weights

  • __group_675__new_gradients (optional) - T3: New gradients

  • __group_675__new_moment_1 (optional) - T4: New averaged gradients

  • __group_675__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_675__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_676__new_weights (optional) - T2: New weights

  • __group_676__new_gradients (optional) - T3: New gradients

  • __group_676__new_moment_1 (optional) - T4: New averaged gradients

  • __group_676__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_676__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_677__new_weights (optional) - T2: New weights

  • __group_677__new_gradients (optional) - T3: New gradients

  • __group_677__new_moment_1 (optional) - T4: New averaged gradients

  • __group_677__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_677__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_678__new_weights (optional) - T2: New weights

  • __group_678__new_gradients (optional) - T3: New gradients

  • __group_678__new_moment_1 (optional) - T4: New averaged gradients

  • __group_678__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_678__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_679__new_weights (optional) - T2: New weights

  • __group_679__new_gradients (optional) - T3: New gradients

  • __group_679__new_moment_1 (optional) - T4: New averaged gradients

  • __group_679__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_679__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_680__new_weights (optional) - T2: New weights

  • __group_680__new_gradients (optional) - T3: New gradients

  • __group_680__new_moment_1 (optional) - T4: New averaged gradients

  • __group_680__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_680__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_681__new_weights (optional) - T2: New weights

  • __group_681__new_gradients (optional) - T3: New gradients

  • __group_681__new_moment_1 (optional) - T4: New averaged gradients

  • __group_681__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_681__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_682__new_weights (optional) - T2: New weights

  • __group_682__new_gradients (optional) - T3: New gradients

  • __group_682__new_moment_1 (optional) - T4: New averaged gradients

  • __group_682__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_682__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_683__new_weights (optional) - T2: New weights

  • __group_683__new_gradients (optional) - T3: New gradients

  • __group_683__new_moment_1 (optional) - T4: New averaged gradients

  • __group_683__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_683__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_684__new_weights (optional) - T2: New weights

  • __group_684__new_gradients (optional) - T3: New gradients

  • __group_684__new_moment_1 (optional) - T4: New averaged gradients

  • __group_684__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_684__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_685__new_weights (optional) - T2: New weights

  • __group_685__new_gradients (optional) - T3: New gradients

  • __group_685__new_moment_1 (optional) - T4: New averaged gradients

  • __group_685__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_685__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_686__new_weights (optional) - T2: New weights

  • __group_686__new_gradients (optional) - T3: New gradients

  • __group_686__new_moment_1 (optional) - T4: New averaged gradients

  • __group_686__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_686__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_687__new_weights (optional) - T2: New weights

  • __group_687__new_gradients (optional) - T3: New gradients

  • __group_687__new_moment_1 (optional) - T4: New averaged gradients

  • __group_687__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_687__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_688__new_weights (optional) - T2: New weights

  • __group_688__new_gradients (optional) - T3: New gradients

  • __group_688__new_moment_1 (optional) - T4: New averaged gradients

  • __group_688__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_688__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_689__new_weights (optional) - T2: New weights

  • __group_689__new_gradients (optional) - T3: New gradients

  • __group_689__new_moment_1 (optional) - T4: New averaged gradients

  • __group_689__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_689__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_690__new_weights (optional) - T2: New weights

  • __group_690__new_gradients (optional) - T3: New gradients

  • __group_690__new_moment_1 (optional) - T4: New averaged gradients

  • __group_690__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_690__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_691__new_weights (optional) - T2: New weights

  • __group_691__new_gradients (optional) - T3: New gradients

  • __group_691__new_moment_1 (optional) - T4: New averaged gradients

  • __group_691__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_691__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_692__new_weights (optional) - T2: New weights

  • __group_692__new_gradients (optional) - T3: New gradients

  • __group_692__new_moment_1 (optional) - T4: New averaged gradients

  • __group_692__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_692__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_693__new_weights (optional) - T2: New weights

  • __group_693__new_gradients (optional) - T3: New gradients

  • __group_693__new_moment_1 (optional) - T4: New averaged gradients

  • __group_693__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_693__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_694__new_weights (optional) - T2: New weights

  • __group_694__new_gradients (optional) - T3: New gradients

  • __group_694__new_moment_1 (optional) - T4: New averaged gradients

  • __group_694__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_694__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_695__new_weights (optional) - T2: New weights

  • __group_695__new_gradients (optional) - T3: New gradients

  • __group_695__new_moment_1 (optional) - T4: New averaged gradients

  • __group_695__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_695__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_696__new_weights (optional) - T2: New weights

  • __group_696__new_gradients (optional) - T3: New gradients

  • __group_696__new_moment_1 (optional) - T4: New averaged gradients

  • __group_696__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_696__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_697__new_weights (optional) - T2: New weights

  • __group_697__new_gradients (optional) - T3: New gradients

  • __group_697__new_moment_1 (optional) - T4: New averaged gradients

  • __group_697__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_697__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_698__new_weights (optional) - T2: New weights

  • __group_698__new_gradients (optional) - T3: New gradients

  • __group_698__new_moment_1 (optional) - T4: New averaged gradients

  • __group_698__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_698__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_699__new_weights (optional) - T2: New weights

  • __group_699__new_gradients (optional) - T3: New gradients

  • __group_699__new_moment_1 (optional) - T4: New averaged gradients

  • __group_699__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_699__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_700__new_weights (optional) - T2: New weights

  • __group_700__new_gradients (optional) - T3: New gradients

  • __group_700__new_moment_1 (optional) - T4: New averaged gradients

  • __group_700__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_700__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_701__new_weights (optional) - T2: New weights

  • __group_701__new_gradients (optional) - T3: New gradients

  • __group_701__new_moment_1 (optional) - T4: New averaged gradients

  • __group_701__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_701__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_702__new_weights (optional) - T2: New weights

  • __group_702__new_gradients (optional) - T3: New gradients

  • __group_702__new_moment_1 (optional) - T4: New averaged gradients

  • __group_702__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_702__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_703__new_weights (optional) - T2: New weights

  • __group_703__new_gradients (optional) - T3: New gradients

  • __group_703__new_moment_1 (optional) - T4: New averaged gradients

  • __group_703__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_703__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_704__new_weights (optional) - T2: New weights

  • __group_704__new_gradients (optional) - T3: New gradients

  • __group_704__new_moment_1 (optional) - T4: New averaged gradients

  • __group_704__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_704__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_705__new_weights (optional) - T2: New weights

  • __group_705__new_gradients (optional) - T3: New gradients

  • __group_705__new_moment_1 (optional) - T4: New averaged gradients

  • __group_705__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_705__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_706__new_weights (optional) - T2: New weights

  • __group_706__new_gradients (optional) - T3: New gradients

  • __group_706__new_moment_1 (optional) - T4: New averaged gradients

  • __group_706__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_706__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_707__new_weights (optional) - T2: New weights

  • __group_707__new_gradients (optional) - T3: New gradients

  • __group_707__new_moment_1 (optional) - T4: New averaged gradients

  • __group_707__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_707__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_708__new_weights (optional) - T2: New weights

  • __group_708__new_gradients (optional) - T3: New gradients

  • __group_708__new_moment_1 (optional) - T4: New averaged gradients

  • __group_708__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_708__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_709__new_weights (optional) - T2: New weights

  • __group_709__new_gradients (optional) - T3: New gradients

  • __group_709__new_moment_1 (optional) - T4: New averaged gradients

  • __group_709__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_709__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_710__new_weights (optional) - T2: New weights

  • __group_710__new_gradients (optional) - T3: New gradients

  • __group_710__new_moment_1 (optional) - T4: New averaged gradients

  • __group_710__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_710__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_711__new_weights (optional) - T2: New weights

  • __group_711__new_gradients (optional) - T3: New gradients

  • __group_711__new_moment_1 (optional) - T4: New averaged gradients

  • __group_711__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_711__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_712__new_weights (optional) - T2: New weights

  • __group_712__new_gradients (optional) - T3: New gradients

  • __group_712__new_moment_1 (optional) - T4: New averaged gradients

  • __group_712__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_712__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_713__new_weights (optional) - T2: New weights

  • __group_713__new_gradients (optional) - T3: New gradients

  • __group_713__new_moment_1 (optional) - T4: New averaged gradients

  • __group_713__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_713__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_714__new_weights (optional) - T2: New weights

  • __group_714__new_gradients (optional) - T3: New gradients

  • __group_714__new_moment_1 (optional) - T4: New averaged gradients

  • __group_714__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_714__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_715__new_weights (optional) - T2: New weights

  • __group_715__new_gradients (optional) - T3: New gradients

  • __group_715__new_moment_1 (optional) - T4: New averaged gradients

  • __group_715__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_715__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_716__new_weights (optional) - T2: New weights

  • __group_716__new_gradients (optional) - T3: New gradients

  • __group_716__new_moment_1 (optional) - T4: New averaged gradients

  • __group_716__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_716__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_717__new_weights (optional) - T2: New weights

  • __group_717__new_gradients (optional) - T3: New gradients

  • __group_717__new_moment_1 (optional) - T4: New averaged gradients

  • __group_717__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_717__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_718__new_weights (optional) - T2: New weights

  • __group_718__new_gradients (optional) - T3: New gradients

  • __group_718__new_moment_1 (optional) - T4: New averaged gradients

  • __group_718__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_718__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_719__new_weights (optional) - T2: New weights

  • __group_719__new_gradients (optional) - T3: New gradients

  • __group_719__new_moment_1 (optional) - T4: New averaged gradients

  • __group_719__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_719__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_720__new_weights (optional) - T2: New weights

  • __group_720__new_gradients (optional) - T3: New gradients

  • __group_720__new_moment_1 (optional) - T4: New averaged gradients

  • __group_720__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_720__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_721__new_weights (optional) - T2: New weights

  • __group_721__new_gradients (optional) - T3: New gradients

  • __group_721__new_moment_1 (optional) - T4: New averaged gradients

  • __group_721__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_721__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_722__new_weights (optional) - T2: New weights

  • __group_722__new_gradients (optional) - T3: New gradients

  • __group_722__new_moment_1 (optional) - T4: New averaged gradients

  • __group_722__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_722__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_723__new_weights (optional) - T2: New weights

  • __group_723__new_gradients (optional) - T3: New gradients

  • __group_723__new_moment_1 (optional) - T4: New averaged gradients

  • __group_723__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_723__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_724__new_weights (optional) - T2: New weights

  • __group_724__new_gradients (optional) - T3: New gradients

  • __group_724__new_moment_1 (optional) - T4: New averaged gradients

  • __group_724__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_724__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_725__new_weights (optional) - T2: New weights

  • __group_725__new_gradients (optional) - T3: New gradients

  • __group_725__new_moment_1 (optional) - T4: New averaged gradients

  • __group_725__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_725__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_726__new_weights (optional) - T2: New weights

  • __group_726__new_gradients (optional) - T3: New gradients

  • __group_726__new_moment_1 (optional) - T4: New averaged gradients

  • __group_726__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_726__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_727__new_weights (optional) - T2: New weights

  • __group_727__new_gradients (optional) - T3: New gradients

  • __group_727__new_moment_1 (optional) - T4: New averaged gradients

  • __group_727__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_727__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_728__new_weights (optional) - T2: New weights

  • __group_728__new_gradients (optional) - T3: New gradients

  • __group_728__new_moment_1 (optional) - T4: New averaged gradients

  • __group_728__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_728__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_729__new_weights (optional) - T2: New weights

  • __group_729__new_gradients (optional) - T3: New gradients

  • __group_729__new_moment_1 (optional) - T4: New averaged gradients

  • __group_729__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_729__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_730__new_weights (optional) - T2: New weights

  • __group_730__new_gradients (optional) - T3: New gradients

  • __group_730__new_moment_1 (optional) - T4: New averaged gradients

  • __group_730__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_730__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_731__new_weights (optional) - T2: New weights

  • __group_731__new_gradients (optional) - T3: New gradients

  • __group_731__new_moment_1 (optional) - T4: New averaged gradients

  • __group_731__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_731__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_732__new_weights (optional) - T2: New weights

  • __group_732__new_gradients (optional) - T3: New gradients

  • __group_732__new_moment_1 (optional) - T4: New averaged gradients

  • __group_732__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_732__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_733__new_weights (optional) - T2: New weights

  • __group_733__new_gradients (optional) - T3: New gradients

  • __group_733__new_moment_1 (optional) - T4: New averaged gradients

  • __group_733__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_733__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_734__new_weights (optional) - T2: New weights

  • __group_734__new_gradients (optional) - T3: New gradients

  • __group_734__new_moment_1 (optional) - T4: New averaged gradients

  • __group_734__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_734__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_735__new_weights (optional) - T2: New weights

  • __group_735__new_gradients (optional) - T3: New gradients

  • __group_735__new_moment_1 (optional) - T4: New averaged gradients

  • __group_735__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_735__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_736__new_weights (optional) - T2: New weights

  • __group_736__new_gradients (optional) - T3: New gradients

  • __group_736__new_moment_1 (optional) - T4: New averaged gradients

  • __group_736__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_736__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_737__new_weights (optional) - T2: New weights

  • __group_737__new_gradients (optional) - T3: New gradients

  • __group_737__new_moment_1 (optional) - T4: New averaged gradients

  • __group_737__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_737__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_738__new_weights (optional) - T2: New weights

  • __group_738__new_gradients (optional) - T3: New gradients

  • __group_738__new_moment_1 (optional) - T4: New averaged gradients

  • __group_738__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_738__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_739__new_weights (optional) - T2: New weights

  • __group_739__new_gradients (optional) - T3: New gradients

  • __group_739__new_moment_1 (optional) - T4: New averaged gradients

  • __group_739__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_739__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_740__new_weights (optional) - T2: New weights

  • __group_740__new_gradients (optional) - T3: New gradients

  • __group_740__new_moment_1 (optional) - T4: New averaged gradients

  • __group_740__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_740__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_741__new_weights (optional) - T2: New weights

  • __group_741__new_gradients (optional) - T3: New gradients

  • __group_741__new_moment_1 (optional) - T4: New averaged gradients

  • __group_741__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_741__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_742__new_weights (optional) - T2: New weights

  • __group_742__new_gradients (optional) - T3: New gradients

  • __group_742__new_moment_1 (optional) - T4: New averaged gradients

  • __group_742__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_742__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_743__new_weights (optional) - T2: New weights

  • __group_743__new_gradients (optional) - T3: New gradients

  • __group_743__new_moment_1 (optional) - T4: New averaged gradients

  • __group_743__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_743__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_744__new_weights (optional) - T2: New weights

  • __group_744__new_gradients (optional) - T3: New gradients

  • __group_744__new_moment_1 (optional) - T4: New averaged gradients

  • __group_744__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_744__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_745__new_weights (optional) - T2: New weights

  • __group_745__new_gradients (optional) - T3: New gradients

  • __group_745__new_moment_1 (optional) - T4: New averaged gradients

  • __group_745__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_745__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_746__new_weights (optional) - T2: New weights

  • __group_746__new_gradients (optional) - T3: New gradients

  • __group_746__new_moment_1 (optional) - T4: New averaged gradients

  • __group_746__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_746__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_747__new_weights (optional) - T2: New weights

  • __group_747__new_gradients (optional) - T3: New gradients

  • __group_747__new_moment_1 (optional) - T4: New averaged gradients

  • __group_747__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_747__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_748__new_weights (optional) - T2: New weights

  • __group_748__new_gradients (optional) - T3: New gradients

  • __group_748__new_moment_1 (optional) - T4: New averaged gradients

  • __group_748__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_748__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_749__new_weights (optional) - T2: New weights

  • __group_749__new_gradients (optional) - T3: New gradients

  • __group_749__new_moment_1 (optional) - T4: New averaged gradients

  • __group_749__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_749__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_750__new_weights (optional) - T2: New weights

  • __group_750__new_gradients (optional) - T3: New gradients

  • __group_750__new_moment_1 (optional) - T4: New averaged gradients

  • __group_750__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_750__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_751__new_weights (optional) - T2: New weights

  • __group_751__new_gradients (optional) - T3: New gradients

  • __group_751__new_moment_1 (optional) - T4: New averaged gradients

  • __group_751__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_751__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_752__new_weights (optional) - T2: New weights

  • __group_752__new_gradients (optional) - T3: New gradients

  • __group_752__new_moment_1 (optional) - T4: New averaged gradients

  • __group_752__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_752__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_753__new_weights (optional) - T2: New weights

  • __group_753__new_gradients (optional) - T3: New gradients

  • __group_753__new_moment_1 (optional) - T4: New averaged gradients

  • __group_753__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_753__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_754__new_weights (optional) - T2: New weights

  • __group_754__new_gradients (optional) - T3: New gradients

  • __group_754__new_moment_1 (optional) - T4: New averaged gradients

  • __group_754__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_754__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_755__new_weights (optional) - T2: New weights

  • __group_755__new_gradients (optional) - T3: New gradients

  • __group_755__new_moment_1 (optional) - T4: New averaged gradients

  • __group_755__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_755__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_756__new_weights (optional) - T2: New weights

  • __group_756__new_gradients (optional) - T3: New gradients

  • __group_756__new_moment_1 (optional) - T4: New averaged gradients

  • __group_756__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_756__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_757__new_weights (optional) - T2: New weights

  • __group_757__new_gradients (optional) - T3: New gradients

  • __group_757__new_moment_1 (optional) - T4: New averaged gradients

  • __group_757__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_757__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_758__new_weights (optional) - T2: New weights

  • __group_758__new_gradients (optional) - T3: New gradients

  • __group_758__new_moment_1 (optional) - T4: New averaged gradients

  • __group_758__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_758__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_759__new_weights (optional) - T2: New weights

  • __group_759__new_gradients (optional) - T3: New gradients

  • __group_759__new_moment_1 (optional) - T4: New averaged gradients

  • __group_759__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_759__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_760__new_weights (optional) - T2: New weights

  • __group_760__new_gradients (optional) - T3: New gradients

  • __group_760__new_moment_1 (optional) - T4: New averaged gradients

  • __group_760__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_760__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_761__new_weights (optional) - T2: New weights

  • __group_761__new_gradients (optional) - T3: New gradients

  • __group_761__new_moment_1 (optional) - T4: New averaged gradients

  • __group_761__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_761__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_762__new_weights (optional) - T2: New weights

  • __group_762__new_gradients (optional) - T3: New gradients

  • __group_762__new_moment_1 (optional) - T4: New averaged gradients

  • __group_762__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_762__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_763__new_weights (optional) - T2: New weights

  • __group_763__new_gradients (optional) - T3: New gradients

  • __group_763__new_moment_1 (optional) - T4: New averaged gradients

  • __group_763__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_763__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_764__new_weights (optional) - T2: New weights

  • __group_764__new_gradients (optional) - T3: New gradients

  • __group_764__new_moment_1 (optional) - T4: New averaged gradients

  • __group_764__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_764__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_765__new_weights (optional) - T2: New weights

  • __group_765__new_gradients (optional) - T3: New gradients

  • __group_765__new_moment_1 (optional) - T4: New averaged gradients

  • __group_765__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_765__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_766__new_weights (optional) - T2: New weights

  • __group_766__new_gradients (optional) - T3: New gradients

  • __group_766__new_moment_1 (optional) - T4: New averaged gradients

  • __group_766__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_766__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_767__new_weights (optional) - T2: New weights

  • __group_767__new_gradients (optional) - T3: New gradients

  • __group_767__new_moment_1 (optional) - T4: New averaged gradients

  • __group_767__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_767__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_768__new_weights (optional) - T2: New weights

  • __group_768__new_gradients (optional) - T3: New gradients

  • __group_768__new_moment_1 (optional) - T4: New averaged gradients

  • __group_768__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_768__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_769__new_weights (optional) - T2: New weights

  • __group_769__new_gradients (optional) - T3: New gradients

  • __group_769__new_moment_1 (optional) - T4: New averaged gradients

  • __group_769__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_769__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_770__new_weights (optional) - T2: New weights

  • __group_770__new_gradients (optional) - T3: New gradients

  • __group_770__new_moment_1 (optional) - T4: New averaged gradients

  • __group_770__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_770__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_771__new_weights (optional) - T2: New weights

  • __group_771__new_gradients (optional) - T3: New gradients

  • __group_771__new_moment_1 (optional) - T4: New averaged gradients

  • __group_771__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_771__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_772__new_weights (optional) - T2: New weights

  • __group_772__new_gradients (optional) - T3: New gradients

  • __group_772__new_moment_1 (optional) - T4: New averaged gradients

  • __group_772__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_772__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_773__new_weights (optional) - T2: New weights

  • __group_773__new_gradients (optional) - T3: New gradients

  • __group_773__new_moment_1 (optional) - T4: New averaged gradients

  • __group_773__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_773__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_774__new_weights (optional) - T2: New weights

  • __group_774__new_gradients (optional) - T3: New gradients

  • __group_774__new_moment_1 (optional) - T4: New averaged gradients

  • __group_774__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_774__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_775__new_weights (optional) - T2: New weights

  • __group_775__new_gradients (optional) - T3: New gradients

  • __group_775__new_moment_1 (optional) - T4: New averaged gradients

  • __group_775__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_775__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_776__new_weights (optional) - T2: New weights

  • __group_776__new_gradients (optional) - T3: New gradients

  • __group_776__new_moment_1 (optional) - T4: New averaged gradients

  • __group_776__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_776__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_777__new_weights (optional) - T2: New weights

  • __group_777__new_gradients (optional) - T3: New gradients

  • __group_777__new_moment_1 (optional) - T4: New averaged gradients

  • __group_777__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_777__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_778__new_weights (optional) - T2: New weights

  • __group_778__new_gradients (optional) - T3: New gradients

  • __group_778__new_moment_1 (optional) - T4: New averaged gradients

  • __group_778__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_778__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_779__new_weights (optional) - T2: New weights

  • __group_779__new_gradients (optional) - T3: New gradients

  • __group_779__new_moment_1 (optional) - T4: New averaged gradients

  • __group_779__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_779__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_780__new_weights (optional) - T2: New weights

  • __group_780__new_gradients (optional) - T3: New gradients

  • __group_780__new_moment_1 (optional) - T4: New averaged gradients

  • __group_780__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_780__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_781__new_weights (optional) - T2: New weights

  • __group_781__new_gradients (optional) - T3: New gradients

  • __group_781__new_moment_1 (optional) - T4: New averaged gradients

  • __group_781__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_781__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_782__new_weights (optional) - T2: New weights

  • __group_782__new_gradients (optional) - T3: New gradients

  • __group_782__new_moment_1 (optional) - T4: New averaged gradients

  • __group_782__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_782__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_783__new_weights (optional) - T2: New weights

  • __group_783__new_gradients (optional) - T3: New gradients

  • __group_783__new_moment_1 (optional) - T4: New averaged gradients

  • __group_783__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_783__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_784__new_weights (optional) - T2: New weights

  • __group_784__new_gradients (optional) - T3: New gradients

  • __group_784__new_moment_1 (optional) - T4: New averaged gradients

  • __group_784__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_784__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_785__new_weights (optional) - T2: New weights

  • __group_785__new_gradients (optional) - T3: New gradients

  • __group_785__new_moment_1 (optional) - T4: New averaged gradients

  • __group_785__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_785__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_786__new_weights (optional) - T2: New weights

  • __group_786__new_gradients (optional) - T3: New gradients

  • __group_786__new_moment_1 (optional) - T4: New averaged gradients

  • __group_786__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_786__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_787__new_weights (optional) - T2: New weights

  • __group_787__new_gradients (optional) - T3: New gradients

  • __group_787__new_moment_1 (optional) - T4: New averaged gradients

  • __group_787__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_787__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_788__new_weights (optional) - T2: New weights

  • __group_788__new_gradients (optional) - T3: New gradients

  • __group_788__new_moment_1 (optional) - T4: New averaged gradients

  • __group_788__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_788__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_789__new_weights (optional) - T2: New weights

  • __group_789__new_gradients (optional) - T3: New gradients

  • __group_789__new_moment_1 (optional) - T4: New averaged gradients

  • __group_789__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_789__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_790__new_weights (optional) - T2: New weights

  • __group_790__new_gradients (optional) - T3: New gradients

  • __group_790__new_moment_1 (optional) - T4: New averaged gradients

  • __group_790__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_790__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_791__new_weights (optional) - T2: New weights

  • __group_791__new_gradients (optional) - T3: New gradients

  • __group_791__new_moment_1 (optional) - T4: New averaged gradients

  • __group_791__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_791__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_792__new_weights (optional) - T2: New weights

  • __group_792__new_gradients (optional) - T3: New gradients

  • __group_792__new_moment_1 (optional) - T4: New averaged gradients

  • __group_792__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_792__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_793__new_weights (optional) - T2: New weights

  • __group_793__new_gradients (optional) - T3: New gradients

  • __group_793__new_moment_1 (optional) - T4: New averaged gradients

  • __group_793__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_793__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_794__new_weights (optional) - T2: New weights

  • __group_794__new_gradients (optional) - T3: New gradients

  • __group_794__new_moment_1 (optional) - T4: New averaged gradients

  • __group_794__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_794__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_795__new_weights (optional) - T2: New weights

  • __group_795__new_gradients (optional) - T3: New gradients

  • __group_795__new_moment_1 (optional) - T4: New averaged gradients

  • __group_795__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_795__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_796__new_weights (optional) - T2: New weights

  • __group_796__new_gradients (optional) - T3: New gradients

  • __group_796__new_moment_1 (optional) - T4: New averaged gradients

  • __group_796__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_796__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_797__new_weights (optional) - T2: New weights

  • __group_797__new_gradients (optional) - T3: New gradients

  • __group_797__new_moment_1 (optional) - T4: New averaged gradients

  • __group_797__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_797__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_798__new_weights (optional) - T2: New weights

  • __group_798__new_gradients (optional) - T3: New gradients

  • __group_798__new_moment_1 (optional) - T4: New averaged gradients

  • __group_798__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_798__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_799__new_weights (optional) - T2: New weights

  • __group_799__new_gradients (optional) - T3: New gradients

  • __group_799__new_moment_1 (optional) - T4: New averaged gradients

  • __group_799__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_799__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_800__new_weights (optional) - T2: New weights

  • __group_800__new_gradients (optional) - T3: New gradients

  • __group_800__new_moment_1 (optional) - T4: New averaged gradients

  • __group_800__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_800__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_801__new_weights (optional) - T2: New weights

  • __group_801__new_gradients (optional) - T3: New gradients

  • __group_801__new_moment_1 (optional) - T4: New averaged gradients

  • __group_801__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_801__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_802__new_weights (optional) - T2: New weights

  • __group_802__new_gradients (optional) - T3: New gradients

  • __group_802__new_moment_1 (optional) - T4: New averaged gradients

  • __group_802__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_802__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_803__new_weights (optional) - T2: New weights

  • __group_803__new_gradients (optional) - T3: New gradients

  • __group_803__new_moment_1 (optional) - T4: New averaged gradients

  • __group_803__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_803__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_804__new_weights (optional) - T2: New weights

  • __group_804__new_gradients (optional) - T3: New gradients

  • __group_804__new_moment_1 (optional) - T4: New averaged gradients

  • __group_804__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_804__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_805__new_weights (optional) - T2: New weights

  • __group_805__new_gradients (optional) - T3: New gradients

  • __group_805__new_moment_1 (optional) - T4: New averaged gradients

  • __group_805__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_805__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_806__new_weights (optional) - T2: New weights

  • __group_806__new_gradients (optional) - T3: New gradients

  • __group_806__new_moment_1 (optional) - T4: New averaged gradients

  • __group_806__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_806__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_807__new_weights (optional) - T2: New weights

  • __group_807__new_gradients (optional) - T3: New gradients

  • __group_807__new_moment_1 (optional) - T4: New averaged gradients

  • __group_807__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_807__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_808__new_weights (optional) - T2: New weights

  • __group_808__new_gradients (optional) - T3: New gradients

  • __group_808__new_moment_1 (optional) - T4: New averaged gradients

  • __group_808__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_808__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_809__new_weights (optional) - T2: New weights

  • __group_809__new_gradients (optional) - T3: New gradients

  • __group_809__new_moment_1 (optional) - T4: New averaged gradients

  • __group_809__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_809__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_810__new_weights (optional) - T2: New weights

  • __group_810__new_gradients (optional) - T3: New gradients

  • __group_810__new_moment_1 (optional) - T4: New averaged gradients

  • __group_810__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_810__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_811__new_weights (optional) - T2: New weights

  • __group_811__new_gradients (optional) - T3: New gradients

  • __group_811__new_moment_1 (optional) - T4: New averaged gradients

  • __group_811__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_811__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_812__new_weights (optional) - T2: New weights

  • __group_812__new_gradients (optional) - T3: New gradients

  • __group_812__new_moment_1 (optional) - T4: New averaged gradients

  • __group_812__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_812__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_813__new_weights (optional) - T2: New weights

  • __group_813__new_gradients (optional) - T3: New gradients

  • __group_813__new_moment_1 (optional) - T4: New averaged gradients

  • __group_813__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_813__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_814__new_weights (optional) - T2: New weights

  • __group_814__new_gradients (optional) - T3: New gradients

  • __group_814__new_moment_1 (optional) - T4: New averaged gradients

  • __group_814__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_814__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_815__new_weights (optional) - T2: New weights

  • __group_815__new_gradients (optional) - T3: New gradients

  • __group_815__new_moment_1 (optional) - T4: New averaged gradients

  • __group_815__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_815__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_816__new_weights (optional) - T2: New weights

  • __group_816__new_gradients (optional) - T3: New gradients

  • __group_816__new_moment_1 (optional) - T4: New averaged gradients

  • __group_816__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_816__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_817__new_weights (optional) - T2: New weights

  • __group_817__new_gradients (optional) - T3: New gradients

  • __group_817__new_moment_1 (optional) - T4: New averaged gradients

  • __group_817__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_817__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_818__new_weights (optional) - T2: New weights

  • __group_818__new_gradients (optional) - T3: New gradients

  • __group_818__new_moment_1 (optional) - T4: New averaged gradients

  • __group_818__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_818__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_819__new_weights (optional) - T2: New weights

  • __group_819__new_gradients (optional) - T3: New gradients

  • __group_819__new_moment_1 (optional) - T4: New averaged gradients

  • __group_819__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_819__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_820__new_weights (optional) - T2: New weights

  • __group_820__new_gradients (optional) - T3: New gradients

  • __group_820__new_moment_1 (optional) - T4: New averaged gradients

  • __group_820__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_820__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_821__new_weights (optional) - T2: New weights

  • __group_821__new_gradients (optional) - T3: New gradients

  • __group_821__new_moment_1 (optional) - T4: New averaged gradients

  • __group_821__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_821__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_822__new_weights (optional) - T2: New weights

  • __group_822__new_gradients (optional) - T3: New gradients

  • __group_822__new_moment_1 (optional) - T4: New averaged gradients

  • __group_822__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_822__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_823__new_weights (optional) - T2: New weights

  • __group_823__new_gradients (optional) - T3: New gradients

  • __group_823__new_moment_1 (optional) - T4: New averaged gradients

  • __group_823__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_823__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_824__new_weights (optional) - T2: New weights

  • __group_824__new_gradients (optional) - T3: New gradients

  • __group_824__new_moment_1 (optional) - T4: New averaged gradients

  • __group_824__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_824__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_825__new_weights (optional) - T2: New weights

  • __group_825__new_gradients (optional) - T3: New gradients

  • __group_825__new_moment_1 (optional) - T4: New averaged gradients

  • __group_825__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_825__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_826__new_weights (optional) - T2: New weights

  • __group_826__new_gradients (optional) - T3: New gradients

  • __group_826__new_moment_1 (optional) - T4: New averaged gradients

  • __group_826__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_826__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_827__new_weights (optional) - T2: New weights

  • __group_827__new_gradients (optional) - T3: New gradients

  • __group_827__new_moment_1 (optional) - T4: New averaged gradients

  • __group_827__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_827__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_828__new_weights (optional) - T2: New weights

  • __group_828__new_gradients (optional) - T3: New gradients

  • __group_828__new_moment_1 (optional) - T4: New averaged gradients

  • __group_828__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_828__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_829__new_weights (optional) - T2: New weights

  • __group_829__new_gradients (optional) - T3: New gradients

  • __group_829__new_moment_1 (optional) - T4: New averaged gradients

  • __group_829__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_829__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_830__new_weights (optional) - T2: New weights

  • __group_830__new_gradients (optional) - T3: New gradients

  • __group_830__new_moment_1 (optional) - T4: New averaged gradients

  • __group_830__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_830__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_831__new_weights (optional) - T2: New weights

  • __group_831__new_gradients (optional) - T3: New gradients

  • __group_831__new_moment_1 (optional) - T4: New averaged gradients

  • __group_831__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_831__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_832__new_weights (optional) - T2: New weights

  • __group_832__new_gradients (optional) - T3: New gradients

  • __group_832__new_moment_1 (optional) - T4: New averaged gradients

  • __group_832__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_832__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_833__new_weights (optional) - T2: New weights

  • __group_833__new_gradients (optional) - T3: New gradients

  • __group_833__new_moment_1 (optional) - T4: New averaged gradients

  • __group_833__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_833__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_834__new_weights (optional) - T2: New weights

  • __group_834__new_gradients (optional) - T3: New gradients

  • __group_834__new_moment_1 (optional) - T4: New averaged gradients

  • __group_834__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_834__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_835__new_weights (optional) - T2: New weights

  • __group_835__new_gradients (optional) - T3: New gradients

  • __group_835__new_moment_1 (optional) - T4: New averaged gradients

  • __group_835__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_835__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_836__new_weights (optional) - T2: New weights

  • __group_836__new_gradients (optional) - T3: New gradients

  • __group_836__new_moment_1 (optional) - T4: New averaged gradients

  • __group_836__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_836__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_837__new_weights (optional) - T2: New weights

  • __group_837__new_gradients (optional) - T3: New gradients

  • __group_837__new_moment_1 (optional) - T4: New averaged gradients

  • __group_837__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_837__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_838__new_weights (optional) - T2: New weights

  • __group_838__new_gradients (optional) - T3: New gradients

  • __group_838__new_moment_1 (optional) - T4: New averaged gradients

  • __group_838__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_838__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_839__new_weights (optional) - T2: New weights

  • __group_839__new_gradients (optional) - T3: New gradients

  • __group_839__new_moment_1 (optional) - T4: New averaged gradients

  • __group_839__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_839__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_840__new_weights (optional) - T2: New weights

  • __group_840__new_gradients (optional) - T3: New gradients

  • __group_840__new_moment_1 (optional) - T4: New averaged gradients

  • __group_840__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_840__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_841__new_weights (optional) - T2: New weights

  • __group_841__new_gradients (optional) - T3: New gradients

  • __group_841__new_moment_1 (optional) - T4: New averaged gradients

  • __group_841__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_841__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_842__new_weights (optional) - T2: New weights

  • __group_842__new_gradients (optional) - T3: New gradients

  • __group_842__new_moment_1 (optional) - T4: New averaged gradients

  • __group_842__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_842__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_843__new_weights (optional) - T2: New weights

  • __group_843__new_gradients (optional) - T3: New gradients

  • __group_843__new_moment_1 (optional) - T4: New averaged gradients

  • __group_843__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_843__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_844__new_weights (optional) - T2: New weights

  • __group_844__new_gradients (optional) - T3: New gradients

  • __group_844__new_moment_1 (optional) - T4: New averaged gradients

  • __group_844__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_844__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_845__new_weights (optional) - T2: New weights

  • __group_845__new_gradients (optional) - T3: New gradients

  • __group_845__new_moment_1 (optional) - T4: New averaged gradients

  • __group_845__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_845__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_846__new_weights (optional) - T2: New weights

  • __group_846__new_gradients (optional) - T3: New gradients

  • __group_846__new_moment_1 (optional) - T4: New averaged gradients

  • __group_846__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_846__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_847__new_weights (optional) - T2: New weights

  • __group_847__new_gradients (optional) - T3: New gradients

  • __group_847__new_moment_1 (optional) - T4: New averaged gradients

  • __group_847__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_847__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_848__new_weights (optional) - T2: New weights

  • __group_848__new_gradients (optional) - T3: New gradients

  • __group_848__new_moment_1 (optional) - T4: New averaged gradients

  • __group_848__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_848__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_849__new_weights (optional) - T2: New weights

  • __group_849__new_gradients (optional) - T3: New gradients

  • __group_849__new_moment_1 (optional) - T4: New averaged gradients

  • __group_849__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_849__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_850__new_weights (optional) - T2: New weights

  • __group_850__new_gradients (optional) - T3: New gradients

  • __group_850__new_moment_1 (optional) - T4: New averaged gradients

  • __group_850__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_850__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_851__new_weights (optional) - T2: New weights

  • __group_851__new_gradients (optional) - T3: New gradients

  • __group_851__new_moment_1 (optional) - T4: New averaged gradients

  • __group_851__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_851__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_852__new_weights (optional) - T2: New weights

  • __group_852__new_gradients (optional) - T3: New gradients

  • __group_852__new_moment_1 (optional) - T4: New averaged gradients

  • __group_852__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_852__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_853__new_weights (optional) - T2: New weights

  • __group_853__new_gradients (optional) - T3: New gradients

  • __group_853__new_moment_1 (optional) - T4: New averaged gradients

  • __group_853__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_853__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_854__new_weights (optional) - T2: New weights

  • __group_854__new_gradients (optional) - T3: New gradients

  • __group_854__new_moment_1 (optional) - T4: New averaged gradients

  • __group_854__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_854__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_855__new_weights (optional) - T2: New weights

  • __group_855__new_gradients (optional) - T3: New gradients

  • __group_855__new_moment_1 (optional) - T4: New averaged gradients

  • __group_855__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_855__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_856__new_weights (optional) - T2: New weights

  • __group_856__new_gradients (optional) - T3: New gradients

  • __group_856__new_moment_1 (optional) - T4: New averaged gradients

  • __group_856__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_856__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_857__new_weights (optional) - T2: New weights

  • __group_857__new_gradients (optional) - T3: New gradients

  • __group_857__new_moment_1 (optional) - T4: New averaged gradients

  • __group_857__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_857__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_858__new_weights (optional) - T2: New weights

  • __group_858__new_gradients (optional) - T3: New gradients

  • __group_858__new_moment_1 (optional) - T4: New averaged gradients

  • __group_858__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_858__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_859__new_weights (optional) - T2: New weights

  • __group_859__new_gradients (optional) - T3: New gradients

  • __group_859__new_moment_1 (optional) - T4: New averaged gradients

  • __group_859__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_859__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_860__new_weights (optional) - T2: New weights

  • __group_860__new_gradients (optional) - T3: New gradients

  • __group_860__new_moment_1 (optional) - T4: New averaged gradients

  • __group_860__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_860__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_861__new_weights (optional) - T2: New weights

  • __group_861__new_gradients (optional) - T3: New gradients

  • __group_861__new_moment_1 (optional) - T4: New averaged gradients

  • __group_861__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_861__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_862__new_weights (optional) - T2: New weights

  • __group_862__new_gradients (optional) - T3: New gradients

  • __group_862__new_moment_1 (optional) - T4: New averaged gradients

  • __group_862__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_862__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_863__new_weights (optional) - T2: New weights

  • __group_863__new_gradients (optional) - T3: New gradients

  • __group_863__new_moment_1 (optional) - T4: New averaged gradients

  • __group_863__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_863__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_864__new_weights (optional) - T2: New weights

  • __group_864__new_gradients (optional) - T3: New gradients

  • __group_864__new_moment_1 (optional) - T4: New averaged gradients

  • __group_864__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_864__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_865__new_weights (optional) - T2: New weights

  • __group_865__new_gradients (optional) - T3: New gradients

  • __group_865__new_moment_1 (optional) - T4: New averaged gradients

  • __group_865__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_865__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_866__new_weights (optional) - T2: New weights

  • __group_866__new_gradients (optional) - T3: New gradients

  • __group_866__new_moment_1 (optional) - T4: New averaged gradients

  • __group_866__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_866__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_867__new_weights (optional) - T2: New weights

  • __group_867__new_gradients (optional) - T3: New gradients

  • __group_867__new_moment_1 (optional) - T4: New averaged gradients

  • __group_867__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_867__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_868__new_weights (optional) - T2: New weights

  • __group_868__new_gradients (optional) - T3: New gradients

  • __group_868__new_moment_1 (optional) - T4: New averaged gradients

  • __group_868__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_868__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_869__new_weights (optional) - T2: New weights

  • __group_869__new_gradients (optional) - T3: New gradients

  • __group_869__new_moment_1 (optional) - T4: New averaged gradients

  • __group_869__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_869__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_870__new_weights (optional) - T2: New weights

  • __group_870__new_gradients (optional) - T3: New gradients

  • __group_870__new_moment_1 (optional) - T4: New averaged gradients

  • __group_870__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_870__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_871__new_weights (optional) - T2: New weights

  • __group_871__new_gradients (optional) - T3: New gradients

  • __group_871__new_moment_1 (optional) - T4: New averaged gradients

  • __group_871__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_871__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_872__new_weights (optional) - T2: New weights

  • __group_872__new_gradients (optional) - T3: New gradients

  • __group_872__new_moment_1 (optional) - T4: New averaged gradients

  • __group_872__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_872__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_873__new_weights (optional) - T2: New weights

  • __group_873__new_gradients (optional) - T3: New gradients

  • __group_873__new_moment_1 (optional) - T4: New averaged gradients

  • __group_873__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_873__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_874__new_weights (optional) - T2: New weights

  • __group_874__new_gradients (optional) - T3: New gradients

  • __group_874__new_moment_1 (optional) - T4: New averaged gradients

  • __group_874__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_874__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_875__new_weights (optional) - T2: New weights

  • __group_875__new_gradients (optional) - T3: New gradients

  • __group_875__new_moment_1 (optional) - T4: New averaged gradients

  • __group_875__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_875__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_876__new_weights (optional) - T2: New weights

  • __group_876__new_gradients (optional) - T3: New gradients

  • __group_876__new_moment_1 (optional) - T4: New averaged gradients

  • __group_876__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_876__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_877__new_weights (optional) - T2: New weights

  • __group_877__new_gradients (optional) - T3: New gradients

  • __group_877__new_moment_1 (optional) - T4: New averaged gradients

  • __group_877__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_877__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_878__new_weights (optional) - T2: New weights

  • __group_878__new_gradients (optional) - T3: New gradients

  • __group_878__new_moment_1 (optional) - T4: New averaged gradients

  • __group_878__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_878__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_879__new_weights (optional) - T2: New weights

  • __group_879__new_gradients (optional) - T3: New gradients

  • __group_879__new_moment_1 (optional) - T4: New averaged gradients

  • __group_879__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_879__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_880__new_weights (optional) - T2: New weights

  • __group_880__new_gradients (optional) - T3: New gradients

  • __group_880__new_moment_1 (optional) - T4: New averaged gradients

  • __group_880__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_880__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_881__new_weights (optional) - T2: New weights

  • __group_881__new_gradients (optional) - T3: New gradients

  • __group_881__new_moment_1 (optional) - T4: New averaged gradients

  • __group_881__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_881__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_882__new_weights (optional) - T2: New weights

  • __group_882__new_gradients (optional) - T3: New gradients

  • __group_882__new_moment_1 (optional) - T4: New averaged gradients

  • __group_882__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_882__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_883__new_weights (optional) - T2: New weights

  • __group_883__new_gradients (optional) - T3: New gradients

  • __group_883__new_moment_1 (optional) - T4: New averaged gradients

  • __group_883__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_883__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_884__new_weights (optional) - T2: New weights

  • __group_884__new_gradients (optional) - T3: New gradients

  • __group_884__new_moment_1 (optional) - T4: New averaged gradients

  • __group_884__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_884__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_885__new_weights (optional) - T2: New weights

  • __group_885__new_gradients (optional) - T3: New gradients

  • __group_885__new_moment_1 (optional) - T4: New averaged gradients

  • __group_885__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_885__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_886__new_weights (optional) - T2: New weights

  • __group_886__new_gradients (optional) - T3: New gradients

  • __group_886__new_moment_1 (optional) - T4: New averaged gradients

  • __group_886__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_886__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_887__new_weights (optional) - T2: New weights

  • __group_887__new_gradients (optional) - T3: New gradients

  • __group_887__new_moment_1 (optional) - T4: New averaged gradients

  • __group_887__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_887__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_888__new_weights (optional) - T2: New weights

  • __group_888__new_gradients (optional) - T3: New gradients

  • __group_888__new_moment_1 (optional) - T4: New averaged gradients

  • __group_888__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_888__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_889__new_weights (optional) - T2: New weights

  • __group_889__new_gradients (optional) - T3: New gradients

  • __group_889__new_moment_1 (optional) - T4: New averaged gradients

  • __group_889__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_889__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_890__new_weights (optional) - T2: New weights

  • __group_890__new_gradients (optional) - T3: New gradients

  • __group_890__new_moment_1 (optional) - T4: New averaged gradients

  • __group_890__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_890__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_891__new_weights (optional) - T2: New weights

  • __group_891__new_gradients (optional) - T3: New gradients

  • __group_891__new_moment_1 (optional) - T4: New averaged gradients

  • __group_891__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_891__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_892__new_weights (optional) - T2: New weights

  • __group_892__new_gradients (optional) - T3: New gradients

  • __group_892__new_moment_1 (optional) - T4: New averaged gradients

  • __group_892__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_892__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_893__new_weights (optional) - T2: New weights

  • __group_893__new_gradients (optional) - T3: New gradients

  • __group_893__new_moment_1 (optional) - T4: New averaged gradients

  • __group_893__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_893__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_894__new_weights (optional) - T2: New weights

  • __group_894__new_gradients (optional) - T3: New gradients

  • __group_894__new_moment_1 (optional) - T4: New averaged gradients

  • __group_894__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_894__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_895__new_weights (optional) - T2: New weights

  • __group_895__new_gradients (optional) - T3: New gradients

  • __group_895__new_moment_1 (optional) - T4: New averaged gradients

  • __group_895__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_895__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_896__new_weights (optional) - T2: New weights

  • __group_896__new_gradients (optional) - T3: New gradients

  • __group_896__new_moment_1 (optional) - T4: New averaged gradients

  • __group_896__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_896__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_897__new_weights (optional) - T2: New weights

  • __group_897__new_gradients (optional) - T3: New gradients

  • __group_897__new_moment_1 (optional) - T4: New averaged gradients

  • __group_897__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_897__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_898__new_weights (optional) - T2: New weights

  • __group_898__new_gradients (optional) - T3: New gradients

  • __group_898__new_moment_1 (optional) - T4: New averaged gradients

  • __group_898__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_898__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_899__new_weights (optional) - T2: New weights

  • __group_899__new_gradients (optional) - T3: New gradients

  • __group_899__new_moment_1 (optional) - T4: New averaged gradients

  • __group_899__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_899__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_900__new_weights (optional) - T2: New weights

  • __group_900__new_gradients (optional) - T3: New gradients

  • __group_900__new_moment_1 (optional) - T4: New averaged gradients

  • __group_900__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_900__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_901__new_weights (optional) - T2: New weights

  • __group_901__new_gradients (optional) - T3: New gradients

  • __group_901__new_moment_1 (optional) - T4: New averaged gradients

  • __group_901__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_901__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_902__new_weights (optional) - T2: New weights

  • __group_902__new_gradients (optional) - T3: New gradients

  • __group_902__new_moment_1 (optional) - T4: New averaged gradients

  • __group_902__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_902__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_903__new_weights (optional) - T2: New weights

  • __group_903__new_gradients (optional) - T3: New gradients

  • __group_903__new_moment_1 (optional) - T4: New averaged gradients

  • __group_903__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_903__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_904__new_weights (optional) - T2: New weights

  • __group_904__new_gradients (optional) - T3: New gradients

  • __group_904__new_moment_1 (optional) - T4: New averaged gradients

  • __group_904__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_904__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_905__new_weights (optional) - T2: New weights

  • __group_905__new_gradients (optional) - T3: New gradients

  • __group_905__new_moment_1 (optional) - T4: New averaged gradients

  • __group_905__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_905__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_906__new_weights (optional) - T2: New weights

  • __group_906__new_gradients (optional) - T3: New gradients

  • __group_906__new_moment_1 (optional) - T4: New averaged gradients

  • __group_906__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_906__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_907__new_weights (optional) - T2: New weights

  • __group_907__new_gradients (optional) - T3: New gradients

  • __group_907__new_moment_1 (optional) - T4: New averaged gradients

  • __group_907__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_907__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_908__new_weights (optional) - T2: New weights

  • __group_908__new_gradients (optional) - T3: New gradients

  • __group_908__new_moment_1 (optional) - T4: New averaged gradients

  • __group_908__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_908__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_909__new_weights (optional) - T2: New weights

  • __group_909__new_gradients (optional) - T3: New gradients

  • __group_909__new_moment_1 (optional) - T4: New averaged gradients

  • __group_909__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_909__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_910__new_weights (optional) - T2: New weights

  • __group_910__new_gradients (optional) - T3: New gradients

  • __group_910__new_moment_1 (optional) - T4: New averaged gradients

  • __group_910__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_910__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_911__new_weights (optional) - T2: New weights

  • __group_911__new_gradients (optional) - T3: New gradients

  • __group_911__new_moment_1 (optional) - T4: New averaged gradients

  • __group_911__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_911__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_912__new_weights (optional) - T2: New weights

  • __group_912__new_gradients (optional) - T3: New gradients

  • __group_912__new_moment_1 (optional) - T4: New averaged gradients

  • __group_912__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_912__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_913__new_weights (optional) - T2: New weights

  • __group_913__new_gradients (optional) - T3: New gradients

  • __group_913__new_moment_1 (optional) - T4: New averaged gradients

  • __group_913__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_913__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_914__new_weights (optional) - T2: New weights

  • __group_914__new_gradients (optional) - T3: New gradients

  • __group_914__new_moment_1 (optional) - T4: New averaged gradients

  • __group_914__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_914__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_915__new_weights (optional) - T2: New weights

  • __group_915__new_gradients (optional) - T3: New gradients

  • __group_915__new_moment_1 (optional) - T4: New averaged gradients

  • __group_915__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_915__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_916__new_weights (optional) - T2: New weights

  • __group_916__new_gradients (optional) - T3: New gradients

  • __group_916__new_moment_1 (optional) - T4: New averaged gradients

  • __group_916__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_916__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_917__new_weights (optional) - T2: New weights

  • __group_917__new_gradients (optional) - T3: New gradients

  • __group_917__new_moment_1 (optional) - T4: New averaged gradients

  • __group_917__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_917__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_918__new_weights (optional) - T2: New weights

  • __group_918__new_gradients (optional) - T3: New gradients

  • __group_918__new_moment_1 (optional) - T4: New averaged gradients

  • __group_918__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_918__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_919__new_weights (optional) - T2: New weights

  • __group_919__new_gradients (optional) - T3: New gradients

  • __group_919__new_moment_1 (optional) - T4: New averaged gradients

  • __group_919__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_919__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_920__new_weights (optional) - T2: New weights

  • __group_920__new_gradients (optional) - T3: New gradients

  • __group_920__new_moment_1 (optional) - T4: New averaged gradients

  • __group_920__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_920__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_921__new_weights (optional) - T2: New weights

  • __group_921__new_gradients (optional) - T3: New gradients

  • __group_921__new_moment_1 (optional) - T4: New averaged gradients

  • __group_921__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_921__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_922__new_weights (optional) - T2: New weights

  • __group_922__new_gradients (optional) - T3: New gradients

  • __group_922__new_moment_1 (optional) - T4: New averaged gradients

  • __group_922__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_922__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_923__new_weights (optional) - T2: New weights

  • __group_923__new_gradients (optional) - T3: New gradients

  • __group_923__new_moment_1 (optional) - T4: New averaged gradients

  • __group_923__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_923__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_924__new_weights (optional) - T2: New weights

  • __group_924__new_gradients (optional) - T3: New gradients

  • __group_924__new_moment_1 (optional) - T4: New averaged gradients

  • __group_924__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_924__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_925__new_weights (optional) - T2: New weights

  • __group_925__new_gradients (optional) - T3: New gradients

  • __group_925__new_moment_1 (optional) - T4: New averaged gradients

  • __group_925__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_925__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_926__new_weights (optional) - T2: New weights

  • __group_926__new_gradients (optional) - T3: New gradients

  • __group_926__new_moment_1 (optional) - T4: New averaged gradients

  • __group_926__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_926__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_927__new_weights (optional) - T2: New weights

  • __group_927__new_gradients (optional) - T3: New gradients

  • __group_927__new_moment_1 (optional) - T4: New averaged gradients

  • __group_927__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_927__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_928__new_weights (optional) - T2: New weights

  • __group_928__new_gradients (optional) - T3: New gradients

  • __group_928__new_moment_1 (optional) - T4: New averaged gradients

  • __group_928__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_928__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_929__new_weights (optional) - T2: New weights

  • __group_929__new_gradients (optional) - T3: New gradients

  • __group_929__new_moment_1 (optional) - T4: New averaged gradients

  • __group_929__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_929__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_930__new_weights (optional) - T2: New weights

  • __group_930__new_gradients (optional) - T3: New gradients

  • __group_930__new_moment_1 (optional) - T4: New averaged gradients

  • __group_930__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_930__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_931__new_weights (optional) - T2: New weights

  • __group_931__new_gradients (optional) - T3: New gradients

  • __group_931__new_moment_1 (optional) - T4: New averaged gradients

  • __group_931__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_931__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_932__new_weights (optional) - T2: New weights

  • __group_932__new_gradients (optional) - T3: New gradients

  • __group_932__new_moment_1 (optional) - T4: New averaged gradients

  • __group_932__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_932__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_933__new_weights (optional) - T2: New weights

  • __group_933__new_gradients (optional) - T3: New gradients

  • __group_933__new_moment_1 (optional) - T4: New averaged gradients

  • __group_933__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_933__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_934__new_weights (optional) - T2: New weights

  • __group_934__new_gradients (optional) - T3: New gradients

  • __group_934__new_moment_1 (optional) - T4: New averaged gradients

  • __group_934__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_934__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_935__new_weights (optional) - T2: New weights

  • __group_935__new_gradients (optional) - T3: New gradients

  • __group_935__new_moment_1 (optional) - T4: New averaged gradients

  • __group_935__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_935__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_936__new_weights (optional) - T2: New weights

  • __group_936__new_gradients (optional) - T3: New gradients

  • __group_936__new_moment_1 (optional) - T4: New averaged gradients

  • __group_936__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_936__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_937__new_weights (optional) - T2: New weights

  • __group_937__new_gradients (optional) - T3: New gradients

  • __group_937__new_moment_1 (optional) - T4: New averaged gradients

  • __group_937__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_937__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_938__new_weights (optional) - T2: New weights

  • __group_938__new_gradients (optional) - T3: New gradients

  • __group_938__new_moment_1 (optional) - T4: New averaged gradients

  • __group_938__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_938__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_939__new_weights (optional) - T2: New weights

  • __group_939__new_gradients (optional) - T3: New gradients

  • __group_939__new_moment_1 (optional) - T4: New averaged gradients

  • __group_939__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_939__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_940__new_weights (optional) - T2: New weights

  • __group_940__new_gradients (optional) - T3: New gradients

  • __group_940__new_moment_1 (optional) - T4: New averaged gradients

  • __group_940__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_940__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_941__new_weights (optional) - T2: New weights

  • __group_941__new_gradients (optional) - T3: New gradients

  • __group_941__new_moment_1 (optional) - T4: New averaged gradients

  • __group_941__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_941__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_942__new_weights (optional) - T2: New weights

  • __group_942__new_gradients (optional) - T3: New gradients

  • __group_942__new_moment_1 (optional) - T4: New averaged gradients

  • __group_942__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_942__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_943__new_weights (optional) - T2: New weights

  • __group_943__new_gradients (optional) - T3: New gradients

  • __group_943__new_moment_1 (optional) - T4: New averaged gradients

  • __group_943__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_943__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_944__new_weights (optional) - T2: New weights

  • __group_944__new_gradients (optional) - T3: New gradients

  • __group_944__new_moment_1 (optional) - T4: New averaged gradients

  • __group_944__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_944__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_945__new_weights (optional) - T2: New weights

  • __group_945__new_gradients (optional) - T3: New gradients

  • __group_945__new_moment_1 (optional) - T4: New averaged gradients

  • __group_945__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_945__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_946__new_weights (optional) - T2: New weights

  • __group_946__new_gradients (optional) - T3: New gradients

  • __group_946__new_moment_1 (optional) - T4: New averaged gradients

  • __group_946__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_946__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_947__new_weights (optional) - T2: New weights

  • __group_947__new_gradients (optional) - T3: New gradients

  • __group_947__new_moment_1 (optional) - T4: New averaged gradients

  • __group_947__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_947__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_948__new_weights (optional) - T2: New weights

  • __group_948__new_gradients (optional) - T3: New gradients

  • __group_948__new_moment_1 (optional) - T4: New averaged gradients

  • __group_948__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_948__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_949__new_weights (optional) - T2: New weights

  • __group_949__new_gradients (optional) - T3: New gradients

  • __group_949__new_moment_1 (optional) - T4: New averaged gradients

  • __group_949__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_949__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_950__new_weights (optional) - T2: New weights

  • __group_950__new_gradients (optional) - T3: New gradients

  • __group_950__new_moment_1 (optional) - T4: New averaged gradients

  • __group_950__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_950__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_951__new_weights (optional) - T2: New weights

  • __group_951__new_gradients (optional) - T3: New gradients

  • __group_951__new_moment_1 (optional) - T4: New averaged gradients

  • __group_951__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_951__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_952__new_weights (optional) - T2: New weights

  • __group_952__new_gradients (optional) - T3: New gradients

  • __group_952__new_moment_1 (optional) - T4: New averaged gradients

  • __group_952__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_952__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_953__new_weights (optional) - T2: New weights

  • __group_953__new_gradients (optional) - T3: New gradients

  • __group_953__new_moment_1 (optional) - T4: New averaged gradients

  • __group_953__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_953__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_954__new_weights (optional) - T2: New weights

  • __group_954__new_gradients (optional) - T3: New gradients

  • __group_954__new_moment_1 (optional) - T4: New averaged gradients

  • __group_954__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_954__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_955__new_weights (optional) - T2: New weights

  • __group_955__new_gradients (optional) - T3: New gradients

  • __group_955__new_moment_1 (optional) - T4: New averaged gradients

  • __group_955__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_955__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_956__new_weights (optional) - T2: New weights

  • __group_956__new_gradients (optional) - T3: New gradients

  • __group_956__new_moment_1 (optional) - T4: New averaged gradients

  • __group_956__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_956__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_957__new_weights (optional) - T2: New weights

  • __group_957__new_gradients (optional) - T3: New gradients

  • __group_957__new_moment_1 (optional) - T4: New averaged gradients

  • __group_957__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_957__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_958__new_weights (optional) - T2: New weights

  • __group_958__new_gradients (optional) - T3: New gradients

  • __group_958__new_moment_1 (optional) - T4: New averaged gradients

  • __group_958__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_958__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_959__new_weights (optional) - T2: New weights

  • __group_959__new_gradients (optional) - T3: New gradients

  • __group_959__new_moment_1 (optional) - T4: New averaged gradients

  • __group_959__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_959__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_960__new_weights (optional) - T2: New weights

  • __group_960__new_gradients (optional) - T3: New gradients

  • __group_960__new_moment_1 (optional) - T4: New averaged gradients

  • __group_960__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_960__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_961__new_weights (optional) - T2: New weights

  • __group_961__new_gradients (optional) - T3: New gradients

  • __group_961__new_moment_1 (optional) - T4: New averaged gradients

  • __group_961__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_961__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_962__new_weights (optional) - T2: New weights

  • __group_962__new_gradients (optional) - T3: New gradients

  • __group_962__new_moment_1 (optional) - T4: New averaged gradients

  • __group_962__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_962__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_963__new_weights (optional) - T2: New weights

  • __group_963__new_gradients (optional) - T3: New gradients

  • __group_963__new_moment_1 (optional) - T4: New averaged gradients

  • __group_963__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_963__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_964__new_weights (optional) - T2: New weights

  • __group_964__new_gradients (optional) - T3: New gradients

  • __group_964__new_moment_1 (optional) - T4: New averaged gradients

  • __group_964__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_964__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_965__new_weights (optional) - T2: New weights

  • __group_965__new_gradients (optional) - T3: New gradients

  • __group_965__new_moment_1 (optional) - T4: New averaged gradients

  • __group_965__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_965__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_966__new_weights (optional) - T2: New weights

  • __group_966__new_gradients (optional) - T3: New gradients

  • __group_966__new_moment_1 (optional) - T4: New averaged gradients

  • __group_966__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_966__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_967__new_weights (optional) - T2: New weights

  • __group_967__new_gradients (optional) - T3: New gradients

  • __group_967__new_moment_1 (optional) - T4: New averaged gradients

  • __group_967__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_967__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_968__new_weights (optional) - T2: New weights

  • __group_968__new_gradients (optional) - T3: New gradients

  • __group_968__new_moment_1 (optional) - T4: New averaged gradients

  • __group_968__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_968__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_969__new_weights (optional) - T2: New weights

  • __group_969__new_gradients (optional) - T3: New gradients

  • __group_969__new_moment_1 (optional) - T4: New averaged gradients

  • __group_969__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_969__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_970__new_weights (optional) - T2: New weights

  • __group_970__new_gradients (optional) - T3: New gradients

  • __group_970__new_moment_1 (optional) - T4: New averaged gradients

  • __group_970__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_970__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_971__new_weights (optional) - T2: New weights

  • __group_971__new_gradients (optional) - T3: New gradients

  • __group_971__new_moment_1 (optional) - T4: New averaged gradients

  • __group_971__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_971__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_972__new_weights (optional) - T2: New weights

  • __group_972__new_gradients (optional) - T3: New gradients

  • __group_972__new_moment_1 (optional) - T4: New averaged gradients

  • __group_972__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_972__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_973__new_weights (optional) - T2: New weights

  • __group_973__new_gradients (optional) - T3: New gradients

  • __group_973__new_moment_1 (optional) - T4: New averaged gradients

  • __group_973__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_973__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_974__new_weights (optional) - T2: New weights

  • __group_974__new_gradients (optional) - T3: New gradients

  • __group_974__new_moment_1 (optional) - T4: New averaged gradients

  • __group_974__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_974__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_975__new_weights (optional) - T2: New weights

  • __group_975__new_gradients (optional) - T3: New gradients

  • __group_975__new_moment_1 (optional) - T4: New averaged gradients

  • __group_975__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_975__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_976__new_weights (optional) - T2: New weights

  • __group_976__new_gradients (optional) - T3: New gradients

  • __group_976__new_moment_1 (optional) - T4: New averaged gradients

  • __group_976__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_976__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_977__new_weights (optional) - T2: New weights

  • __group_977__new_gradients (optional) - T3: New gradients

  • __group_977__new_moment_1 (optional) - T4: New averaged gradients

  • __group_977__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_977__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_978__new_weights (optional) - T2: New weights

  • __group_978__new_gradients (optional) - T3: New gradients

  • __group_978__new_moment_1 (optional) - T4: New averaged gradients

  • __group_978__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_978__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_979__new_weights (optional) - T2: New weights

  • __group_979__new_gradients (optional) - T3: New gradients

  • __group_979__new_moment_1 (optional) - T4: New averaged gradients

  • __group_979__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_979__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_980__new_weights (optional) - T2: New weights

  • __group_980__new_gradients (optional) - T3: New gradients

  • __group_980__new_moment_1 (optional) - T4: New averaged gradients

  • __group_980__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_980__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_981__new_weights (optional) - T2: New weights

  • __group_981__new_gradients (optional) - T3: New gradients

  • __group_981__new_moment_1 (optional) - T4: New averaged gradients

  • __group_981__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_981__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_982__new_weights (optional) - T2: New weights

  • __group_982__new_gradients (optional) - T3: New gradients

  • __group_982__new_moment_1 (optional) - T4: New averaged gradients

  • __group_982__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_982__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_983__new_weights (optional) - T2: New weights

  • __group_983__new_gradients (optional) - T3: New gradients

  • __group_983__new_moment_1 (optional) - T4: New averaged gradients

  • __group_983__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_983__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_984__new_weights (optional) - T2: New weights

  • __group_984__new_gradients (optional) - T3: New gradients

  • __group_984__new_moment_1 (optional) - T4: New averaged gradients

  • __group_984__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_984__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_985__new_weights (optional) - T2: New weights

  • __group_985__new_gradients (optional) - T3: New gradients

  • __group_985__new_moment_1 (optional) - T4: New averaged gradients

  • __group_985__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_985__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_986__new_weights (optional) - T2: New weights

  • __group_986__new_gradients (optional) - T3: New gradients

  • __group_986__new_moment_1 (optional) - T4: New averaged gradients

  • __group_986__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_986__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_987__new_weights (optional) - T2: New weights

  • __group_987__new_gradients (optional) - T3: New gradients

  • __group_987__new_moment_1 (optional) - T4: New averaged gradients

  • __group_987__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_987__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_988__new_weights (optional) - T2: New weights

  • __group_988__new_gradients (optional) - T3: New gradients

  • __group_988__new_moment_1 (optional) - T4: New averaged gradients

  • __group_988__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_988__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_989__new_weights (optional) - T2: New weights

  • __group_989__new_gradients (optional) - T3: New gradients

  • __group_989__new_moment_1 (optional) - T4: New averaged gradients

  • __group_989__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_989__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_990__new_weights (optional) - T2: New weights

  • __group_990__new_gradients (optional) - T3: New gradients

  • __group_990__new_moment_1 (optional) - T4: New averaged gradients

  • __group_990__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_990__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_991__new_weights (optional) - T2: New weights

  • __group_991__new_gradients (optional) - T3: New gradients

  • __group_991__new_moment_1 (optional) - T4: New averaged gradients

  • __group_991__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_991__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_992__new_weights (optional) - T2: New weights

  • __group_992__new_gradients (optional) - T3: New gradients

  • __group_992__new_moment_1 (optional) - T4: New averaged gradients

  • __group_992__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_992__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_993__new_weights (optional) - T2: New weights

  • __group_993__new_gradients (optional) - T3: New gradients

  • __group_993__new_moment_1 (optional) - T4: New averaged gradients

  • __group_993__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_993__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_994__new_weights (optional) - T2: New weights

  • __group_994__new_gradients (optional) - T3: New gradients

  • __group_994__new_moment_1 (optional) - T4: New averaged gradients

  • __group_994__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_994__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_995__new_weights (optional) - T2: New weights

  • __group_995__new_gradients (optional) - T3: New gradients

  • __group_995__new_moment_1 (optional) - T4: New averaged gradients

  • __group_995__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_995__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_996__new_weights (optional) - T2: New weights

  • __group_996__new_gradients (optional) - T3: New gradients

  • __group_996__new_moment_1 (optional) - T4: New averaged gradients

  • __group_996__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_996__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_997__new_weights (optional) - T2: New weights

  • __group_997__new_gradients (optional) - T3: New gradients

  • __group_997__new_moment_1 (optional) - T4: New averaged gradients

  • __group_997__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_997__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_998__new_weights (optional) - T2: New weights

  • __group_998__new_gradients (optional) - T3: New gradients

  • __group_998__new_moment_1 (optional) - T4: New averaged gradients

  • __group_998__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_998__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_999__new_weights (optional) - T2: New weights

  • __group_999__new_gradients (optional) - T3: New gradients

  • __group_999__new_moment_1 (optional) - T4: New averaged gradients

  • __group_999__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_999__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1000__new_weights (optional) - T2: New weights

  • __group_1000__new_gradients (optional) - T3: New gradients

  • __group_1000__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1000__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1000__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1001__new_weights (optional) - T2: New weights

  • __group_1001__new_gradients (optional) - T3: New gradients

  • __group_1001__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1001__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1001__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1002__new_weights (optional) - T2: New weights

  • __group_1002__new_gradients (optional) - T3: New gradients

  • __group_1002__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1002__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1002__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1003__new_weights (optional) - T2: New weights

  • __group_1003__new_gradients (optional) - T3: New gradients

  • __group_1003__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1003__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1003__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1004__new_weights (optional) - T2: New weights

  • __group_1004__new_gradients (optional) - T3: New gradients

  • __group_1004__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1004__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1004__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1005__new_weights (optional) - T2: New weights

  • __group_1005__new_gradients (optional) - T3: New gradients

  • __group_1005__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1005__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1005__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1006__new_weights (optional) - T2: New weights

  • __group_1006__new_gradients (optional) - T3: New gradients

  • __group_1006__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1006__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1006__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1007__new_weights (optional) - T2: New weights

  • __group_1007__new_gradients (optional) - T3: New gradients

  • __group_1007__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1007__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1007__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1008__new_weights (optional) - T2: New weights

  • __group_1008__new_gradients (optional) - T3: New gradients

  • __group_1008__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1008__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1008__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1009__new_weights (optional) - T2: New weights

  • __group_1009__new_gradients (optional) - T3: New gradients

  • __group_1009__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1009__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1009__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1010__new_weights (optional) - T2: New weights

  • __group_1010__new_gradients (optional) - T3: New gradients

  • __group_1010__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1010__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1010__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1011__new_weights (optional) - T2: New weights

  • __group_1011__new_gradients (optional) - T3: New gradients

  • __group_1011__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1011__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1011__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1012__new_weights (optional) - T2: New weights

  • __group_1012__new_gradients (optional) - T3: New gradients

  • __group_1012__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1012__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1012__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1013__new_weights (optional) - T2: New weights

  • __group_1013__new_gradients (optional) - T3: New gradients

  • __group_1013__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1013__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1013__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1014__new_weights (optional) - T2: New weights

  • __group_1014__new_gradients (optional) - T3: New gradients

  • __group_1014__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1014__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1014__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1015__new_weights (optional) - T2: New weights

  • __group_1015__new_gradients (optional) - T3: New gradients

  • __group_1015__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1015__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1015__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1016__new_weights (optional) - T2: New weights

  • __group_1016__new_gradients (optional) - T3: New gradients

  • __group_1016__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1016__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1016__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1017__new_weights (optional) - T2: New weights

  • __group_1017__new_gradients (optional) - T3: New gradients

  • __group_1017__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1017__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1017__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1018__new_weights (optional) - T2: New weights

  • __group_1018__new_gradients (optional) - T3: New gradients

  • __group_1018__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1018__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1018__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1019__new_weights (optional) - T2: New weights

  • __group_1019__new_gradients (optional) - T3: New gradients

  • __group_1019__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1019__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1019__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1020__new_weights (optional) - T2: New weights

  • __group_1020__new_gradients (optional) - T3: New gradients

  • __group_1020__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1020__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1020__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1021__new_weights (optional) - T2: New weights

  • __group_1021__new_gradients (optional) - T3: New gradients

  • __group_1021__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1021__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1021__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1022__new_weights (optional) - T2: New weights

  • __group_1022__new_gradients (optional) - T3: New gradients

  • __group_1022__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1022__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1022__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

  • __group_1023__new_weights (optional) - T2: New weights

  • __group_1023__new_gradients (optional) - T3: New gradients

  • __group_1023__new_moment_1 (optional) - T4: New averaged gradients

  • __group_1023__new_moment_2 (optional) - T4: New averaged squared gradients

  • __group_1023__new_mixed_precision_weights (optional) - T_MIXED_PRECISION_FP: New FP16 or BF16 weights

OnnxComMicrosoftLayerNormalizationGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftLayerNormalizationGrad(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

LayerNormalizationGrad

Attributes

  • axis: The first normalization dimension: normalization will be performed along dimensions axis : rank(inputs). Default value is ?.

Inputs

  • Y_grad (heterogeneous) - V: The gradient tensor from output.

  • X (heterogeneous) - T: Input data tensor from the forward path

  • scale (heterogeneous) - V: Scale tensor.

  • mean (heterogeneous) - U: mean of X.

  • inv_std_dev (heterogeneous) - U: inverse std deviation of X.

Outputs

  • X_grad (heterogeneous) - T: Gradient of the input.

  • scale_grad (heterogeneous) - V: Gradient of the scale.

  • bias_grad (heterogeneous) - V: Gradient of the bias.

OnnxComMicrosoftLayerNormalizationGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftLayerNormalizationGrad_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

LayerNormalizationGrad

Attributes

  • axis: The first normalization dimension: normalization will be performed along dimensions axis : rank(inputs). Default value is ?.

Inputs

  • Y_grad (heterogeneous) - V: The gradient tensor from output.

  • X (heterogeneous) - T: Input data tensor from the forward path

  • scale (heterogeneous) - V: Scale tensor.

  • mean (heterogeneous) - U: mean of X.

  • inv_std_dev (heterogeneous) - U: inverse std deviation of X.

Outputs

  • X_grad (heterogeneous) - T: Gradient of the input.

  • scale_grad (heterogeneous) - V: Gradient of the scale.

  • bias_grad (heterogeneous) - V: Gradient of the bias.

OnnxComMicrosoftLogSoftmaxGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftLogSoftmaxGrad(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size Default value is ?.

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • X (heterogeneous) - T: Input tensor

Outputs

  • dX (heterogeneous) - T: Gradient of input X

OnnxComMicrosoftLogSoftmaxGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftLogSoftmaxGrad_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size Default value is ?.

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • X (heterogeneous) - T: Input tensor

Outputs

  • dX (heterogeneous) - T: Gradient of input X

OnnxComMicrosoftLogSoftmaxGrad_13#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftLogSoftmaxGrad_13(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • axis: Describes the dimension LogSoftmax will be performed on.Defaults to -1. Negative value means counting dimensions from the back. Default value is ?.

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • X (heterogeneous) - T: Input tensor

Outputs

  • dX (heterogeneous) - T: Gradient of input X

OnnxComMicrosoftLogSoftmaxGrad_13_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftLogSoftmaxGrad_13_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • axis: Describes the dimension LogSoftmax will be performed on.Defaults to -1. Negative value means counting dimensions from the back. Default value is ?.

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • X (heterogeneous) - T: Input tensor

Outputs

  • dX (heterogeneous) - T: Gradient of input X

OnnxComMicrosoftLongformerAttention#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftLongformerAttention(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Longformer Self Attention with a local context and a global context. Tokens attend locally: Each token attends to its W previous tokens and W succeding tokens with W being the window length. A selected few tokens attend globally to all other tokens.

The attention mask is of shape (batch_size, sequence_length), where sequence_length is a multiple of 2W after padding. Mask value < 0 (like -10000.0) means the token is masked, 0 otherwise.

Global attention flags have value 1 for the tokens attend globally and 0 otherwise.

Attributes

  • num_heads (required): Number of attention heads Default value is ?.

  • window (required): One sided attention windows length W, or half of total window length Default value is ?.

Inputs

  • input (heterogeneous) - T: 3D input tensor with shape (batch_size, sequence_length, hidden_size), hidden_size = num_heads * head_size

  • weight (heterogeneous) - T: 2D input tensor with shape (hidden_size, 3 * hidden_size)

  • bias (heterogeneous) - T: 1D input tensor with shape (3 * hidden_size)

  • mask (heterogeneous) - T: Attention mask with shape (batch_size, sequence_length)

  • global_weight (heterogeneous) - T: 2D input tensor with shape (hidden_size, 3 * hidden_size)

  • global_bias (heterogeneous) - T: 1D input tensor with shape (3 * hidden_size)

  • global (heterogeneous) - G: Global attention flags with shape (batch_size, sequence_length)

Outputs

  • output (heterogeneous) - T: 3D output tensor with shape (batch_size, sequence_length, hidden_size)

OnnxComMicrosoftLongformerAttention_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftLongformerAttention_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Longformer Self Attention with a local context and a global context. Tokens attend locally: Each token attends to its W previous tokens and W succeding tokens with W being the window length. A selected few tokens attend globally to all other tokens.

The attention mask is of shape (batch_size, sequence_length), where sequence_length is a multiple of 2W after padding. Mask value < 0 (like -10000.0) means the token is masked, 0 otherwise.

Global attention flags have value 1 for the tokens attend globally and 0 otherwise.

Attributes

  • num_heads (required): Number of attention heads Default value is ?.

  • window (required): One sided attention windows length W, or half of total window length Default value is ?.

Inputs

  • input (heterogeneous) - T: 3D input tensor with shape (batch_size, sequence_length, hidden_size), hidden_size = num_heads * head_size

  • weight (heterogeneous) - T: 2D input tensor with shape (hidden_size, 3 * hidden_size)

  • bias (heterogeneous) - T: 1D input tensor with shape (3 * hidden_size)

  • mask (heterogeneous) - T: Attention mask with shape (batch_size, sequence_length)

  • global_weight (heterogeneous) - T: 2D input tensor with shape (hidden_size, 3 * hidden_size)

  • global_bias (heterogeneous) - T: 1D input tensor with shape (3 * hidden_size)

  • global (heterogeneous) - G: Global attention flags with shape (batch_size, sequence_length)

Outputs

  • output (heterogeneous) - T: 3D output tensor with shape (batch_size, sequence_length, hidden_size)

OnnxComMicrosoftMatMulInteger16#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMatMulInteger16(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html.

The production MUST never overflow. The accumulation may overflow if and only if in 32 bits.

Inputs

  • A (heterogeneous) - T1: N-dimensional matrix A

  • B (heterogeneous) - T2: N-dimensional matrix B

Outputs

  • Y (heterogeneous) - T3: Matrix multiply results from A * B

OnnxComMicrosoftMatMulInteger16_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMatMulInteger16_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html.

The production MUST never overflow. The accumulation may overflow if and only if in 32 bits.

Inputs

  • A (heterogeneous) - T1: N-dimensional matrix A

  • B (heterogeneous) - T2: N-dimensional matrix B

Outputs

  • Y (heterogeneous) - T3: Matrix multiply results from A * B

OnnxComMicrosoftMatMulIntegerToFloat#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMatMulIntegerToFloat(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

Between 4 and 7 inputs.

  • A (heterogeneous) - T1: N-dimensional matrix A

  • B (heterogeneous) - T2: N-dimensional matrix B

  • a_scale (heterogeneous) - T3: Scale of quantized input ‘A’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘A’.

  • b_scale (heterogeneous) - T3: Scale of quantized input ‘B’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • a_zero_point (optional, heterogeneous) - T1: Zero point tensor for input ‘A’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘A’.

  • b_zero_point (optional, heterogeneous) - T2: Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • bias (optional, heterogeneous) - T3: 1D input tensor, whose dimension is same as B’s last dimension

Outputs

  • Y (heterogeneous) - T3: Matrix multiply results from A * B

OnnxComMicrosoftMatMulIntegerToFloat_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMatMulIntegerToFloat_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

Between 4 and 7 inputs.

  • A (heterogeneous) - T1: N-dimensional matrix A

  • B (heterogeneous) - T2: N-dimensional matrix B

  • a_scale (heterogeneous) - T3: Scale of quantized input ‘A’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘A’.

  • b_scale (heterogeneous) - T3: Scale of quantized input ‘B’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • a_zero_point (optional, heterogeneous) - T1: Zero point tensor for input ‘A’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘A’.

  • b_zero_point (optional, heterogeneous) - T2: Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • bias (optional, heterogeneous) - T3: 1D input tensor, whose dimension is same as B’s last dimension

Outputs

  • Y (heterogeneous) - T3: Matrix multiply results from A * B

OnnxComMicrosoftMaxpoolWithMask#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMaxpoolWithMask(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

For internal use.

Attributes

  • auto_pad:

Default value is ?.

  • kernel_shape:

Default value is ?.

  • pads:

Default value is ?.

  • storage_order:

Default value is ?.

  • strides:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

  • M (heterogeneous) - tensor(int32): mask

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftMaxpoolWithMask_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMaxpoolWithMask_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

For internal use.

Attributes

  • auto_pad:

Default value is ?.

  • kernel_shape:

Default value is ?.

  • pads:

Default value is ?.

  • storage_order:

Default value is ?.

  • strides:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

  • M (heterogeneous) - tensor(int32): mask

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftMegatronF#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMegatronF(*args, **kwargs)#

Version

  • name: MegatronF (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • input (heterogeneous) - T: The input data as Tensor.

Outputs

  • output (heterogeneous) - T: The output.

OnnxComMicrosoftMegatronF_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMegatronF_1(*args, **kwargs)#

Version

  • name: MegatronF (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • input (heterogeneous) - T: The input data as Tensor.

Outputs

  • output (heterogeneous) - T: The output.

OnnxComMicrosoftMegatronG#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMegatronG(*args, **kwargs)#

Version

  • name: MegatronG (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • group_type: 0 - data parallel group, 1 - horizontal parallel group Default value is ?.

Inputs

  • input (heterogeneous) - T: The input data as Tensor.

Outputs

  • output (heterogeneous) - T: The output.

OnnxComMicrosoftMegatronG_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMegatronG_1(*args, **kwargs)#

Version

  • name: MegatronG (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • group_type: 0 - data parallel group, 1 - horizontal parallel group Default value is ?.

Inputs

  • input (heterogeneous) - T: The input data as Tensor.

Outputs

  • output (heterogeneous) - T: The output.

OnnxComMicrosoftMixedPrecisionScale#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMixedPrecisionScale(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

MixedPrecisionScale

Attributes

  • fuse_outputs: If true, fuse all outputs into one continous buffer. Default value is ?.

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

Between 2 and 2147483647 inputs.

  • S (heterogeneous) - ScaleT: scale

  • X (variadic, heterogeneous) - SrcT: inputs

Outputs

Between 1 and 2147483647 outputs.

  • Y (variadic, heterogeneous) - DstT: output

OnnxComMicrosoftMixedPrecisionScale_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMixedPrecisionScale_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

MixedPrecisionScale

Attributes

  • fuse_outputs: If true, fuse all outputs into one continous buffer. Default value is ?.

  • to (required): The data type to which the elements of the input tensor are cast. Strictly must be one of the types from DataType enum in TensorProto Default value is ?.

Inputs

Between 2 and 2147483647 inputs.

  • S (heterogeneous) - ScaleT: scale

  • X (variadic, heterogeneous) - SrcT: inputs

Outputs

Between 1 and 2147483647 outputs.

  • Y (variadic, heterogeneous) - DstT: output

OnnxComMicrosoftMulInteger#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMulInteger(*args, **kwargs)#

Version

  • name: MulInteger (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Performs element-wise binary quantized multiplication (with Numpy-style broadcasting support). “This operator supports multidirectional (i.e., Numpy-style) broadcasting” The output of this op is the int32 accumulated result of the mul operation

C (int32) = (A - A_zero_point) * (B - B_zero_point)

Inputs

Between 3 and 4 inputs.

  • A (heterogeneous) - T: First operand.

  • A_zero_point (optional, heterogeneous) - T: Input A zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • B (heterogeneous) - T: Second operand.

  • B_zero_point (optional, heterogeneous) - T: Input B zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • C (heterogeneous) - T1: Constrain output to 32 bit tensor

OnnxComMicrosoftMulInteger_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMulInteger_1(*args, **kwargs)#

Version

  • name: MulInteger (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Performs element-wise binary quantized multiplication (with Numpy-style broadcasting support). “This operator supports multidirectional (i.e., Numpy-style) broadcasting” The output of this op is the int32 accumulated result of the mul operation

C (int32) = (A - A_zero_point) * (B - B_zero_point)

Inputs

Between 3 and 4 inputs.

  • A (heterogeneous) - T: First operand.

  • A_zero_point (optional, heterogeneous) - T: Input A zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • B (heterogeneous) - T: Second operand.

  • B_zero_point (optional, heterogeneous) - T: Input B zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • C (heterogeneous) - T1: Constrain output to 32 bit tensor

OnnxComMicrosoftMurmurHash3#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMurmurHash3(*args, **kwargs)#

Version

  • name: MurmurHash3 (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The underlying implementation is MurmurHash3_x86_32 generating low latency 32bits hash suitable for implementing lookup tables, Bloom filters, count min sketch or feature hashing.

Attributes

  • positive: If value is 1, output type is uint32_t, else int32_t. Default value is 1. Default value is ?.

  • seed: Seed for the hashing algorithm, unsigned 32-bit integer, default to 0. Default value is ?.

Inputs

  • X (heterogeneous) - T1: An input tensor to hash.

Outputs

  • Y (heterogeneous) - T2: 32-bit hash value.

OnnxComMicrosoftMurmurHash3_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftMurmurHash3_1(*args, **kwargs)#

Version

  • name: MurmurHash3 (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The underlying implementation is MurmurHash3_x86_32 generating low latency 32bits hash suitable for implementing lookup tables, Bloom filters, count min sketch or feature hashing.

Attributes

  • positive: If value is 1, output type is uint32_t, else int32_t. Default value is 1. Default value is ?.

  • seed: Seed for the hashing algorithm, unsigned 32-bit integer, default to 0. Default value is ?.

Inputs

  • X (heterogeneous) - T1: An input tensor to hash.

Outputs

  • Y (heterogeneous) - T2: 32-bit hash value.

OnnxComMicrosoftNGramRepeatBlock#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNGramRepeatBlock(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Enforce no repetition of n-grams. Scores are set to -inf for tokens that form a repeated n-gram if added to the back of the input_ids.

Attributes

  • ngram_size (required): The NGram size. Default value is ?.

Inputs

  • input_ids (heterogeneous) - Tid: 2D input tensor with shape (batch_size, sequence_length)

  • scores (heterogeneous) - T: 2D input tensor with shape (batch_size, vocab_size)

Outputs

  • scores_out (heterogeneous) - T: 2D output tensor with shape (batch_size, vocab_size)

OnnxComMicrosoftNGramRepeatBlock_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNGramRepeatBlock_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Enforce no repetition of n-grams. Scores are set to -inf for tokens that form a repeated n-gram if added to the back of the input_ids.

Attributes

  • ngram_size (required): The NGram size. Default value is ?.

Inputs

  • input_ids (heterogeneous) - Tid: 2D input tensor with shape (batch_size, sequence_length)

  • scores (heterogeneous) - T: 2D input tensor with shape (batch_size, vocab_size)

Outputs

  • scores_out (heterogeneous) - T: 2D output tensor with shape (batch_size, vocab_size)

OnnxComMicrosoftNcclAllGather#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNcclAllGather(*args, **kwargs)#

Version

  • name: NcclAllGather (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • group_type: 0 - global parallel group, 1 - data parallel group, 2 - node local data parallel group, 3 - cross node data parallel group, 4 - horozontal parallel, 5 - model parallel. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - T: tensors to be sent

Outputs

Between 1 and 2147483647 outputs.

  • output (variadic, heterogeneous) - T: gathered tensors

OnnxComMicrosoftNcclAllGather_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNcclAllGather_1(*args, **kwargs)#

Version

  • name: NcclAllGather (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • group_type: 0 - global parallel group, 1 - data parallel group, 2 - node local data parallel group, 3 - cross node data parallel group, 4 - horozontal parallel, 5 - model parallel. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - T: tensors to be sent

Outputs

Between 1 and 2147483647 outputs.

  • output (variadic, heterogeneous) - T: gathered tensors

OnnxComMicrosoftNcclAllReduce#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNcclAllReduce(*args, **kwargs)#

Version

  • name: NcclAllReduce (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • group_type: 0 - global parallel group, 1 - data parallel group, 2 - node local data parallel group, 3 - cross node data parallel group, 4 - horozontal parallel, 5 - model parallel. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - T: tensors to be reduced

Outputs

Between 1 and 2147483647 outputs.

  • output (variadic, heterogeneous) - T: reduced tensors

OnnxComMicrosoftNcclAllReduce_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNcclAllReduce_1(*args, **kwargs)#

Version

  • name: NcclAllReduce (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • group_type: 0 - global parallel group, 1 - data parallel group, 2 - node local data parallel group, 3 - cross node data parallel group, 4 - horozontal parallel, 5 - model parallel. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - T: tensors to be reduced

Outputs

Between 1 and 2147483647 outputs.

  • output (variadic, heterogeneous) - T: reduced tensors

OnnxComMicrosoftNcclReduceScatter#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNcclReduceScatter(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • group_type: 0 - global parallel group, 1 - data parallel group, 2 - node local data parallel group, 3 - cross node data parallel group, 4 - horozontal parallel, 5 - model parallel. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - T: tensors to be reduced and scattered

Outputs

Between 1 and 2147483647 outputs.

  • output (variadic, heterogeneous) - T: reduced tensors

OnnxComMicrosoftNcclReduceScatter_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNcclReduceScatter_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • group_type: 0 - global parallel group, 1 - data parallel group, 2 - node local data parallel group, 3 - cross node data parallel group, 4 - horozontal parallel, 5 - model parallel. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - T: tensors to be reduced and scattered

Outputs

Between 1 and 2147483647 outputs.

  • output (variadic, heterogeneous) - T: reduced tensors

OnnxComMicrosoftNchwcAveragePool#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcAveragePool(*args, **kwargs)#

Version

  • name: AveragePool (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • auto_pad:

Default value is ?.

  • ceil_mode:

Default value is ?.

  • count_include_pad:

Default value is ?.

  • dilations:

Default value is ?.

  • kernel_shape (required):

Default value is ?.

  • pads:

Default value is ?.

  • strides:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcAveragePool_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcAveragePool_1(*args, **kwargs)#

Version

  • name: AveragePool (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • auto_pad:

Default value is ?.

  • ceil_mode:

Default value is ?.

  • count_include_pad:

Default value is ?.

  • dilations:

Default value is ?.

  • kernel_shape (required):

Default value is ?.

  • pads:

Default value is ?.

  • strides:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcConv#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcConv(*args, **kwargs)#

Version

  • name: Conv (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • activation:

Default value is ?.

  • activation_params:

Default value is ?.

  • auto_pad:

Default value is ?.

  • dilations:

Default value is ?.

  • group:

Default value is ?.

  • kernel_shape:

Default value is ?.

  • pads:

Default value is ?.

  • strides:

Default value is ?.

Inputs

Between 2 and 4 inputs.

  • X (heterogeneous) - T:

  • W (heterogeneous) - T:

  • B (optional, heterogeneous) - T:

  • Sum (optional, heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcConv_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcConv_1(*args, **kwargs)#

Version

  • name: Conv (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • activation:

Default value is ?.

  • activation_params:

Default value is ?.

  • auto_pad:

Default value is ?.

  • dilations:

Default value is ?.

  • group:

Default value is ?.

  • kernel_shape:

Default value is ?.

  • pads:

Default value is ?.

  • strides:

Default value is ?.

Inputs

Between 2 and 4 inputs.

  • X (heterogeneous) - T:

  • W (heterogeneous) - T:

  • B (optional, heterogeneous) - T:

  • Sum (optional, heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcGlobalAveragePool#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcGlobalAveragePool(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcGlobalAveragePool_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcGlobalAveragePool_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcGlobalMaxPool#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcGlobalMaxPool(*args, **kwargs)#

Version

  • name: GlobalMaxPool (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcGlobalMaxPool_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcGlobalMaxPool_1(*args, **kwargs)#

Version

  • name: GlobalMaxPool (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcMaxPool#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcMaxPool(*args, **kwargs)#

Version

  • name: MaxPool (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • auto_pad:

Default value is ?.

  • ceil_mode:

Default value is ?.

  • dilations:

Default value is ?.

  • kernel_shape (required):

Default value is ?.

  • pads:

Default value is ?.

  • storage_order:

Default value is ?.

  • strides:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcMaxPool_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcMaxPool_1(*args, **kwargs)#

Version

  • name: MaxPool (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • auto_pad:

Default value is ?.

  • ceil_mode:

Default value is ?.

  • dilations:

Default value is ?.

  • kernel_shape (required):

Default value is ?.

  • pads:

Default value is ?.

  • storage_order:

Default value is ?.

  • strides:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcReorderInput#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcReorderInput(*args, **kwargs)#

Version

  • name: ReorderInput (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • channels_last:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcReorderInput_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcReorderInput_1(*args, **kwargs)#

Version

  • name: ReorderInput (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • channels_last:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcReorderOutput#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcReorderOutput(*args, **kwargs)#

Version

  • name: ReorderOutput (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • channels:

Default value is ?.

  • channels_last:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcReorderOutput_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcReorderOutput_1(*args, **kwargs)#

Version

  • name: ReorderOutput (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • channels:

Default value is ?.

  • channels_last:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcUpsample#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcUpsample(*args, **kwargs)#

Version

  • name: Upsample (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • coordinate_transformation_mode:

Default value is ?.

  • mode:

Default value is ?.

  • scales:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNchwcUpsample_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNchwcUpsample_1(*args, **kwargs)#

Version

  • name: Upsample (GitHub)

  • domain: com.microsoft.nchwc

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.nchwc.

Summary

For internal use.

Attributes

  • coordinate_transformation_mode:

Default value is ?.

  • mode:

Default value is ?.

  • scales:

Default value is ?.

Inputs

  • X (heterogeneous) - T:

Outputs

  • Y (heterogeneous) - T:

OnnxComMicrosoftNegativeLogLikelihoodLossInternal#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNegativeLogLikelihoodLossInternal(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

NegativeLogLikelihoodLossInternal

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

Between 2 and 4 inputs.

  • input (heterogeneous) - T: Input tensor of shape (N, C) or (N, C, d1, d2, …, dk).

  • target (heterogeneous) - Tind: Target tensor of shape (N) or (N, d1, d2, …, dk). Target element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the target values should either be in the range [0, C) or have the value ignore_index.

  • weight (optional, heterogeneous) - T: Optional rescaling weight tensor. If given, it has to be a tensor of size C. Otherwise, it is treated as if having all ones.

  • ignore_index (optional, heterogeneous) - I: Scalar tensor to specify a target value that is ignored and does not contribute to the input gradient.

Outputs

  • loss (heterogeneous) - T: The negative log likelihood loss

OnnxComMicrosoftNegativeLogLikelihoodLossInternal2#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNegativeLogLikelihoodLossInternal2(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

NegativeLogLikelihoodLossInternal

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

Between 2 and 4 inputs.

  • input (heterogeneous) - T: Input tensor of shape (N, C) or (N, C, d1, d2, …, dk).

  • target (heterogeneous) - Tind: Target tensor of shape (N) or (N, d1, d2, …, dk). Target element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the target values should either be in the range [0, C) or have the value ignore_index.

  • weight (optional, heterogeneous) - T: Optional rescaling weight tensor. If given, it has to be a tensor of size C. Otherwise, it is treated as if having all ones.

  • ignore_index (optional, heterogeneous) - I: Scalar tensor to specify a target value that is ignored and does not contribute to the input gradient.

Outputs

  • loss (heterogeneous) - T: The negative log likelihood loss

OnnxComMicrosoftNegativeLogLikelihoodLossInternal2_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNegativeLogLikelihoodLossInternal2_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

NegativeLogLikelihoodLossInternal

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

Between 2 and 4 inputs.

  • input (heterogeneous) - T: Input tensor of shape (N, C) or (N, C, d1, d2, …, dk).

  • target (heterogeneous) - Tind: Target tensor of shape (N) or (N, d1, d2, …, dk). Target element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the target values should either be in the range [0, C) or have the value ignore_index.

  • weight (optional, heterogeneous) - T: Optional rescaling weight tensor. If given, it has to be a tensor of size C. Otherwise, it is treated as if having all ones.

  • ignore_index (optional, heterogeneous) - I: Scalar tensor to specify a target value that is ignored and does not contribute to the input gradient.

Outputs

  • loss (heterogeneous) - T: The negative log likelihood loss

OnnxComMicrosoftNegativeLogLikelihoodLossInternal_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNegativeLogLikelihoodLossInternal_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

NegativeLogLikelihoodLossInternal

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

Between 2 and 4 inputs.

  • input (heterogeneous) - T: Input tensor of shape (N, C) or (N, C, d1, d2, …, dk).

  • target (heterogeneous) - Tind: Target tensor of shape (N) or (N, d1, d2, …, dk). Target element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the target values should either be in the range [0, C) or have the value ignore_index.

  • weight (optional, heterogeneous) - T: Optional rescaling weight tensor. If given, it has to be a tensor of size C. Otherwise, it is treated as if having all ones.

  • ignore_index (optional, heterogeneous) - I: Scalar tensor to specify a target value that is ignored and does not contribute to the input gradient.

Outputs

  • loss (heterogeneous) - T: The negative log likelihood loss

OnnxComMicrosoftNhwcConv#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNhwcConv(*args, **kwargs)#

Version

  • name: NhwcConv (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • auto_pad:

Default value is ?.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis. Default value is ?.

  • group: number of groups input channels and output channels are divided into. Default value is ?.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W. Default value is ?.

  • pads:

Default value is ?.

  • strides: Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis. Default value is ?.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. Assuming zero based indices for the shape array, X.shape[1] == (W.shape[1] * group) == C and W.shape[0] mod G == 0. Or in other words FILTER_IN_CHANNEL multiplied by the number of groups should be equal to DATA_CHANNEL and the number of feature maps M should be a multiple of the number of groups G.

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

OnnxComMicrosoftNhwcConv_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNhwcConv_1(*args, **kwargs)#

Version

  • name: NhwcConv (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • auto_pad:

Default value is ?.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis. Default value is ?.

  • group: number of groups input channels and output channels are divided into. Default value is ?.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W. Default value is ?.

  • pads:

Default value is ?.

  • strides: Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis. Default value is ?.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. Assuming zero based indices for the shape array, X.shape[1] == (W.shape[1] * group) == C and W.shape[0] mod G == 0. Or in other words FILTER_IN_CHANNEL multiplied by the number of groups should be equal to DATA_CHANNEL and the number of feature maps M should be a multiple of the number of groups G.

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

OnnxComMicrosoftNhwcMaxPool#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNhwcMaxPool(*args, **kwargs)#

Version

  • name: NhwcMaxPool (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • auto_pad:

Default value is ?.

  • ceil_mode:

Default value is ?.

  • dilations:

Default value is ?.

  • kernel_shape (required):

Default value is ?.

  • pads:

Default value is ?.

  • strides:

Default value is ?.

Inputs

  • x (heterogeneous) - T:

Outputs

  • y (heterogeneous) - T:

OnnxComMicrosoftNhwcMaxPool_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftNhwcMaxPool_1(*args, **kwargs)#

Version

  • name: NhwcMaxPool (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • auto_pad:

Default value is ?.

  • ceil_mode:

Default value is ?.

  • dilations:

Default value is ?.

  • kernel_shape (required):

Default value is ?.

  • pads:

Default value is ?.

  • strides:

Default value is ?.

Inputs

  • x (heterogeneous) - T:

Outputs

  • y (heterogeneous) - T:

OnnxComMicrosoftPad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftPad(*args, **kwargs)#

Version

  • name: Pad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Given data tensor, pads, mode, and value. Example: Insert 0 pads to the beginning of the second dimension. data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ]

pads = [0, 2, 0, 0] output = [

[ [0.0, 0.0, 1.0, 1.2], [0.0, 0.0, 2.3, 3.4], [0.0, 0.0, 4.5, 5.7], ], ]

Attributes

  • mode: Three modes: constant`(default) - pads with a given constant value, `reflect - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis, edge - pads with the edge values of array Default value is ?.

Inputs

Between 2 and 3 inputs.

  • data (heterogeneous) - T: Input tensor.

  • pads (heterogeneous) - tensor(int64): Tensor of integers indicating the number of padding elements to add or remove (if negative) at the beginning and end of each axis. For 2D input tensor, it is the number of pixels. pads should be a 1D tensor of shape [2 * input_rank] or a 2D tensor of shape [1, 2 * input_rank]. pads format (1D example) should be as follow [x1_begin, x2_begin,…,x1_end, x2_end,…], where xi_begin is the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.

  • value (optional, heterogeneous) - T: (Optional) A scalar or rank 1 tensor containing a single value to be filled if the mode chosen is constant (by default it is 0.0).

Outputs

  • output (heterogeneous) - T: Tensor after padding.

OnnxComMicrosoftPad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftPad_1(*args, **kwargs)#

Version

  • name: Pad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Given data tensor, pads, mode, and value. Example: Insert 0 pads to the beginning of the second dimension. data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7], ]

pads = [0, 2, 0, 0] output = [

[ [0.0, 0.0, 1.0, 1.2], [0.0, 0.0, 2.3, 3.4], [0.0, 0.0, 4.5, 5.7], ], ]

Attributes

  • mode: Three modes: constant`(default) - pads with a given constant value, `reflect - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis, edge - pads with the edge values of array Default value is ?.

Inputs

Between 2 and 3 inputs.

  • data (heterogeneous) - T: Input tensor.

  • pads (heterogeneous) - tensor(int64): Tensor of integers indicating the number of padding elements to add or remove (if negative) at the beginning and end of each axis. For 2D input tensor, it is the number of pixels. pads should be a 1D tensor of shape [2 * input_rank] or a 2D tensor of shape [1, 2 * input_rank]. pads format (1D example) should be as follow [x1_begin, x2_begin,…,x1_end, x2_end,…], where xi_begin is the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.

  • value (optional, heterogeneous) - T: (Optional) A scalar or rank 1 tensor containing a single value to be filled if the mode chosen is constant (by default it is 0.0).

Outputs

  • output (heterogeneous) - T: Tensor after padding.

OnnxComMicrosoftPassThrough#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftPassThrough(*args, **kwargs)#

Version

  • name: PassThrough (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Barrier op with value pass through, outputs = inputs

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic) - T: input tensors

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T: output tensors

OnnxComMicrosoftPassThrough_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftPassThrough_1(*args, **kwargs)#

Version

  • name: PassThrough (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Barrier op with value pass through, outputs = inputs

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic) - T: input tensors

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T: output tensors

OnnxComMicrosoftPythonOp#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftPythonOp(*args, **kwargs)#

Version

  • name: PythonOp (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Wrapper of Pytorch’s autograd.Function implementation.

Attributes

  • inplace: Indicate if the output should reuse input memory. Default value is ?.

  • input_convention (required): input_convention[i]==c means a non-tensor argument. input_convention[i]==d means a tensor. Default value is ?.

  • input_float_scalar_positions:

Default value is ?.

  • input_float_scalars: Python float arguments. Default value is ?.

  • input_float_tuple_begins:

Default value is ?.

  • input_float_tuple_positions:

Default value is ?.

  • input_float_tuples:

Default value is ?.

  • input_int_scalar_positions:

Default value is ?.

  • input_int_scalars: Python int arguments. Default value is ?.

  • input_int_tuple_begins:

Default value is ?.

  • input_int_tuple_positions:

Default value is ?.

  • input_int_tuples: Python int-tuple arguments. Default value is ?.

  • input_pointer_scalar_positions:

Default value is ?.

  • input_pointer_scalars:

Default value is ?.

  • input_requires_grads (required): Flags to indicate whether the torch.autograd.apply’s inputs require gradients (including flags for both tensor and non-tensor inputs) Default value is ?.

  • input_tensor_ranks (required): Input tensors’ ranks of autograd.Function.apply. Default value is ?.

  • input_tensor_types (required): Input types of autograd.Function.apply. Default value is ?.

  • name (required): Name of custom class. Default value is ?.

  • output_tensor_ranks (required): Output tensors’ ranks of autograd.Function.apply. Default value is ?.

  • output_tensor_requires_grads (required): Flags to indicate which output has gradient Default value is ?.

  • output_tensor_types (required): Output types of autograd.Function.apply. Default value is ?.

  • training_mode: Indicate if the model is exported in training_mode, by default, False. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic) - T: Module outputs to be returned to pytorch.

Outputs

Between 2 and 2147483647 outputs.

  • context (heterogeneous) - TInt64: Address of context created in this operator. It can be used in backward.

  • outputs (variadic) - T: Outputs returned from pytorch.

OnnxComMicrosoftPythonOpGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftPythonOpGrad(*args, **kwargs)#

Version

  • name: PythonOpGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Wrapper of Pytorch’s autograd.Function’s backward implementaiton.

Attributes

  • inplace: Indicate if the output should reuse input memory. Todo(pengwa): do we really need it? Default value is ?.

  • input_tensor_ranks: Input ranks of autograd.Function.backward (including only tensor inputs).This attribute is mostly used for input checks for better robustness. Default value is ?.

  • input_tensor_requires_grads (required): Flags to indicate which inputs have gradients (including only tensor inputs).This attribute is mostly used for input checks for better robustness. Default value is ?.

  • input_tensor_types: Input types of autograd.Function.backward (including only tensor inputs).This attribute is mostly used for input checks for better robustnes. Default value is ?.

  • name (required): Name of custom class. Default value is ?.

  • output_convention (required): A string inidicating autograd.Function.backward outputs’s type.value ‘c’ - non-tensor output; value ‘d’ - tensor output. Default value is ?.

  • output_tensor_ranks: Output ranks of autograd.Function.backward outputs (including only tensor outputs). Default value is ?.

  • output_tensor_requires_grads (required): Flags to indicate which outputs have gradients (including only tensor outputs). Default value is ?.

  • output_tensor_types: Output types of autograd.Function.backward outputs (including only tensor outputs). Default value is ?.

Inputs

Between 2 and 2147483647 inputs.

  • context (heterogeneous) - TInt64: Address of context created in this operator. It should be generated by the corresponding forward.

  • inputs (variadic) - T: There are 2*N inputs: N gradient inputs (as inputs of autograd.Function.backward) + N forward run activations of autograd.Function.apply.The N forward run inputs are used as control dependency between PythonOpGrad and PythonOp

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T: Outputs returned from pytorch.

OnnxComMicrosoftPythonOpGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftPythonOpGrad_1(*args, **kwargs)#

Version

  • name: PythonOpGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Wrapper of Pytorch’s autograd.Function’s backward implementaiton.

Attributes

  • inplace: Indicate if the output should reuse input memory. Todo(pengwa): do we really need it? Default value is ?.

  • input_tensor_ranks: Input ranks of autograd.Function.backward (including only tensor inputs).This attribute is mostly used for input checks for better robustness. Default value is ?.

  • input_tensor_requires_grads (required): Flags to indicate which inputs have gradients (including only tensor inputs).This attribute is mostly used for input checks for better robustness. Default value is ?.

  • input_tensor_types: Input types of autograd.Function.backward (including only tensor inputs).This attribute is mostly used for input checks for better robustnes. Default value is ?.

  • name (required): Name of custom class. Default value is ?.

  • output_convention (required): A string inidicating autograd.Function.backward outputs’s type.value ‘c’ - non-tensor output; value ‘d’ - tensor output. Default value is ?.

  • output_tensor_ranks: Output ranks of autograd.Function.backward outputs (including only tensor outputs). Default value is ?.

  • output_tensor_requires_grads (required): Flags to indicate which outputs have gradients (including only tensor outputs). Default value is ?.

  • output_tensor_types: Output types of autograd.Function.backward outputs (including only tensor outputs). Default value is ?.

Inputs

Between 2 and 2147483647 inputs.

  • context (heterogeneous) - TInt64: Address of context created in this operator. It should be generated by the corresponding forward.

  • inputs (variadic) - T: There are 2*N inputs: N gradient inputs (as inputs of autograd.Function.backward) + N forward run activations of autograd.Function.apply.The N forward run inputs are used as control dependency between PythonOpGrad and PythonOp

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T: Outputs returned from pytorch.

OnnxComMicrosoftPythonOp_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftPythonOp_1(*args, **kwargs)#

Version

  • name: PythonOp (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Wrapper of Pytorch’s autograd.Function implementation.

Attributes

  • inplace: Indicate if the output should reuse input memory. Default value is ?.

  • input_convention (required): input_convention[i]==c means a non-tensor argument. input_convention[i]==d means a tensor. Default value is ?.

  • input_float_scalar_positions:

Default value is ?.

  • input_float_scalars: Python float arguments. Default value is ?.

  • input_float_tuple_begins:

Default value is ?.

  • input_float_tuple_positions:

Default value is ?.

  • input_float_tuples:

Default value is ?.

  • input_int_scalar_positions:

Default value is ?.

  • input_int_scalars: Python int arguments. Default value is ?.

  • input_int_tuple_begins:

Default value is ?.

  • input_int_tuple_positions:

Default value is ?.

  • input_int_tuples: Python int-tuple arguments. Default value is ?.

  • input_pointer_scalar_positions:

Default value is ?.

  • input_pointer_scalars:

Default value is ?.

  • input_requires_grads (required): Flags to indicate whether the torch.autograd.apply’s inputs require gradients (including flags for both tensor and non-tensor inputs) Default value is ?.

  • input_tensor_ranks (required): Input tensors’ ranks of autograd.Function.apply. Default value is ?.

  • input_tensor_types (required): Input types of autograd.Function.apply. Default value is ?.

  • name (required): Name of custom class. Default value is ?.

  • output_tensor_ranks (required): Output tensors’ ranks of autograd.Function.apply. Default value is ?.

  • output_tensor_requires_grads (required): Flags to indicate which output has gradient Default value is ?.

  • output_tensor_types (required): Output types of autograd.Function.apply. Default value is ?.

  • training_mode: Indicate if the model is exported in training_mode, by default, False. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic) - T: Module outputs to be returned to pytorch.

Outputs

Between 2 and 2147483647 outputs.

  • context (heterogeneous) - TInt64: Address of context created in this operator. It can be used in backward.

  • outputs (variadic) - T: Outputs returned from pytorch.

OnnxComMicrosoftQAttention#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQAttention(*args, **kwargs)#

Version

  • name: QAttention (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Quantization of Multi-Head Self Attention.

Attributes

  • num_heads (required): Number of attention heads Default value is ?.

  • unidirectional: Whether every token can only attend to previous tokens. Default value is 0. Default value is ?.

Inputs

Between 5 and 9 inputs.

  • input (heterogeneous) - T1: 3D input tensor with shape (batch_size, sequence_length, input_hidden_size)

  • weight (heterogeneous) - T2: 2D input tensor with shape (input_hidden_size, 3 * hidden_size), hidden_size = num_heads * head_size

  • bias (heterogeneous) - T3: 1D input tensor with shape (3 * hidden_size)

  • input_scale (heterogeneous) - T3: scale of quantized input tensor. It’s a scalar, which means a per- tensor/layer quantization.

  • weight_scale (heterogeneous) - T3: scale of weight scale. It’s a scalar or a 1D tensor, which means a per-tensor/per-column quantization.Its size should be 3 * hidden_size if it is per-column quantization

  • mask_index (optional, heterogeneous) - T4: Attention mask index with shape (batch_size)

  • input_zero_point (optional, heterogeneous) - T1: zero point of quantized input tensor.It’s a scalar, which means a per-tensor/layer quantization.

  • weight_zero_point (optional, heterogeneous) - T2: zero point of quantized weight tensor. It’s a scalar or a 1D tensor, which means a per-tensor/per-column quantization.Its size should be 3 * hidden_size if it is per-column quantization

  • past (optional, heterogeneous) - T3: past state for key and value with shape (2, batch_size, num_heads, past_sequence_length, head_size).

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T3: 3D output tensor with shape (batch_size, sequence_length, hidden_size)

  • present (optional, heterogeneous) - T3: present state for key and value with shape (2, batch_size, num_heads, past_sequence_length + sequence_length, head_size)

OnnxComMicrosoftQAttention_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQAttention_1(*args, **kwargs)#

Version

  • name: QAttention (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Quantization of Multi-Head Self Attention.

Attributes

  • num_heads (required): Number of attention heads Default value is ?.

  • unidirectional: Whether every token can only attend to previous tokens. Default value is 0. Default value is ?.

Inputs

Between 5 and 9 inputs.

  • input (heterogeneous) - T1: 3D input tensor with shape (batch_size, sequence_length, input_hidden_size)

  • weight (heterogeneous) - T2: 2D input tensor with shape (input_hidden_size, 3 * hidden_size), hidden_size = num_heads * head_size

  • bias (heterogeneous) - T3: 1D input tensor with shape (3 * hidden_size)

  • input_scale (heterogeneous) - T3: scale of quantized input tensor. It’s a scalar, which means a per- tensor/layer quantization.

  • weight_scale (heterogeneous) - T3: scale of weight scale. It’s a scalar or a 1D tensor, which means a per-tensor/per-column quantization.Its size should be 3 * hidden_size if it is per-column quantization

  • mask_index (optional, heterogeneous) - T4: Attention mask index with shape (batch_size)

  • input_zero_point (optional, heterogeneous) - T1: zero point of quantized input tensor.It’s a scalar, which means a per-tensor/layer quantization.

  • weight_zero_point (optional, heterogeneous) - T2: zero point of quantized weight tensor. It’s a scalar or a 1D tensor, which means a per-tensor/per-column quantization.Its size should be 3 * hidden_size if it is per-column quantization

  • past (optional, heterogeneous) - T3: past state for key and value with shape (2, batch_size, num_heads, past_sequence_length, head_size).

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T3: 3D output tensor with shape (batch_size, sequence_length, hidden_size)

  • present (optional, heterogeneous) - T3: present state for key and value with shape (2, batch_size, num_heads, past_sequence_length + sequence_length, head_size)

OnnxComMicrosoftQEmbedLayerNormalization#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQEmbedLayerNormalization(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

QEmbedLayerNormalization is the quantized fusion of embedding layer in BERT model, with optional mask processing. The embedding layer takes input_ids (word IDs) and segment_ids (sentence IDs) to look up word_embedding, position_embedding, and segment_emedding; the embeddings are added then applied layer normalization using gamma and beta tensors. The input_ids and segment_ids remain int32. All embeddings, gamma, and beta tensors are converted to int8/uint8. The last input mask is optional. If mask is provided, mask index (that is position of first 0 in mask, or number of words will be calculated.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is ?.

Inputs

  • input_ids (heterogeneous) - T1: 2D words IDs with shape (batch_size, sequence_length)

  • segment_ids (optional, heterogeneous) - T1: 2D segment IDs with shape (batch_size, sequence_length)

  • word_embedding_quant (heterogeneous) - T2: 2D with shape (,hidden_size)

  • position_embedding_quant (heterogeneous) - T2: 2D with shape (, hidden_size)

  • segment_embedding (optional, heterogeneous) - T2: 2D with shape (, hidden_size)

  • gamma_quant (heterogeneous) - T2: 1D gamma tensor for layer normalization with shape (hidden_size)

  • beta_quant (heterogeneous) - T2: 1D beta tensor for layer normalization with shape (hidden_size)

  • mask (optional, heterogeneous) - T1: Mask

  • word_embedding_scale (heterogeneous) - T: Scale for word embeddings

  • position_embedding_scale (heterogeneous) - T: Scale for position embeddings

  • segment_embedding_scale (optional, heterogeneous) - T: Scale for segment embeddings

  • gamma_scale (heterogeneous) - T: Scale for 1D gamma tensor

  • beta_scale (heterogeneous) - T: Scale for 1D beta tensor

  • word_embedding_zero_point (heterogeneous) - T2: Zero point for word embeddings

  • position_embedding_zero_point (heterogeneous) - T2: Zero point for position embeddings

  • segment_embedding_zero_point (optional, heterogeneous) - T2: Zero Point for segment embeddings

  • gamma_zero_point (heterogeneous) - T2: Zero Point for 1D gamma tensor

  • beta_zero_point (heterogeneous) - T2: Zero Point for 1D beta tensor

Outputs

  • layernorm_out (heterogeneous) - T: LayerNorm Output

  • mask_index_out (heterogeneous) - T1: Mask Index Output

OnnxComMicrosoftQEmbedLayerNormalization_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQEmbedLayerNormalization_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

QEmbedLayerNormalization is the quantized fusion of embedding layer in BERT model, with optional mask processing. The embedding layer takes input_ids (word IDs) and segment_ids (sentence IDs) to look up word_embedding, position_embedding, and segment_emedding; the embeddings are added then applied layer normalization using gamma and beta tensors. The input_ids and segment_ids remain int32. All embeddings, gamma, and beta tensors are converted to int8/uint8. The last input mask is optional. If mask is provided, mask index (that is position of first 0 in mask, or number of words will be calculated.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is ?.

Inputs

  • input_ids (heterogeneous) - T1: 2D words IDs with shape (batch_size, sequence_length)

  • segment_ids (optional, heterogeneous) - T1: 2D segment IDs with shape (batch_size, sequence_length)

  • word_embedding_quant (heterogeneous) - T2: 2D with shape (,hidden_size)

  • position_embedding_quant (heterogeneous) - T2: 2D with shape (, hidden_size)

  • segment_embedding (optional, heterogeneous) - T2: 2D with shape (, hidden_size)

  • gamma_quant (heterogeneous) - T2: 1D gamma tensor for layer normalization with shape (hidden_size)

  • beta_quant (heterogeneous) - T2: 1D beta tensor for layer normalization with shape (hidden_size)

  • mask (optional, heterogeneous) - T1: Mask

  • word_embedding_scale (heterogeneous) - T: Scale for word embeddings

  • position_embedding_scale (heterogeneous) - T: Scale for position embeddings

  • segment_embedding_scale (optional, heterogeneous) - T: Scale for segment embeddings

  • gamma_scale (heterogeneous) - T: Scale for 1D gamma tensor

  • beta_scale (heterogeneous) - T: Scale for 1D beta tensor

  • word_embedding_zero_point (heterogeneous) - T2: Zero point for word embeddings

  • position_embedding_zero_point (heterogeneous) - T2: Zero point for position embeddings

  • segment_embedding_zero_point (optional, heterogeneous) - T2: Zero Point for segment embeddings

  • gamma_zero_point (heterogeneous) - T2: Zero Point for 1D gamma tensor

  • beta_zero_point (heterogeneous) - T2: Zero Point for 1D beta tensor

Outputs

  • layernorm_out (heterogeneous) - T: LayerNorm Output

  • mask_index_out (heterogeneous) - T1: Mask Index Output

OnnxComMicrosoftQGemm#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQGemm(*args, **kwargs)#

Version

  • name: QGemm (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Quantized Gemm

Attributes

  • alpha: Scalar multiplier for the product of input tensors A * B. Default value is ?.

  • transA: Whether A should be transposed Default value is ?.

  • transB: Whether B should be transposed Default value is ?.

Inputs

Between 6 and 9 inputs.

  • A (heterogeneous) - TA: Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.

  • a_scale (heterogeneous) - T: Scale of quantized input ‘A’. It is a scalar,which means a per- tensor quantization.

  • a_zero_point (heterogeneous) - TA: Zero point tensor for input ‘A’. It is a scalar.

  • B (heterogeneous) - TB: Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.

  • b_scale (heterogeneous) - T: Scale of quantized input ‘B’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • b_zero_point (heterogeneous) - TB: Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • C (optional, heterogeneous) - TC: Optional input tensor C. If not specified, the computation is done as if C is a scalar 0. The shape of C should be unidirectional broadcastable to (M, N). Its type is int32_t and must be quantized with zero_point = 0 and scale = alpha / beta * a_scale * b_scale.

  • y_scale (optional, heterogeneous) - T: Scale of output ‘Y’. It is a scalar, which means a per-tensor quantization. It is optional. The output is full precision(float32) if it is not provided. Or the output is quantized.

  • y_zero_point (optional, heterogeneous) - TYZ: Zero point tensor for output ‘Y’. It is a scalar, which means a per- tensor quantization. It is optional. The output is full precision(float32) if it is not provided. Or the output is quantized.

Outputs

  • Y (heterogeneous) - TY: Output tensor of shape (M, N).

OnnxComMicrosoftQGemm_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQGemm_1(*args, **kwargs)#

Version

  • name: QGemm (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Quantized Gemm

Attributes

  • alpha: Scalar multiplier for the product of input tensors A * B. Default value is ?.

  • transA: Whether A should be transposed Default value is ?.

  • transB: Whether B should be transposed Default value is ?.

Inputs

Between 6 and 9 inputs.

  • A (heterogeneous) - TA: Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.

  • a_scale (heterogeneous) - T: Scale of quantized input ‘A’. It is a scalar,which means a per- tensor quantization.

  • a_zero_point (heterogeneous) - TA: Zero point tensor for input ‘A’. It is a scalar.

  • B (heterogeneous) - TB: Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.

  • b_scale (heterogeneous) - T: Scale of quantized input ‘B’. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • b_zero_point (heterogeneous) - TB: Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per-tensor or per-column quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of columns of input ‘B’.

  • C (optional, heterogeneous) - TC: Optional input tensor C. If not specified, the computation is done as if C is a scalar 0. The shape of C should be unidirectional broadcastable to (M, N). Its type is int32_t and must be quantized with zero_point = 0 and scale = alpha / beta * a_scale * b_scale.

  • y_scale (optional, heterogeneous) - T: Scale of output ‘Y’. It is a scalar, which means a per-tensor quantization. It is optional. The output is full precision(float32) if it is not provided. Or the output is quantized.

  • y_zero_point (optional, heterogeneous) - TYZ: Zero point tensor for output ‘Y’. It is a scalar, which means a per- tensor quantization. It is optional. The output is full precision(float32) if it is not provided. Or the output is quantized.

Outputs

  • Y (heterogeneous) - TY: Output tensor of shape (M, N).

OnnxComMicrosoftQLinearAdd#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearAdd(*args, **kwargs)#

Version

  • name: QLinearAdd (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Performs element-wise binary addition on 8 bit data types (with Numpy-style broadcasting support).

C = (A_scale * (A - A_zero_point) + B_scale * (B - B_zero_point))/C_scale + C_zero_point

Inputs

Between 7 and 8 inputs.

  • A (heterogeneous) - T: First operand.

  • A_scale (heterogeneous) - tensor(float): Input A’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • A_zero_point (optional, heterogeneous) - T: Input A zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • B (heterogeneous) - T: Second operand.

  • B_scale (heterogeneous) - tensor(float): Input B’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • B_zero_point (optional, heterogeneous) - T: Input B zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • C_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • C_zero_point (optional, heterogeneous) - T: Output zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

OnnxComMicrosoftQLinearAdd_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearAdd_1(*args, **kwargs)#

Version

  • name: QLinearAdd (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Performs element-wise binary addition on 8 bit data types (with Numpy-style broadcasting support).

C = (A_scale * (A - A_zero_point) + B_scale * (B - B_zero_point))/C_scale + C_zero_point

Inputs

Between 7 and 8 inputs.

  • A (heterogeneous) - T: First operand.

  • A_scale (heterogeneous) - tensor(float): Input A’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • A_zero_point (optional, heterogeneous) - T: Input A zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • B (heterogeneous) - T: Second operand.

  • B_scale (heterogeneous) - tensor(float): Input B’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • B_zero_point (optional, heterogeneous) - T: Input B zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • C_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • C_zero_point (optional, heterogeneous) - T: Output zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

OnnxComMicrosoftQLinearAveragePool#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearAveragePool(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

QLinearAveragePool consumes an input tensor X and applies average pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. average pooling consisting of computing the average on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following: ` output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1) ` or ` output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1) ` if ceil_mode is enabled

` * pad_shape[i] is sum of pads along axis i `

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following: ` VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i]) SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i]) ` And pad shape will be following if SAME_UPPER or SAME_LOWER: ` pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i] `

The output of each pooling window is divided by the number of elements (exclude pad when attribute count_include_pad is zero).

Input and output scales and zero points are used to convert the output to a new quantization range. Output = Dequantize(Input) -> AveragePool on fp32 data -> Quantize(output)

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is ?.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is ?.

  • channels_last: Works on NHWC layout or not? Default not. Default value is ?.

  • count_include_pad: Whether include pad pixels when calculating values for the edges. Default is 0, doesn’t count include pad. Default value is ?.

  • kernel_shape (required): The size of the kernel along each axis. Default value is ?.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis. Default value is ?.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis. Default value is ?.

Inputs

Between 4 and 5 inputs.

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • x_scale (heterogeneous) - tensor(float): Input scale. It’s a scalar, which means a per-tensor/layer quantization.

  • x_zero_point (optional, heterogeneous) - T: Input zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • y_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • y_zero_point (optional, heterogeneous) - T: Output zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

OnnxComMicrosoftQLinearAveragePool_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearAveragePool_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

QLinearAveragePool consumes an input tensor X and applies average pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. average pooling consisting of computing the average on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following: ` output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1) ` or ` output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1) ` if ceil_mode is enabled

` * pad_shape[i] is sum of pads along axis i `

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following: ` VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i]) SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i]) ` And pad shape will be following if SAME_UPPER or SAME_LOWER: ` pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i] `

The output of each pooling window is divided by the number of elements (exclude pad when attribute count_include_pad is zero).

Input and output scales and zero points are used to convert the output to a new quantization range. Output = Dequantize(Input) -> AveragePool on fp32 data -> Quantize(output)

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is ?.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is ?.

  • channels_last: Works on NHWC layout or not? Default not. Default value is ?.

  • count_include_pad: Whether include pad pixels when calculating values for the edges. Default is 0, doesn’t count include pad. Default value is ?.

  • kernel_shape (required): The size of the kernel along each axis. Default value is ?.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis. Default value is ?.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis. Default value is ?.

Inputs

Between 4 and 5 inputs.

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • x_scale (heterogeneous) - tensor(float): Input scale. It’s a scalar, which means a per-tensor/layer quantization.

  • x_zero_point (optional, heterogeneous) - T: Input zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • y_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • y_zero_point (optional, heterogeneous) - T: Output zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

OnnxComMicrosoftQLinearConcat#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearConcat(*args, **kwargs)#

Version

  • name: QLinearConcat (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Concatenate a list of tensors into a single tensor.All input tensors must have the same shape, except for the dimension size of the axis to concatenate on.

Attributes

  • axis (required): Which axis to concat on Default value is ?.

Inputs

Between 3 and 2147483647 inputs.

  • Y_scale (heterogeneous) - TF: Y’s scale.

  • Y_zero_point (heterogeneous) - T8: Y’s zero point.

  • inputs (variadic) - TV: List of tensors/scale/zero_point for concatenation

Outputs

  • Y (heterogeneous) - T8: Concatenated tensor

OnnxComMicrosoftQLinearConcat_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearConcat_1(*args, **kwargs)#

Version

  • name: QLinearConcat (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Concatenate a list of tensors into a single tensor.All input tensors must have the same shape, except for the dimension size of the axis to concatenate on.

Attributes

  • axis (required): Which axis to concat on Default value is ?.

Inputs

Between 3 and 2147483647 inputs.

  • Y_scale (heterogeneous) - TF: Y’s scale.

  • Y_zero_point (heterogeneous) - T8: Y’s zero point.

  • inputs (variadic) - TV: List of tensors/scale/zero_point for concatenation

Outputs

  • Y (heterogeneous) - T8: Concatenated tensor

OnnxComMicrosoftQLinearConv#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearConv(*args, **kwargs)#

Version

  • name: QLinearConv (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • auto_pad:

Default value is ?.

  • channels_last:

Default value is ?.

  • dilations:

Default value is ?.

  • group:

Default value is ?.

  • kernel_shape:

Default value is ?.

  • pads:

Default value is ?.

  • strides:

Default value is ?.

Inputs

Between 8 and 9 inputs.

  • x (heterogeneous) - T1:

  • x_scale (heterogeneous) - tensor(float):

  • x_zero_point (heterogeneous) - T1:

  • w (heterogeneous) - T2:

  • w_scale (heterogeneous) - tensor(float):

  • w_zero_point (heterogeneous) - T2:

  • y_scale (heterogeneous) - tensor(float):

  • y_zero_point (heterogeneous) - T3:

  • B (optional, heterogeneous) - T4:

Outputs

  • y (heterogeneous) - T3:

OnnxComMicrosoftQLinearConv_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearConv_1(*args, **kwargs)#

Version

  • name: QLinearConv (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • auto_pad:

Default value is ?.

  • channels_last:

Default value is ?.

  • dilations:

Default value is ?.

  • group:

Default value is ?.

  • kernel_shape:

Default value is ?.

  • pads:

Default value is ?.

  • strides:

Default value is ?.

Inputs

Between 8 and 9 inputs.

  • x (heterogeneous) - T1:

  • x_scale (heterogeneous) - tensor(float):

  • x_zero_point (heterogeneous) - T1:

  • w (heterogeneous) - T2:

  • w_scale (heterogeneous) - tensor(float):

  • w_zero_point (heterogeneous) - T2:

  • y_scale (heterogeneous) - tensor(float):

  • y_zero_point (heterogeneous) - T3:

  • B (optional, heterogeneous) - T4:

Outputs

  • y (heterogeneous) - T3:

OnnxComMicrosoftQLinearGlobalAveragePool#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearGlobalAveragePool(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

QLinearGlobalAveragePool consumes an input tensor X and applies Average pooling across the values in the same channel. This is equivalent to AveragePool with kernel size equal to the spatial dimension of input tensor. Input is of type uint8_t or int8_t.

Attributes

  • channels_last:

Default value is ?.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; According to channels_last, dimensions for image case are (N x C x H x W), or (N x H x W x C) where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), or (N x D1 X D2 … Dn x C) where N is the batch size.

  • x_scale (heterogeneous) - tensor(float): Scale of quantized input ‘X’. It must be a scalar.

  • x_zero_point (heterogeneous) - T: Zero point tensor for input ‘X’. It must be a scalar.

  • y_scale (heterogeneous) - tensor(float): Scale of quantized output ‘Y’. It must be a scalar.

  • y_zero_point (heterogeneous) - T: Zero point tensor for output ‘Y’. It must be a scalar.

Outputs

  • Y (heterogeneous) - T: Output data tensor from pooling across the input tensor. The output tensor has the same rank as the input. with the N and C value keep it value, while the otherdimensions are all 1.

OnnxComMicrosoftQLinearGlobalAveragePool_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearGlobalAveragePool_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

QLinearGlobalAveragePool consumes an input tensor X and applies Average pooling across the values in the same channel. This is equivalent to AveragePool with kernel size equal to the spatial dimension of input tensor. Input is of type uint8_t or int8_t.

Attributes

  • channels_last:

Default value is ?.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; According to channels_last, dimensions for image case are (N x C x H x W), or (N x H x W x C) where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), or (N x D1 X D2 … Dn x C) where N is the batch size.

  • x_scale (heterogeneous) - tensor(float): Scale of quantized input ‘X’. It must be a scalar.

  • x_zero_point (heterogeneous) - T: Zero point tensor for input ‘X’. It must be a scalar.

  • y_scale (heterogeneous) - tensor(float): Scale of quantized output ‘Y’. It must be a scalar.

  • y_zero_point (heterogeneous) - T: Zero point tensor for output ‘Y’. It must be a scalar.

Outputs

  • Y (heterogeneous) - T: Output data tensor from pooling across the input tensor. The output tensor has the same rank as the input. with the N and C value keep it value, while the otherdimensions are all 1.

OnnxComMicrosoftQLinearLeakyRelu#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearLeakyRelu(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

QLinearLeakyRelu takes quantized input data (Tensor), an argument alpha, and quantize parameter for output, and produces one output data (Tensor<T>) where the function f(x) = quantize(alpha * dequantize(x)) for dequantize(x) < 0, f(x) = quantize(dequantize(x)) for dequantize(x) >= 0, is applied to the data tensor elementwise.

Attributes

  • alpha: Coefficient of leakage. Default value is ?.

Inputs

Between 4 and 5 inputs.

  • X (heterogeneous) - T: Input tensor

  • X_scale (heterogeneous) - tensor(float): Input X’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • X_zero_point (optional, heterogeneous) - T: Input X’s zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • Y_scale (heterogeneous) - tensor(float): Output Y’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • Y_zero_point (optional, heterogeneous) - T: Output Y’s zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • Y (heterogeneous) - T: Output tensor

OnnxComMicrosoftQLinearLeakyRelu_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearLeakyRelu_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

QLinearLeakyRelu takes quantized input data (Tensor), an argument alpha, and quantize parameter for output, and produces one output data (Tensor<T>) where the function f(x) = quantize(alpha * dequantize(x)) for dequantize(x) < 0, f(x) = quantize(dequantize(x)) for dequantize(x) >= 0, is applied to the data tensor elementwise.

Attributes

  • alpha: Coefficient of leakage. Default value is ?.

Inputs

Between 4 and 5 inputs.

  • X (heterogeneous) - T: Input tensor

  • X_scale (heterogeneous) - tensor(float): Input X’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • X_zero_point (optional, heterogeneous) - T: Input X’s zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • Y_scale (heterogeneous) - tensor(float): Output Y’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • Y_zero_point (optional, heterogeneous) - T: Output Y’s zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • Y (heterogeneous) - T: Output tensor

OnnxComMicrosoftQLinearMul#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearMul(*args, **kwargs)#

Version

  • name: QLinearMul (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Performs element-wise binary multiplication on 8 bit data types (with Numpy-style broadcasting support).

C = ((A - A_zero_point) * (B - B_zero_point)) * (A_scale * B_scale)/C_scale + C_zero_point

Inputs

Between 7 and 8 inputs.

  • A (heterogeneous) - T: First operand.

  • A_scale (heterogeneous) - tensor(float): Input A’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • A_zero_point (optional, heterogeneous) - T: Input A zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • B (heterogeneous) - T: Second operand.

  • B_scale (heterogeneous) - tensor(float): Input B’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • B_zero_point (optional, heterogeneous) - T: Input B zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • C_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • C_zero_point (optional, heterogeneous) - T: Output zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

OnnxComMicrosoftQLinearMul_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearMul_1(*args, **kwargs)#

Version

  • name: QLinearMul (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Performs element-wise binary multiplication on 8 bit data types (with Numpy-style broadcasting support).

C = ((A - A_zero_point) * (B - B_zero_point)) * (A_scale * B_scale)/C_scale + C_zero_point

Inputs

Between 7 and 8 inputs.

  • A (heterogeneous) - T: First operand.

  • A_scale (heterogeneous) - tensor(float): Input A’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • A_zero_point (optional, heterogeneous) - T: Input A zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • B (heterogeneous) - T: Second operand.

  • B_scale (heterogeneous) - tensor(float): Input B’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • B_zero_point (optional, heterogeneous) - T: Input B zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • C_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • C_zero_point (optional, heterogeneous) - T: Output zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

OnnxComMicrosoftQLinearReduceMean#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearReduceMean(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Computes the mean of the low-precision input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulting tensor have the reduced dimension pruned. The above behavior is similar to numpy, with the exception that numpy default keepdims to False instead of True. Input and Output scales and zero points are used to requantize the output in a new range. This helps to improve accuracy as after ReduceMean operation the range of the output is expected to decrease.

"Output = Dequantize(Input) -> ReduceMean on fp32 data -> Quantize(output)",

Attributes

  • axes (required): A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Default value is ?.

  • keepdims (required): Keep the reduced dimension or not, default 1 mean keep reduced dimension. Default value is ?.

Inputs

Between 4 and 5 inputs.

  • data (heterogeneous) - T: An input tensor.

  • data_scale (heterogeneous) - tensor(float): Input scale. It’s a scalar, which means a per-tensor/layer quantization.

  • data_zero_point (optional, heterogeneous) - T: Input zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • reduced_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • reduced_zero_point (optional, heterogeneous) - T: Output zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

OnnxComMicrosoftQLinearReduceMean_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearReduceMean_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Computes the mean of the low-precision input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulting tensor have the reduced dimension pruned. The above behavior is similar to numpy, with the exception that numpy default keepdims to False instead of True. Input and Output scales and zero points are used to requantize the output in a new range. This helps to improve accuracy as after ReduceMean operation the range of the output is expected to decrease.

"Output = Dequantize(Input) -> ReduceMean on fp32 data -> Quantize(output)",

Attributes

  • axes (required): A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Default value is ?.

  • keepdims (required): Keep the reduced dimension or not, default 1 mean keep reduced dimension. Default value is ?.

Inputs

Between 4 and 5 inputs.

  • data (heterogeneous) - T: An input tensor.

  • data_scale (heterogeneous) - tensor(float): Input scale. It’s a scalar, which means a per-tensor/layer quantization.

  • data_zero_point (optional, heterogeneous) - T: Input zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • reduced_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • reduced_zero_point (optional, heterogeneous) - T: Output zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

OnnxComMicrosoftQLinearSigmoid#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearSigmoid(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

QLinearSigmoid takes quantized input data (Tensor), and quantize parameter for output, and produces one output data (Tensor<T>) where the function f(x) = quantize(Sigmoid(dequantize(x))), is applied to the data tensor elementwise. Wwhere the function Sigmoid(x) = 1 / (1 + exp(-x))

Inputs

Between 4 and 5 inputs.

  • X (heterogeneous) - T: Input tensor

  • X_scale (heterogeneous) - tensor(float): Input X’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • X_zero_point (optional, heterogeneous) - T: Input X’s zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • Y_scale (heterogeneous) - tensor(float): Output Y’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • Y_zero_point (optional, heterogeneous) - T: Output Y’s zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • Y (heterogeneous) - T: Output tensor

OnnxComMicrosoftQLinearSigmoid_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQLinearSigmoid_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

QLinearSigmoid takes quantized input data (Tensor), and quantize parameter for output, and produces one output data (Tensor<T>) where the function f(x) = quantize(Sigmoid(dequantize(x))), is applied to the data tensor elementwise. Wwhere the function Sigmoid(x) = 1 / (1 + exp(-x))

Inputs

Between 4 and 5 inputs.

  • X (heterogeneous) - T: Input tensor

  • X_scale (heterogeneous) - tensor(float): Input X’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • X_zero_point (optional, heterogeneous) - T: Input X’s zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

  • Y_scale (heterogeneous) - tensor(float): Output Y’s scale. It’s a scalar, which means a per-tensor/layer quantization.

  • Y_zero_point (optional, heterogeneous) - T: Output Y’s zero point. Default value is 0 if it’s not specified. It’s a scalar, which means a per-tensor/layer quantization.

Outputs

  • Y (heterogeneous) - T: Output tensor

OnnxComMicrosoftQuantizeLinear#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQuantizeLinear(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The linear quantization operator. It consumes a full precision data, a scale, a zero point to compute the low precision / quantized tensor. The quantization formula is y = saturate ((x / y_scale) + y_zero_point).For saturation, it saturates to [0, 255] if it’s uint8, or [-128, 127] if it’s int8. For (x / y_scale), it’s rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. Scale and zero point must have same shape. They must be either scalar (per tensor) or 1-D tensor (per ‘axis’).

Attributes

  • axis: The axis along which same quantization parameters are applied. It’s optional.If it’s not specified, it means per-tensor quantization and input ‘x_scale’ and ‘x_zero_point’ must be scalars.If it’s specified, it means per ‘axis’ quantization and input ‘x_scale’ and ‘x_zero_point’ must be 1-D tensors. Default value is ?.

Inputs

  • x (heterogeneous) - T1: N-D full precision Input tensor to be quantized.

  • y_scale (heterogeneous) - T1: Scale for doing quantization to get ‘y’. It could be a scalar or a 1-D tensor,which means a per-tensor or per-axis quantization. If it’s a 1-D tensor, its number of elements should be equal to the dimension value of ‘axis’ dimension of input ‘x’.

  • y_zero_point (heterogeneous) - T2: Zero point for doing quantization to get ‘y’. It could be a scalar or a 1-D tensor, which means a per-tensoror per-axis quantization. If it’s a 1-D tensor, its number of elements should be equal to the dimension value of ‘axis’ dimension of input ‘x’.

Outputs

  • y (heterogeneous) - T2: N-D quantized output tensor. It has same shape as input ‘x’.

OnnxComMicrosoftQuantizeLinear_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftQuantizeLinear_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The linear quantization operator. It consumes a full precision data, a scale, a zero point to compute the low precision / quantized tensor. The quantization formula is y = saturate ((x / y_scale) + y_zero_point).For saturation, it saturates to [0, 255] if it’s uint8, or [-128, 127] if it’s int8. For (x / y_scale), it’s rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. Scale and zero point must have same shape. They must be either scalar (per tensor) or 1-D tensor (per ‘axis’).

Attributes

  • axis: The axis along which same quantization parameters are applied. It’s optional.If it’s not specified, it means per-tensor quantization and input ‘x_scale’ and ‘x_zero_point’ must be scalars.If it’s specified, it means per ‘axis’ quantization and input ‘x_scale’ and ‘x_zero_point’ must be 1-D tensors. Default value is ?.

Inputs

  • x (heterogeneous) - T1: N-D full precision Input tensor to be quantized.

  • y_scale (heterogeneous) - T1: Scale for doing quantization to get ‘y’. It could be a scalar or a 1-D tensor,which means a per-tensor or per-axis quantization. If it’s a 1-D tensor, its number of elements should be equal to the dimension value of ‘axis’ dimension of input ‘x’.

  • y_zero_point (heterogeneous) - T2: Zero point for doing quantization to get ‘y’. It could be a scalar or a 1-D tensor, which means a per-tensoror per-axis quantization. If it’s a 1-D tensor, its number of elements should be equal to the dimension value of ‘axis’ dimension of input ‘x’.

Outputs

  • y (heterogeneous) - T2: N-D quantized output tensor. It has same shape as input ‘x’.

OnnxComMicrosoftRange#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftRange(*args, **kwargs)#

Version

  • name: Range (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Creates a sequence of numbers that begins at start and extends by increments of delta up to but not including limit.

Inputs

Between 2 and 3 inputs.

  • start (heterogeneous) - T: Tensor(scalar, or dims=[1]). First entry in the range.

  • limit (heterogeneous) - T: Tensor(scalar, or dims=[1]). Upper limit of sequence, exclusive.

  • delta (optional, heterogeneous) - T: Tensor(scalar, or dims=[1]). Number that increments start. Defaults to 1.

Outputs

  • Y (heterogeneous) - T: 1-D Tensor of the range.

OnnxComMicrosoftRange_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftRange_1(*args, **kwargs)#

Version

  • name: Range (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Creates a sequence of numbers that begins at start and extends by increments of delta up to but not including limit.

Inputs

Between 2 and 3 inputs.

  • start (heterogeneous) - T: Tensor(scalar, or dims=[1]). First entry in the range.

  • limit (heterogeneous) - T: Tensor(scalar, or dims=[1]). Upper limit of sequence, exclusive.

  • delta (optional, heterogeneous) - T: Tensor(scalar, or dims=[1]). Number that increments start. Defaults to 1.

Outputs

  • Y (heterogeneous) - T: 1-D Tensor of the range.

OnnxComMicrosoftRecordEvent#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftRecordEvent(*args, **kwargs)#

Version

  • name: RecordEvent (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Record an event.

Inputs

Between 2 and 2147483647 inputs.

  • EventIdentifier (heterogeneous) - TInt64: Event identifier to record.

  • InputData (variadic) - T: Input data.

Outputs

Between 0 and 2147483647 outputs.

  • OutputData (variadic) - T: Output data.

OnnxComMicrosoftRecordEvent_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftRecordEvent_1(*args, **kwargs)#

Version

  • name: RecordEvent (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Record an event.

Inputs

Between 2 and 2147483647 inputs.

  • EventIdentifier (heterogeneous) - TInt64: Event identifier to record.

  • InputData (variadic) - T: Input data.

Outputs

Between 0 and 2147483647 outputs.

  • OutputData (variadic) - T: Output data.

OnnxComMicrosoftRecv#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftRecv(*args, **kwargs)#

Version

  • name: Recv (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Receive a tensor from the the specified source.

Attributes

  • element_types (required): Element types of the received tensors. Default value is ?.

  • tag (required): The tag of the message carrying Data. Default value is ?.

Inputs

  • InputSignal (heterogeneous) - TBool: Input control signal. It must be a scalar.

  • Remote (heterogeneous) - TInt64: Remote src rank. It must be a scalar.

Outputs

Between 2 and 2147483647 outputs.

  • OutputSignal (heterogeneous) - TBool: Output control signal. It must be a scalar.

  • Data (variadic) - V: The Received tensors.

OnnxComMicrosoftRecv_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftRecv_1(*args, **kwargs)#

Version

  • name: Recv (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Receive a tensor from the the specified source.

Attributes

  • element_types (required): Element types of the received tensors. Default value is ?.

  • tag (required): The tag of the message carrying Data. Default value is ?.

Inputs

  • InputSignal (heterogeneous) - TBool: Input control signal. It must be a scalar.

  • Remote (heterogeneous) - TInt64: Remote src rank. It must be a scalar.

Outputs

Between 2 and 2147483647 outputs.

  • OutputSignal (heterogeneous) - TBool: Output control signal. It must be a scalar.

  • Data (variadic) - V: The Received tensors.

OnnxComMicrosoftReduceAllL2#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftReduceAllL2(*args, **kwargs)#

Version

  • name: ReduceAllL2 (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Multi-tensor version of ReduceL2.

Inputs

Between 1 and 2147483647 inputs.

  • X (variadic, heterogeneous) - TIn: inputs

Outputs

  • Y (heterogeneous) - TOut: output

OnnxComMicrosoftReduceAllL2_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftReduceAllL2_1(*args, **kwargs)#

Version

  • name: ReduceAllL2 (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Multi-tensor version of ReduceL2.

Inputs

Between 1 and 2147483647 inputs.

  • X (variadic, heterogeneous) - TIn: inputs

Outputs

  • Y (heterogeneous) - TOut: output

OnnxComMicrosoftReduceSumInteger#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftReduceSumInteger(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Computes the sum of the low-precision input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulting tensor have the reduced dimension pruned. The above behavior is similar to numpy, with the exception that numpy default keepdims to False instead of True.

Attributes

  • axes (required): A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Default value is ?.

  • keepdims (required): Keep the reduced dimension or not, default 1 mean keep reduced dimension. Default value is ?.

Inputs

  • data (heterogeneous) - T1: An input tensor.

Outputs

  • reduced (heterogeneous) - T2: Reduced output tensor.

OnnxComMicrosoftReduceSumInteger_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftReduceSumInteger_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Computes the sum of the low-precision input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equal 1. If keepdims equal 0, then the resulting tensor have the reduced dimension pruned. The above behavior is similar to numpy, with the exception that numpy default keepdims to False instead of True.

Attributes

  • axes (required): A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Default value is ?.

  • keepdims (required): Keep the reduced dimension or not, default 1 mean keep reduced dimension. Default value is ?.

Inputs

  • data (heterogeneous) - T1: An input tensor.

Outputs

  • reduced (heterogeneous) - T2: Reduced output tensor.

OnnxComMicrosoftReduceSumTraining#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftReduceSumTraining(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

ReduceSumTraining

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 mean keep reduced dimension. Default value is ?.

  • noop_with_empty_axes: Perform reduction or not when axes is empty, default false mean perform reduction.when axes is empty and this attribute is set to true, input tensor will not be reduced,thus output tensor would be equivalent to input tensor. Default value is ?.

Inputs

  • data (heterogeneous) - T: An input tensor.

  • axes (heterogeneous) - tensor(int64): A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

OnnxComMicrosoftReduceSumTraining_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftReduceSumTraining_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

ReduceSumTraining

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 mean keep reduced dimension. Default value is ?.

  • noop_with_empty_axes: Perform reduction or not when axes is empty, default false mean perform reduction.when axes is empty and this attribute is set to true, input tensor will not be reduced,thus output tensor would be equivalent to input tensor. Default value is ?.

Inputs

  • data (heterogeneous) - T: An input tensor.

  • axes (heterogeneous) - tensor(int64): A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

OnnxComMicrosoftReluGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftReluGrad(*args, **kwargs)#

Version

  • name: ReluGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • X (heterogeneous) - T: Input tensor

Outputs

  • dX (heterogeneous) - T: Gradient of input X

OnnxComMicrosoftReluGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftReluGrad_1(*args, **kwargs)#

Version

  • name: ReluGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • X (heterogeneous) - T: Input tensor

Outputs

  • dX (heterogeneous) - T: Gradient of input X

OnnxComMicrosoftRfft#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftRfft(*args, **kwargs)#

Version

  • name: Rfft (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • normalized:

Default value is ?.

  • onesided:

Default value is ?.

  • signal_ndim:

Default value is ?.

Inputs

  • X (heterogeneous) - T: input tensor

Outputs

  • Y (heterogeneous) - T: output tensor

OnnxComMicrosoftRfft_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftRfft_1(*args, **kwargs)#

Version

  • name: Rfft (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • normalized:

Default value is ?.

  • onesided:

Default value is ?.

  • signal_ndim:

Default value is ?.

Inputs

  • X (heterogeneous) - T: input tensor

Outputs

  • Y (heterogeneous) - T: output tensor

OnnxComMicrosoftSGDOptimizer#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSGDOptimizer(*args, **kwargs)#

Version

  • name: SGDOptimizer (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • ETA (heterogeneous) - L: Learning Rate

  • W (heterogeneous) - T: Original weight(s)

  • G (heterogeneous) - T: Gradient of Weight(s)

Outputs

Between 0 and 2 outputs.

  • NW (optional, heterogeneous) - T: Updated weight(s)

  • NG (optional, heterogeneous) - T: Updated gradients(s)

OnnxComMicrosoftSGDOptimizer_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSGDOptimizer_1(*args, **kwargs)#

Version

  • name: SGDOptimizer (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

  • ETA (heterogeneous) - L: Learning Rate

  • W (heterogeneous) - T: Original weight(s)

  • G (heterogeneous) - T: Gradient of Weight(s)

Outputs

Between 0 and 2 outputs.

  • NW (optional, heterogeneous) - T: Updated weight(s)

  • NG (optional, heterogeneous) - T: Updated gradients(s)

OnnxComMicrosoftSampleOp#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSampleOp(*args, **kwargs)#

Version

  • name: SampleOp (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Sample echo operator.

Inputs

  • X (heterogeneous) - T: input

Outputs

  • Y (heterogeneous) - T: output

OnnxComMicrosoftSampleOp_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSampleOp_1(*args, **kwargs)#

Version

  • name: SampleOp (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Sample echo operator.

Inputs

  • X (heterogeneous) - T: input

Outputs

  • Y (heterogeneous) - T: output

OnnxComMicrosoftScale#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftScale(*args, **kwargs)#

Version

  • name: Scale (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Scale

Attributes

  • scale_down: If true, the output tensor is input tensor devided by scale, otherwise, it’s input tensor multiplied by scale. The default value is false. Default value is ?.

Inputs

  • input (heterogeneous) - T: Input tensor.

  • scale (heterogeneous) - ScaleT: Scale scalar tensor.

Outputs

  • output (heterogeneous) - T: The scaled output tensor.

OnnxComMicrosoftScale_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftScale_1(*args, **kwargs)#

Version

  • name: Scale (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Scale

Attributes

  • scale_down: If true, the output tensor is input tensor devided by scale, otherwise, it’s input tensor multiplied by scale. The default value is false. Default value is ?.

Inputs

  • input (heterogeneous) - T: Input tensor.

  • scale (heterogeneous) - ScaleT: Scale scalar tensor.

Outputs

  • output (heterogeneous) - T: The scaled output tensor.

OnnxComMicrosoftSend#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSend(*args, **kwargs)#

Version

  • name: Send (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Send data tensor to the specified destination.

Attributes

  • element_types (required): Element types of the sent tensors. Default value is ?.

  • tag (required): The tag of the message carrying Data. Default value is ?.

Inputs

Between 3 and 2147483647 inputs.

  • InputSignal (heterogeneous) - TBool: Input control signal. It must be a scalar.

  • Remote (heterogeneous) - TInt64: Remote dst rank. It must be a scalar.

  • Data (variadic) - V: Tensors to send.

Outputs

  • OutputSignal (heterogeneous) - TBool: Output control signal. It must be a scalar.

OnnxComMicrosoftSend_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSend_1(*args, **kwargs)#

Version

  • name: Send (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Send data tensor to the specified destination.

Attributes

  • element_types (required): Element types of the sent tensors. Default value is ?.

  • tag (required): The tag of the message carrying Data. Default value is ?.

Inputs

Between 3 and 2147483647 inputs.

  • InputSignal (heterogeneous) - TBool: Input control signal. It must be a scalar.

  • Remote (heterogeneous) - TInt64: Remote dst rank. It must be a scalar.

  • Data (variadic) - V: Tensors to send.

Outputs

  • OutputSignal (heterogeneous) - TBool: Output control signal. It must be a scalar.

OnnxComMicrosoftSigmoidGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSigmoidGrad(*args, **kwargs)#

Version

  • name: SigmoidGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SigmoidGrad

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • Y (heterogeneous) - T: The input tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftSigmoidGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSigmoidGrad_1(*args, **kwargs)#

Version

  • name: SigmoidGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SigmoidGrad

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • Y (heterogeneous) - T: The input tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftSimplifiedLayerNormalizationGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSimplifiedLayerNormalizationGrad(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SimplifiedLayerNormalizationGrad

Attributes

  • axis: The first normalization dimension: normalization will be performed along dimensions axis : rank(inputs). Default value is ?.

Inputs

  • Y_grad (heterogeneous) - V: The gradient tensor from output.

  • X (heterogeneous) - T: Input data tensor from the forward path

  • scale (heterogeneous) - V: Scale tensor.

  • inv_std_var (heterogeneous) - U: inverse std variance of X.

Outputs

  • X_grad (heterogeneous) - T: Gradient of the input.

  • scale_grad (heterogeneous) - V: Gradient of the scale.

OnnxComMicrosoftSimplifiedLayerNormalizationGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSimplifiedLayerNormalizationGrad_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SimplifiedLayerNormalizationGrad

Attributes

  • axis: The first normalization dimension: normalization will be performed along dimensions axis : rank(inputs). Default value is ?.

Inputs

  • Y_grad (heterogeneous) - V: The gradient tensor from output.

  • X (heterogeneous) - T: Input data tensor from the forward path

  • scale (heterogeneous) - V: Scale tensor.

  • inv_std_var (heterogeneous) - U: inverse std variance of X.

Outputs

  • X_grad (heterogeneous) - T: Gradient of the input.

  • scale_grad (heterogeneous) - V: Gradient of the scale.

OnnxComMicrosoftSkipLayerNormalization#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSkipLayerNormalization(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Skip and Layer Normalization Fusion

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is ?.

Inputs

Between 3 and 5 inputs.

  • input (heterogeneous) - T: 3D input tensor with shape (batch_size, sequence_length, hidden_size)

  • skip (heterogeneous) - T: 3D skip tensor with shape (batch_size, sequence_length, hidden_size)

  • gamma (heterogeneous) - T: 1D input tensor with shape (hidden_size)

  • beta (optional, heterogeneous) - T: 1D skip tensor with shape (hidden_size

  • bias (optional, heterogeneous) - T: 1D bias tensor with shape (hidden_size

Outputs

Between 1 and 3 outputs.

  • output (heterogeneous) - T: 3D output tensor with shape (batch_size, sequence_length, hidden_size)

  • mean (optional, heterogeneous) - U: Saved mean used during training to speed up gradient computation

  • inv_std_var (optional, heterogeneous) - U: Saved inverse standard variance used during training to speed up gradient computation.

OnnxComMicrosoftSkipLayerNormalization_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSkipLayerNormalization_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Skip and Layer Normalization Fusion

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is ?.

Inputs

Between 3 and 5 inputs.

  • input (heterogeneous) - T: 3D input tensor with shape (batch_size, sequence_length, hidden_size)

  • skip (heterogeneous) - T: 3D skip tensor with shape (batch_size, sequence_length, hidden_size)

  • gamma (heterogeneous) - T: 1D input tensor with shape (hidden_size)

  • beta (optional, heterogeneous) - T: 1D skip tensor with shape (hidden_size

  • bias (optional, heterogeneous) - T: 1D bias tensor with shape (hidden_size

Outputs

Between 1 and 3 outputs.

  • output (heterogeneous) - T: 3D output tensor with shape (batch_size, sequence_length, hidden_size)

  • mean (optional, heterogeneous) - U: Saved mean used during training to speed up gradient computation

  • inv_std_var (optional, heterogeneous) - U: Saved inverse standard variance used during training to speed up gradient computation.

OnnxComMicrosoftSliceGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSliceGrad(*args, **kwargs)#

Version

  • name: SliceGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

Between 4 and 6 inputs.

  • dY (heterogeneous) - T: Gradient of output

  • shape (heterogeneous) - I: Shape of the Slice input X.

  • starts (heterogeneous) - Tind: Tensor of starting indices of corresponding axis in axes

  • ends (heterogeneous) - Tind: Tensor of starting indices of corresponding axis in ‘axes’

  • axes (optional, heterogeneous) - Tind: Tensor of axes that starts and ends apply to

  • steps (optional, heterogeneous) - Tind: Tensor of slice step of corresponding axis in axes

Outputs

  • dX (heterogeneous) - T: Gradient of input

OnnxComMicrosoftSliceGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSliceGrad_1(*args, **kwargs)#

Version

  • name: SliceGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Inputs

Between 4 and 6 inputs.

  • dY (heterogeneous) - T: Gradient of output

  • shape (heterogeneous) - I: Shape of the Slice input X.

  • starts (heterogeneous) - Tind: Tensor of starting indices of corresponding axis in axes

  • ends (heterogeneous) - Tind: Tensor of starting indices of corresponding axis in ‘axes’

  • axes (optional, heterogeneous) - Tind: Tensor of axes that starts and ends apply to

  • steps (optional, heterogeneous) - Tind: Tensor of slice step of corresponding axis in axes

Outputs

  • dX (heterogeneous) - T: Gradient of input

OnnxComMicrosoftSnpe#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSnpe(*args, **kwargs)#

Version

  • name: Snpe (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Onnx node for SNPE.

Attributes

  • DLC (required): payload of the SNPE DLC file. Default value is ?.

  • notes: (Optional) Some notes for the model Default value is ?.

  • snpe_version: (Optional) SNPE version used to convert the model. Default value is ?.

  • target_device: (Optional) Target device like CPU, DSP, etc. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic, heterogeneous) - T: List of tensors for SNPE DLC input

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic, heterogeneous) - T: One or more outputs, list of tensors for DLC output

OnnxComMicrosoftSnpe_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSnpe_1(*args, **kwargs)#

Version

  • name: Snpe (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Onnx node for SNPE.

Attributes

  • DLC (required): payload of the SNPE DLC file. Default value is ?.

  • notes: (Optional) Some notes for the model Default value is ?.

  • snpe_version: (Optional) SNPE version used to convert the model. Default value is ?.

  • target_device: (Optional) Target device like CPU, DSP, etc. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic, heterogeneous) - T: List of tensors for SNPE DLC input

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic, heterogeneous) - T: One or more outputs, list of tensors for DLC output

OnnxComMicrosoftSoftmaxCrossEntropy#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxCrossEntropy(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SoftmaxCrossEntropy

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

  • logits (heterogeneous) - T: Unscaled log probabilities, N-D input of shape (-1, num_classes).

  • label (heterogeneous) - T: The onehot label is N-D input with the same shape as logits.

Outputs

Between 1 and 2 outputs.

  • Y (heterogeneous) - T: loss.

  • log_prob (optional, heterogeneous) - T: logsoftmax(logits)

OnnxComMicrosoftSoftmaxCrossEntropyGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxCrossEntropyGrad(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SoftmaxCrossEntropyGrad

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

  • dY (heterogeneous) - T: gradient of Y

  • log_prob (heterogeneous) - T: logsoftmax(logits), N-D input of shape (-1, num_classes).

  • label (heterogeneous) - T: The onehot label is N-D input with the same shape as logits.

Outputs

  • d_logits (heterogeneous) - T: gradient of logits

OnnxComMicrosoftSoftmaxCrossEntropyGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxCrossEntropyGrad_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SoftmaxCrossEntropyGrad

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

  • dY (heterogeneous) - T: gradient of Y

  • log_prob (heterogeneous) - T: logsoftmax(logits), N-D input of shape (-1, num_classes).

  • label (heterogeneous) - T: The onehot label is N-D input with the same shape as logits.

Outputs

  • d_logits (heterogeneous) - T: gradient of logits

OnnxComMicrosoftSoftmaxCrossEntropyLossGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxCrossEntropyLossGrad(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SoftmaxCrossEntropyLossGrad

Attributes

  • ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. Default value is ?.

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

Between 3 and 4 inputs.

  • dY (heterogeneous) - T: gradient of Y

  • log_prob (heterogeneous) - T: logsoftmax(logits), (N+1)-D input of shape (batch_size).

  • label (heterogeneous) - Tind: label is N-D input whose shape should match that of logits. It is a tensor of nonnegative integers, where each element is the nonnegative integer label for the element of the batch.

  • weight (optional, heterogeneous) - T: weight for each sample. The shape is 1-D tensor.

Outputs

  • d_logits (heterogeneous) - T: gradient of logits

OnnxComMicrosoftSoftmaxCrossEntropyLossGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxCrossEntropyLossGrad_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SoftmaxCrossEntropyLossGrad

Attributes

  • ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. Default value is ?.

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

Between 3 and 4 inputs.

  • dY (heterogeneous) - T: gradient of Y

  • log_prob (heterogeneous) - T: logsoftmax(logits), (N+1)-D input of shape (batch_size).

  • label (heterogeneous) - Tind: label is N-D input whose shape should match that of logits. It is a tensor of nonnegative integers, where each element is the nonnegative integer label for the element of the batch.

  • weight (optional, heterogeneous) - T: weight for each sample. The shape is 1-D tensor.

Outputs

  • d_logits (heterogeneous) - T: gradient of logits

OnnxComMicrosoftSoftmaxCrossEntropyLossInternal#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxCrossEntropyLossInternal(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SoftmaxCrossEntropyLossInternal

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

Between 2 and 4 inputs.

  • scores (heterogeneous) - T: The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , …, Dk], where K is the number of dimensions.

  • labels (heterogeneous) - Tind: The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, …, Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.

  • weights (optional, heterogeneous) - T: A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.

  • ignore_index (optional, heterogeneous) - I: Scalar tensor to specify a target value that is ignored and does not contribute to the input gradient.

Outputs

  • output (heterogeneous) - T: Weighted loss float Tensor. If reduction is ‘none’, this has the shape of [batch_size], or [batch_size, D1, D2, …, Dk] in case of K-dimensional loss. Otherwise, it is a scalar.

  • log_prob (heterogeneous) - T: Log probability tensor. If the output of softmax is prob, its value is log(prob).

OnnxComMicrosoftSoftmaxCrossEntropyLossInternalGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxCrossEntropyLossInternalGrad(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SoftmaxCrossEntropyLossInternalGrad

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

Between 3 and 5 inputs.

  • dY (heterogeneous) - T: gradient of Y

  • log_prob (heterogeneous) - T: logsoftmax(logits), (N+1)-D input of shape (batch_size).

  • label (heterogeneous) - Tind: label is N-D input whose shape should match that of logits. It is a tensor of nonnegative integers, where each element is the nonnegative integer label for the element of the batch.

  • weight (optional, heterogeneous) - T: weight for each sample. The shape is 1-D tensor.

  • ignore_index (optional, heterogeneous) - I: Scalar tensor to specify a target value that is ignored and does not contribute to the input gradient.

Outputs

  • d_logits (heterogeneous) - T: gradient of logits

OnnxComMicrosoftSoftmaxCrossEntropyLossInternalGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxCrossEntropyLossInternalGrad_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SoftmaxCrossEntropyLossInternalGrad

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

Between 3 and 5 inputs.

  • dY (heterogeneous) - T: gradient of Y

  • log_prob (heterogeneous) - T: logsoftmax(logits), (N+1)-D input of shape (batch_size).

  • label (heterogeneous) - Tind: label is N-D input whose shape should match that of logits. It is a tensor of nonnegative integers, where each element is the nonnegative integer label for the element of the batch.

  • weight (optional, heterogeneous) - T: weight for each sample. The shape is 1-D tensor.

  • ignore_index (optional, heterogeneous) - I: Scalar tensor to specify a target value that is ignored and does not contribute to the input gradient.

Outputs

  • d_logits (heterogeneous) - T: gradient of logits

OnnxComMicrosoftSoftmaxCrossEntropyLossInternal_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxCrossEntropyLossInternal_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SoftmaxCrossEntropyLossInternal

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

Between 2 and 4 inputs.

  • scores (heterogeneous) - T: The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , …, Dk], where K is the number of dimensions.

  • labels (heterogeneous) - Tind: The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, …, Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.

  • weights (optional, heterogeneous) - T: A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.

  • ignore_index (optional, heterogeneous) - I: Scalar tensor to specify a target value that is ignored and does not contribute to the input gradient.

Outputs

  • output (heterogeneous) - T: Weighted loss float Tensor. If reduction is ‘none’, this has the shape of [batch_size], or [batch_size, D1, D2, …, Dk] in case of K-dimensional loss. Otherwise, it is a scalar.

  • log_prob (heterogeneous) - T: Log probability tensor. If the output of softmax is prob, its value is log(prob).

OnnxComMicrosoftSoftmaxCrossEntropy_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxCrossEntropy_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SoftmaxCrossEntropy

Attributes

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is ?.

Inputs

  • logits (heterogeneous) - T: Unscaled log probabilities, N-D input of shape (-1, num_classes).

  • label (heterogeneous) - T: The onehot label is N-D input with the same shape as logits.

Outputs

Between 1 and 2 outputs.

  • Y (heterogeneous) - T: loss.

  • log_prob (optional, heterogeneous) - T: logsoftmax(logits)

OnnxComMicrosoftSoftmaxGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxGrad(*args, **kwargs)#

Version

  • name: SoftmaxGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size Default value is ?.

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • Y (heterogeneous) - T: Input tensor

Outputs

  • dX (heterogeneous) - T: Gradient of input X

OnnxComMicrosoftSoftmaxGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxGrad_1(*args, **kwargs)#

Version

  • name: SoftmaxGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size Default value is ?.

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • Y (heterogeneous) - T: Input tensor

Outputs

  • dX (heterogeneous) - T: Gradient of input X

OnnxComMicrosoftSoftmaxGrad_13#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxGrad_13(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • axis: Describes the dimension Softmax will be performed on.Defaults to -1. Negative value means counting dimensions from the back. Default value is ?.

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • Y (heterogeneous) - T: Input tensor

Outputs

  • dX (heterogeneous) - T: Gradient of input X

OnnxComMicrosoftSoftmaxGrad_13_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSoftmaxGrad_13_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • axis: Describes the dimension Softmax will be performed on.Defaults to -1. Negative value means counting dimensions from the back. Default value is ?.

Inputs

  • dY (heterogeneous) - T: Gradient of output Y

  • Y (heterogeneous) - T: Input tensor

Outputs

  • dX (heterogeneous) - T: Gradient of input X

OnnxComMicrosoftSparseToDenseMatMul#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSparseToDenseMatMul(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • alpha: Scalar multiplier for the product of the input tensors. Default value is ?.

  • transA: Whether A should be transposed on the last two dimensions before doing multiplication Default value is ?.

  • transB: Whether B should be transposed on the last two dimensions before doing multiplication Default value is ?.

Inputs

  • A (heterogeneous) - T: 2-dimensional sparse matrix A. Either COO or CSR format

  • B (heterogeneous) - T1: N-dimensional dense matrix B

Outputs

  • Y (heterogeneous) - T1: Matrix multiply results

OnnxComMicrosoftSparseToDenseMatMul_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSparseToDenseMatMul_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Attributes

  • alpha: Scalar multiplier for the product of the input tensors. Default value is ?.

  • transA: Whether A should be transposed on the last two dimensions before doing multiplication Default value is ?.

  • transB: Whether B should be transposed on the last two dimensions before doing multiplication Default value is ?.

Inputs

  • A (heterogeneous) - T: 2-dimensional sparse matrix A. Either COO or CSR format

  • B (heterogeneous) - T1: N-dimensional dense matrix B

Outputs

  • Y (heterogeneous) - T1: Matrix multiply results

OnnxComMicrosoftSplitTraining#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSplitTraining(*args, **kwargs)#

Version

  • name: SplitTraining (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SplitTraining

Attributes

  • axis: Which axis to split on. A negative value means counting dimensions from the back. Accepted range is [-rank, rank-1] where r = rank(input). Default value is ?.

Inputs

  • input (heterogeneous) - T: The tensor to split

  • split (heterogeneous) - tensor(int64): length of each output

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic, heterogeneous) - T: One or more outputs forming list of tensors after splitting

OnnxComMicrosoftSplitTraining_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSplitTraining_1(*args, **kwargs)#

Version

  • name: SplitTraining (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SplitTraining

Attributes

  • axis: Which axis to split on. A negative value means counting dimensions from the back. Accepted range is [-rank, rank-1] where r = rank(input). Default value is ?.

Inputs

  • input (heterogeneous) - T: The tensor to split

  • split (heterogeneous) - tensor(int64): length of each output

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic, heterogeneous) - T: One or more outputs forming list of tensors after splitting

OnnxComMicrosoftSummaryHistogram#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSummaryHistogram(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SummaryHistogram

Attributes

  • tag (required): The tag corresponding to the histogram data. Default value is ?.

Inputs

  • input (heterogeneous) - T: The scalar tensor to produce a histogram over.

Outputs

  • summary (heterogeneous) - S: The serialized Tensorboard Summary.

OnnxComMicrosoftSummaryHistogram_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSummaryHistogram_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SummaryHistogram

Attributes

  • tag (required): The tag corresponding to the histogram data. Default value is ?.

Inputs

  • input (heterogeneous) - T: The scalar tensor to produce a histogram over.

Outputs

  • summary (heterogeneous) - S: The serialized Tensorboard Summary.

OnnxComMicrosoftSummaryMerge#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSummaryMerge(*args, **kwargs)#

Version

  • name: SummaryMerge (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SummaryMerge

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - S: One or more serialized Tensorboard Summary tensors to merge into a single Summary.

Outputs

  • summary (heterogeneous) - S: The serialized Tensorboard Summary.

OnnxComMicrosoftSummaryMerge_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSummaryMerge_1(*args, **kwargs)#

Version

  • name: SummaryMerge (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SummaryMerge

Inputs

Between 1 and 2147483647 inputs.

  • input (variadic, heterogeneous) - S: One or more serialized Tensorboard Summary tensors to merge into a single Summary.

Outputs

  • summary (heterogeneous) - S: The serialized Tensorboard Summary.

OnnxComMicrosoftSummaryScalar#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSummaryScalar(*args, **kwargs)#

Version

  • name: SummaryScalar (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SummaryScalar

Attributes

  • tags (required): The tags corresponding to each input scalar. Default value is ?.

Inputs

  • input (heterogeneous) - T: The scalar tensor to summarize as simple values.

Outputs

  • summary (heterogeneous) - S: The serialized Tensorboard Summary.

OnnxComMicrosoftSummaryScalar_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSummaryScalar_1(*args, **kwargs)#

Version

  • name: SummaryScalar (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SummaryScalar

Attributes

  • tags (required): The tags corresponding to each input scalar. Default value is ?.

Inputs

  • input (heterogeneous) - T: The scalar tensor to summarize as simple values.

Outputs

  • summary (heterogeneous) - S: The serialized Tensorboard Summary.

OnnxComMicrosoftSummaryText#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSummaryText(*args, **kwargs)#

Version

  • name: SummaryText (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SummaryText

Attributes

  • tag (required): The tag corresponding to the text data. Default value is ?.

Inputs

  • input (heterogeneous) - S: The string tensor to render in the Tensorboard Text dashboard.

Outputs

  • summary (heterogeneous) - S: The serialized Tensorboard Summary.

OnnxComMicrosoftSummaryText_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftSummaryText_1(*args, **kwargs)#

Version

  • name: SummaryText (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

SummaryText

Attributes

  • tag (required): The tag corresponding to the text data. Default value is ?.

Inputs

  • input (heterogeneous) - S: The string tensor to render in the Tensorboard Text dashboard.

Outputs

  • summary (heterogeneous) - S: The serialized Tensorboard Summary.

OnnxComMicrosoftTanhGrad#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftTanhGrad(*args, **kwargs)#

Version

  • name: TanhGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

TanhGrad

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • Y (heterogeneous) - T: The input tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftTanhGrad_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftTanhGrad_1(*args, **kwargs)#

Version

  • name: TanhGrad (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

TanhGrad

Inputs

  • dY (heterogeneous) - T: The gradient tensor from output.

  • Y (heterogeneous) - T: The input tensor.

Outputs

  • dX (heterogeneous) - T: Gradient of the input.

OnnxComMicrosoftTokenizer#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftTokenizer(*args, **kwargs)#

Version

  • name: Tokenizer (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Tokenizer divides each string in X into a vector of strings along the last axis. Allowed input shapes are [C] and [N, C]. If the maximum number of tokens found per input string is D, the output shape would be [N, C, D] when input shape is [N, C]. Similarly, if input shape is [C] then the output should be [C, D]. Tokenizer has two different operation modes. The first mode is selected when “tokenexp” is not set and “separators” is set. If “tokenexp” is set and “separators” is not set, the second mode will be used. The first mode breaks each input string into tokens by matching and removing separators. “separators” is a list of strings which are regular expressions. “tokenexp” is a single regular expression. Let’s assume “separators” is [” “] and consider an example. If input is [“Hello World”, “I love computer science !”] whose shape is [2], then the output would be

[[“Hello”, “World”, padvalue, padvalue, padvalue], [“I”, “love”, “computer”, “science”, “!”]] whose shape is [2, 5] because you can find at most 5 tokens per input string. Note that the input at most can have two axes, so 3-D and higher dimension are not supported. If “separators” contains a single empty string, the Tokenizer will enter into character tokenezation mode. This means all strings will be broken part into individual characters. For each input string, the second mode searches matches of “tokenexp” and each match will be a token in Y. The matching of “tokenexp” is conducted greedily (i.e., a match should be as long as possible). This operator searches for the first match starting from the beginning of the considered string, and then launches another search starting from the first remained character after the first matched token. If no match found, this operator will remove the first character from the remained string and do another search. This procedure will be repeated until reaching the end of the considered string.

Let’s consider another example to illustrate the effect of setting “mark” to true. If input is [“Hello”, “World”], then the corresponding output would be [0x02, “Hello”, “World”, 0x03]. This implies that if mark is true, [C]/[N, C] - input’s output shape becomes [C, D+2]/[N, C, D+2].

If tokenizer removes the entire content of [C]-input, it will produce [[]]. I.e. the output shape should be [C][0] or [N][C][0] if input shape was [N][C]. If the tokenizer receives empty input of [0] then the output is [0] if empty input of [N, 0] then [N, 0].

Attributes

  • mark (required): Boolean whether to mark the beginning/end character with start of text character (0x02)/end of text character (0x03). Default value is ?.

  • mincharnum (required): Minimum number of characters allowed in the output. For example, if mincharnum is 2, tokens such as “A” and “B” would be ignored Default value is ?.

  • pad_value (required): The string used to pad output tensors when the tokens extracted doesn’t match the maximum number of tokens found. If start/end markers are needed, padding will appear outside the markers. Default value is ?.

  • separators: an optional list of strings attribute that contains a list of separators - regular expressions to match separators Two consecutive segments in X connected by a separator would be divided into two tokens. For example, if the input is “Hello World!” and this attribute contains only one space character, the corresponding output would be [“Hello”, “World!”]. To achieve character-level tokenization, one should set the ‘separators’ to [“”], which contains an empty string. Default value is ?.

  • tokenexp: An optional string. Token’s regular expression in basic POSIX format (pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#ta g_09_03). If set, tokenizer may produce tokens matching the specified pattern. Note that one and only of ‘tokenexp’ and ‘separators’ should be set. Default value is ?.

Inputs

  • X (heterogeneous) - T: Strings to tokenize

Outputs

  • Y (heterogeneous) - T: Tokenized strings

OnnxComMicrosoftTokenizer_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftTokenizer_1(*args, **kwargs)#

Version

  • name: Tokenizer (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Tokenizer divides each string in X into a vector of strings along the last axis. Allowed input shapes are [C] and [N, C]. If the maximum number of tokens found per input string is D, the output shape would be [N, C, D] when input shape is [N, C]. Similarly, if input shape is [C] then the output should be [C, D]. Tokenizer has two different operation modes. The first mode is selected when “tokenexp” is not set and “separators” is set. If “tokenexp” is set and “separators” is not set, the second mode will be used. The first mode breaks each input string into tokens by matching and removing separators. “separators” is a list of strings which are regular expressions. “tokenexp” is a single regular expression. Let’s assume “separators” is [” “] and consider an example. If input is [“Hello World”, “I love computer science !”] whose shape is [2], then the output would be

[[“Hello”, “World”, padvalue, padvalue, padvalue], [“I”, “love”, “computer”, “science”, “!”]] whose shape is [2, 5] because you can find at most 5 tokens per input string. Note that the input at most can have two axes, so 3-D and higher dimension are not supported. If “separators” contains a single empty string, the Tokenizer will enter into character tokenezation mode. This means all strings will be broken part into individual characters. For each input string, the second mode searches matches of “tokenexp” and each match will be a token in Y. The matching of “tokenexp” is conducted greedily (i.e., a match should be as long as possible). This operator searches for the first match starting from the beginning of the considered string, and then launches another search starting from the first remained character after the first matched token. If no match found, this operator will remove the first character from the remained string and do another search. This procedure will be repeated until reaching the end of the considered string.

Let’s consider another example to illustrate the effect of setting “mark” to true. If input is [“Hello”, “World”], then the corresponding output would be [0x02, “Hello”, “World”, 0x03]. This implies that if mark is true, [C]/[N, C] - input’s output shape becomes [C, D+2]/[N, C, D+2].

If tokenizer removes the entire content of [C]-input, it will produce [[]]. I.e. the output shape should be [C][0] or [N][C][0] if input shape was [N][C]. If the tokenizer receives empty input of [0] then the output is [0] if empty input of [N, 0] then [N, 0].

Attributes

  • mark (required): Boolean whether to mark the beginning/end character with start of text character (0x02)/end of text character (0x03). Default value is ?.

  • mincharnum (required): Minimum number of characters allowed in the output. For example, if mincharnum is 2, tokens such as “A” and “B” would be ignored Default value is ?.

  • pad_value (required): The string used to pad output tensors when the tokens extracted doesn’t match the maximum number of tokens found. If start/end markers are needed, padding will appear outside the markers. Default value is ?.

  • separators: an optional list of strings attribute that contains a list of separators - regular expressions to match separators Two consecutive segments in X connected by a separator would be divided into two tokens. For example, if the input is “Hello World!” and this attribute contains only one space character, the corresponding output would be [“Hello”, “World!”]. To achieve character-level tokenization, one should set the ‘separators’ to [“”], which contains an empty string. Default value is ?.

  • tokenexp: An optional string. Token’s regular expression in basic POSIX format (pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#ta g_09_03). If set, tokenizer may produce tokens matching the specified pattern. Note that one and only of ‘tokenexp’ and ‘separators’ should be set. Default value is ?.

Inputs

  • X (heterogeneous) - T: Strings to tokenize

Outputs

  • Y (heterogeneous) - T: Tokenized strings

OnnxComMicrosoftTorchEmbedding#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftTorchEmbedding(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Based on Torch operator Embedding, creates a lookup table of embedding vectors of fixed size,

for a dictionary of fixed size.

Inputs

Between 2 and 4 inputs.

  • weight (heterogeneous) - T: The embedding matrix of size N x M. ‘N’ is equal to the maximum possible index + 1, and ‘M’ is equal to the embedding size

  • indices (heterogeneous) - tensor(int64): Long tensor containing the indices to extract from embedding matrix.

  • padding_idx (optional, heterogeneous) - tensor(int64): A 0-D scalar tensor. If specified, the entries at padding_idx do not contribute to the gradient; therefore, the embedding vector at padding_idx is not updated during training, i.e. it remains as a fixed pad.

  • scale_grad_by_freq (optional, heterogeneous) - tensor(bool): A 0-D bool tensor. If given, this will scale gradients by the inverse of frequency of the indices (words) in the mini-batch. Default is False

Outputs

  • Y (heterogeneous) - T: Output tensor of the same type as the input tensor. Shape of the output is * x M, where ‘*’ is the shape of input indices, and ‘M’ is the embedding size.

OnnxComMicrosoftTorchEmbedding_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftTorchEmbedding_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Based on Torch operator Embedding, creates a lookup table of embedding vectors of fixed size,

for a dictionary of fixed size.

Inputs

Between 2 and 4 inputs.

  • weight (heterogeneous) - T: The embedding matrix of size N x M. ‘N’ is equal to the maximum possible index + 1, and ‘M’ is equal to the embedding size

  • indices (heterogeneous) - tensor(int64): Long tensor containing the indices to extract from embedding matrix.

  • padding_idx (optional, heterogeneous) - tensor(int64): A 0-D scalar tensor. If specified, the entries at padding_idx do not contribute to the gradient; therefore, the embedding vector at padding_idx is not updated during training, i.e. it remains as a fixed pad.

  • scale_grad_by_freq (optional, heterogeneous) - tensor(bool): A 0-D bool tensor. If given, this will scale gradients by the inverse of frequency of the indices (words) in the mini-batch. Default is False

Outputs

  • Y (heterogeneous) - T: Output tensor of the same type as the input tensor. Shape of the output is * x M, where ‘*’ is the shape of input indices, and ‘M’ is the embedding size.

OnnxComMicrosoftTransposeMatMul#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftTransposeMatMul(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Duplicate of FusedMatMul. Going forward FusedMatMul should be used. This OP will be supported for backward compatibility. Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

Attributes

  • alpha: Scalar multiplier for the product of the input tensors. Default value is ?.

  • transA: Whether A should be transposed on the last two dimensions before doing multiplication Default value is ?.

  • transB: Whether B should be transposed on the last two dimensions before doing multiplication Default value is ?.

Inputs

  • A (heterogeneous) - T: N-dimensional matrix A

  • B (heterogeneous) - T: N-dimensional matrix B

Outputs

  • Y (heterogeneous) - T: Matrix multiply results

OnnxComMicrosoftTransposeMatMul_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftTransposeMatMul_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Duplicate of FusedMatMul. Going forward FusedMatMul should be used. This OP will be supported for backward compatibility. Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

Attributes

  • alpha: Scalar multiplier for the product of the input tensors. Default value is ?.

  • transA: Whether A should be transposed on the last two dimensions before doing multiplication Default value is ?.

  • transB: Whether B should be transposed on the last two dimensions before doing multiplication Default value is ?.

Inputs

  • A (heterogeneous) - T: N-dimensional matrix A

  • B (heterogeneous) - T: N-dimensional matrix B

Outputs

  • Y (heterogeneous) - T: Matrix multiply results

OnnxComMicrosoftTrilu#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftTrilu(*args, **kwargs)#

Version

  • name: Trilu (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Returns the upper or lower triangular part of a 2-D matrix, or batches of 2-D matrices. If the attribute “upper” is set to true, the upper triangular matrix is retained. Lower triangular matrix is retained otherwise. Default value for upper is true. Trilu takes one input tensor of shape [*, N, M], where * is zero or more batch dimensions. The upper triangular part consists of the elements on and above the given diagonal (k). The lower triangular part consists of elements on and below the diagonal. All other elements in the matrix are set to zero. If k = 0, the triangular part on and above/below the main diagonal is retained. If upper is set to true, a positive k retains the upper triangular matrix excluding k diagonals above the main diagonal. A negative k value includes as many diagonals below the main diagonal. If upper is set to false, a positive k retains the lower triangular matrix including k diagonals above the main diagonal. A negative k value excludes as many diagonals below the main diagonal.

Attributes

  • upper: Boolean. Indicates whether upper or lower part of matrix is retained. Default is true. Default value is ?.

Inputs

Between 1 and 2 inputs.

  • X (heterogeneous) - T: Input tensor of rank 2 or higher.

  • k (optional, heterogeneous) - tensor(int64): A 0-D tensor containing a single value corresponding to the number diagonals above or the main diagonal to exclude or include.Default value is 0 if it’s not specified.

Outputs

  • Y (heterogeneous) - T: Output tensor of the same type and shape as the input tensor.

OnnxComMicrosoftTrilu_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftTrilu_1(*args, **kwargs)#

Version

  • name: Trilu (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Returns the upper or lower triangular part of a 2-D matrix, or batches of 2-D matrices. If the attribute “upper” is set to true, the upper triangular matrix is retained. Lower triangular matrix is retained otherwise. Default value for upper is true. Trilu takes one input tensor of shape [*, N, M], where * is zero or more batch dimensions. The upper triangular part consists of the elements on and above the given diagonal (k). The lower triangular part consists of elements on and below the diagonal. All other elements in the matrix are set to zero. If k = 0, the triangular part on and above/below the main diagonal is retained. If upper is set to true, a positive k retains the upper triangular matrix excluding k diagonals above the main diagonal. A negative k value includes as many diagonals below the main diagonal. If upper is set to false, a positive k retains the lower triangular matrix including k diagonals above the main diagonal. A negative k value excludes as many diagonals below the main diagonal.

Attributes

  • upper: Boolean. Indicates whether upper or lower part of matrix is retained. Default is true. Default value is ?.

Inputs

Between 1 and 2 inputs.

  • X (heterogeneous) - T: Input tensor of rank 2 or higher.

  • k (optional, heterogeneous) - tensor(int64): A 0-D tensor containing a single value corresponding to the number diagonals above or the main diagonal to exclude or include.Default value is 0 if it’s not specified.

Outputs

  • Y (heterogeneous) - T: Output tensor of the same type and shape as the input tensor.

OnnxComMicrosoftUnique#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftUnique(*args, **kwargs)#

Version

  • name: Unique (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Finds all the unique values (deduped list) present in the given input tensor. This operator returns 3 outputs. The first output tensor ‘uniques’ contains all of the unique elements of the input, sorted in the same order that they occur in the input. The second output tensor ‘idx’ is the same size as the input and it contains the index of each value of the input in ‘uniques’. The third output tensor ‘counts’ contains the count of each element of ‘uniques’ in the input. Example:

input_x = [2, 1, 1, 3, 4, 3] output_uniques = [2, 1, 3, 4] output_idx = [0, 1, 1, 2, 3, 2] output_counts = [1, 2, 2, 1]

Inputs

  • x (heterogeneous) - T: A 1-D input tensor that is to be processed.

Outputs

  • y (heterogeneous) - T: A 1-D tensor of the same type as ‘x’ containing all the unique values in ‘x’ sorted in the same order that they occur in the input ‘x’

  • idx (heterogeneous) - tensor(int64): A 1-D INT64 tensor of the same size as ‘x’ containing the indices for each value in ‘x’ in the output ‘uniques’

  • counts (heterogeneous) - tensor(int64): A 1-D INT64 tensor containing the the count of each element of ‘uniques’ in the input ‘x’

OnnxComMicrosoftUnique_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftUnique_1(*args, **kwargs)#

Version

  • name: Unique (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Finds all the unique values (deduped list) present in the given input tensor. This operator returns 3 outputs. The first output tensor ‘uniques’ contains all of the unique elements of the input, sorted in the same order that they occur in the input. The second output tensor ‘idx’ is the same size as the input and it contains the index of each value of the input in ‘uniques’. The third output tensor ‘counts’ contains the count of each element of ‘uniques’ in the input. Example:

input_x = [2, 1, 1, 3, 4, 3] output_uniques = [2, 1, 3, 4] output_idx = [0, 1, 1, 2, 3, 2] output_counts = [1, 2, 2, 1]

Inputs

  • x (heterogeneous) - T: A 1-D input tensor that is to be processed.

Outputs

  • y (heterogeneous) - T: A 1-D tensor of the same type as ‘x’ containing all the unique values in ‘x’ sorted in the same order that they occur in the input ‘x’

  • idx (heterogeneous) - tensor(int64): A 1-D INT64 tensor of the same size as ‘x’ containing the indices for each value in ‘x’ in the output ‘uniques’

  • counts (heterogeneous) - tensor(int64): A 1-D INT64 tensor containing the the count of each element of ‘uniques’ in the input ‘x’

OnnxComMicrosoftView#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftView(*args, **kwargs)#

Version

  • name: View (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

View. The output tensors are views of the input, according to the shapes provided.

Inputs

Between 2 and 2147483647 inputs.

  • input (heterogeneous) - T: Input tensor.

  • shapes (variadic, heterogeneous) - tensor(int64): Shapes of each view output. The shapes must adds up to the input buffer size.

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic, heterogeneous) - T: Output tensors viewed according the shapes input. It has a one to one mapping to the shapes input

OnnxComMicrosoftView_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftView_1(*args, **kwargs)#

Version

  • name: View (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

View. The output tensors are views of the input, according to the shapes provided.

Inputs

Between 2 and 2147483647 inputs.

  • input (heterogeneous) - T: Input tensor.

  • shapes (variadic, heterogeneous) - tensor(int64): Shapes of each view output. The shapes must adds up to the input buffer size.

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic, heterogeneous) - T: Output tensors viewed according the shapes input. It has a one to one mapping to the shapes input

OnnxComMicrosoftWaitEvent#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftWaitEvent(*args, **kwargs)#

Version

  • name: WaitEvent (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Wait for an event to be recorded.

Inputs

Between 2 and 2147483647 inputs.

  • EventIdentifier (heterogeneous) - TInt64: Event identifier to record.

  • InputData (variadic) - T: Input data.

Outputs

Between 1 and 2147483647 outputs.

  • OutputData (variadic) - T: Output data.

OnnxComMicrosoftWaitEvent_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftWaitEvent_1(*args, **kwargs)#

Version

  • name: WaitEvent (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Wait for an event to be recorded.

Inputs

Between 2 and 2147483647 inputs.

  • EventIdentifier (heterogeneous) - TInt64: Event identifier to record.

  • InputData (variadic) - T: Input data.

Outputs

Between 1 and 2147483647 outputs.

  • OutputData (variadic) - T: Output data.

OnnxComMicrosoftWordConvEmbedding#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftWordConvEmbedding(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The WordConvEmbedding takes in a batch of sequence words and embed each word to a vector.

Attributes

  • char_embedding_size: Integer representing the embedding vector size for each char.If not provide, use the char embedding size of embedding vector. Default value is ?.

  • conv_window_size: This operator applies convolution to word from left to right with window equal to conv_window_size and stride to 1.Take word ‘example’ for example, with conv_window_size equal to 2, conv is applied to [ex],[xa], [am], [mp]…If not provide, use the first dimension of conv kernal shape. Default value is ?.

  • embedding_size: Integer representing the embedding vector size for each word.If not provide, use the fileter size of conv weight Default value is ?.

Inputs

  • Sequence (heterogeneous) - T: Specify batchs of sequence words to embedding

  • W (heterogeneous) - T1: Specify weights of conv

  • B (heterogeneous) - T1: Specify bias of conv

  • C (heterogeneous) - T1: Specify embedding vector of char

Outputs

  • Y (heterogeneous) - T1: output

OnnxComMicrosoftWordConvEmbedding_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftWordConvEmbedding_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The WordConvEmbedding takes in a batch of sequence words and embed each word to a vector.

Attributes

  • char_embedding_size: Integer representing the embedding vector size for each char.If not provide, use the char embedding size of embedding vector. Default value is ?.

  • conv_window_size: This operator applies convolution to word from left to right with window equal to conv_window_size and stride to 1.Take word ‘example’ for example, with conv_window_size equal to 2, conv is applied to [ex],[xa], [am], [mp]…If not provide, use the first dimension of conv kernal shape. Default value is ?.

  • embedding_size: Integer representing the embedding vector size for each word.If not provide, use the fileter size of conv weight Default value is ?.

Inputs

  • Sequence (heterogeneous) - T: Specify batchs of sequence words to embedding

  • W (heterogeneous) - T1: Specify weights of conv

  • B (heterogeneous) - T1: Specify bias of conv

  • C (heterogeneous) - T1: Specify embedding vector of char

Outputs

  • Y (heterogeneous) - T1: output

OnnxComMicrosoftYieldOp#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftYieldOp(*args, **kwargs)#

Version

  • name: YieldOp (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Yield Op.

Attributes

  • full_shape_outputs (required): The indices of the module outputs that must have full shape. Default value is ?.

  • non_differentiable_outputs: The indices of the module outputs that doesn’t have a gradient. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • module_outputs (variadic) - T: Module outputs to be returned to pytorch.

Outputs

Between 0 and 2147483647 outputs.

  • module_outputs_grad (variadic) - T: Gradient of module outputs returned from pytorch.

OnnxComMicrosoftYieldOp_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftYieldOp_1(*args, **kwargs)#

Version

  • name: YieldOp (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Yield Op.

Attributes

  • full_shape_outputs (required): The indices of the module outputs that must have full shape. Default value is ?.

  • non_differentiable_outputs: The indices of the module outputs that doesn’t have a gradient. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • module_outputs (variadic) - T: Module outputs to be returned to pytorch.

Outputs

Between 0 and 2147483647 outputs.

  • module_outputs_grad (variadic) - T: Gradient of module outputs returned from pytorch.

OnnxComMicrosoftZeroGradient#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftZeroGradient(*args, **kwargs)#

Version

  • name: ZeroGradient (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

reset the accumulator for gradient

Inputs

  • old_gradient (heterogeneous) - T1: historical result of accumulated gradient

  • reset_signal (heterogeneous) - T2: if this input is available, it is ready to reset the accumulator

Outputs

  • zero_gradient (heterogeneous) - T1: reset the gradient

OnnxComMicrosoftZeroGradient_1#

class mlprodict.npy.xop_auto_import_.OnnxComMicrosoftZeroGradient_1(*args, **kwargs)#

Version

  • name: ZeroGradient (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

reset the accumulator for gradient

Inputs

  • old_gradient (heterogeneous) - T1: historical result of accumulated gradient

  • reset_signal (heterogeneous) - T2: if this input is available, it is ready to reset the accumulator

Outputs

  • zero_gradient (heterogeneous) - T1: reset the gradient

OnnxComMsInternalNhwcConv#

class mlprodict.npy.xop_auto_import_.OnnxComMsInternalNhwcConv(*args, **kwargs)#

Version

  • name: Conv (GitHub)

  • domain: com.ms.internal.nhwc

  • since_version: 11

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 11 of domain com.ms.internal.nhwc.

Summary

The convolution operator consumes an input tensor and a filter, and computes the output.

Attributes

  • activation:

Default value is ?.

  • activation_params:

Default value is ?.

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is ?.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis. Default value is ?.

  • group: number of groups input channels and output channels are divided into. Default value is ?.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W. Default value is ?.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis. Default value is ?.

  • strides: Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis. Default value is ?.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. Assuming zero based indices for the shape array, X.shape[1] == (W.shape[1] * group) == C and W.shape[0] mod G == 0. Or in other words FILTER_IN_CHANNEL multiplied by the number of groups should be equal to DATA_CHANNEL and the number of feature maps M should be a multiple of the number of groups G.

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

OnnxComMsInternalNhwcConv_11#

class mlprodict.npy.xop_auto_import_.OnnxComMsInternalNhwcConv_11(*args, **kwargs)#

Version

  • name: Conv (GitHub)

  • domain: com.ms.internal.nhwc

  • since_version: 11

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 11 of domain com.ms.internal.nhwc.

Summary

The convolution operator consumes an input tensor and a filter, and computes the output.

Attributes

  • activation:

Default value is ?.

  • activation_params:

Default value is ?.

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is ?.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis. Default value is ?.

  • group: number of groups input channels and output channels are divided into. Default value is ?.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W. Default value is ?.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis. Default value is ?.

  • strides: Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis. Default value is ?.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. Assuming zero based indices for the shape array, X.shape[1] == (W.shape[1] * group) == C and W.shape[0] mod G == 0. Or in other words FILTER_IN_CHANNEL multiplied by the number of groups should be equal to DATA_CHANNEL and the number of feature maps M should be a multiple of the number of groups G.

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

OnnxComMsInternalNhwcMaxPool#

class mlprodict.npy.xop_auto_import_.OnnxComMsInternalNhwcMaxPool(*args, **kwargs)#

Version

  • name: MaxPool (GitHub)

  • domain: com.ms.internal.nhwc

  • since_version: 11

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 11 of domain com.ms.internal.nhwc.

Summary

MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

or#

output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) - input_spatial_shape[i]

The output of each pooling window is maximum number of elements exclude pad.

Attributes

  • activation:

Default value is ?.

  • activation_params:

Default value is ?.

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is ?.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is ?.

  • dilations: Dilation value along each spatial axis of filter. If not present, the dilation defaults to 1 along each spatial axis. Default value is ?.

  • kernel_shape (required): The size of the kernel along each axis. Default value is ?.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis. Default value is ?.

  • storage_order: The storage order of the tensor. 0 is row major, and 1 is column major. Default value is ?.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis. Default value is ?.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

Between 1 and 2 outputs.

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

  • Indices (optional, heterogeneous) - I: Indices tensor from max pooling across the input tensor. The dimensions of indices are the same as output tensor. The values in indices of are the indices of the selected values during pooling. The indices are computed as flatten 1-D tensor, and the indices do not consider padding. So the values in indices are in [0, N x C x D1 x … x Dn).

OnnxComMsInternalNhwcMaxPool_11#

class mlprodict.npy.xop_auto_import_.OnnxComMsInternalNhwcMaxPool_11(*args, **kwargs)#

Version

  • name: MaxPool (GitHub)

  • domain: com.ms.internal.nhwc

  • since_version: 11

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 11 of domain com.ms.internal.nhwc.

Summary

MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

or#

output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) - input_spatial_shape[i]

The output of each pooling window is maximum number of elements exclude pad.

Attributes

  • activation:

Default value is ?.

  • activation_params:

Default value is ?.

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is ?.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is ?.

  • dilations: Dilation value along each spatial axis of filter. If not present, the dilation defaults to 1 along each spatial axis. Default value is ?.

  • kernel_shape (required): The size of the kernel along each axis. Default value is ?.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis. Default value is ?.

  • storage_order: The storage order of the tensor. 0 is row major, and 1 is column major. Default value is ?.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis. Default value is ?.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

Between 1 and 2 outputs.

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

  • Indices (optional, heterogeneous) - I: Indices tensor from max pooling across the input tensor. The dimensions of indices are the same as output tensor. The values in indices of are the indices of the selected values during pooling. The indices are computed as flatten 1-D tensor, and the indices do not consider padding. So the values in indices are in [0, N x C x D1 x … x Dn).

OnnxCompress#

class mlprodict.npy.xop_auto_import_.OnnxCompress(*args, **kwargs)#

Version

  • name: Compress (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Selects slices from an input tensor along a given axis where condition evaluates to True for each axis index. In case axis is not provided, input is flattened before elements are selected. Compress behaves like numpy.compress: https://docs.scipy.org/doc/numpy/reference/generated/numpy.compress.html

Attributes

  • axis: (Optional) Axis along which to take slices. If not specified, input is flattened before elements being selected. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

Inputs

  • input (heterogeneous) - T: Tensor of rank r >= 1.

  • condition (heterogeneous) - T1: Rank 1 tensor of booleans to indicate which slices or data elements to be selected. Its length can be less than the input length along the axis or the flattened input size if axis is not specified. In such cases data slices or elements exceeding the condition length are discarded.

Outputs

  • output (heterogeneous) - T: Tensor of rank r if axis is specified. Otherwise output is a Tensor of rank 1.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • T1 in ( tensor(bool) ): Constrain to boolean tensors.

OnnxCompress_11#

class mlprodict.npy.xop_auto_import_.OnnxCompress_11(*args, **kwargs)#

Version

  • name: Compress (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Selects slices from an input tensor along a given axis where condition evaluates to True for each axis index. In case axis is not provided, input is flattened before elements are selected. Compress behaves like numpy.compress: https://docs.scipy.org/doc/numpy/reference/generated/numpy.compress.html

Attributes

  • axis: (Optional) Axis along which to take slices. If not specified, input is flattened before elements being selected. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

Inputs

  • input (heterogeneous) - T: Tensor of rank r >= 1.

  • condition (heterogeneous) - T1: Rank 1 tensor of booleans to indicate which slices or data elements to be selected. Its length can be less than the input length along the axis or the flattened input size if axis is not specified. In such cases data slices or elements exceeding the condition length are discarded.

Outputs

  • output (heterogeneous) - T: Tensor of rank r if axis is specified. Otherwise output is a Tensor of rank 1.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • T1 in ( tensor(bool) ): Constrain to boolean tensors.

OnnxCompress_9#

class mlprodict.npy.xop_auto_import_.OnnxCompress_9(*args, **kwargs)#

Version

  • name: Compress (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 9.

Summary

Selects slices from an input tensor along a given axis where condition evaluates to True for each axis index. In case axis is not provided, input is flattened before elements are selected. Compress behaves like numpy.compress: https://docs.scipy.org/doc/numpy/reference/generated/numpy.compress.html

Attributes

  • axis: (Optional) Axis along which to take slices. If not specified, input is flattened before elements being selected.

Inputs

  • input (heterogeneous) - T: Tensor of rank r >= 1.

  • condition (heterogeneous) - T1: Rank 1 tensor of booleans to indicate which slices or data elements to be selected. Its length can be less than the input length alone the axis or the flattened input size if axis is not specified. In such cases data slices or elements exceeding the condition length are discarded.

Outputs

  • output (heterogeneous) - T: Tensor of rank r if axis is specified. Otherwise output is a Tensor of rank 1.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • T1 in ( tensor(bool) ): Constrain to boolean tensors.

OnnxConcat#

class mlprodict.npy.xop_auto_import_.OnnxConcat(*args, **kwargs)#

Version

  • name: Concat (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Concatenate a list of tensors into a single tensor. All input tensors must have the same shape, except for the dimension size of the axis to concatenate on.

Attributes

  • axis (required): Which axis to concat on. A negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(inputs)..

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic, heterogeneous) - T: List of tensors for concatenation

Outputs

  • concat_result (heterogeneous) - T: Concatenated tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to any tensor type.

OnnxConcatFromSequence#

class mlprodict.npy.xop_auto_import_.OnnxConcatFromSequence(*args, **kwargs)#

Version

  • name: ConcatFromSequence (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Concatenate a sequence of tensors into a single tensor. All input tensors must have the same shape, except for the dimension size of the axis to concatenate on. By default ‘new_axis’ is 0, the behavior is similar to numpy.concatenate. When ‘new_axis’ is 1, the behavior is similar to numpy.stack.

Attributes

  • axis (required): Which axis to concat on. Accepted range in [-r, r - 1], where r is the rank of input tensors. When new_axis is 1, accepted range is [-r - 1, r].

  • new_axis: Insert and concatenate on a new axis or not, default 0 means do not insert new axis. Default value is 0.

Inputs

  • input_sequence (heterogeneous) - S: Sequence of tensors for concatenation

Outputs

  • concat_result (heterogeneous) - T: Concatenated tensor

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain input types to any tensor type.

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to any tensor type.

OnnxConcatFromSequence_11#

class mlprodict.npy.xop_auto_import_.OnnxConcatFromSequence_11(*args, **kwargs)#

Version

  • name: ConcatFromSequence (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Concatenate a sequence of tensors into a single tensor. All input tensors must have the same shape, except for the dimension size of the axis to concatenate on. By default ‘new_axis’ is 0, the behavior is similar to numpy.concatenate. When ‘new_axis’ is 1, the behavior is similar to numpy.stack.

Attributes

  • axis (required): Which axis to concat on. Accepted range in [-r, r - 1], where r is the rank of input tensors. When new_axis is 1, accepted range is [-r - 1, r].

  • new_axis: Insert and concatenate on a new axis or not, default 0 means do not insert new axis. Default value is 0.

Inputs

  • input_sequence (heterogeneous) - S: Sequence of tensors for concatenation

Outputs

  • concat_result (heterogeneous) - T: Concatenated tensor

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain input types to any tensor type.

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to any tensor type.

OnnxConcat_1#

class mlprodict.npy.xop_auto_import_.OnnxConcat_1(*args, **kwargs)#

Version

  • name: Concat (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Concatenate a list of tensors into a single tensor

Attributes

  • axis: Which axis to concat on. Default value is 1.

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic, heterogeneous) - T: List of tensors for concatenation

Outputs

  • concat_result (heterogeneous) - T: Concatenated tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain output types to float tensors.

OnnxConcat_11#

class mlprodict.npy.xop_auto_import_.OnnxConcat_11(*args, **kwargs)#

Version

  • name: Concat (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Concatenate a list of tensors into a single tensor. All input tensors must have the same shape, except for the dimension size of the axis to concatenate on.

Attributes

  • axis (required): Which axis to concat on. A negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(inputs)..

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic, heterogeneous) - T: List of tensors for concatenation

Outputs

  • concat_result (heterogeneous) - T: Concatenated tensor

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to any tensor type.

OnnxConcat_13#

class mlprodict.npy.xop_auto_import_.OnnxConcat_13(*args, **kwargs)#

Version

  • name: Concat (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Concatenate a list of tensors into a single tensor. All input tensors must have the same shape, except for the dimension size of the axis to concatenate on.

Attributes

  • axis (required): Which axis to concat on. A negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(inputs)..

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic, heterogeneous) - T: List of tensors for concatenation

Outputs

  • concat_result (heterogeneous) - T: Concatenated tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to any tensor type.

OnnxConcat_4#

class mlprodict.npy.xop_auto_import_.OnnxConcat_4(*args, **kwargs)#

Version

  • name: Concat (GitHub)

  • domain: main

  • since_version: 4

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 4.

Summary

Concatenate a list of tensors into a single tensor

Attributes

  • axis (required): Which axis to concat on

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic, heterogeneous) - T: List of tensors for concatenation

Outputs

  • concat_result (heterogeneous) - T: Concatenated tensor

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to any tensor type.

OnnxConstant#

class mlprodict.npy.xop_auto_import_.OnnxConstant(*args, **kwargs)#

Version

  • name: Constant (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

This operator produces a constant tensor. Exactly one of the provided attributes, either value, sparse_value, or value_* must be specified.

Attributes

  • sparse_value: The value for the elements of the output tensor in sparse format.

  • value: The value for the elements of the output tensor.

  • value_float: The value for the sole element for the scalar, float32, output tensor.

  • value_floats: The values for the elements for the 1D, float32, output tensor.

  • value_int: The value for the sole element for the scalar, int64, output tensor.

  • value_ints: The values for the elements for the 1D, int64, output tensor.

  • value_string: The value for the sole element for the scalar, UTF-8 string, output tensor.

  • value_strings: The values for the elements for the 1D, UTF-8 string, output tensor.

Outputs

  • output (heterogeneous) - T: Output tensor containing the same value of the provided tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxConstantOfShape#

class mlprodict.npy.xop_auto_import_.OnnxConstantOfShape(*args, **kwargs)#

Version

  • name: ConstantOfShape (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Generate a tensor with given value and shape.

Attributes

  • value: (Optional) The value of the output elements.Should be a one-element tensor. If not specified, it defaults to a tensor of value 0 and datatype float32

Inputs

  • input (heterogeneous) - T1: 1D tensor. The shape of the expected output tensor. If empty tensor is given, the output would be a scalar. All values must be >= 0.

Outputs

  • output (heterogeneous) - T2: Output tensor of shape specified by ‘input’.If attribute ‘value’ is specified, the value and datatype of the output tensor is taken from ‘value’.If attribute ‘value’ is not specified, the value in the output defaults to 0, and the datatype defaults to float32.

Type Constraints

  • T1 in ( tensor(int64) ): Constrain input types.

  • T2 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to be numerics.

OnnxConstantOfShape_9#

class mlprodict.npy.xop_auto_import_.OnnxConstantOfShape_9(*args, **kwargs)#

Version

  • name: ConstantOfShape (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Generate a tensor with given value and shape.

Attributes

  • value: (Optional) The value of the output elements.Should be a one-element tensor. If not specified, it defaults to a tensor of value 0 and datatype float32

Inputs

  • input (heterogeneous) - T1: 1D tensor. The shape of the expected output tensor. If empty tensor is given, the output would be a scalar. All values must be >= 0.

Outputs

  • output (heterogeneous) - T2: Output tensor of shape specified by ‘input’.If attribute ‘value’ is specified, the value and datatype of the output tensor is taken from ‘value’.If attribute ‘value’ is not specified, the value in the output defaults to 0, and the datatype defaults to float32.

Type Constraints

  • T1 in ( tensor(int64) ): Constrain input types.

  • T2 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to be numerics.

OnnxConstant_1#

class mlprodict.npy.xop_auto_import_.OnnxConstant_1(*args, **kwargs)#

Version

  • name: Constant (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

A constant tensor.

Attributes

  • value (required): The value for the elements of the output tensor.

Outputs

  • output (heterogeneous) - T: Output tensor containing the same value of the provided tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxConstant_11#

class mlprodict.npy.xop_auto_import_.OnnxConstant_11(*args, **kwargs)#

Version

  • name: Constant (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

A constant tensor. Exactly one of the two attributes, either value or sparse_value, must be specified.

Attributes

  • sparse_value: The value for the elements of the output tensor in sparse format.

  • value: The value for the elements of the output tensor.

Outputs

  • output (heterogeneous) - T: Output tensor containing the same value of the provided tensor.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxConstant_12#

class mlprodict.npy.xop_auto_import_.OnnxConstant_12(*args, **kwargs)#

Version

  • name: Constant (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

This operator produces a constant tensor. Exactly one of the provided attributes, either value, sparse_value, or value_* must be specified.

Attributes

  • sparse_value: The value for the elements of the output tensor in sparse format.

  • value: The value for the elements of the output tensor.

  • value_float: The value for the sole element for the scalar, float32, output tensor.

  • value_floats: The values for the elements for the 1D, float32, output tensor.

  • value_int: The value for the sole element for the scalar, int64, output tensor.

  • value_ints: The values for the elements for the 1D, int64, output tensor.

  • value_string: The value for the sole element for the scalar, UTF-8 string, output tensor.

  • value_strings: The values for the elements for the 1D, UTF-8 string, output tensor.

Outputs

  • output (heterogeneous) - T: Output tensor containing the same value of the provided tensor.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxConstant_13#

class mlprodict.npy.xop_auto_import_.OnnxConstant_13(*args, **kwargs)#

Version

  • name: Constant (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

This operator produces a constant tensor. Exactly one of the provided attributes, either value, sparse_value, or value_* must be specified.

Attributes

  • sparse_value: The value for the elements of the output tensor in sparse format.

  • value: The value for the elements of the output tensor.

  • value_float: The value for the sole element for the scalar, float32, output tensor.

  • value_floats: The values for the elements for the 1D, float32, output tensor.

  • value_int: The value for the sole element for the scalar, int64, output tensor.

  • value_ints: The values for the elements for the 1D, int64, output tensor.

  • value_string: The value for the sole element for the scalar, UTF-8 string, output tensor.

  • value_strings: The values for the elements for the 1D, UTF-8 string, output tensor.

Outputs

  • output (heterogeneous) - T: Output tensor containing the same value of the provided tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxConstant_9#

class mlprodict.npy.xop_auto_import_.OnnxConstant_9(*args, **kwargs)#

Version

  • name: Constant (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

A constant tensor.

Attributes

  • value (required): The value for the elements of the output tensor.

Outputs

  • output (heterogeneous) - T: Output tensor containing the same value of the provided tensor.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxConv#

class mlprodict.npy.xop_auto_import_.OnnxConv(*args, **kwargs)#

Version

  • name: Conv (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

The convolution operator consumes an input tensor and a filter, and computes the output.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.

  • group: number of groups input channels and output channels are divided into. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. Assuming zero based indices for the shape array, X.shape[1] == (W.shape[1] * group) == C and W.shape[0] mod G == 0. Or in other words FILTER_IN_CHANNEL multiplied by the number of groups should be equal to DATA_CHANNEL and the number of feature maps M should be a multiple of the number of groups G.

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxConvInteger#

class mlprodict.npy.xop_auto_import_.OnnxConvInteger(*args, **kwargs)#

Version

  • name: ConvInteger (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

The integer convolution operator consumes an input tensor, its zero-point, a filter, and its zero-point, and computes the output. The production MUST never overflow. The accumulation may overflow if and only if in 32 bits.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each axis.

  • group: number of groups input channels and output channels are divided into. default is 1. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input ‘w’.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0.The value represent the number of pixels added to the beginning and end part of the corresponding axis.`pads` format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number ofpixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaultsto 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each axis.

Inputs

Between 2 and 4 inputs.

  • x (heterogeneous) - T1: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • w (heterogeneous) - T2: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. X.shape[1] == (W.shape[1] * group) == C (assuming zero based indices for the shape array). Or in other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL.

  • x_zero_point (optional, heterogeneous) - T1: Zero point tensor for input ‘x’. It’s optional and default value is 0. It’s a scalar, which means a per-tensor/layer quantization.

  • w_zero_point (optional, heterogeneous) - T2: Zero point tensor for input ‘w’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per- tensor/layer or per output channel quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of output channels (M)

Outputs

  • y (heterogeneous) - T3: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Type Constraints

  • T1 in ( tensor(int8), tensor(uint8) ): Constrain input x and its zero point data type to 8-bit integer tensor.

  • T2 in ( tensor(int8), tensor(uint8) ): Constrain input w and its zero point data type to 8-bit integer tensor.

  • T3 in ( tensor(int32) ): Constrain output y data type to 32-bit integer tensor.

OnnxConvInteger_10#

class mlprodict.npy.xop_auto_import_.OnnxConvInteger_10(*args, **kwargs)#

Version

  • name: ConvInteger (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

The integer convolution operator consumes an input tensor, its zero-point, a filter, and its zero-point, and computes the output. The production MUST never overflow. The accumulation may overflow if and only if in 32 bits.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each axis.

  • group: number of groups input channels and output channels are divided into. default is 1. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input ‘w’.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0.The value represent the number of pixels added to the beginning and end part of the corresponding axis.`pads` format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number ofpixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaultsto 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each axis.

Inputs

Between 2 and 4 inputs.

  • x (heterogeneous) - T1: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • w (heterogeneous) - T2: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. X.shape[1] == (W.shape[1] * group) == C (assuming zero based indices for the shape array). Or in other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL.

  • x_zero_point (optional, heterogeneous) - T1: Zero point tensor for input ‘x’. It’s optional and default value is 0. It’s a scalar, which means a per-tensor/layer quantization.

  • w_zero_point (optional, heterogeneous) - T2: Zero point tensor for input ‘w’. It’s optional and default value is 0. It could be a scalar or a 1-D tensor, which means a per- tensor/layer or per output channel quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of output channels (M)

Outputs

  • y (heterogeneous) - T3: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Type Constraints

  • T1 in ( tensor(int8), tensor(uint8) ): Constrain input x and its zero point data type to 8-bit integer tensor.

  • T2 in ( tensor(int8), tensor(uint8) ): Constrain input w and its zero point data type to 8-bit integer tensor.

  • T3 in ( tensor(int32) ): Constrain output y data type to 32-bit integer tensor.

OnnxConvTranspose#

class mlprodict.npy.xop_auto_import_.OnnxConvTranspose(*args, **kwargs)#

Version

  • name: ConvTranspose (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

The convolution transpose operator consumes an input tensor and a filter, and computes the output.

If the pads parameter is provided the shape of the output is calculated via the following equation:

output_shape[i] = stride[i] * (input_size[i] - 1) + output_padding[i] + ((kernel_shape[i] - 1) * dilations[i] + 1) - pads[start_i] - pads[end_i]

output_shape can also be explicitly specified in which case pads values are auto generated using these equations:

total_padding[i] = stride[i] * (input_size[i] - 1) + output_padding[i] + ((kernel_shape[i] - 1) * dilations[i] + 1) - output_shape[i] If (auto_pads == SAME_UPPER): pads[start_i] = total_padding[i]/2; pads[end_i] = total_padding[i] - (total_padding[i]/2) Else: pads[start_i] = total_padding[i] - (total_padding[i]/2); pads[end_i] = (total_padding[i]/2).

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = input_shape[i] * strides[i] for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.

  • group: number of groups input channels and output channels are divided into. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W.

  • output_padding: Additional elements added to the side with higher coordinate indices in the output. Each padding value in “output_padding” must be less than the corresponding stride/dilation dimension. By default, this attribute is a zero vector. Note that this attribute doesn’t directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If “output_shape” is explicitly provided, “output_padding” does not contribute additional size to “output_shape” but participates in the computation of the needed padding amount. This is also called adjs or adjustment in some frameworks.

  • output_shape: The shape of the output can be explicitly set which will cause pads values to be auto generated. If output_shape is specified pads values are ignored. See doc for details for equations to generate pads

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn)

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (C x M/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the weight shape will be (C x M/group x k1 x k2 x … x kn), where (k1 x k2 x … x kn) is the dimension of the kernel. The number of channels in the output should be equal to W.shape[1] * group (assuming zero based indices of the shape array)

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, pad lengths and group count. The number of channels in the output should be equal to W.shape[1] * group (assuming zero based indices of the shape array)

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxConvTranspose_1#

class mlprodict.npy.xop_auto_import_.OnnxConvTranspose_1(*args, **kwargs)#

Version

  • name: ConvTranspose (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

The convolution transpose operator consumes an input tensor and a filter, and computes the output.

If the pads parameter is provided the shape of the output is calculated via the following equation:

output_shape[i] = stride[i] * (input_size[i] - 1) + output_padding[i] + ((kernel_shape[i] - 1) * dilations[i] + 1) - pads[start_i] - pads[end_i]

output_shape can also be explicitly specified in which case pads values are auto generated using these equations:

total_padding[i] = stride[i] * (input_size[i] - 1) + output_padding[i] + ((kernel_shape[i] - 1) * dilations[i] + 1) - output_shape[i] If (auto_pads != SAME_UPPER): pads[start_i] = total_padding[i]/2; pads[end_i] = total_padding[i] - (total_padding[i]/2) Else: pads[start_i] = total_padding[i] - (total_padding[i]/2); pads[end_i] = (total_padding[i]/2).

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter.

  • group: number of groups input channels and output channels are divided into. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W.

  • output_padding: The zero-padding added to one side of the output. This is also called adjs/adjustment in some frameworks.

  • output_shape: The shape of the output can be explicitly set which will cause pads values to be auto generated. If output_shape is specified pads values are ignored. See doc for details for equations to generate pads

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn)

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (C x M/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the weight shape will be (C x M/group x k1 x k2 x … x kn), where (k1 x k2 x … x kn) is the dimension of the kernel. The number of channels in the output should be equal to W.shape[1] * group (assuming zero based indices of the shape array)

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, pad lengths and group count. The number of channels in the output should be equal to W.shape[1] * group (assuming zero based indices of the shape array)

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxConvTranspose_11#

class mlprodict.npy.xop_auto_import_.OnnxConvTranspose_11(*args, **kwargs)#

Version

  • name: ConvTranspose (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

The convolution transpose operator consumes an input tensor and a filter, and computes the output.

If the pads parameter is provided the shape of the output is calculated via the following equation:

output_shape[i] = stride[i] * (input_size[i] - 1) + output_padding[i] + ((kernel_shape[i] - 1) * dilations[i] + 1) - pads[start_i] - pads[end_i]

output_shape can also be explicitly specified in which case pads values are auto generated using these equations:

total_padding[i] = stride[i] * (input_size[i] - 1) + output_padding[i] + ((kernel_shape[i] - 1) * dilations[i] + 1) - output_shape[i] If (auto_pads == SAME_UPPER): pads[start_i] = total_padding[i]/2; pads[end_i] = total_padding[i] - (total_padding[i]/2) Else: pads[start_i] = total_padding[i] - (total_padding[i]/2); pads[end_i] = (total_padding[i]/2).

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = input_shape[i] * strides[i] for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.

  • group: number of groups input channels and output channels are divided into. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W.

  • output_padding: Additional elements added to the side with higher coordinate indices in the output. Each padding value in “output_padding” must be less than the corresponding stride/dilation dimension. By default, this attribute is a zero vector. Note that this attribute doesn’t directly affect the computed output values. It only controls the selection of the computed values, so changing this attribute only adds or removes output elements. If “output_shape” is explicitly provided, “output_padding” does not contribute additional size to “output_shape” but participates in the computation of the needed padding amount. This is also called adjs or adjustment in some frameworks.

  • output_shape: The shape of the output can be explicitly set which will cause pads values to be auto generated. If output_shape is specified pads values are ignored. See doc for details for equations to generate pads

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn)

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (C x M/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the weight shape will be (C x M/group x k1 x k2 x … x kn), where (k1 x k2 x … x kn) is the dimension of the kernel. The number of channels in the output should be equal to W.shape[1] * group (assuming zero based indices of the shape array)

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, pad lengths and group count. The number of channels in the output should be equal to W.shape[1] * group (assuming zero based indices of the shape array)

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxConv_1#

class mlprodict.npy.xop_auto_import_.OnnxConv_1(*args, **kwargs)#

Version

  • name: Conv (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

The convolution operator consumes an input tensor and a filter, and computes the output.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter.

  • group: number of groups input channels and output channels are divided into. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. X.shape[1] == (W.shape[1] * group) == C (assuming zero based indices for the shape array). Or in other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL.

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxConv_11#

class mlprodict.npy.xop_auto_import_.OnnxConv_11(*args, **kwargs)#

Version

  • name: Conv (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

The convolution operator consumes an input tensor and a filter, and computes the output.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.

  • group: number of groups input channels and output channels are divided into. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input W.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults is 1 along each spatial axis.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • W (heterogeneous) - T: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. Assuming zero based indices for the shape array, X.shape[1] == (W.shape[1] * group) == C and W.shape[0] mod G == 0. Or in other words FILTER_IN_CHANNEL multiplied by the number of groups should be equal to DATA_CHANNEL and the number of feature maps M should be a multiple of the number of groups G.

  • B (optional, heterogeneous) - T: Optional 1D bias to be added to the convolution, has size of M.

Outputs

  • Y (heterogeneous) - T: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxCos#

class mlprodict.npy.xop_auto_import_.OnnxCos(*args, **kwargs)#

Version

  • name: Cos (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the cosine of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The cosine of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxCos_7#

class mlprodict.npy.xop_auto_import_.OnnxCos_7(*args, **kwargs)#

Version

  • name: Cos (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the cosine of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The cosine of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxCosh#

class mlprodict.npy.xop_auto_import_.OnnxCosh(*args, **kwargs)#

Version

  • name: Cosh (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Calculates the hyperbolic cosine of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic cosine values of the input tensor computed element- wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxCosh_9#

class mlprodict.npy.xop_auto_import_.OnnxCosh_9(*args, **kwargs)#

Version

  • name: Cosh (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Calculates the hyperbolic cosine of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic cosine values of the input tensor computed element- wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxCumSum#

class mlprodict.npy.xop_auto_import_.OnnxCumSum(*args, **kwargs)#

Version

  • name: CumSum (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Performs cumulative sum of the input elements along the given axis. By default, it will do the sum inclusively meaning the first element is copied as is. Through an exclusive attribute, this behavior can change to exclude the first element. It can also perform summation in the opposite direction of the axis. For that, set reverse attribute to 1.

Example:

input_x = [1, 2, 3]
axis=0
output = [1, 3, 6]
exclusive=1
output = [0, 1, 3]
exclusive=0
reverse=1
output = [6, 5, 3]
exclusive=1
reverse=1
output = [5, 3, 0]

Attributes

  • exclusive: If set to 1 will return exclusive sum in which the top element is not included. In other terms, if set to 1, the j-th output element would be the sum of the first (j-1) elements. Otherwise, it would be the sum of the first j elements. Default value is 0.

  • reverse: If set to 1 will perform the sums in reverse direction. Default value is 0.

Inputs

  • x (heterogeneous) - T: An input tensor that is to be processed.

  • axis (heterogeneous) - T2: A 0-D tensor. Must be in the range [-rank(x), rank(x)-1]. Negative value means counting dimensions from the back.

Outputs

  • y (heterogeneous) - T: Output tensor of the same type as ‘x’ with cumulative sums of the x’s elements

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

  • T2 in ( tensor(int32), tensor(int64) ): axis tensor can be int32 or int64 only

OnnxCumSum_11#

class mlprodict.npy.xop_auto_import_.OnnxCumSum_11(*args, **kwargs)#

Version

  • name: CumSum (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Performs cumulative sum of the input elements along the given axis. By default, it will do the sum inclusively meaning the first element is copied as is. Through an exclusive attribute, this behavior can change to exclude the first element. It can also perform summation in the opposite direction of the axis. For that, set reverse attribute to 1.

Example:

input_x = [1, 2, 3]
axis=0
output = [1, 3, 6]
exclusive=1
output = [0, 1, 3]
exclusive=0
reverse=1
output = [6, 5, 3]
exclusive=1
reverse=1
output = [5, 3, 0]

Attributes

  • exclusive: If set to 1 will return exclusive sum in which the top element is not included. In other terms, if set to 1, the j-th output element would be the sum of the first (j-1) elements. Otherwise, it would be the sum of the first j elements. Default value is 0.

  • reverse: If set to 1 will perform the sums in reverse direction. Default value is 0.

Inputs

  • x (heterogeneous) - T: An input tensor that is to be processed.

  • axis (heterogeneous) - T2: A 0-D tensor. Must be in the range [-rank(x), rank(x)-1]. Negative value means counting dimensions from the back.

Outputs

  • y (heterogeneous) - T: Output tensor of the same type as ‘x’ with cumulative sums of the x’s elements

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Input can be of any tensor type.

  • T2 in ( tensor(int32), tensor(int64) ): axis tensor can be int32 or int64 only

OnnxCumSum_14#

class mlprodict.npy.xop_auto_import_.OnnxCumSum_14(*args, **kwargs)#

Version

  • name: CumSum (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Performs cumulative sum of the input elements along the given axis. By default, it will do the sum inclusively meaning the first element is copied as is. Through an exclusive attribute, this behavior can change to exclude the first element. It can also perform summation in the opposite direction of the axis. For that, set reverse attribute to 1.

Example:

input_x = [1, 2, 3]
axis=0
output = [1, 3, 6]
exclusive=1
output = [0, 1, 3]
exclusive=0
reverse=1
output = [6, 5, 3]
exclusive=1
reverse=1
output = [5, 3, 0]

Attributes

  • exclusive: If set to 1 will return exclusive sum in which the top element is not included. In other terms, if set to 1, the j-th output element would be the sum of the first (j-1) elements. Otherwise, it would be the sum of the first j elements. Default value is 0.

  • reverse: If set to 1 will perform the sums in reverse direction. Default value is 0.

Inputs

  • x (heterogeneous) - T: An input tensor that is to be processed.

  • axis (heterogeneous) - T2: A 0-D tensor. Must be in the range [-rank(x), rank(x)-1]. Negative value means counting dimensions from the back.

Outputs

  • y (heterogeneous) - T: Output tensor of the same type as ‘x’ with cumulative sums of the x’s elements

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

  • T2 in ( tensor(int32), tensor(int64) ): axis tensor can be int32 or int64 only

OnnxDFT#

class mlprodict.npy.xop_auto_import_.OnnxDFT(*args, **kwargs)#

Version

  • name: DFT (GitHub)

  • domain: main

  • since_version: 17

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Computes the discrete Fourier transform of input.

Attributes

  • axis: The axis on which to perform the DFT. By default this value is set to 1, which corresponds to the first dimension after the batch index. Default value is 1.

  • inverse: Whether to perform the inverse discrete fourier transform. By default this value is set to 0, which corresponds to false. Default value is 0.

  • onesided: If onesided is 1, only values for w in [0, 1, 2, …, floor(n_fft/2) + 1] are returned because the real-to-complex Fourier transform satisfies the conjugate symmetry, i.e., X[m, w] = X[m,w]=X[m,n_fft-w]*. Note if the input or window tensors are complex, then onesided output is not possible. Enabling onesided with real inputs performs a Real-valued fast Fourier transform (RFFT). When invoked with real or complex valued input, the default value is 0. Values can be 0 or 1. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • input (heterogeneous) - T1: For real input, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]…[signal_dimN][1]. For complex input, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]…[signal_dimN][2]. The first dimension is the batch dimension. The following N dimentions correspond to the signal’s dimensions. The final dimension represents the real and imaginary parts of the value in that order.

  • dft_length (optional, heterogeneous) - T2: The length of the signal.If greater than the axis dimension, the signal will be zero-padded up to dft_length. If less than the axis dimension, only the first dft_length values will be used as the signal. It’s an optional value.

Outputs

  • output (heterogeneous) - T1: The Fourier Transform of the input vector.If onesided is 0, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]…[signal_dimN][2]. If axis=0 and onesided is 1, the following shape is expected: [batch_idx][floor(signal_dim1/2)+1][signal_dim2]…[signal_dimN][2]. If axis=1 and onesided is 1, the following shape is expected: [batch_idx][signal_dim1][floor(signal_dim2/2)+1]…[signal_dimN][2]. If axis=N-1 and onesided is 1, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]…[floor(signal_dimN/2)+1][2]. The signal_dim at the specified axis is equal to the dft_length.

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T2 in ( tensor(int32), tensor(int64) ): Constrain scalar length types to int64_t.

OnnxDFT_17#

class mlprodict.npy.xop_auto_import_.OnnxDFT_17(*args, **kwargs)#

Version

  • name: DFT (GitHub)

  • domain: main

  • since_version: 17

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Computes the discrete Fourier transform of input.

Attributes

  • axis: The axis on which to perform the DFT. By default this value is set to 1, which corresponds to the first dimension after the batch index. Default value is 1.

  • inverse: Whether to perform the inverse discrete fourier transform. By default this value is set to 0, which corresponds to false. Default value is 0.

  • onesided: If onesided is 1, only values for w in [0, 1, 2, …, floor(n_fft/2) + 1] are returned because the real-to-complex Fourier transform satisfies the conjugate symmetry, i.e., X[m, w] = X[m,w]=X[m,n_fft-w]*. Note if the input or window tensors are complex, then onesided output is not possible. Enabling onesided with real inputs performs a Real-valued fast Fourier transform (RFFT). When invoked with real or complex valued input, the default value is 0. Values can be 0 or 1. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • input (heterogeneous) - T1: For real input, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]…[signal_dimN][1]. For complex input, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]…[signal_dimN][2]. The first dimension is the batch dimension. The following N dimentions correspond to the signal’s dimensions. The final dimension represents the real and imaginary parts of the value in that order.

  • dft_length (optional, heterogeneous) - T2: The length of the signal.If greater than the axis dimension, the signal will be zero-padded up to dft_length. If less than the axis dimension, only the first dft_length values will be used as the signal. It’s an optional value.

Outputs

  • output (heterogeneous) - T1: The Fourier Transform of the input vector.If onesided is 0, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]…[signal_dimN][2]. If axis=0 and onesided is 1, the following shape is expected: [batch_idx][floor(signal_dim1/2)+1][signal_dim2]…[signal_dimN][2]. If axis=1 and onesided is 1, the following shape is expected: [batch_idx][signal_dim1][floor(signal_dim2/2)+1]…[signal_dimN][2]. If axis=N-1 and onesided is 1, the following shape is expected: [batch_idx][signal_dim1][signal_dim2]…[floor(signal_dimN/2)+1][2]. The signal_dim at the specified axis is equal to the dft_length.

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T2 in ( tensor(int32), tensor(int64) ): Constrain scalar length types to int64_t.

OnnxDepthToSpace#

class mlprodict.npy.xop_auto_import_.OnnxDepthToSpace(*args, **kwargs)#

Version

  • name: DepthToSpace (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

DepthToSpace rearranges (permutes) data from depth into blocks of spatial data. This is the reverse transformation of SpaceToDepth. More specifically, this op outputs a copy of the input tensor where values from the depth dimension are moved in spatial blocks to the height and width dimensions. By default, mode = DCR. In the DCR mode, elements along the depth dimension from the input tensor are rearranged in the following order: depth, column, and then row. The output y is computed from the input x as below:

b, c, h, w = x.shape

tmp = np.reshape(x, [b, blocksize, blocksize, c // (blocksize**2), h, w])

tmp = np.transpose(tmp, [0, 3, 4, 1, 5, 2])

y = np.reshape(tmp, [b, c // (blocksize**2), h * blocksize, w * blocksize])

In the CRD mode, elements along the depth dimension from the input tensor are rearranged in the following order: column, row, and the depth. The output y is computed from the input x as below:

b, c, h, w = x.shape

tmp = np.reshape(x, [b, c // (blocksize ** 2), blocksize, blocksize, h, w])

tmp = np.transpose(tmp, [0, 1, 4, 2, 5, 3])

y = np.reshape(tmp, [b, c // (blocksize ** 2), h * blocksize, w * blocksize])

Attributes

  • blocksize (required): Blocks of [blocksize, blocksize] are moved.

  • mode: DCR (default) for depth-column-row order re-arrangement. Use CRD for column-row-depth order. Default value is 'DCR'.

Inputs

  • input (heterogeneous) - T: Input tensor of [N,C,H,W], where N is the batch axis, C is the channel or depth, H is the height and W is the width.

Outputs

  • output (heterogeneous) - T: Output tensor of [N, C/(blocksize * blocksize), H * blocksize, W * blocksize].

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxDepthToSpace_1#

class mlprodict.npy.xop_auto_import_.OnnxDepthToSpace_1(*args, **kwargs)#

Version

  • name: DepthToSpace (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

DepthToSpace rearranges (permutes) data from depth into blocks of spatial data. This is the reverse transformation of SpaceToDepth. More specifically, this op outputs a copy of the input tensor where values from the depth dimension are moved in spatial blocks to the height and width dimensions.

Attributes

  • blocksize (required): Blocks of [blocksize, blocksize] are moved.

Inputs

  • input (heterogeneous) - T: Input tensor of [N,C,H,W], where N is the batch axis, C is the channel or depth, H is the height and W is the width.

Outputs

  • output (heterogeneous) - T: Output tensor of [N, C/(blocksize * blocksize), H * blocksize, W * blocksize].

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxDepthToSpace_11#

class mlprodict.npy.xop_auto_import_.OnnxDepthToSpace_11(*args, **kwargs)#

Version

  • name: DepthToSpace (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

DepthToSpace rearranges (permutes) data from depth into blocks of spatial data. This is the reverse transformation of SpaceToDepth. More specifically, this op outputs a copy of the input tensor where values from the depth dimension are moved in spatial blocks to the height and width dimensions. By default, mode = DCR. In the DCR mode, elements along the depth dimension from the input tensor are rearranged in the following order: depth, column, and then row. The output y is computed from the input x as below:

b, c, h, w = x.shape

tmp = np.reshape(x, [b, blocksize, blocksize, c // (blocksize**2), h, w])

tmp = np.transpose(tmp, [0, 3, 4, 1, 5, 2])

y = np.reshape(tmp, [b, c // (blocksize**2), h * blocksize, w * blocksize])

In the CRD mode, elements along the depth dimension from the input tensor are rearranged in the following order: column, row, and the depth. The output y is computed from the input x as below:

b, c, h, w = x.shape

tmp = np.reshape(x, [b, c // (blocksize ** 2), blocksize, blocksize, h, w])

tmp = np.transpose(tmp, [0, 1, 4, 2, 5, 3])

y = np.reshape(tmp, [b, c // (blocksize ** 2), h * blocksize, w * blocksize])

Attributes

  • blocksize (required): Blocks of [blocksize, blocksize] are moved.

  • mode: DCR (default) for depth-column-row order re-arrangement. Use CRD for column-row-depth order. Default value is 'DCR'.

Inputs

  • input (heterogeneous) - T: Input tensor of [N,C,H,W], where N is the batch axis, C is the channel or depth, H is the height and W is the width.

Outputs

  • output (heterogeneous) - T: Output tensor of [N, C/(blocksize * blocksize), H * blocksize, W * blocksize].

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxDepthToSpace_13#

class mlprodict.npy.xop_auto_import_.OnnxDepthToSpace_13(*args, **kwargs)#

Version

  • name: DepthToSpace (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

DepthToSpace rearranges (permutes) data from depth into blocks of spatial data. This is the reverse transformation of SpaceToDepth. More specifically, this op outputs a copy of the input tensor where values from the depth dimension are moved in spatial blocks to the height and width dimensions. By default, mode = DCR. In the DCR mode, elements along the depth dimension from the input tensor are rearranged in the following order: depth, column, and then row. The output y is computed from the input x as below:

b, c, h, w = x.shape

tmp = np.reshape(x, [b, blocksize, blocksize, c // (blocksize**2), h, w])

tmp = np.transpose(tmp, [0, 3, 4, 1, 5, 2])

y = np.reshape(tmp, [b, c // (blocksize**2), h * blocksize, w * blocksize])

In the CRD mode, elements along the depth dimension from the input tensor are rearranged in the following order: column, row, and the depth. The output y is computed from the input x as below:

b, c, h, w = x.shape

tmp = np.reshape(x, [b, c // (blocksize ** 2), blocksize, blocksize, h, w])

tmp = np.transpose(tmp, [0, 1, 4, 2, 5, 3])

y = np.reshape(tmp, [b, c // (blocksize ** 2), h * blocksize, w * blocksize])

Attributes

  • blocksize (required): Blocks of [blocksize, blocksize] are moved.

  • mode: DCR (default) for depth-column-row order re-arrangement. Use CRD for column-row-depth order. Default value is 'DCR'.

Inputs

  • input (heterogeneous) - T: Input tensor of [N,C,H,W], where N is the batch axis, C is the channel or depth, H is the height and W is the width.

Outputs

  • output (heterogeneous) - T: Output tensor of [N, C/(blocksize * blocksize), H * blocksize, W * blocksize].

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxDequantizeLinear#

class mlprodict.npy.xop_auto_import_.OnnxDequantizeLinear(*args, **kwargs)#

Version

  • name: DequantizeLinear (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale. ‘x_scale’ and ‘x_zero_point’ must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. ‘x_zero_point’ and ‘x’ must have same type. ‘x’ and ‘y’ must have same shape. In the case of dequantizing int32, there’s no zero point (zero point is supposed to be 0).

Attributes

  • axis: (Optional) The axis of the dequantizing dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is 1.

Inputs

Between 2 and 3 inputs.

  • x (heterogeneous) - T: N-D quantized input tensor to be de-quantized.

  • x_scale (heterogeneous) - tensor(float): Scale for input ‘x’. It can be a scalar, which means a per- tensor/layer dequantization, or a 1-D tensor for per-axis dequantization.

  • x_zero_point (optional, heterogeneous) - T: Zero point for input ‘x’. Shape must match x_scale. It’s optional. Zero point is 0 when it’s not specified.

Outputs

  • y (heterogeneous) - tensor(float): N-D full precision output tensor. It has same shape as input ‘x’.

Type Constraints

  • T in ( tensor(int32), tensor(int8), tensor(uint8) ): Constrain ‘x_zero_point’ and ‘x’ to 8-bit/32-bit integer tensor.

OnnxDequantizeLinear_10#

class mlprodict.npy.xop_auto_import_.OnnxDequantizeLinear_10(*args, **kwargs)#

Version

  • name: DequantizeLinear (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

The linear dequantization operator. It consumes a quantized tensor, a scale, a zero point to compute the full precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale. ‘x_scale’ and ‘x_zero_point’ are both scalars. ‘x_zero_point’ and ‘x’ must have same type. ‘x’ and ‘y’ must have same shape. In the case of dequantizing int32, there’s no zero point (zero point is supposed to be 0).

Inputs

Between 2 and 3 inputs.

  • x (heterogeneous) - T: N-D quantized input tensor to be de-quantized.

  • x_scale (heterogeneous) - tensor(float): Scale for input ‘x’. It’s a scalar, which means a per-tensor/layer quantization.

  • x_zero_point (optional, heterogeneous) - T: Zero point for input ‘x’. It’s a scalar, which means a per- tensor/layer quantization. It’s optional. 0 is the default value when it’s not specified.

Outputs

  • y (heterogeneous) - tensor(float): N-D full precision output tensor. It has same shape as input ‘x’.

Type Constraints

  • T in ( tensor(int32), tensor(int8), tensor(uint8) ): Constrain ‘x_zero_point’ and ‘x’ to 8-bit/32-bit integer tensor.

OnnxDequantizeLinear_13#

class mlprodict.npy.xop_auto_import_.OnnxDequantizeLinear_13(*args, **kwargs)#

Version

  • name: DequantizeLinear (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full precision tensor. The dequantization formula is y = (x - x_zero_point) * x_scale. ‘x_scale’ and ‘x_zero_point’ must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. ‘x_zero_point’ and ‘x’ must have same type. ‘x’ and ‘y’ must have same shape. In the case of dequantizing int32, there’s no zero point (zero point is supposed to be 0).

Attributes

  • axis: (Optional) The axis of the dequantizing dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is 1.

Inputs

Between 2 and 3 inputs.

  • x (heterogeneous) - T: N-D quantized input tensor to be de-quantized.

  • x_scale (heterogeneous) - tensor(float): Scale for input ‘x’. It can be a scalar, which means a per- tensor/layer dequantization, or a 1-D tensor for per-axis dequantization.

  • x_zero_point (optional, heterogeneous) - T: Zero point for input ‘x’. Shape must match x_scale. It’s optional. Zero point is 0 when it’s not specified.

Outputs

  • y (heterogeneous) - tensor(float): N-D full precision output tensor. It has same shape as input ‘x’.

Type Constraints

  • T in ( tensor(int32), tensor(int8), tensor(uint8) ): Constrain ‘x_zero_point’ and ‘x’ to 8-bit/32-bit integer tensor.

OnnxDet#

class mlprodict.npy.xop_auto_import_.OnnxDet(*args, **kwargs)#

Version

  • name: Det (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Det calculates determinant of a square matrix or batches of square matrices. Det takes one input tensor of shape [*, M, M], where * is zero or more batch dimensions, and the inner-most 2 dimensions form square matrices. The output is a tensor of shape [*], containing the determinants of all input submatrices. e.g., When the input is 2-D, the output is a scalar(shape is empty: []).

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to floating-point tensors.

OnnxDet_11#

class mlprodict.npy.xop_auto_import_.OnnxDet_11(*args, **kwargs)#

Version

  • name: Det (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Det calculates determinant of a square matrix or batches of square matrices. Det takes one input tensor of shape [*, M, M], where * is zero or more batch dimensions, and the inner-most 2 dimensions form square matrices. The output is a tensor of shape [*], containing the determinants of all input submatrices. e.g., When the input is 2-D, the output is a scalar(shape is empty: []).

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to floating-point tensors.

OnnxDiv#

class mlprodict.npy.xop_auto_import_.OnnxDiv(*args, **kwargs)#

Version

  • name: Div (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Performs element-wise binary division (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

(Opset 14 change): Extend supported types to include uint8, int8, uint16, and int16.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxDiv_1#

class mlprodict.npy.xop_auto_import_.OnnxDiv_1(*args, **kwargs)#

Version

  • name: Div (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Performs element-wise binary division (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of element size 1 (including a scalar tensor and any tensor with rank equal to or smaller than the first tensor), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor shape(A) = (2, 3, 4, 5), shape(B) = (5,) shape(A) = (2, 3, 4, 5), shape(B) = (4, 5) shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1 shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

  • axis: If set, defines the broadcast dimensions. See doc for details.

  • broadcast: Pass 1 to enable broadcasting Default value is 0.

  • consumed_inputs: legacy optimization attribute.

Inputs

  • A (heterogeneous) - T: First operand, should share the type with the second operand.

  • B (heterogeneous) - T: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

  • C (heterogeneous) - T: Result, has same dimensions and type as A

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxDiv_13#

class mlprodict.npy.xop_auto_import_.OnnxDiv_13(*args, **kwargs)#

Version

  • name: Div (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Performs element-wise binary division (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxDiv_14#

class mlprodict.npy.xop_auto_import_.OnnxDiv_14(*args, **kwargs)#

Version

  • name: Div (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Performs element-wise binary division (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

(Opset 14 change): Extend supported types to include uint8, int8, uint16, and int16.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxDiv_6#

class mlprodict.npy.xop_auto_import_.OnnxDiv_6(*args, **kwargs)#

Version

  • name: Div (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Performs element-wise binary division (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of element size 1 (including a scalar tensor and any tensor with rank equal to or smaller than the first tensor), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor shape(A) = (2, 3, 4, 5), shape(B) = (5,) shape(A) = (2, 3, 4, 5), shape(B) = (4, 5) shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1 shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

  • axis: If set, defines the broadcast dimensions. See doc for details.

  • broadcast: Pass 1 to enable broadcasting Default value is 0.

Inputs

  • A (heterogeneous) - T: First operand, should share the type with the second operand.

  • B (heterogeneous) - T: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

  • C (heterogeneous) - T: Result, has same dimensions and type as A

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxDiv_7#

class mlprodict.npy.xop_auto_import_.OnnxDiv_7(*args, **kwargs)#

Version

  • name: Div (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Performs element-wise binary division (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxDropout#

class mlprodict.npy.xop_auto_import_.OnnxDropout(*args, **kwargs)#

Version

  • name: Dropout (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Dropout takes an input floating-point tensor, an optional input ratio (floating-point scalar) and an optional input training_mode (boolean scalar). It produces two tensor outputs, output (floating-point tensor) and mask (optional Tensor<bool>). If training_mode is true then the output Y will be a random dropout; Note that this Dropout scales the masked input data by the following equation, so to convert the trained model into inference mode, the user can simply not pass training_mode input or set it to false.

output = scale * data * mask,

where

scale = 1. / (1. - ratio).

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

Between 1 and 3 inputs.

  • data (heterogeneous) - T: The input data as Tensor.

  • ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.

  • training_mode (optional, heterogeneous) - T2: If set to true then it indicates dropout is being used for training. It is an optional value hence unless specified explicitly, it is false. If it is false, ratio is ignored and the operation mimics inference mode where nothing will be dropped from the input data and if mask is requested as output it will contain all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T2: The output mask.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain input ‘ratio’ types to float tensors.

  • T2 in ( tensor(bool) ): Constrain output ‘mask’ types to boolean tensors.

OnnxDropout_1#

class mlprodict.npy.xop_auto_import_.OnnxDropout_1(*args, **kwargs)#

Version

  • name: Dropout (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Dropout takes one input data (Tensor<float>) and produces two Tensor outputs, output (Tensor<float>) and mask (Tensor<bool>). Depending on whether it is in test mode or not, the output Y will either be a random dropout, or a simple copy of the input. Note that our implementation of Dropout does scaling in the training phase, so during testing nothing needs to be done.

Attributes

  • consumed_inputs: legacy optimization attribute.

  • is_test: (int, default 0) if nonzero, run dropout in test mode where the output is simply Y = X. Default value is 0.

  • ratio: (float, default 0.5) the ratio of random dropout Default value is 0.5.

Inputs

  • data (heterogeneous) - T: The input data as Tensor.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T: The output mask. If is_test is nonzero, this output is not filled.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxDropout_10#

class mlprodict.npy.xop_auto_import_.OnnxDropout_10(*args, **kwargs)#

Version

  • name: Dropout (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Dropout takes one input floating tensor and produces two tensor outputs, output (floating tensor) and mask (Tensor<bool>). Depending on whether it is in test mode or not, the output Y will either be a random dropout, or a simple copy of the input. Note that our implementation of Dropout does scaling in the training phase, so during testing nothing needs to be done. This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • ratio: The ratio of random dropout Default value is 0.5.

Inputs

  • data (heterogeneous) - T: The input data as Tensor.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T1: The output mask.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(bool) ): Constrain output mask types to boolean tensors.

OnnxDropout_12#

class mlprodict.npy.xop_auto_import_.OnnxDropout_12(*args, **kwargs)#

Version

  • name: Dropout (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Dropout takes an input floating-point tensor, an optional input ratio (floating-point scalar) and an optional input training_mode (boolean scalar). It produces two tensor outputs, output (floating-point tensor) and mask (optional Tensor<bool>). If training_mode is true then the output Y will be a random dropout; Note that this Dropout scales the masked input data by the following equation, so to convert the trained model into inference mode, the user can simply not pass training_mode input or set it to false.

output = scale * data * mask,

where

scale = 1. / (1. - ratio).

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

Between 1 and 3 inputs.

  • data (heterogeneous) - T: The input data as Tensor.

  • ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.

  • training_mode (optional, heterogeneous) - T2: If set to true then it indicates dropout is being used for training. It is an optional value hence unless specified explicitly, it is false. If it is false, ratio is ignored and the operation mimics inference mode where nothing will be dropped from the input data and if mask is requested as output it will contain all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T2: The output mask.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain input ‘ratio’ types to float tensors.

  • T2 in ( tensor(bool) ): Constrain output ‘mask’ types to boolean tensors.

OnnxDropout_13#

class mlprodict.npy.xop_auto_import_.OnnxDropout_13(*args, **kwargs)#

Version

  • name: Dropout (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Dropout takes an input floating-point tensor, an optional input ratio (floating-point scalar) and an optional input training_mode (boolean scalar). It produces two tensor outputs, output (floating-point tensor) and mask (optional Tensor<bool>). If training_mode is true then the output Y will be a random dropout; Note that this Dropout scales the masked input data by the following equation, so to convert the trained model into inference mode, the user can simply not pass training_mode input or set it to false.

output = scale * data * mask,

where

scale = 1. / (1. - ratio).

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

Between 1 and 3 inputs.

  • data (heterogeneous) - T: The input data as Tensor.

  • ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.

  • training_mode (optional, heterogeneous) - T2: If set to true then it indicates dropout is being used for training. It is an optional value hence unless specified explicitly, it is false. If it is false, ratio is ignored and the operation mimics inference mode where nothing will be dropped from the input data and if mask is requested as output it will contain all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T2: The output mask.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain input ‘ratio’ types to float tensors.

  • T2 in ( tensor(bool) ): Constrain output ‘mask’ types to boolean tensors.

OnnxDropout_6#

class mlprodict.npy.xop_auto_import_.OnnxDropout_6(*args, **kwargs)#

Version

  • name: Dropout (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Dropout takes one input data (Tensor<float>) and produces two Tensor outputs, output (Tensor<float>) and mask (Tensor<bool>). Depending on whether it is in test mode or not, the output Y will either be a random dropout, or a simple copy of the input. Note that our implementation of Dropout does scaling in the training phase, so during testing nothing needs to be done.

Attributes

  • is_test: (int, default 0) if nonzero, run dropout in test mode where the output is simply Y = X. Default value is 0.

  • ratio: (float, default 0.5) the ratio of random dropout Default value is 0.5.

Inputs

  • data (heterogeneous) - T: The input data as Tensor.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T: The output mask. If is_test is nonzero, this output is not filled.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxDropout_7#

class mlprodict.npy.xop_auto_import_.OnnxDropout_7(*args, **kwargs)#

Version

  • name: Dropout (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Dropout takes one input data (Tensor<float>) and produces two Tensor outputs, output (Tensor<float>) and mask (Tensor<bool>). Depending on whether it is in test mode or not, the output Y will either be a random dropout, or a simple copy of the input. Note that our implementation of Dropout does scaling in the training phase, so during testing nothing needs to be done. This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • ratio: The ratio of random dropout Default value is 0.5.

Inputs

  • data (heterogeneous) - T: The input data as Tensor.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: The output.

  • mask (optional, heterogeneous) - T: The output mask.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxDynamicQuantizeLinear#

class mlprodict.npy.xop_auto_import_.OnnxDynamicQuantizeLinear(*args, **kwargs)#

Version

This version of the operator has been available since version 11.

Summary

A Function to fuse calculation for Scale, Zero Point and FP32->8Bit convertion of FP32 Input data. Outputs Scale, ZeroPoint and Quantized Input for a given FP32 Input. Scale is calculated as:

y_scale = (max(x) - min(x))/(qmax - qmin)
* where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8
* data range is adjusted to include 0.

Zero point is calculated as:

intermediate_zero_point = qmin - min(x)/y_scale
y_zero_point = cast(round(saturate(itermediate_zero_point)))
* where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8
* for saturation, it saturates to [0, 255] if it's uint8, or [-127, 127] if it's int8. Right now only uint8 is supported.
* rounding to nearest ties to even.

Data quantization formula is:

y = saturate (round (x / y_scale) + y_zero_point)
* for saturation, it saturates to [0, 255] if it's uint8, or [-127, 127] if it's int8. Right now only uint8 is supported.
* rounding to nearest ties to even.

Inputs

  • x (heterogeneous) - T1: Input tensor

Outputs

  • y (heterogeneous) - T2: Quantized output tensor

  • y_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • y_zero_point (heterogeneous) - T2: Output zero point. It’s a scalar, which means a per-tensor/layer quantization.

Type Constraints

  • T1 in ( tensor(float) ): Constrain ‘x’ to float tensor.

  • T2 in ( tensor(uint8) ): Constrain ‘y_zero_point’ and ‘y’ to 8-bit unsigned integer tensor.

OnnxDynamicQuantizeLinear_11#

class mlprodict.npy.xop_auto_import_.OnnxDynamicQuantizeLinear_11(*args, **kwargs)#

Version

This version of the operator has been available since version 11.

Summary

A Function to fuse calculation for Scale, Zero Point and FP32->8Bit convertion of FP32 Input data. Outputs Scale, ZeroPoint and Quantized Input for a given FP32 Input. Scale is calculated as:

y_scale = (max(x) - min(x))/(qmax - qmin)
* where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8
* data range is adjusted to include 0.

Zero point is calculated as:

intermediate_zero_point = qmin - min(x)/y_scale
y_zero_point = cast(round(saturate(itermediate_zero_point)))
* where qmax and qmin are max and min values for quantization range .i.e [0, 255] in case of uint8
* for saturation, it saturates to [0, 255] if it's uint8, or [-127, 127] if it's int8. Right now only uint8 is supported.
* rounding to nearest ties to even.

Data quantization formula is:

y = saturate (round (x / y_scale) + y_zero_point)
* for saturation, it saturates to [0, 255] if it's uint8, or [-127, 127] if it's int8. Right now only uint8 is supported.
* rounding to nearest ties to even.

Inputs

  • x (heterogeneous) - T1: Input tensor

Outputs

  • y (heterogeneous) - T2: Quantized output tensor

  • y_scale (heterogeneous) - tensor(float): Output scale. It’s a scalar, which means a per-tensor/layer quantization.

  • y_zero_point (heterogeneous) - T2: Output zero point. It’s a scalar, which means a per-tensor/layer quantization.

Type Constraints

  • T1 in ( tensor(float) ): Constrain ‘x’ to float tensor.

  • T2 in ( tensor(uint8) ): Constrain ‘y_zero_point’ and ‘y’ to 8-bit unsigned integer tensor.

OnnxEinsum#

class mlprodict.npy.xop_auto_import_.OnnxEinsum(*args, **kwargs)#

Version

  • name: Einsum (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

An einsum of the form `term1, term2 -> output-term` produces an output tensor using the following equation

where the reduce-sum performs a summation over all the indices occurring in the input terms (term1, term2)
that do not occur in the output-term.

The Einsum operator evaluates algebraic tensor operations on a sequence of tensors, using the Einstein summation
convention. The equation string contains a comma-separated sequence of lower case letters. Each term corresponds to
an operand tensor, and the characters within the terms correspond to operands dimensions.

This sequence may be followed by "->" to separate the left and right hand side of the equation.
If the equation contains "->" followed by the right-hand side, the explicit (not classical) form of the Einstein
summation is performed, and the right-hand side indices indicate output tensor dimensions. In other cases,
output indices are (implicitly) set to the alphabetically sorted sequence of indices appearing exactly once in the
equation.

When a dimension character is repeated in the left-hand side, it represents summation along the dimension.

The equation may contain ellipsis ("...") to enable broadcasting. Ellipsis must indicate a fixed number of dimensions.
Specifically, every occurrence of ellipsis in the equation must represent the same number of dimensions.
The right-hand side may contain exactly one ellipsis. In implicit mode, the ellipsis dimensions are set to the
beginning of the output. The equation string may contain space (U+0020) character.

Attributes

  • equation (required): Einsum expression string.

Inputs

Between 1 and 2147483647 inputs.

  • Inputs (variadic, heterogeneous) - T: Operands

Outputs

  • Output (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numerical tensor types.

OnnxEinsum_12#

class mlprodict.npy.xop_auto_import_.OnnxEinsum_12(*args, **kwargs)#

Version

  • name: Einsum (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

An einsum of the form `term1, term2 -> output-term` produces an output tensor using the following equation

where the reduce-sum performs a summation over all the indices occurring in the input terms (term1, term2)
that do not occur in the output-term.

The Einsum operator evaluates algebraic tensor operations on a sequence of tensors, using the Einstein summation
convention. The equation string contains a comma-separated sequence of lower case letters. Each term corresponds to
an operand tensor, and the characters within the terms correspond to operands dimensions.

This sequence may be followed by "->" to separate the left and right hand side of the equation.
If the equation contains "->" followed by the right-hand side, the explicit (not classical) form of the Einstein
summation is performed, and the right-hand side indices indicate output tensor dimensions. In other cases,
output indices are (implicitly) set to the alphabetically sorted sequence of indices appearing exactly once in the
equation.

When a dimension character is repeated in the left-hand side, it represents summation along the dimension.

The equation may contain ellipsis ("...") to enable broadcasting. Ellipsis must indicate a fixed number of dimensions.
Specifically, every occurrence of ellipsis in the equation must represent the same number of dimensions.
The right-hand side may contain exactly one ellipsis. In implicit mode, the ellipsis dimensions are set to the
beginning of the output. The equation string may contain space (U+0020) character.

Attributes

  • equation (required): Einsum expression string.

Inputs

Between 1 and 2147483647 inputs.

  • Inputs (variadic, heterogeneous) - T: Operands

Outputs

  • Output (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numerical tensor types.

OnnxElu#

class mlprodict.npy.xop_auto_import_.OnnxElu(*args, **kwargs)#

Version

  • name: Elu (GitHub)

  • domain: main

  • since_version: 6

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Elu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the function f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0., is applied to the tensor elementwise.

Attributes

  • alpha: Coefficient of ELU. Default value is 1.0.

Inputs

  • X (heterogeneous) - T: 1D input tensor

Outputs

  • Y (heterogeneous) - T: 1D output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxElu_1#

class mlprodict.npy.xop_auto_import_.OnnxElu_1(*args, **kwargs)#

Version

  • name: Elu (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Elu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the function f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0., is applied to the tensor elementwise.

Attributes

  • alpha: Coefficient of ELU default to 1.0. Default value is 1.0.

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: 1D input tensor

Outputs

  • Y (heterogeneous) - T: 1D input tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxElu_6#

class mlprodict.npy.xop_auto_import_.OnnxElu_6(*args, **kwargs)#

Version

  • name: Elu (GitHub)

  • domain: main

  • since_version: 6

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Elu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the function f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0., is applied to the tensor elementwise.

Attributes

  • alpha: Coefficient of ELU. Default value is 1.0.

Inputs

  • X (heterogeneous) - T: 1D input tensor

Outputs

  • Y (heterogeneous) - T: 1D output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxEqual#

class mlprodict.npy.xop_auto_import_.OnnxEqual(*args, **kwargs)#

Version

  • name: Equal (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Returns the tensor resulted from performing the equal logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxEqual_1#

class mlprodict.npy.xop_auto_import_.OnnxEqual_1(*args, **kwargs)#

Version

  • name: Equal (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Returns the tensor resulted from performing the equal logical operation elementwise on the input tensors A and B.

If broadcasting is enabled, the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. See the doc of Add for a detailed description of the broadcasting rules.

Attributes

  • axis: If set, defines the broadcast dimensions.

  • broadcast: Enable broadcasting Default value is 0.

Inputs

  • A (heterogeneous) - T: Left input tensor for the logical operator.

  • B (heterogeneous) - T: Right input tensor for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool), tensor(int32), tensor(int64) ): Constrain input to integral tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxEqual_11#

class mlprodict.npy.xop_auto_import_.OnnxEqual_11(*args, **kwargs)#

Version

  • name: Equal (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Returns the tensor resulted from performing the equal logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxEqual_13#

class mlprodict.npy.xop_auto_import_.OnnxEqual_13(*args, **kwargs)#

Version

  • name: Equal (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Returns the tensor resulted from performing the equal logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxEqual_7#

class mlprodict.npy.xop_auto_import_.OnnxEqual_7(*args, **kwargs)#

Version

  • name: Equal (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Returns the tensor resulted from performing the equal logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool), tensor(int32), tensor(int64) ): Constrain input to integral tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxErf#

class mlprodict.npy.xop_auto_import_.OnnxErf(*args, **kwargs)#

Version

  • name: Erf (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the error function of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The error function of the input tensor computed element-wise. It has the same shape and type of the input.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxErf_13#

class mlprodict.npy.xop_auto_import_.OnnxErf_13(*args, **kwargs)#

Version

  • name: Erf (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the error function of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The error function of the input tensor computed element-wise. It has the same shape and type of the input.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxErf_9#

class mlprodict.npy.xop_auto_import_.OnnxErf_9(*args, **kwargs)#

Version

  • name: Erf (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Computes the error function of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The error function of the input tensor computed element-wise. It has the same shape and type of the input.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxExp#

class mlprodict.npy.xop_auto_import_.OnnxExp(*args, **kwargs)#

Version

  • name: Exp (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Calculates the exponential of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The exponential of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxExp_1#

class mlprodict.npy.xop_auto_import_.OnnxExp_1(*args, **kwargs)#

Version

  • name: Exp (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Calculates the exponential of the given input tensor, element-wise.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The exponential of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxExp_13#

class mlprodict.npy.xop_auto_import_.OnnxExp_13(*args, **kwargs)#

Version

  • name: Exp (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Calculates the exponential of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The exponential of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxExp_6#

class mlprodict.npy.xop_auto_import_.OnnxExp_6(*args, **kwargs)#

Version

  • name: Exp (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Calculates the exponential of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The exponential of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxExpand#

class mlprodict.npy.xop_auto_import_.OnnxExpand(*args, **kwargs)#

Version

  • name: Expand (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Broadcast the input tensor following the given shape and the broadcast rule. The broadcast rule is similar to numpy.array(input) * numpy.ones(shape): Dimensions are right alignment; Two corresponding dimensions must have the same value, or one of them is equal to 1. Also, this operator is similar to numpy.broadcast_to(input, shape), but the major difference is numpy.broadcast_to() does not allow shape to be smaller than input.size(). It is possible that the output.shape is not equal to shape, when some dimensions in shape is equal to 1, or the shape.ndim < input.shape.ndim.

Inputs

  • input (heterogeneous) - T: Input tensor

  • shape (heterogeneous) - tensor(int64): A 1-D tensor indicates the shape you want to expand to, following the broadcast rule

Outputs

  • output (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensors.

OnnxExpand_13#

class mlprodict.npy.xop_auto_import_.OnnxExpand_13(*args, **kwargs)#

Version

  • name: Expand (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Broadcast the input tensor following the given shape and the broadcast rule. The broadcast rule is similar to numpy.array(input) * numpy.ones(shape): Dimensions are right alignment; Two corresponding dimensions must have the same value, or one of them is equal to 1. Also, this operator is similar to numpy.broadcast_to(input, shape), but the major difference is numpy.broadcast_to() does not allow shape to be smaller than input.size(). It is possible that the output.shape is not equal to shape, when some dimensions in shape is equal to 1, or the shape.ndim < input.shape.ndim.

Inputs

  • input (heterogeneous) - T: Input tensor

  • shape (heterogeneous) - tensor(int64): A 1-D tensor indicates the shape you want to expand to, following the broadcast rule

Outputs

  • output (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensors.

OnnxExpand_8#

class mlprodict.npy.xop_auto_import_.OnnxExpand_8(*args, **kwargs)#

Version

  • name: Expand (GitHub)

  • domain: main

  • since_version: 8

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 8.

Summary

Broadcast the input tensor following the given shape and the broadcast rule. The broadcast rule is similar to numpy.array(input) * numpy.ones(shape): Dimensions are right alignment; Two corresponding dimensions must have the same value, or one of them is equal to 1. Also, this operator is similar to numpy.broadcast_to(input, shape), but the major difference is numpy.broadcast_to() does not allow shape to be smaller than input.size(). It is possible that the output.shape is not equal to shape, when some dimensions in shape is equal to 1, or the shape.ndim < input.shape.ndim.

Inputs

  • input (heterogeneous) - T: Input tensor

  • shape (heterogeneous) - tensor(int64): A 1-D tensor indicates the shape you want to expand to, following the broadcast rule

Outputs

  • output (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensors.

OnnxEyeLike#

class mlprodict.npy.xop_auto_import_.OnnxEyeLike(*args, **kwargs)#

Version

  • name: EyeLike (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Generate a 2D tensor (matrix) with ones on the diagonal and zeros everywhere else. Only 2D tensors are supported, i.e. input T1 must be of rank 2. The shape of the output tensor is the same as the input tensor. The data type can be specified by the ‘dtype’ argument. If ‘dtype’ is not specified, then the type of input tensor is used. By default, the main diagonal is populated with ones, but attribute ‘k’ can be used to populate upper or lower diagonals. The ‘dtype’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message and be valid as an output type.

Attributes

  • dtype: (Optional) The data type for the elements of the output tensor. If not specified,the data type of the input tensor T1 is used. If input tensor T1 is also notspecified, then type defaults to ‘float’.

  • k: (Optional) Index of the diagonal to be populated with ones. Default is 0. If T2 is the output, this op sets T2[i, i+k] = 1. k = 0 populates the main diagonal, k > 0 populates an upper diagonal, and k < 0 populates a lower diagonal. Default value is 0.

Inputs

  • input (heterogeneous) - T1: 2D input tensor to copy shape, and optionally, type information from.

Outputs

  • output (heterogeneous) - T2: Output tensor, same shape as input tensor T1.

Type Constraints

  • T1 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types. Strings and complex are not supported.

  • T2 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types. Strings and complex are not supported.

OnnxEyeLike_9#

class mlprodict.npy.xop_auto_import_.OnnxEyeLike_9(*args, **kwargs)#

Version

  • name: EyeLike (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Generate a 2D tensor (matrix) with ones on the diagonal and zeros everywhere else. Only 2D tensors are supported, i.e. input T1 must be of rank 2. The shape of the output tensor is the same as the input tensor. The data type can be specified by the ‘dtype’ argument. If ‘dtype’ is not specified, then the type of input tensor is used. By default, the main diagonal is populated with ones, but attribute ‘k’ can be used to populate upper or lower diagonals. The ‘dtype’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message and be valid as an output type.

Attributes

  • dtype: (Optional) The data type for the elements of the output tensor. If not specified,the data type of the input tensor T1 is used. If input tensor T1 is also notspecified, then type defaults to ‘float’.

  • k: (Optional) Index of the diagonal to be populated with ones. Default is 0. If T2 is the output, this op sets T2[i, i+k] = 1. k = 0 populates the main diagonal, k > 0 populates an upper diagonal, and k < 0 populates a lower diagonal. Default value is 0.

Inputs

  • input (heterogeneous) - T1: 2D input tensor to copy shape, and optionally, type information from.

Outputs

  • output (heterogeneous) - T2: Output tensor, same shape as input tensor T1.

Type Constraints

  • T1 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types. Strings and complex are not supported.

  • T2 in ( tensor(bool), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types. Strings and complex are not supported.

OnnxFlatten#

class mlprodict.npy.xop_auto_import_.OnnxFlatten(*args, **kwargs)#

Version

  • name: Flatten (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Flattens the input tensor into a 2D matrix. If input tensor has shape (d_0, d_1, … d_n) then the output will have shape (d_0 X d_1 … d_(axis-1), d_axis X d_(axis+1) … X dn).

Attributes

  • axis: Indicate up to which input dimensions (exclusive) should be flattened to the outer dimension of the output. The value for axis must be in the range [-r, r], where r is the rank of the input tensor. Negative value means counting dimensions from the back. When axis = 0, the shape of the output tensor is (1, (d_0 X d_1 … d_n), where the shape of the input tensor is (d_0, d_1, … d_n). Default value is 1.

Inputs

  • input (heterogeneous) - T: A tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: A 2D tensor with the contents of the input tensor, with input dimensions up to axis flattened to the outer dimension of the output and remaining input dimensions flattened into the inner dimension of the output.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output to all tensor types.

OnnxFlatten_1#

class mlprodict.npy.xop_auto_import_.OnnxFlatten_1(*args, **kwargs)#

Version

  • name: Flatten (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Flattens the input tensor into a 2D matrix. If input tensor has shape (d_0, d_1, … d_n) then the output will have shape (d_0 X d_1 … d_(axis-1), d_axis X d_(axis+1) … X dn).

Attributes

  • axis: Indicate up to which input dimensions (exclusive) should be flattened to the outer dimension of the output. The value for axis must be in the range [0, R], where R is the rank of the input tensor. When axis = 0, the shape of the output tensor is (1, (d_0 X d_1 … d_n), where the shape of the input tensor is (d_0, d_1, … d_n). Default value is 1.

Inputs

  • input (heterogeneous) - T: A tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: A 2D tensor with the contents of the input tensor, with input dimensions up to axis flattened to the outer dimension of the output and remaining input dimensions flattened into the inner dimension of the output.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxFlatten_11#

class mlprodict.npy.xop_auto_import_.OnnxFlatten_11(*args, **kwargs)#

Version

  • name: Flatten (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Flattens the input tensor into a 2D matrix. If input tensor has shape (d_0, d_1, … d_n) then the output will have shape (d_0 X d_1 … d_(axis-1), d_axis X d_(axis+1) … X dn).

Attributes

  • axis: Indicate up to which input dimensions (exclusive) should be flattened to the outer dimension of the output. The value for axis must be in the range [-r, r], where r is the rank of the input tensor. Negative value means counting dimensions from the back. When axis = 0, the shape of the output tensor is (1, (d_0 X d_1 … d_n), where the shape of the input tensor is (d_0, d_1, … d_n). Default value is 1.

Inputs

  • input (heterogeneous) - T: A tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: A 2D tensor with the contents of the input tensor, with input dimensions up to axis flattened to the outer dimension of the output and remaining input dimensions flattened into the inner dimension of the output.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output to all tensor types.

OnnxFlatten_13#

class mlprodict.npy.xop_auto_import_.OnnxFlatten_13(*args, **kwargs)#

Version

  • name: Flatten (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Flattens the input tensor into a 2D matrix. If input tensor has shape (d_0, d_1, … d_n) then the output will have shape (d_0 X d_1 … d_(axis-1), d_axis X d_(axis+1) … X dn).

Attributes

  • axis: Indicate up to which input dimensions (exclusive) should be flattened to the outer dimension of the output. The value for axis must be in the range [-r, r], where r is the rank of the input tensor. Negative value means counting dimensions from the back. When axis = 0, the shape of the output tensor is (1, (d_0 X d_1 … d_n), where the shape of the input tensor is (d_0, d_1, … d_n). Default value is 1.

Inputs

  • input (heterogeneous) - T: A tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: A 2D tensor with the contents of the input tensor, with input dimensions up to axis flattened to the outer dimension of the output and remaining input dimensions flattened into the inner dimension of the output.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output to all tensor types.

OnnxFlatten_9#

class mlprodict.npy.xop_auto_import_.OnnxFlatten_9(*args, **kwargs)#

Version

  • name: Flatten (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Flattens the input tensor into a 2D matrix. If input tensor has shape (d_0, d_1, … d_n) then the output will have shape (d_0 X d_1 … d_(axis-1), d_axis X d_(axis+1) … X dn).

Attributes

  • axis: Indicate up to which input dimensions (exclusive) should be flattened to the outer dimension of the output. The value for axis must be in the range [0, R], where R is the rank of the input tensor. When axis = 0, the shape of the output tensor is (1, (d_0 X d_1 … d_n), where the shape of the input tensor is (d_0, d_1, … d_n). Default value is 1.

Inputs

  • input (heterogeneous) - T: A tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: A 2D tensor with the contents of the input tensor, with input dimensions up to axis flattened to the outer dimension of the output and remaining input dimensions flattened into the inner dimension of the output.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output to all tensor types.

OnnxFloor#

class mlprodict.npy.xop_auto_import_.OnnxFloor(*args, **kwargs)#

Version

  • name: Floor (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Floor takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the floor is, y = floor(x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxFloor_1#

class mlprodict.npy.xop_auto_import_.OnnxFloor_1(*args, **kwargs)#

Version

  • name: Floor (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Floor takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the floor is, y = floor(x), is applied to the tensor elementwise.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxFloor_13#

class mlprodict.npy.xop_auto_import_.OnnxFloor_13(*args, **kwargs)#

Version

  • name: Floor (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Floor takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the floor is, y = floor(x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxFloor_6#

class mlprodict.npy.xop_auto_import_.OnnxFloor_6(*args, **kwargs)#

Version

  • name: Floor (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Floor takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the floor is, y = floor(x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGRU#

class mlprodict.npy.xop_auto_import_.OnnxGRU(*args, **kwargs)#

Version

  • name: GRU (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

z - update gate

r - reset gate

h - hidden gate

t - time step (t-1 means previous time step)

W[zrh] - W parameter weight matrix for update, reset, and hidden gates

R[zrh] - R recurrence weight matrix for update, reset, and hidden gates

Wb[zrh] - W bias vectors for update, reset, and hidden gates

Rb[zrh] - R bias vectors for update, reset, and hidden gates

WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates

RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates

WBb[zrh] - W bias vectors for backward update, reset, and hidden gates

RBb[zrh] - R bias vectors for backward update, reset, and hidden gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh):

  • zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)

  • rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)

  • ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0

  • ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0

  • Ht = (1 - zt) (.) ht + zt (.) Ht-1

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 2 (or 4 if bidirectional) activation functions for update, reset, and hidden gates. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • layout: The shape format of inputs X, initial_h and outputs Y, Y_h. If 0, the following shapes are expected: X.shape = [seq_length, batch_size, input_size], Y.shape = [seq_length, num_directions, batch_size, hidden_size], initial_h.shape = Y_h.shape = [num_directions, batch_size, hidden_size]. If 1, the following shapes are expected: X.shape = [batch_size, seq_length, input_size], Y.shape = [batch_size, seq_length, num_directions, hidden_size], initial_h.shape = Y_h.shape = [batch_size, num_directions, hidden_size]. Default value is 0.

  • linear_before_reset: When computing the output of the hidden gate, apply the linear transformation before multiplying by the output of the reset gate. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 6*hidden_size]. Optional: If not specified - assumed to be 0

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Between 0 and 2 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxGRU_1#

class mlprodict.npy.xop_auto_import_.OnnxGRU_1(*args, **kwargs)#

Version

  • name: GRU (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

z - update gate

r - reset gate

h - hidden gate

t - time step (t-1 means previous time step)

W[zrh] - W parameter weight matrix for update, reset, and hidden gates

R[zrh] - R recurrence weight matrix for update, reset, and hidden gates

Wb[zrh] - W bias vectors for update, reset, and hidden gates

Rb[zrh] - R bias vectors for update, reset, and hidden gates

WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates

RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates

WBb[zrh] - W bias vectors for backward update, reset, and hidden gates

RBb[zrh] - R bias vectors for backward update, reset, and hidden gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh):

  • zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz)

  • rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr)

  • ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0

  • ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0

  • Ht = (1 - zt) (.) ht + zt (.) Ht-1

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM.

  • activations: A list of 2 (or 4 if bidirectional) activation functions for update, reset, and hidden gates. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'foward'.

  • hidden_size: Number of neurons in the hidden layer

  • output_sequence: The sequence output for the hidden is optional if 0. Default 0. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 6*hidden_size]. Optional: If not specified - assumed to be 0

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size]. It is optional if output_sequence is 0.

  • Y_h (heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxGRU_14#

class mlprodict.npy.xop_auto_import_.OnnxGRU_14(*args, **kwargs)#

Version

  • name: GRU (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

z - update gate

r - reset gate

h - hidden gate

t - time step (t-1 means previous time step)

W[zrh] - W parameter weight matrix for update, reset, and hidden gates

R[zrh] - R recurrence weight matrix for update, reset, and hidden gates

Wb[zrh] - W bias vectors for update, reset, and hidden gates

Rb[zrh] - R bias vectors for update, reset, and hidden gates

WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates

RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates

WBb[zrh] - W bias vectors for backward update, reset, and hidden gates

RBb[zrh] - R bias vectors for backward update, reset, and hidden gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh):

  • zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)

  • rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)

  • ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0

  • ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0

  • Ht = (1 - zt) (.) ht + zt (.) Ht-1

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 2 (or 4 if bidirectional) activation functions for update, reset, and hidden gates. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • layout: The shape format of inputs X, initial_h and outputs Y, Y_h. If 0, the following shapes are expected: X.shape = [seq_length, batch_size, input_size], Y.shape = [seq_length, num_directions, batch_size, hidden_size], initial_h.shape = Y_h.shape = [num_directions, batch_size, hidden_size]. If 1, the following shapes are expected: X.shape = [batch_size, seq_length, input_size], Y.shape = [batch_size, seq_length, num_directions, hidden_size], initial_h.shape = Y_h.shape = [batch_size, num_directions, hidden_size]. Default value is 0.

  • linear_before_reset: When computing the output of the hidden gate, apply the linear transformation before multiplying by the output of the reset gate. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 6*hidden_size]. Optional: If not specified - assumed to be 0

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Between 0 and 2 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxGRU_3#

class mlprodict.npy.xop_auto_import_.OnnxGRU_3(*args, **kwargs)#

Version

  • name: GRU (GitHub)

  • domain: main

  • since_version: 3

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 3.

Summary

Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

z - update gate

r - reset gate

h - hidden gate

t - time step (t-1 means previous time step)

W[zrh] - W parameter weight matrix for update, reset, and hidden gates

R[zrh] - R recurrence weight matrix for update, reset, and hidden gates

Wb[zrh] - W bias vectors for update, reset, and hidden gates

Rb[zrh] - R bias vectors for update, reset, and hidden gates

WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates

RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates

WBb[zrh] - W bias vectors for backward update, reset, and hidden gates

RBb[zrh] - R bias vectors for backward update, reset, and hidden gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh):

  • zt = f(Xt*(Wz^T) + Ht-1*Rz + Wbz + Rbz)

  • rt = f(Xt*(Wr^T) + Ht-1*Rr + Wbr + Rbr)

  • ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*Rh + Rbh + Wbh) # default, when linear_before_reset = 0

  • ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*Rh + Rbh) + Wbh) # when linear_before_reset != 0

  • Ht = (1 - zt) (.) ht + zt (.) Ht-1

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 2 (or 4 if bidirectional) activation functions for update, reset, and hidden gates. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • linear_before_reset: When computing the output of the hidden gate, apply the linear transformation before multiplying by the output of the reset gate. Default value is 0.

  • output_sequence: The sequence output for the hidden is optional if 0. Default 0. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 6*hidden_size]. Optional: If not specified - assumed to be 0

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Between 0 and 2 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size]. It is optional if output_sequence is 0.

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxGRU_7#

class mlprodict.npy.xop_auto_import_.OnnxGRU_7(*args, **kwargs)#

Version

  • name: GRU (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Computes an one-layer GRU. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

z - update gate

r - reset gate

h - hidden gate

t - time step (t-1 means previous time step)

W[zrh] - W parameter weight matrix for update, reset, and hidden gates

R[zrh] - R recurrence weight matrix for update, reset, and hidden gates

Wb[zrh] - W bias vectors for update, reset, and hidden gates

Rb[zrh] - R bias vectors for update, reset, and hidden gates

WB[zrh] - W parameter weight matrix for backward update, reset, and hidden gates

RB[zrh] - R recurrence weight matrix for backward update, reset, and hidden gates

WBb[zrh] - W bias vectors for backward update, reset, and hidden gates

RBb[zrh] - R bias vectors for backward update, reset, and hidden gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh):

  • zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)

  • rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)

  • ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0

  • ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0

  • Ht = (1 - zt) (.) ht + zt (.) Ht-1

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 2 (or 4 if bidirectional) activation functions for update, reset, and hidden gates. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • linear_before_reset: When computing the output of the hidden gate, apply the linear transformation before multiplying by the output of the reset gate. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[zrh] and WB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[zrh] and RB[zrh] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 3*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for the gates. Concatenation of [Wb[zrh], Rb[zrh]] and [WBb[zrh], RBb[zrh]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 6*hidden_size]. Optional: If not specified - assumed to be 0

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Between 0 and 2 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxGather#

class mlprodict.npy.xop_auto_import_.OnnxGather(*args, **kwargs)#

Version

  • name: Gather (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Given data tensor of rank r >= 1, and indices tensor of rank q, gather entries of the axis dimension of data (by default outer-most one as axis=0) indexed by indices, and concatenates them in an output tensor of rank q + (r - 1).

axis = 0 :

Let k = indices[i_{0}, …, i_{q-1}] Then output[i_{0}, …, i_{q-1}, j_{0}, …, j_{r-2}] = input[k , j_{0}, …, j_{r-2}]

data = [
    [1.0, 1.2],
    [2.3, 3.4],
    [4.5, 5.7],
]
indices = [
    [0, 1],
    [1, 2],
]
output = [
    [
        [1.0, 1.2],
        [2.3, 3.4],
    ],
    [
        [2.3, 3.4],
        [4.5, 5.7],
    ],
]

axis = 1 :

Let k = indices[i_{0}, …, i_{q-1}] Then output[j_{0}, i_{0}, …, i_{q-1}, j_{1}, …, j_{r-2}] = input[j_{0}, k, j_{1}, …, j_{r-2}]

data = [
    [1.0, 1.2, 1.9],
    [2.3, 3.4, 3.9],
    [4.5, 5.7, 5.9],
]
indices = [
    [0, 2],
]
axis = 1,
output = [
        [[1.0, 1.9]],
        [[2.3, 3.9]],
        [[4.5, 5.9]],
]

Attributes

  • axis: Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of any rank q. All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • output (heterogeneous) - T: Tensor of rank q + (r - 1).

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxGatherElements#

class mlprodict.npy.xop_auto_import_.OnnxGatherElements(*args, **kwargs)#

Version

  • name: GatherElements (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

GatherElements takes two inputs data and indices of the same rank r >= 1 and an optional attribute axis that identifies an axis of data (by default, the outer-most axis, that is axis 0). It is an indexing operation that produces its output by indexing into the input data tensor at index positions determined by elements of the indices tensor. Its output shape is the same as the shape of indices and consists of one value (gathered from the data) for each element in indices.

For instance, in the 3-D case (r = 3), the output produced is determined by the following equations:

out[i][j][k] = input[index[i][j][k]][j][k] if axis = 0,
out[i][j][k] = input[i][index[i][j][k]][k] if axis = 1,
out[i][j][k] = input[i][j][index[i][j][k]] if axis = 2,

This operator is also the inverse of ScatterElements. It is similar to Torch’s gather operation.

Example 1:

data = [
    [1, 2],
    [3, 4],
]
indices = [
    [0, 0],
    [1, 0],
]
axis = 1
output = [
    [1, 1],
    [4, 3],
]

Example 2:

data = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]
indices = [
    [1, 2, 0],
    [2, 0, 0],
]
axis = 0
output = [
    [4, 8, 3],
    [7, 2, 3],
]

Attributes

  • axis: Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, with the same rank r as the input. All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • output (heterogeneous) - T: Tensor of the same shape as indices.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxGatherElements_11#

class mlprodict.npy.xop_auto_import_.OnnxGatherElements_11(*args, **kwargs)#

Version

  • name: GatherElements (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

GatherElements takes two inputs data and indices of the same rank r >= 1 and an optional attribute axis that identifies an axis of data (by default, the outer-most axis, that is axis 0). It is an indexing operation that produces its output by indexing into the input data tensor at index positions determined by elements of the indices tensor. Its output shape is the same as the shape of indices and consists of one value (gathered from the data) for each element in indices.

For instance, in the 3-D case (r = 3), the output produced is determined by the following equations:

out[i][j][k] = input[index[i][j][k]][j][k] if axis = 0,
out[i][j][k] = input[i][index[i][j][k]][k] if axis = 1,
out[i][j][k] = input[i][j][index[i][j][k]] if axis = 2,

This operator is also the inverse of ScatterElements. It is similar to Torch’s gather operation.

Example 1:

data = [
    [1, 2],
    [3, 4],
]
indices = [
    [0, 0],
    [1, 0],
]
axis = 1
output = [
    [
      [1, 1],
      [4, 3],
    ],
]

Example 2:

data = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]
indices = [
    [1, 2, 0],
    [2, 0, 0],
]
axis = 0
output = [
    [
      [4, 8, 3],
      [7, 2, 3],
    ],
]

Attributes

  • axis: Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, with the same rank r as the input. All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • output (heterogeneous) - T: Tensor of the same shape as indices.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxGatherElements_13#

class mlprodict.npy.xop_auto_import_.OnnxGatherElements_13(*args, **kwargs)#

Version

  • name: GatherElements (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

GatherElements takes two inputs data and indices of the same rank r >= 1 and an optional attribute axis that identifies an axis of data (by default, the outer-most axis, that is axis 0). It is an indexing operation that produces its output by indexing into the input data tensor at index positions determined by elements of the indices tensor. Its output shape is the same as the shape of indices and consists of one value (gathered from the data) for each element in indices.

For instance, in the 3-D case (r = 3), the output produced is determined by the following equations:

out[i][j][k] = input[index[i][j][k]][j][k] if axis = 0,
out[i][j][k] = input[i][index[i][j][k]][k] if axis = 1,
out[i][j][k] = input[i][j][index[i][j][k]] if axis = 2,

This operator is also the inverse of ScatterElements. It is similar to Torch’s gather operation.

Example 1:

data = [
    [1, 2],
    [3, 4],
]
indices = [
    [0, 0],
    [1, 0],
]
axis = 1
output = [
    [1, 1],
    [4, 3],
]

Example 2:

data = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]
indices = [
    [1, 2, 0],
    [2, 0, 0],
]
axis = 0
output = [
    [4, 8, 3],
    [7, 2, 3],
]

Attributes

  • axis: Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, with the same rank r as the input. All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • output (heterogeneous) - T: Tensor of the same shape as indices.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxGatherND#

class mlprodict.npy.xop_auto_import_.OnnxGatherND(*args, **kwargs)#

Version

  • name: GatherND (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Given data tensor of rank r >= 1, indices tensor of rank q >= 1, and batch_dims integer b, this operator gathers slices of data into an output tensor of rank q + r - indices_shape[-1] - 1 - b.

indices is an q-dimensional integer tensor, best thought of as a (q-1)-dimensional tensor of index-tuples into data, where each element defines a slice of data

batch_dims (denoted as b) is an integer indicating the number of batch dimensions, i.e the leading b number of dimensions of data tensor and indices are representing the batches, and the gather starts from the b+1 dimension.

Some salient points about the inputs’ rank and shape:

  1. r >= 1 and q >= 1 are to be honored. There is no dependency condition to be met between ranks r and q

  2. The first b dimensions of the shape of indices tensor and data tensor must be equal.

  3. b < min(q, r) is to be honored.

  4. The indices_shape[-1] should have a value between 1 (inclusive) and rank r-b (inclusive)

  5. All values in indices are expected to be within bounds [-s, s-1] along axis of size s (i.e.) -data_shape[i] <= indices[…,i] <= data_shape[i] - 1. It is an error if any of the index values are out of bounds.

The output is computed as follows:

The output tensor is obtained by mapping each index-tuple in the indices tensor to the corresponding slice of the input data.

  1. If indices_shape[-1] > r-b => error condition

  2. If indices_shape[-1] == r-b, since the rank of indices is q, indices can be thought of as N (q-b-1)-dimensional tensors containing 1-D tensors of dimension r-b, where N is an integer equals to the product of 1 and all the elements in the batch dimensions of the indices_shape. Let us think of each such r-b ranked tensor as indices_slice. Each scalar value corresponding to data[0:b-1,indices_slice] is filled into the corresponding location of the (q-b-1)-dimensional tensor to form the output tensor (Example 1 below)

  3. If indices_shape[-1] < r-b, since the rank of indices is q, indices can be thought of as N (q-b-1)-dimensional tensor containing 1-D tensors of dimension < r-b. Let us think of each such tensors as indices_slice. Each tensor slice corresponding to data[0:b-1, indices_slice , :] is filled into the corresponding location of the (q-b-1)-dimensional tensor to form the output tensor (Examples 2, 3, 4 and 5 below)

This operator is the inverse of ScatterND.

Example 1

batch_dims = 0

data = [[0,1],[2,3]] # data_shape = [2, 2]

indices = [[0,0],[1,1]] # indices_shape = [2, 2]

output = [0,3] # output_shape = [2]

Example 2

batch_dims = 0

data = [[0,1],[2,3]] # data_shape = [2, 2]

indices = [[1],[0]] # indices_shape = [2, 1]

output = [[2,3],[0,1]] # output_shape = [2, 2]

Example 3

batch_dims = 0

data = [[[0,1],[2,3]],[[4,5],[6,7]]] # data_shape = [2, 2, 2]

indices = [[0,1],[1,0]] # indices_shape = [2, 2]

output = [[2,3],[4,5]] # output_shape = [2, 2]

Example 4

batch_dims = 0

data = [[[0,1],[2,3]],[[4,5],[6,7]]] # data_shape = [2, 2, 2]

indices = [[[0,1]],[[1,0]]] # indices_shape = [2, 1, 2]

output = [[[2,3]],[[4,5]]] # output_shape = [2, 1, 2]

Example 5

batch_dims = 1

data = [[[0,1],[2,3]],[[4,5],[6,7]]] # data_shape = [2, 2, 2]

indices = [[1],[0]] # indices_shape = [2, 1]

output = [[2,3],[4,5]] # output_shape = [2, 2]

Attributes

  • batch_dims: The number of batch dimensions. The gather of indexing starts from dimension of data[batch_dims:] Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - tensor(int64): Tensor of rank q >= 1. All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • output (heterogeneous) - T: Tensor of rank q + r - indices_shape[-1] - 1.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

OnnxGatherND_11#

class mlprodict.npy.xop_auto_import_.OnnxGatherND_11(*args, **kwargs)#

Version

  • name: GatherND (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Given data tensor of rank r >= 1, and indices tensor of rank q >= 1, this operator gathers slices of data into an output tensor of rank q + r - indices_shape[-1] - 1.

indices is an q-dimensional integer tensor, best thought of as a (q-1)-dimensional tensor of index-tuples into data, where each element defines a slice of data

Some salient points about the inputs’ rank and shape:

  1. r >= 1 and q >= 1 are to be honored. There is no dependency condition to be met between ranks r and q

  2. The indices_shape[-1] should have a value between 1 (inclusive) and rank r (inclusive)

  3. All values in indices are expected to be within bounds [-s, s-1] along axis of size s (i.e.) -data_shape[i] <= indices[…,i] <= data_shape[i] - 1. It is an error if any of the index values are out of bounds.

The output is computed as follows:

The output tensor is obtained by mapping each index-tuple in the indices tensor to the corresponding slice of the input data.

  1. If indices_shape[-1] > r => error condition

  2. If indices_shape[-1] == r, since the rank of indices is q, indices can be thought of as a (q-1)-dimensional tensor containing 1-D tensors of dimension r. Let us think of each such r ranked tensor as indices_slice. Each scalar value corresponding to data[indices_slice] is filled into the corresponding location of the (q-1)-dimensional tensor to form the output tensor (Example 1 below)

  3. If indices_shape[-1] < r, since the rank of indices is q, indices can be thought of as a (q-1)-dimensional tensor containing 1-D tensors of dimension < r. Let us think of each such tensors as indices_slice. Each tensor slice corresponding to data[indices_slice , :] is filled into the corresponding location of the (q-1)-dimensional tensor to form the output tensor (Examples 2, 3, and 4 below)

This operator is the inverse of ScatterND.

Example 1

data = [[0,1],[2,3]] # data_shape = [2, 2]

indices = [[0,0],[1,1]] # indices_shape = [2, 2]

output = [0,3] # output_shape = [2]

Example 2

data = [[0,1],[2,3]] # data_shape = [2, 2]

indices = [[1],[0]] # indices_shape = [2, 1]

output = [[2,3],[0,1]] # output_shape = [2, 2]

Example 3

data = [[[0,1],[2,3]],[[4,5],[6,7]]] # data_shape = [2, 2, 2]

indices = [[0,1],[1,0]] # indices_shape = [2, 2]

output = [[2,3],[4,5]] # output_shape = [2, 2]

Example 4

data = [[[0,1],[2,3]],[[4,5],[6,7]]] # data_shape = [2, 2, 2]

indices = [[[0,1]],[[1,0]]] # indices_shape = [2, 1, 2]

output = [[[2,3]],[[4,5]]] # output_shape = [2, 1, 2]

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - tensor(int64): Tensor of rank q >= 1. All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • output (heterogeneous) - T: Tensor of rank q + r - indices_shape[-1] - 1.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

OnnxGatherND_12#

class mlprodict.npy.xop_auto_import_.OnnxGatherND_12(*args, **kwargs)#

Version

  • name: GatherND (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Given data tensor of rank r >= 1, indices tensor of rank q >= 1, and batch_dims integer b, this operator gathers slices of data into an output tensor of rank q + r - indices_shape[-1] - 1 - b.

indices is an q-dimensional integer tensor, best thought of as a (q-1)-dimensional tensor of index-tuples into data, where each element defines a slice of data

batch_dims (denoted as b) is an integer indicating the number of batch dimensions, i.e the leading b number of dimensions of data tensor and indices are representing the batches, and the gather starts from the b+1 dimension.

Some salient points about the inputs’ rank and shape:

  1. r >= 1 and q >= 1 are to be honored. There is no dependency condition to be met between ranks r and q

  2. The first b dimensions of the shape of indices tensor and data tensor must be equal.

  3. b < min(q, r) is to be honored.

  4. The indices_shape[-1] should have a value between 1 (inclusive) and rank r-b (inclusive)

  5. All values in indices are expected to be within bounds [-s, s-1] along axis of size s (i.e.) -data_shape[i] <= indices[…,i] <= data_shape[i] - 1. It is an error if any of the index values are out of bounds.

The output is computed as follows:

The output tensor is obtained by mapping each index-tuple in the indices tensor to the corresponding slice of the input data.

  1. If indices_shape[-1] > r-b => error condition

  2. If indices_shape[-1] == r-b, since the rank of indices is q, indices can be thought of as N (q-b-1)-dimensional tensors containing 1-D tensors of dimension r-b, where N is an integer equals to the product of 1 and all the elements in the batch dimensions of the indices_shape. Let us think of each such r-b ranked tensor as indices_slice. Each scalar value corresponding to data[0:b-1,indices_slice] is filled into the corresponding location of the (q-b-1)-dimensional tensor to form the output tensor (Example 1 below)

  3. If indices_shape[-1] < r-b, since the rank of indices is q, indices can be thought of as N (q-b-1)-dimensional tensor containing 1-D tensors of dimension < r-b. Let us think of each such tensors as indices_slice. Each tensor slice corresponding to data[0:b-1, indices_slice , :] is filled into the corresponding location of the (q-b-1)-dimensional tensor to form the output tensor (Examples 2, 3, 4 and 5 below)

This operator is the inverse of ScatterND.

Example 1

batch_dims = 0

data = [[0,1],[2,3]] # data_shape = [2, 2]

indices = [[0,0],[1,1]] # indices_shape = [2, 2]

output = [0,3] # output_shape = [2]

Example 2

batch_dims = 0

data = [[0,1],[2,3]] # data_shape = [2, 2]

indices = [[1],[0]] # indices_shape = [2, 1]

output = [[2,3],[0,1]] # output_shape = [2, 2]

Example 3

batch_dims = 0

data = [[[0,1],[2,3]],[[4,5],[6,7]]] # data_shape = [2, 2, 2]

indices = [[0,1],[1,0]] # indices_shape = [2, 2]

output = [[2,3],[4,5]] # output_shape = [2, 2]

Example 4

batch_dims = 0

data = [[[0,1],[2,3]],[[4,5],[6,7]]] # data_shape = [2, 2, 2]

indices = [[[0,1]],[[1,0]]] # indices_shape = [2, 1, 2]

output = [[[2,3]],[[4,5]]] # output_shape = [2, 1, 2]

Example 5

batch_dims = 1

data = [[[0,1],[2,3]],[[4,5],[6,7]]] # data_shape = [2, 2, 2]

indices = [[1],[0]] # indices_shape = [2, 1]

output = [[2,3],[4,5]] # output_shape = [2, 2]

Attributes

  • batch_dims: The number of batch dimensions. The gather of indexing starts from dimension of data[batch_dims:] Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - tensor(int64): Tensor of rank q >= 1. All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • output (heterogeneous) - T: Tensor of rank q + r - indices_shape[-1] - 1.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

OnnxGatherND_13#

class mlprodict.npy.xop_auto_import_.OnnxGatherND_13(*args, **kwargs)#

Version

  • name: GatherND (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Given data tensor of rank r >= 1, indices tensor of rank q >= 1, and batch_dims integer b, this operator gathers slices of data into an output tensor of rank q + r - indices_shape[-1] - 1 - b.

indices is an q-dimensional integer tensor, best thought of as a (q-1)-dimensional tensor of index-tuples into data, where each element defines a slice of data

batch_dims (denoted as b) is an integer indicating the number of batch dimensions, i.e the leading b number of dimensions of data tensor and indices are representing the batches, and the gather starts from the b+1 dimension.

Some salient points about the inputs’ rank and shape:

  1. r >= 1 and q >= 1 are to be honored. There is no dependency condition to be met between ranks r and q

  2. The first b dimensions of the shape of indices tensor and data tensor must be equal.

  3. b < min(q, r) is to be honored.

  4. The indices_shape[-1] should have a value between 1 (inclusive) and rank r-b (inclusive)

  5. All values in indices are expected to be within bounds [-s, s-1] along axis of size s (i.e.) -data_shape[i] <= indices[…,i] <= data_shape[i] - 1. It is an error if any of the index values are out of bounds.

The output is computed as follows:

The output tensor is obtained by mapping each index-tuple in the indices tensor to the corresponding slice of the input data.

  1. If indices_shape[-1] > r-b => error condition

  2. If indices_shape[-1] == r-b, since the rank of indices is q, indices can be thought of as N (q-b-1)-dimensional tensors containing 1-D tensors of dimension r-b, where N is an integer equals to the product of 1 and all the elements in the batch dimensions of the indices_shape. Let us think of each such r-b ranked tensor as indices_slice. Each scalar value corresponding to data[0:b-1,indices_slice] is filled into the corresponding location of the (q-b-1)-dimensional tensor to form the output tensor (Example 1 below)

  3. If indices_shape[-1] < r-b, since the rank of indices is q, indices can be thought of as N (q-b-1)-dimensional tensor containing 1-D tensors of dimension < r-b. Let us think of each such tensors as indices_slice. Each tensor slice corresponding to data[0:b-1, indices_slice , :] is filled into the corresponding location of the (q-b-1)-dimensional tensor to form the output tensor (Examples 2, 3, 4 and 5 below)

This operator is the inverse of ScatterND.

Example 1

batch_dims = 0

data = [[0,1],[2,3]] # data_shape = [2, 2]

indices = [[0,0],[1,1]] # indices_shape = [2, 2]

output = [0,3] # output_shape = [2]

Example 2

batch_dims = 0

data = [[0,1],[2,3]] # data_shape = [2, 2]

indices = [[1],[0]] # indices_shape = [2, 1]

output = [[2,3],[0,1]] # output_shape = [2, 2]

Example 3

batch_dims = 0

data = [[[0,1],[2,3]],[[4,5],[6,7]]] # data_shape = [2, 2, 2]

indices = [[0,1],[1,0]] # indices_shape = [2, 2]

output = [[2,3],[4,5]] # output_shape = [2, 2]

Example 4

batch_dims = 0

data = [[[0,1],[2,3]],[[4,5],[6,7]]] # data_shape = [2, 2, 2]

indices = [[[0,1]],[[1,0]]] # indices_shape = [2, 1, 2]

output = [[[2,3]],[[4,5]]] # output_shape = [2, 1, 2]

Example 5

batch_dims = 1

data = [[[0,1],[2,3]],[[4,5],[6,7]]] # data_shape = [2, 2, 2]

indices = [[1],[0]] # indices_shape = [2, 1]

output = [[2,3],[4,5]] # output_shape = [2, 2]

Attributes

  • batch_dims: The number of batch dimensions. The gather of indexing starts from dimension of data[batch_dims:] Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - tensor(int64): Tensor of rank q >= 1. All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • output (heterogeneous) - T: Tensor of rank q + r - indices_shape[-1] - 1.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

OnnxGather_1#

class mlprodict.npy.xop_auto_import_.OnnxGather_1(*args, **kwargs)#

Version

  • name: Gather (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Given data tensor of rank r >= 1, and indices tensor of rank q, gather entries of the axis dimension of data (by default outer-most one as axis=0) indexed by indices, and concatenates them in an output tensor of rank q + (r - 1). Example 1:

data = [
    [1.0, 1.2],
    [2.3, 3.4],
    [4.5, 5.7],
]
indices = [
    [0, 1],
    [1, 2],
]
output = [
    [
        [1.0, 1.2],
        [2.3, 3.4],
    ],
    [
        [2.3, 3.4],
        [4.5, 5.7],
    ],
]

Example 2:

data = [
    [1.0, 1.2, 1.9],
    [2.3, 3.4, 3.9],
    [4.5, 5.7, 5.9],
]
indices = [
    [0, 2],
]
axis = 1,
output = [
    [
        [1.0, 1.9],
        [2.3, 3.9],
        [4.5, 5.9],
    ],
]

Attributes

  • axis: Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of any rank q. All index values are expected to be within bounds. It is an error if any of the index values are out of bounds.

Outputs

  • output (heterogeneous) - T: Tensor of rank q + (r - 1).

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxGather_11#

class mlprodict.npy.xop_auto_import_.OnnxGather_11(*args, **kwargs)#

Version

  • name: Gather (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Given data tensor of rank r >= 1, and indices tensor of rank q, gather entries of the axis dimension of data (by default outer-most one as axis=0) indexed by indices, and concatenates them in an output tensor of rank q + (r - 1).

axis = 0 :

Let k = indices[i_{0}, …, i_{q-1}] Then output[i_{0}, …, i_{q-1}, j_{0}, …, j_{r-2}] = input[k , j_{0}, …, j_{r-2}]

data = [
    [1.0, 1.2],
    [2.3, 3.4],
    [4.5, 5.7],
]
indices = [
    [0, 1],
    [1, 2],
]
output = [
    [
        [1.0, 1.2],
        [2.3, 3.4],
    ],
    [
        [2.3, 3.4],
        [4.5, 5.7],
    ],
]

axis = 1 :

Let k = indices[i_{0}, …, i_{q-1}] Then output[i_{0}, …, i_{q-1}, j_{0}, …, j_{r-2}] = input[j_{0}, k, j_{1}, …, j_{r-2}]

data = [
    [1.0, 1.2, 1.9],
    [2.3, 3.4, 3.9],
    [4.5, 5.7, 5.9],
]
indices = [
    [0, 2],
]
axis = 1,
output = [
    [
        [1.0, 1.9],
        [2.3, 3.9],
        [4.5, 5.9],
    ],
]

Attributes

  • axis: Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of any rank q. All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • output (heterogeneous) - T: Tensor of rank q + (r - 1).

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxGather_13#

class mlprodict.npy.xop_auto_import_.OnnxGather_13(*args, **kwargs)#

Version

  • name: Gather (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Given data tensor of rank r >= 1, and indices tensor of rank q, gather entries of the axis dimension of data (by default outer-most one as axis=0) indexed by indices, and concatenates them in an output tensor of rank q + (r - 1).

axis = 0 :

Let k = indices[i_{0}, …, i_{q-1}] Then output[i_{0}, …, i_{q-1}, j_{0}, …, j_{r-2}] = input[k , j_{0}, …, j_{r-2}]

data = [
    [1.0, 1.2],
    [2.3, 3.4],
    [4.5, 5.7],
]
indices = [
    [0, 1],
    [1, 2],
]
output = [
    [
        [1.0, 1.2],
        [2.3, 3.4],
    ],
    [
        [2.3, 3.4],
        [4.5, 5.7],
    ],
]

axis = 1 :

Let k = indices[i_{0}, …, i_{q-1}] Then output[j_{0}, i_{0}, …, i_{q-1}, j_{1}, …, j_{r-2}] = input[j_{0}, k, j_{1}, …, j_{r-2}]

data = [
    [1.0, 1.2, 1.9],
    [2.3, 3.4, 3.9],
    [4.5, 5.7, 5.9],
]
indices = [
    [0, 2],
]
axis = 1,
output = [
        [[1.0, 1.9]],
        [[2.3, 3.9]],
        [[4.5, 5.9]],
]

Attributes

  • axis: Which axis to gather on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of any rank q. All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

Outputs

  • output (heterogeneous) - T: Tensor of rank q + (r - 1).

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxGemm#

class mlprodict.npy.xop_auto_import_.OnnxGemm(*args, **kwargs)#

Version

  • name: Gemm (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

General Matrix multiplication: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3

A’ = transpose(A) if transA else A

B’ = transpose(B) if transB else B

Compute Y = alpha * A’ * B’ + beta * C, where input tensor A has shape (M, K) or (K, M), input tensor B has shape (K, N) or (N, K), input tensor C is broadcastable to shape (M, N), and output tensor Y has shape (M, N). A will be transposed before doing the computation if attribute transA is non-zero, same for B and transB. This operator supports unidirectional broadcasting (tensor C should be unidirectional broadcastable to tensor A * B); for more details please check Broadcasting in ONNX. This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • alpha: Scalar multiplier for the product of input tensors A * B. Default value is 1.0.

  • beta: Scalar multiplier for input tensor C. Default value is 1.0.

  • transA: Whether A should be transposed Default value is 0.

  • transB: Whether B should be transposed Default value is 0.

Inputs

Between 2 and 3 inputs.

  • A (heterogeneous) - T: Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.

  • B (heterogeneous) - T: Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.

  • C (optional, heterogeneous) - T: Optional input tensor C. If not specified, the computation is done as if C is a scalar 0. The shape of C should be unidirectional broadcastable to (M, N).

Outputs

  • Y (heterogeneous) - T: Output tensor of shape (M, N).

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to float/int tensors.

OnnxGemm_1#

class mlprodict.npy.xop_auto_import_.OnnxGemm_1(*args, **kwargs)#

Version

  • name: Gemm (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

General Matrix multiplication: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3 Compute Y = alpha * A * B + beta * C, where input tensor A has dimension (M X K), input tensor B has dimension (K X N), input tensor C and output tensor Y have dimension (M X N). If attribute broadcast is non-zero, input tensor C will be broadcasted to match the dimension requirement. A will be transposed before doing the computation if attribute transA is non-zero, same for B and transB.

Attributes

  • alpha: Scalar multiplier for the product of input tensors A * B, the default value is 1.0. Default value is 1.0.

  • beta: Scalar multiplier for input tensor C, the default value is 1.0. Default value is 1.0.

  • broadcast: Whether C should be broadcasted Default value is 0.

  • transA: Whether A should be transposed Default value is 0.

  • transB: Whether B should be transposed Default value is 0.

Inputs

  • A (heterogeneous) - T: Input tensor A

  • B (heterogeneous) - T: Input tensor B

  • C (heterogeneous) - T: Input tensor C, can be inplace.

Outputs

  • Y (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGemm_11#

class mlprodict.npy.xop_auto_import_.OnnxGemm_11(*args, **kwargs)#

Version

  • name: Gemm (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

General Matrix multiplication: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3

A’ = transpose(A) if transA else A

B’ = transpose(B) if transB else B

Compute Y = alpha * A’ * B’ + beta * C, where input tensor A has shape (M, K) or (K, M), input tensor B has shape (K, N) or (N, K), input tensor C is broadcastable to shape (M, N), and output tensor Y has shape (M, N). A will be transposed before doing the computation if attribute transA is non-zero, same for B and transB. This operator supports unidirectional broadcasting (tensor C should be unidirectional broadcastable to tensor A * B); for more details please check Broadcasting in ONNX. This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • alpha: Scalar multiplier for the product of input tensors A * B. Default value is 1.0.

  • beta: Scalar multiplier for input tensor C. Default value is 1.0.

  • transA: Whether A should be transposed Default value is 0.

  • transB: Whether B should be transposed Default value is 0.

Inputs

Between 2 and 3 inputs.

  • A (heterogeneous) - T: Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.

  • B (heterogeneous) - T: Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.

  • C (optional, heterogeneous) - T: Optional input tensor C. If not specified, the computation is done as if C is a scalar 0. The shape of C should be unidirectional broadcastable to (M, N).

Outputs

  • Y (heterogeneous) - T: Output tensor of shape (M, N).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to float/int tensors.

OnnxGemm_13#

class mlprodict.npy.xop_auto_import_.OnnxGemm_13(*args, **kwargs)#

Version

  • name: Gemm (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

General Matrix multiplication: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3

A’ = transpose(A) if transA else A

B’ = transpose(B) if transB else B

Compute Y = alpha * A’ * B’ + beta * C, where input tensor A has shape (M, K) or (K, M), input tensor B has shape (K, N) or (N, K), input tensor C is broadcastable to shape (M, N), and output tensor Y has shape (M, N). A will be transposed before doing the computation if attribute transA is non-zero, same for B and transB. This operator supports unidirectional broadcasting (tensor C should be unidirectional broadcastable to tensor A * B); for more details please check Broadcasting in ONNX. This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • alpha: Scalar multiplier for the product of input tensors A * B. Default value is 1.0.

  • beta: Scalar multiplier for input tensor C. Default value is 1.0.

  • transA: Whether A should be transposed Default value is 0.

  • transB: Whether B should be transposed Default value is 0.

Inputs

Between 2 and 3 inputs.

  • A (heterogeneous) - T: Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.

  • B (heterogeneous) - T: Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.

  • C (optional, heterogeneous) - T: Optional input tensor C. If not specified, the computation is done as if C is a scalar 0. The shape of C should be unidirectional broadcastable to (M, N).

Outputs

  • Y (heterogeneous) - T: Output tensor of shape (M, N).

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to float/int tensors.

OnnxGemm_6#

class mlprodict.npy.xop_auto_import_.OnnxGemm_6(*args, **kwargs)#

Version

  • name: Gemm (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

General Matrix multiplication: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3 Compute Y = alpha * A * B + beta * C, where input tensor A has dimension (M X K), input tensor B has dimension (K X N), input tensor C and output tensor Y have dimension (M X N). If attribute broadcast is non-zero, input tensor C will be broadcasted to match the dimension requirement. A will be transposed before doing the computation if attribute transA is non-zero, same for B and transB.

Attributes

  • alpha: Scalar multiplier for the product of input tensors A * B, the default value is 1.0. Default value is 1.0.

  • beta: Scalar multiplier for input tensor C, the default value is 1.0. Default value is 1.0.

  • broadcast: Whether C should be broadcasted Default value is 0.

  • transA: Whether A should be transposed Default value is 0.

  • transB: Whether B should be transposed Default value is 0.

Inputs

  • A (heterogeneous) - T: Input tensor A

  • B (heterogeneous) - T: Input tensor B

  • C (heterogeneous) - T: Input tensor C

Outputs

  • Y (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGemm_7#

class mlprodict.npy.xop_auto_import_.OnnxGemm_7(*args, **kwargs)#

Version

  • name: Gemm (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

General Matrix multiplication: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3

A’ = transpose(A) if transA else A

B’ = transpose(B) if transB else B

Compute Y = alpha * A’ * B’ + beta * C, where input tensor A has shape (M, K) or (K, M), input tensor B has shape (K, N) or (N, K), input tensor C is broadcastable to shape (M, N), and output tensor Y has shape (M, N). A will be transposed before doing the computation if attribute transA is non-zero, same for B and transB. This operator supports unidirectional broadcasting (tensor C should be unidirectional broadcastable to tensor A * B); for more details please check Broadcasting in ONNX.

Attributes

  • alpha: Scalar multiplier for the product of input tensors A * B. Default value is 1.0.

  • beta: Scalar multiplier for input tensor C. Default value is 1.0.

  • transA: Whether A should be transposed Default value is 0.

  • transB: Whether B should be transposed Default value is 0.

Inputs

  • A (heterogeneous) - T: Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.

  • B (heterogeneous) - T: Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.

  • C (heterogeneous) - T: Input tensor C. The shape of C should be unidirectional broadcastable to (M, N).

Outputs

  • Y (heterogeneous) - T: Output tensor of shape (M, N).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGemm_9#

class mlprodict.npy.xop_auto_import_.OnnxGemm_9(*args, **kwargs)#

Version

  • name: Gemm (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

General Matrix multiplication: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3

A’ = transpose(A) if transA else A

B’ = transpose(B) if transB else B

Compute Y = alpha * A’ * B’ + beta * C, where input tensor A has shape (M, K) or (K, M), input tensor B has shape (K, N) or (N, K), input tensor C is broadcastable to shape (M, N), and output tensor Y has shape (M, N). A will be transposed before doing the computation if attribute transA is non-zero, same for B and transB. This operator supports unidirectional broadcasting (tensor C should be unidirectional broadcastable to tensor A * B); for more details please check Broadcasting in ONNX.

Attributes

  • alpha: Scalar multiplier for the product of input tensors A * B. Default value is 1.0.

  • beta: Scalar multiplier for input tensor C. Default value is 1.0.

  • transA: Whether A should be transposed Default value is 0.

  • transB: Whether B should be transposed Default value is 0.

Inputs

  • A (heterogeneous) - T: Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.

  • B (heterogeneous) - T: Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.

  • C (heterogeneous) - T: Input tensor C. The shape of C should be unidirectional broadcastable to (M, N).

Outputs

  • Y (heterogeneous) - T: Output tensor of shape (M, N).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to float/int tensors.

OnnxGlobalAveragePool#

class mlprodict.npy.xop_auto_import_.OnnxGlobalAveragePool(*args, **kwargs)#

Version

  • name: GlobalAveragePool (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

GlobalAveragePool consumes an input tensor X and applies average pooling across the values in the same channel. This is equivalent to AveragePool with kernel size equal to the spatial dimension of input tensor.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from pooling across the input tensor. The output tensor has the same rank as the input. The first two dimensions of output shape are the same as the input (N x C), while the other dimensions are all 1.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGlobalAveragePool_1#

class mlprodict.npy.xop_auto_import_.OnnxGlobalAveragePool_1(*args, **kwargs)#

Version

  • name: GlobalAveragePool (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

GlobalAveragePool consumes an input tensor X and applies average pooling across the values in the same channel. This is equivalent to AveragePool with kernel size equal to the spatial dimension of input tensor.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from pooling across the input tensor. The output tensor has the same rank as the input. The first two dimensions of output shape are the same as the input (N x C), while the other dimensions are all 1.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGlobalLpPool#

class mlprodict.npy.xop_auto_import_.OnnxGlobalLpPool(*args, **kwargs)#

Version

  • name: GlobalLpPool (GitHub)

  • domain: main

  • since_version: 2

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 2.

Summary

GlobalLpPool consumes an input tensor X and applies lp pool pooling across the values in the same channel. This is equivalent to LpPool with kernel size equal to the spatial dimension of input tensor.

Attributes

  • p: p value of the Lp norm used to pool over the input data. Default value is 2.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from pooling across the input tensor. The output tensor has the same rank as the input. The first two dimensions of output shape are the same as the input (N x C), while the other dimensions are all 1.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGlobalLpPool_1#

class mlprodict.npy.xop_auto_import_.OnnxGlobalLpPool_1(*args, **kwargs)#

Version

  • name: GlobalLpPool (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

GlobalLpPool consumes an input tensor X and applies lp pool pooling across the the values in the same channel. This is equivalent to LpPool with kernel size equal to the spatial dimension of input tensor.

Attributes

  • p: p value of the Lp norm used to pool over the input data, default is 2.0. Default value is 2.0.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimension are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from pooling across the input tensor. Dimensions will be N x C x 1 x 1

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGlobalLpPool_2#

class mlprodict.npy.xop_auto_import_.OnnxGlobalLpPool_2(*args, **kwargs)#

Version

  • name: GlobalLpPool (GitHub)

  • domain: main

  • since_version: 2

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 2.

Summary

GlobalLpPool consumes an input tensor X and applies lp pool pooling across the values in the same channel. This is equivalent to LpPool with kernel size equal to the spatial dimension of input tensor.

Attributes

  • p: p value of the Lp norm used to pool over the input data. Default value is 2.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from pooling across the input tensor. The output tensor has the same rank as the input. The first two dimensions of output shape are the same as the input (N x C), while the other dimensions are all 1.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGlobalMaxPool#

class mlprodict.npy.xop_auto_import_.OnnxGlobalMaxPool(*args, **kwargs)#

Version

  • name: GlobalMaxPool (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

GlobalMaxPool consumes an input tensor X and applies max pooling across the values in the same channel. This is equivalent to MaxPool with kernel size equal to the spatial dimension of input tensor.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from pooling across the input tensor. The output tensor has the same rank as the input. The first two dimensions of output shape are the same as the input (N x C), while the other dimensions are all 1.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGlobalMaxPool_1#

class mlprodict.npy.xop_auto_import_.OnnxGlobalMaxPool_1(*args, **kwargs)#

Version

  • name: GlobalMaxPool (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

GlobalMaxPool consumes an input tensor X and applies max pooling across the values in the same channel. This is equivalent to MaxPool with kernel size equal to the spatial dimension of input tensor.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from pooling across the input tensor. The output tensor has the same rank as the input. The first two dimensions of output shape are the same as the input (N x C), while the other dimensions are all 1.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGreater#

class mlprodict.npy.xop_auto_import_.OnnxGreater(*args, **kwargs)#

Version

  • name: Greater (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Returns the tensor resulted from performing the greater logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxGreaterOrEqual#

class mlprodict.npy.xop_auto_import_.OnnxGreaterOrEqual(*args, **kwargs)#

Version

  • name: GreaterOrEqual (GitHub)

  • domain: main

  • since_version: 16

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 16.

Summary

Returns the tensor resulted from performing the greater_equal logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxGreaterOrEqual_12#

class mlprodict.npy.xop_auto_import_.OnnxGreaterOrEqual_12(*args, **kwargs)#

Version

  • name: GreaterOrEqual (GitHub)

  • domain: main

  • since_version: 12

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 12.

Summary

Returns the tensor resulted from performing the greater_equal logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxGreaterOrEqual_16#

class mlprodict.npy.xop_auto_import_.OnnxGreaterOrEqual_16(*args, **kwargs)#

Version

  • name: GreaterOrEqual (GitHub)

  • domain: main

  • since_version: 16

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 16.

Summary

Returns the tensor resulted from performing the greater_equal logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxGreater_1#

class mlprodict.npy.xop_auto_import_.OnnxGreater_1(*args, **kwargs)#

Version

  • name: Greater (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Returns the tensor resulted from performing the greater logical operation elementwise on the input tensors A and B.

If broadcasting is enabled, the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. See the doc of Add for a detailed description of the broadcasting rules.

Attributes

  • axis: If set, defines the broadcast dimensions.

  • broadcast: Enable broadcasting Default value is 0.

Inputs

  • A (heterogeneous) - T: Left input tensor for the logical operator.

  • B (heterogeneous) - T: Right input tensor for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input to float tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxGreater_13#

class mlprodict.npy.xop_auto_import_.OnnxGreater_13(*args, **kwargs)#

Version

  • name: Greater (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Returns the tensor resulted from performing the greater logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxGreater_7#

class mlprodict.npy.xop_auto_import_.OnnxGreater_7(*args, **kwargs)#

Version

  • name: Greater (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Returns the tensor resulted from performing the greater logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input to float tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxGreater_9#

class mlprodict.npy.xop_auto_import_.OnnxGreater_9(*args, **kwargs)#

Version

  • name: Greater (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Returns the tensor resulted from performing the greater logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxGridSample#

class mlprodict.npy.xop_auto_import_.OnnxGridSample(*args, **kwargs)#

Version

  • name: GridSample (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Given an input X and a flow-field grid, computes the output Y using X values and pixel locations from grid. Currently, only spatial (4-D) inputs are supported. For input X with shape (N, C, H, W) and grid with shape (N, H_out, W_out, 2), the output Y will have shape (N, C, H_out, W_out).

The tensor X contains values at centers of square pixels in a H by W 2-dimensional image. The tensor grid describes normalized positions where the output Y is to be computed using a specified interpolation method (the mode) and a padding mode (for grid positions falling outside the 2-dimensional image).

Elements in grid[N, H_out, W_out] are size-2 vectors specifying positions in the 2-dimensional space of X. They are used to interpolate output values of Y[N, C, H_out, W_out].

The GridSample operator is often used in doing grid generator and sampler in the [Spatial Transformer Networks](https://arxiv.org/abs/1506.02025). See also in [torch.nn.functional.grid_sample](https://pytorch.org/docs/master/generated/torch.nn.functional.grid_sample.html#torch-nn-functional-grid-sample).

Attributes

  • align_corners: If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input’s corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input’s corner pixels, making the sampling more resolution agnostic. Default value is 0.

  • mode: Three interpolation modes: bilinear (default), nearest and bicubic. Default value is 'bilinear'.

  • padding_mode: Support padding modes for outside grid values: zeros`(default), `border, reflection. zeros: use 0 for out-of-bound grid locations, border: use border values for out-of-bound grid locations, reflection: use values at locations reflected by the border for out-of-bound grid locations. If index 0 represents the margin pixel, the reflected value at index -1 will be the same as the value at index 1. For location far away from the border, it will keep being reflected until becoming in bound. If pixel location x = -3.5 reflects by border -1 and becomes x’ = 1.5, then reflects by border 1 and becomes x’’ = 0.5. Default value is 'zeros'.

Inputs

  • X (heterogeneous) - T1: 4-D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the input data.

  • grid (heterogeneous) - T2: Input offset, 4-D tensor of shape (N, H_out, W_out, 2), where H_out and W_out are the height and width of grid and output, Grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-1, 1]. If grid has values outside the range of [-1, 1], the corresponding outputs will be handled as defined by padding_mode.

Outputs

  • Y (heterogeneous) - T1: 4-D tensor of shape (N, C, H_out, W_out) of sampled values. For integer input types, intermediate values are computed as floating point and cast to integer at the end.

Type Constraints

  • T1 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input X and output Y types to all tensor types.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Constrain grid types to float tensors.

OnnxGridSample_16#

class mlprodict.npy.xop_auto_import_.OnnxGridSample_16(*args, **kwargs)#

Version

  • name: GridSample (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Given an input X and a flow-field grid, computes the output Y using X values and pixel locations from grid. Currently, only spatial (4-D) inputs are supported. For input X with shape (N, C, H, W) and grid with shape (N, H_out, W_out, 2), the output Y will have shape (N, C, H_out, W_out).

The tensor X contains values at centers of square pixels in a H by W 2-dimensional image. The tensor grid describes normalized positions where the output Y is to be computed using a specified interpolation method (the mode) and a padding mode (for grid positions falling outside the 2-dimensional image).

Elements in grid[N, H_out, W_out] are size-2 vectors specifying positions in the 2-dimensional space of X. They are used to interpolate output values of Y[N, C, H_out, W_out].

The GridSample operator is often used in doing grid generator and sampler in the [Spatial Transformer Networks](https://arxiv.org/abs/1506.02025). See also in [torch.nn.functional.grid_sample](https://pytorch.org/docs/master/generated/torch.nn.functional.grid_sample.html#torch-nn-functional-grid-sample).

Attributes

  • align_corners: If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input’s corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input’s corner pixels, making the sampling more resolution agnostic. Default value is 0.

  • mode: Three interpolation modes: bilinear (default), nearest and bicubic. Default value is 'bilinear'.

  • padding_mode: Support padding modes for outside grid values: zeros`(default), `border, reflection. zeros: use 0 for out-of-bound grid locations, border: use border values for out-of-bound grid locations, reflection: use values at locations reflected by the border for out-of-bound grid locations. If index 0 represents the margin pixel, the reflected value at index -1 will be the same as the value at index 1. For location far away from the border, it will keep being reflected until becoming in bound. If pixel location x = -3.5 reflects by border -1 and becomes x’ = 1.5, then reflects by border 1 and becomes x’’ = 0.5. Default value is 'zeros'.

Inputs

  • X (heterogeneous) - T1: 4-D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the input data.

  • grid (heterogeneous) - T2: Input offset, 4-D tensor of shape (N, H_out, W_out, 2), where H_out and W_out are the height and width of grid and output, Grid specifies the sampling pixel locations normalized by the input spatial dimensions. Therefore, it should have most values in the range of [-1, 1]. If grid has values outside the range of [-1, 1], the corresponding outputs will be handled as defined by padding_mode.

Outputs

  • Y (heterogeneous) - T1: 4-D tensor of shape (N, C, H_out, W_out) of sampled values. For integer input types, intermediate values are computed as floating point and cast to integer at the end.

Type Constraints

  • T1 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input X and output Y types to all tensor types.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Constrain grid types to float tensors.

OnnxGroupNormalization#

class mlprodict.npy.xop_auto_import_.OnnxGroupNormalization(*args, **kwargs)#

Version

  • name: GroupNormalization (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 18.

Summary

A GroupNormalization function. Carries out group normalization as described in the paper https://arxiv.org/abs/1803.08494

This operator transforms input according to

y = scale * (x - mean) / sqrt(variance + epsilon) + bias,

where the mean and variance are computed per instance per group of channels, and scale and bias should be specified for each group of channels. The number of groups num_groups should be divisible by the number of channels so that there are an equal number of channels per group.

When the number of groups is the same as the number of channels, this operator is equivalent to InstanceNormalization. When there is only one group, this operator is equivalent to LayerNormalization.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.

  • num_groups (required): The number of groups of channels. It should be a divisor of the number of channels C.

Inputs

  • X (heterogeneous) - T: Input data tensor. Dimensions for image cases are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width of the data. Statistics are computed for every group of channels over C, H, and W. For non-image cases, the dimensions are in the form of (N x C x D1 x D2 … Dn).

  • scale (heterogeneous) - T: Scale tensor of shape (num_groups).

  • bias (heterogeneous) - T: Bias tensor of shape (num_groups).

Outputs

  • Y (heterogeneous) - T: The output tensor of the same shape as X.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxGroupNormalization_18#

class mlprodict.npy.xop_auto_import_.OnnxGroupNormalization_18(*args, **kwargs)#

Version

  • name: GroupNormalization (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 18.

Summary

A GroupNormalization function. Carries out group normalization as described in the paper https://arxiv.org/abs/1803.08494

This operator transforms input according to

y = scale * (x - mean) / sqrt(variance + epsilon) + bias,

where the mean and variance are computed per instance per group of channels, and scale and bias should be specified for each group of channels. The number of groups num_groups should be divisible by the number of channels so that there are an equal number of channels per group.

When the number of groups is the same as the number of channels, this operator is equivalent to InstanceNormalization. When there is only one group, this operator is equivalent to LayerNormalization.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.

  • num_groups (required): The number of groups of channels. It should be a divisor of the number of channels C.

Inputs

  • X (heterogeneous) - T: Input data tensor. Dimensions for image cases are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width of the data. Statistics are computed for every group of channels over C, H, and W. For non-image cases, the dimensions are in the form of (N x C x D1 x D2 … Dn).

  • scale (heterogeneous) - T: Scale tensor of shape (num_groups).

  • bias (heterogeneous) - T: Bias tensor of shape (num_groups).

Outputs

  • Y (heterogeneous) - T: The output tensor of the same shape as X.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxHammingWindow#

class mlprodict.npy.xop_auto_import_.OnnxHammingWindow(*args, **kwargs)#

Version

  • name: HammingWindow (GitHub)

  • domain: main

  • since_version: 17

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Generates a Hamming window as described in the paper https://ieeexplore.ieee.org/document/1455106.

Attributes

  • output_datatype: The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T2. The default value is 1 = FLOAT. Default value is 1.

  • periodic: If 1, returns a window to be used as periodic function. If 0, return a symmetric window. When ‘periodic’ is specified, hann computes a window of length size + 1 and returns the first size points. The default value is 1. Default value is 1.

Inputs

  • size (heterogeneous) - T1: A scalar value indicating the length of the window.

Outputs

  • output (heterogeneous) - T2: A Hamming window with length: size. The output has the shape: [size].

Type Constraints

  • T1 in ( tensor(int32), tensor(int64) ): Constrain the input size to int64_t.

  • T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to numeric tensors.

OnnxHammingWindow_17#

class mlprodict.npy.xop_auto_import_.OnnxHammingWindow_17(*args, **kwargs)#

Version

  • name: HammingWindow (GitHub)

  • domain: main

  • since_version: 17

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Generates a Hamming window as described in the paper https://ieeexplore.ieee.org/document/1455106.

Attributes

  • output_datatype: The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T2. The default value is 1 = FLOAT. Default value is 1.

  • periodic: If 1, returns a window to be used as periodic function. If 0, return a symmetric window. When ‘periodic’ is specified, hann computes a window of length size + 1 and returns the first size points. The default value is 1. Default value is 1.

Inputs

  • size (heterogeneous) - T1: A scalar value indicating the length of the window.

Outputs

  • output (heterogeneous) - T2: A Hamming window with length: size. The output has the shape: [size].

Type Constraints

  • T1 in ( tensor(int32), tensor(int64) ): Constrain the input size to int64_t.

  • T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to numeric tensors.

OnnxHannWindow#

class mlprodict.npy.xop_auto_import_.OnnxHannWindow(*args, **kwargs)#

Version

  • name: HannWindow (GitHub)

  • domain: main

  • since_version: 17

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Generates a Hann window as described in the paper https://ieeexplore.ieee.org/document/1455106.

Attributes

  • output_datatype: The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T2. The default value is 1 = FLOAT. Default value is 1.

  • periodic: If 1, returns a window to be used as periodic function. If 0, return a symmetric window. When ‘periodic’ is specified, hann computes a window of length size + 1 and returns the first size points. The default value is 1. Default value is 1.

Inputs

  • size (heterogeneous) - T1: A scalar value indicating the length of the window.

Outputs

  • output (heterogeneous) - T2: A Hann window with length: size. The output has the shape: [size].

Type Constraints

  • T1 in ( tensor(int32), tensor(int64) ): Constrain the input size to int64_t.

  • T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to numeric tensors.

OnnxHannWindow_17#

class mlprodict.npy.xop_auto_import_.OnnxHannWindow_17(*args, **kwargs)#

Version

  • name: HannWindow (GitHub)

  • domain: main

  • since_version: 17

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Generates a Hann window as described in the paper https://ieeexplore.ieee.org/document/1455106.

Attributes

  • output_datatype: The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T2. The default value is 1 = FLOAT. Default value is 1.

  • periodic: If 1, returns a window to be used as periodic function. If 0, return a symmetric window. When ‘periodic’ is specified, hann computes a window of length size + 1 and returns the first size points. The default value is 1. Default value is 1.

Inputs

  • size (heterogeneous) - T1: A scalar value indicating the length of the window.

Outputs

  • output (heterogeneous) - T2: A Hann window with length: size. The output has the shape: [size].

Type Constraints

  • T1 in ( tensor(int32), tensor(int64) ): Constrain the input size to int64_t.

  • T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output types to numeric tensors.

OnnxHardSigmoid#

class mlprodict.npy.xop_auto_import_.OnnxHardSigmoid(*args, **kwargs)#

Version

  • name: HardSigmoid (GitHub)

  • domain: main

  • since_version: 6

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

HardSigmoid takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the HardSigmoid function, y = max(0, min(1, alpha * x + beta)), is applied to the tensor elementwise.

Attributes

  • alpha: Value of alpha. Default value is 0.20000000298023224.

  • beta: Value of beta. Default value is 0.5.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxHardSigmoid_1#

class mlprodict.npy.xop_auto_import_.OnnxHardSigmoid_1(*args, **kwargs)#

Version

  • name: HardSigmoid (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

HardSigmoid takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the HardSigmoid function, y = max(0, min(1, alpha * x + beta)), is applied to the tensor elementwise.

Attributes

  • alpha: Value of alpha default to 0.2 Default value is 0.20000000298023224.

  • beta: Value of beta default to 0.5 Default value is 0.5.

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxHardSigmoid_6#

class mlprodict.npy.xop_auto_import_.OnnxHardSigmoid_6(*args, **kwargs)#

Version

  • name: HardSigmoid (GitHub)

  • domain: main

  • since_version: 6

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

HardSigmoid takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the HardSigmoid function, y = max(0, min(1, alpha * x + beta)), is applied to the tensor elementwise.

Attributes

  • alpha: Value of alpha. Default value is 0.20000000298023224.

  • beta: Value of beta. Default value is 0.5.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxHardSwish#

class mlprodict.npy.xop_auto_import_.OnnxHardSwish(*args, **kwargs)#

Version

  • name: HardSwish (GitHub)

  • domain: main

  • since_version: 14

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

HardSwish takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the HardSwish function, y = x * max(0, min(1, alpha * x + beta)) = x * HardSigmoid<alpha, beta>(x), where alpha = 1/6 and beta = 0.5, is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxHardSwish_14#

class mlprodict.npy.xop_auto_import_.OnnxHardSwish_14(*args, **kwargs)#

Version

  • name: HardSwish (GitHub)

  • domain: main

  • since_version: 14

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

HardSwish takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the HardSwish function, y = x * max(0, min(1, alpha * x + beta)) = x * HardSigmoid<alpha, beta>(x), where alpha = 1/6 and beta = 0.5, is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxHardmax#

class mlprodict.npy.xop_auto_import_.OnnxHardmax(*args, **kwargs)#

Version

  • name: Hardmax (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The operator computes the hardmax values for the given input:

Hardmax(element in input, axis) = 1 if the element is the first maximum value along the specified axis, 0 otherwise

The “axis” attribute indicates the dimension along which Hardmax will be performed. The output tensor has the same shape and contains the Hardmax values of the corresponding input.

Attributes

  • axis:

    Describes the dimension Hardmax will be performed on. Negative

    value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is -1.

Inputs

  • input (heterogeneous) - T: The input tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as the input tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxHardmax_1#

class mlprodict.npy.xop_auto_import_.OnnxHardmax_1(*args, **kwargs)#

Version

  • name: Hardmax (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

The operator computes the hardmax (1 for the first maximum value, and 0 for all others) values for each layer in the batch

of the given input. The input is a 2-D tensor (Tensor<float>) of size

(batch_size x input_feature_dimensions). The output tensor has the same shape and contains the hardmax values of the corresponding input.

Input does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor input in [a_0, a_1, …, a_{k-1}, a_k, …, a_{n-1}] and k is the axis provided, then input will be coerced into a 2-dimensional tensor with dimensions [a_0 * … * a_{k-1}, a_k * … * a_{n-1}]. For the default case where axis=1, this means the input tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * … * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = N and a_1 * … * a_{n-1} = D. Each of these dimensions must be matched correctly, or else the operator will throw errors.

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size Default value is 1.

Inputs

  • input (heterogeneous) - T: The input tensor that’s coerced into a 2D matrix of size (NxD) as described above.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as input tensor (the original size without coercion).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxHardmax_11#

class mlprodict.npy.xop_auto_import_.OnnxHardmax_11(*args, **kwargs)#

Version

  • name: Hardmax (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

The operator computes the hardmax (1 for the first maximum value, and 0 for all others) values for each layer in the batch

of the given input.

The input does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor input in [a_0, a_1, …, a_{k-1}, a_k, …, a_{n-1}] and k is the axis provided, then input will be coerced into a 2-dimensional tensor with dimensions [a_0 * … * a_{k-1}, a_k * … * a_{n-1}]. For the default case where axis=1, this means the input tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * … * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = N and a_1 * … * a_{n-1} = D. Each of these dimensions must be matched correctly, or else the operator will throw errors. The output tensor has the same shape and contains the hardmax values of the corresponding input.

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is 1.

Inputs

  • input (heterogeneous) - T: The input tensor that’s coerced into a 2D matrix of size (NxD) as described above.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as input tensor (the original size without coercion).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxHardmax_13#

class mlprodict.npy.xop_auto_import_.OnnxHardmax_13(*args, **kwargs)#

Version

  • name: Hardmax (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The operator computes the hardmax values for the given input:

Hardmax(element in input, axis) = 1 if the element is the first maximum value along the specified axis, 0 otherwise

The “axis” attribute indicates the dimension along which Hardmax will be performed. The output tensor has the same shape and contains the Hardmax values of the corresponding input.

Attributes

  • axis:

    Describes the dimension Hardmax will be performed on. Negative

    value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is -1.

Inputs

  • input (heterogeneous) - T: The input tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as the input tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxIdentity#

class mlprodict.npy.xop_auto_import_.OnnxIdentity(*args, **kwargs)#

Version

  • name: Identity (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Identity operator

Inputs

  • input (heterogeneous) - V: Input tensor

Outputs

  • output (heterogeneous) - V: Tensor to copy input into.

Type Constraints

  • V in ( optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor, sequence, and optional types.

OnnxIdentity_1#

class mlprodict.npy.xop_auto_import_.OnnxIdentity_1(*args, **kwargs)#

Version

  • name: Identity (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Identity operator

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: Tensor to copy input into.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxIdentity_13#

class mlprodict.npy.xop_auto_import_.OnnxIdentity_13(*args, **kwargs)#

Version

  • name: Identity (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Identity operator

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: Tensor to copy input into.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxIdentity_14#

class mlprodict.npy.xop_auto_import_.OnnxIdentity_14(*args, **kwargs)#

Version

  • name: Identity (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Identity operator

Inputs

  • input (heterogeneous) - V: Input tensor

Outputs

  • output (heterogeneous) - V: Tensor to copy input into.

Type Constraints

  • V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor and sequence types.

OnnxIdentity_16#

class mlprodict.npy.xop_auto_import_.OnnxIdentity_16(*args, **kwargs)#

Version

  • name: Identity (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Identity operator

Inputs

  • input (heterogeneous) - V: Input tensor

Outputs

  • output (heterogeneous) - V: Tensor to copy input into.

Type Constraints

  • V in ( optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor, sequence, and optional types.

OnnxIf#

class mlprodict.npy.xop_auto_import_.OnnxIf(*args, **kwargs)#

Version

  • name: If (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

If conditional

Attributes

  • else_branch (required): Graph to run if condition is false. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the then_branch.

  • then_branch (required): Graph to run if condition is true. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the else_branch.

Inputs

  • cond (heterogeneous) - B: Condition for the if

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - V: Values that are live-out to the enclosing scope. The return values in the then_branch and else_branch must be of the same data type. The then_branch and else_branch may produce tensors with the same element type and different shapes. If corresponding outputs from the then-branch and the else-branch have static shapes S1 and S2, then the shape of the corresponding output variable of the if- node (if present) must be compatible with both S1 and S2 as it represents the union of both possible shapes.For example, if in a model file, the first output of then_branch is typed float tensor with shape [2] and the first output of else_branch is another float tensor with shape [3], If’s first output should have (a) no shape set, or (b) a shape of rank 1 with neither dim_value nor dim_param set, or (c) a shape of rank 1 with a unique dim_param. In contrast, the first output cannot have the shape [2] since [2] and [3] are not compatible.

Type Constraints

  • V in ( optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor, Sequence(Tensor), Optional(Tensor), and Optional(Sequence(Tensor)) types

  • B in ( tensor(bool) ): Only bool

OnnxIf_1#

class mlprodict.npy.xop_auto_import_.OnnxIf_1(*args, **kwargs)#

Version

  • name: If (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

If conditional

Attributes

  • else_branch (required): Graph to run if condition is false. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the then_branch.

  • then_branch (required): Graph to run if condition is true. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the else_branch.

Inputs

  • cond (heterogeneous) - B: Condition for the if

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - V: Values that are live-out to the enclosing scope. The return values in the then_branch and else_branch must be of the same shape and same data type.

Type Constraints

  • V in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types

  • B in ( tensor(bool) ): Only bool

OnnxIf_11#

class mlprodict.npy.xop_auto_import_.OnnxIf_11(*args, **kwargs)#

Version

  • name: If (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

If conditional

Attributes

  • else_branch (required): Graph to run if condition is false. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the then_branch.

  • then_branch (required): Graph to run if condition is true. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the else_branch.

Inputs

  • cond (heterogeneous) - B: Condition for the if

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - V: Values that are live-out to the enclosing scope. The return values in the then_branch and else_branch must be of the same data type. The then_branch and else_branch may produce tensors with the same element type and different shapes. If corresponding outputs from the then-branch and the else-branch have static shapes S1 and S2, then the shape of the corresponding output variable of the if- node (if present) must be compatible with both S1 and S2 as it represents the union of both possible shapes.For example, if in a model file, the first output of then_branch is typed float tensor with shape [2] and the first output of else_branch is another float tensor with shape [3], If’s first output should have (a) no shape set, or (b) a shape of rank 1 with neither dim_value nor dim_param set, or (c) a shape of rank 1 with a unique dim_param. In contrast, the first output cannot have the shape [2] since [2] and [3] are not compatible.

Type Constraints

  • V in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types

  • B in ( tensor(bool) ): Only bool

OnnxIf_13#

class mlprodict.npy.xop_auto_import_.OnnxIf_13(*args, **kwargs)#

Version

  • name: If (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

If conditional

Attributes

  • else_branch (required): Graph to run if condition is false. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the then_branch.

  • then_branch (required): Graph to run if condition is true. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the else_branch.

Inputs

  • cond (heterogeneous) - B: Condition for the if

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - V: Values that are live-out to the enclosing scope. The return values in the then_branch and else_branch must be of the same data type. The then_branch and else_branch may produce tensors with the same element type and different shapes. If corresponding outputs from the then-branch and the else-branch have static shapes S1 and S2, then the shape of the corresponding output variable of the if- node (if present) must be compatible with both S1 and S2 as it represents the union of both possible shapes.For example, if in a model file, the first output of then_branch is typed float tensor with shape [2] and the first output of else_branch is another float tensor with shape [3], If’s first output should have (a) no shape set, or (b) a shape of rank 1 with neither dim_value nor dim_param set, or (c) a shape of rank 1 with a unique dim_param. In contrast, the first output cannot have the shape [2] since [2] and [3] are not compatible.

Type Constraints

  • V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor and Sequence types

  • B in ( tensor(bool) ): Only bool

OnnxIf_16#

class mlprodict.npy.xop_auto_import_.OnnxIf_16(*args, **kwargs)#

Version

  • name: If (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

If conditional

Attributes

  • else_branch (required): Graph to run if condition is false. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the then_branch.

  • then_branch (required): Graph to run if condition is true. Has N outputs: values you wish to be live-out to the enclosing scope. The number of outputs must match the number of outputs in the else_branch.

Inputs

  • cond (heterogeneous) - B: Condition for the if

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - V: Values that are live-out to the enclosing scope. The return values in the then_branch and else_branch must be of the same data type. The then_branch and else_branch may produce tensors with the same element type and different shapes. If corresponding outputs from the then-branch and the else-branch have static shapes S1 and S2, then the shape of the corresponding output variable of the if- node (if present) must be compatible with both S1 and S2 as it represents the union of both possible shapes.For example, if in a model file, the first output of then_branch is typed float tensor with shape [2] and the first output of else_branch is another float tensor with shape [3], If’s first output should have (a) no shape set, or (b) a shape of rank 1 with neither dim_value nor dim_param set, or (c) a shape of rank 1 with a unique dim_param. In contrast, the first output cannot have the shape [2] since [2] and [3] are not compatible.

Type Constraints

  • V in ( optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor, Sequence(Tensor), Optional(Tensor), and Optional(Sequence(Tensor)) types

  • B in ( tensor(bool) ): Only bool

OnnxInstanceNormalization#

class mlprodict.npy.xop_auto_import_.OnnxInstanceNormalization(*args, **kwargs)#

Version

This version of the operator has been available since version 6.

Summary

Carries out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.

y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.

Inputs

  • input (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

  • scale (heterogeneous) - T: The input 1-dimensional scale tensor of size C.

  • B (heterogeneous) - T: The input 1-dimensional bias tensor of size C.

Outputs

  • output (heterogeneous) - T: The output tensor of the same shape as input.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxInstanceNormalization_1#

class mlprodict.npy.xop_auto_import_.OnnxInstanceNormalization_1(*args, **kwargs)#

Version

This version of the operator has been available since version 1.

Summary

Carries out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.

y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.

Attributes

  • consumed_inputs: legacy optimization attribute.

  • epsilon: The epsilon value to use to avoid division by zero, default is 1e-5f. Default value is 9.999999747378752e-06.

Inputs

  • input (heterogeneous) - T: The input 4-dimensional tensor of shape NCHW.

  • scale (heterogeneous) - T: The input 1-dimensional scale tensor of size C.

  • B (heterogeneous) - T: The input 1-dimensional bias tensor of size C.

Outputs

  • output (heterogeneous) - T: The output 4-dimensional tensor of the same shape as input.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxInstanceNormalization_6#

class mlprodict.npy.xop_auto_import_.OnnxInstanceNormalization_6(*args, **kwargs)#

Version

This version of the operator has been available since version 6.

Summary

Carries out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.

y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.

Attributes

  • epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.

Inputs

  • input (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

  • scale (heterogeneous) - T: The input 1-dimensional scale tensor of size C.

  • B (heterogeneous) - T: The input 1-dimensional bias tensor of size C.

Outputs

  • output (heterogeneous) - T: The output tensor of the same shape as input.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxIsInf#

class mlprodict.npy.xop_auto_import_.OnnxIsInf(*args, **kwargs)#

Version

  • name: IsInf (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Map infinity to true and other values to false.

Attributes

  • detect_negative: (Optional) Whether map negative infinity to true. Default to 1 so that negative infinity induces true. Set this attribute to 0 if negative infinity should be mapped to false. Default value is 1.

  • detect_positive: (Optional) Whether map positive infinity to true. Default to 1 so that positive infinity induces true. Set this attribute to 0 if positive infinity should be mapped to false. Default value is 1.

Inputs

  • X (heterogeneous) - T1: input

Outputs

  • Y (heterogeneous) - T2: output

Type Constraints

  • T1 in ( tensor(double), tensor(float) ): Constrain input types to float tensors.

  • T2 in ( tensor(bool) ): Constrain output types to boolean tensors.

OnnxIsInf_10#

class mlprodict.npy.xop_auto_import_.OnnxIsInf_10(*args, **kwargs)#

Version

  • name: IsInf (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Map infinity to true and other values to false.

Attributes

  • detect_negative: (Optional) Whether map negative infinity to true. Default to 1 so that negative infinity induces true. Set this attribute to 0 if negative infinity should be mapped to false. Default value is 1.

  • detect_positive: (Optional) Whether map positive infinity to true. Default to 1 so that positive infinity induces true. Set this attribute to 0 if positive infinity should be mapped to false. Default value is 1.

Inputs

  • X (heterogeneous) - T1: input

Outputs

  • Y (heterogeneous) - T2: output

Type Constraints

  • T1 in ( tensor(double), tensor(float) ): Constrain input types to float tensors.

  • T2 in ( tensor(bool) ): Constrain output types to boolean tensors.

OnnxIsNaN#

class mlprodict.npy.xop_auto_import_.OnnxIsNaN(*args, **kwargs)#

Version

  • name: IsNaN (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Returns which elements of the input are NaN.

Inputs

  • X (heterogeneous) - T1: input

Outputs

  • Y (heterogeneous) - T2: output

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input types to float tensors.

  • T2 in ( tensor(bool) ): Constrain output types to boolean tensors.

OnnxIsNaN_13#

class mlprodict.npy.xop_auto_import_.OnnxIsNaN_13(*args, **kwargs)#

Version

  • name: IsNaN (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Returns which elements of the input are NaN.

Inputs

  • X (heterogeneous) - T1: input

Outputs

  • Y (heterogeneous) - T2: output

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input types to float tensors.

  • T2 in ( tensor(bool) ): Constrain output types to boolean tensors.

OnnxIsNaN_9#

class mlprodict.npy.xop_auto_import_.OnnxIsNaN_9(*args, **kwargs)#

Version

  • name: IsNaN (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Returns which elements of the input are NaN.

Inputs

  • X (heterogeneous) - T1: input

Outputs

  • Y (heterogeneous) - T2: output

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain input types to float tensors.

  • T2 in ( tensor(bool) ): Constrain output types to boolean tensors.

OnnxLRN#

class mlprodict.npy.xop_auto_import_.OnnxLRN(*args, **kwargs)#

Version

  • name: LRN (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Local Response Normalization proposed in the [AlexNet paper](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf). It normalizes over local input regions. The local region is defined across the channels. For an element X[n, c, d1, …, dk] in a tensor of shape (N x C x D1 x D2, …, Dk), its region is {X[n, i, d1, …, dk] | max(0, c - floor((size - 1) / 2)) <= i <= min(C - 1, c + ceil((size - 1) / 2))}.

square_sum[n, c, d1, …, dk] = sum(X[n, i, d1, …, dk] ^ 2), where max(0, c - floor((size - 1) / 2)) <= i <= min(C - 1, c + ceil((size - 1) / 2)).

Y[n, c, d1, …, dk] = X[n, c, d1, …, dk] / (bias + alpha / size * square_sum[n, c, d1, …, dk] ) ^ beta

Attributes

  • alpha: Scaling parameter. Default value is 9.999999747378752e-05.

  • beta: The exponent. Default value is 0.75.

  • bias:

Default value is 1.0.

  • size (required): The number of channels to sum over

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

  • Y (heterogeneous) - T: Output tensor, which has the shape and type as input tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLRN_1#

class mlprodict.npy.xop_auto_import_.OnnxLRN_1(*args, **kwargs)#

Version

  • name: LRN (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Local Response Normalization proposed in the [AlexNet paper](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf). It normalizes over local input regions. The local region is defined across the channels. For an element X[n, c, d1, …, dk] in a tensor of shape (N x C x D1 x D2, …, Dk), its region is {X[n, i, d1, …, dk] | max(0, c - floor((size - 1) / 2)) <= i <= min(C - 1, c + ceil((size - 1) / 2))}.

square_sum[n, c, d1, …, dk] = sum(X[n, i, d1, …, dk] ^ 2), where max(0, c - floor((size - 1) / 2)) <= i <= min(C - 1, c + ceil((size - 1) / 2)).

Y[n, c, d1, …, dk] = X[n, c, d1, …, dk] / (bias + alpha / size * square_sum[n, c, d1, …, dk] ) ^ beta

Attributes

  • alpha: Scaling parameter. Default value is 9.999999747378752e-05.

  • beta: The exponent. Default value is 0.75.

  • bias:

Default value is 1.0.

  • size (required): The number of channels to sum over

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

  • Y (heterogeneous) - T: Output tensor, which has the shape and type as input tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLRN_13#

class mlprodict.npy.xop_auto_import_.OnnxLRN_13(*args, **kwargs)#

Version

  • name: LRN (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Local Response Normalization proposed in the [AlexNet paper](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf). It normalizes over local input regions. The local region is defined across the channels. For an element X[n, c, d1, …, dk] in a tensor of shape (N x C x D1 x D2, …, Dk), its region is {X[n, i, d1, …, dk] | max(0, c - floor((size - 1) / 2)) <= i <= min(C - 1, c + ceil((size - 1) / 2))}.

square_sum[n, c, d1, …, dk] = sum(X[n, i, d1, …, dk] ^ 2), where max(0, c - floor((size - 1) / 2)) <= i <= min(C - 1, c + ceil((size - 1) / 2)).

Y[n, c, d1, …, dk] = X[n, c, d1, …, dk] / (bias + alpha / size * square_sum[n, c, d1, …, dk] ) ^ beta

Attributes

  • alpha: Scaling parameter. Default value is 9.999999747378752e-05.

  • beta: The exponent. Default value is 0.75.

  • bias:

Default value is 1.0.

  • size (required): The number of channels to sum over

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

  • Y (heterogeneous) - T: Output tensor, which has the shape and type as input tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLSTM#

class mlprodict.npy.xop_auto_import_.OnnxLSTM(*args, **kwargs)#

Version

  • name: LSTM (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

o - output gate

f - forget gate

c - cell gate

t - time step (t-1 means previous time step)

W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

Wb[iofc] - W bias vectors for input, output, forget, and cell gates

Rb[iofc] - R bias vectors for input, output, forget, and cell gates

P[iof] - P peephole weight vector for input, output, and forget gates

WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

PB[iof] - P peephole weight vector for backward input, output, and forget gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

  • it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)

  • ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)

  • ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)

  • Ct = ft (.) Ct-1 + it (.) ct

  • ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)

  • Ht = ot (.) h(Ct)

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • input_forget: Couple the input and forget gates if 1. Default value is 0.

  • layout: The shape format of inputs X, initial_h, initial_c and outputs Y, Y_h, Y_c. If 0, the following shapes are expected: X.shape = [seq_length, batch_size, input_size], Y.shape = [seq_length, num_directions, batch_size, hidden_size], initial_h.shape = Y_h.shape = initial_c.shape = Y_c.shape = [num_directions, batch_size, hidden_size]. If 1, the following shapes are expected: X.shape = [batch_size, seq_length, input_size], Y.shape = [batch_size, seq_length, num_directions, hidden_size], initial_h.shape = Y_h.shape = initial_c.shape = Y_c.shape = [batch_size, num_directions, hidden_size]. Default value is 0.

Inputs

Between 3 and 8 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (optional, heterogeneous) - T: Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (optional, heterogeneous) - T: The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

Outputs

Between 0 and 3 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (optional, heterogeneous) - T: The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxLSTM_1#

class mlprodict.npy.xop_auto_import_.OnnxLSTM_1(*args, **kwargs)#

Version

  • name: LSTM (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

o - output gate

f - forget gate

c - cell gate

t - time step (t-1 means previous time step)

W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

Wb[iofc] - W bias vectors for input, output, forget, and cell gates

Rb[iofc] - R bias vectors for input, output, forget, and cell gates

P[iof] - P peephole weight vector for input, output, and forget gates

WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

PB[iof] - P peephole weight vector for backward input, output, and forget gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

  • it = f(Xt*(Wi^T) + Ht-1*Ri + Pi (.) Ct-1 + Wbi + Rbi)

  • ft = f(Xt*(Wf^T) + Ht-1*Rf + Pf (.) Ct-1 + Wbf + Rbf)

  • ct = g(Xt*(Wc^T) + Ht-1*Rc + Wbc + Rbc)

  • Ct = ft (.) Ct-1 + it (.) ct

  • ot = f(Xt*(Wo^T) + Ht-1*Ro + Po (.) Ct + Wbo + Rbo)

  • Ht = ot (.) h(Ct)

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • input_forget: Couple the input and forget gates if 1, default 0. Default value is 0.

  • output_sequence: The sequence output for the hidden is optional if 0. Default 0. Default value is 0.

Inputs

Between 3 and 8 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (optional, heterogeneous) - T: Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (optional, heterogeneous) - T: The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

Outputs

Between 0 and 3 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size]. It is optional if output_sequence is 0.

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (optional, heterogeneous) - T: The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxLSTM_14#

class mlprodict.npy.xop_auto_import_.OnnxLSTM_14(*args, **kwargs)#

Version

  • name: LSTM (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

o - output gate

f - forget gate

c - cell gate

t - time step (t-1 means previous time step)

W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

Wb[iofc] - W bias vectors for input, output, forget, and cell gates

Rb[iofc] - R bias vectors for input, output, forget, and cell gates

P[iof] - P peephole weight vector for input, output, and forget gates

WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

PB[iof] - P peephole weight vector for backward input, output, and forget gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

  • it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)

  • ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)

  • ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)

  • Ct = ft (.) Ct-1 + it (.) ct

  • ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)

  • Ht = ot (.) h(Ct)

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • input_forget: Couple the input and forget gates if 1. Default value is 0.

  • layout: The shape format of inputs X, initial_h, initial_c and outputs Y, Y_h, Y_c. If 0, the following shapes are expected: X.shape = [seq_length, batch_size, input_size], Y.shape = [seq_length, num_directions, batch_size, hidden_size], initial_h.shape = Y_h.shape = initial_c.shape = Y_c.shape = [num_directions, batch_size, hidden_size]. If 1, the following shapes are expected: X.shape = [batch_size, seq_length, input_size], Y.shape = [batch_size, seq_length, num_directions, hidden_size], initial_h.shape = Y_h.shape = initial_c.shape = Y_c.shape = [batch_size, num_directions, hidden_size]. Default value is 0.

Inputs

Between 3 and 8 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (optional, heterogeneous) - T: Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (optional, heterogeneous) - T: The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

Outputs

Between 0 and 3 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (optional, heterogeneous) - T: The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxLSTM_7#

class mlprodict.npy.xop_auto_import_.OnnxLSTM_7(*args, **kwargs)#

Version

  • name: LSTM (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Computes an one-layer LSTM. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

o - output gate

f - forget gate

c - cell gate

t - time step (t-1 means previous time step)

W[iofc] - W parameter weight matrix for input, output, forget, and cell gates

R[iofc] - R recurrence weight matrix for input, output, forget, and cell gates

Wb[iofc] - W bias vectors for input, output, forget, and cell gates

Rb[iofc] - R bias vectors for input, output, forget, and cell gates

P[iof] - P peephole weight vector for input, output, and forget gates

WB[iofc] - W parameter weight matrix for backward input, output, forget, and cell gates

RB[iofc] - R recurrence weight matrix for backward input, output, forget, and cell gates

WBb[iofc] - W bias vectors for backward input, output, forget, and cell gates

RBb[iofc] - R bias vectors for backward input, output, forget, and cell gates

PB[iof] - P peephole weight vector for backward input, output, and forget gates

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Sigmoid, g=Tanh, h=Tanh):

  • it = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Pi (.) Ct-1 + Wbi + Rbi)

  • ft = f(Xt*(Wf^T) + Ht-1*(Rf^T) + Pf (.) Ct-1 + Wbf + Rbf)

  • ct = g(Xt*(Wc^T) + Ht-1*(Rc^T) + Wbc + Rbc)

  • Ct = ft (.) Ct-1 + it (.) ct

  • ot = f(Xt*(Wo^T) + Ht-1*(Ro^T) + Po (.) Ct + Wbo + Rbo)

  • Ht = ot (.) h(Ct)

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: A list of 3 (or 6 if bidirectional) activation functions for input, output, forget, cell, and hidden. The activation functions must be one of the activation functions specified above. Optional: See the equations for default if not specified.

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • input_forget: Couple the input and forget gates if 1. Default value is 0.

Inputs

Between 3 and 8 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for the gates. Concatenation of W[iofc] and WB[iofc] (if bidirectional) along dimension 0. The tensor has shape [num_directions, 4*hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of R[iofc] and RB[iofc] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 4*hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wb[iofc], Rb[iofc]], and [WBb[iofc], RBb[iofc]] (if bidirectional) along dimension 0. This tensor has shape [num_directions, 8*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • initial_c (optional, heterogeneous) - T: Optional initial value of the cell. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

  • P (optional, heterogeneous) - T: The weight tensor for peepholes. Concatenation of P[iof] and PB[iof] (if bidirectional) along dimension 0. It has shape [num_directions, 3*hidde_size]. Optional: If not specified - assumed to be 0.

Outputs

Between 0 and 3 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

  • Y_c (optional, heterogeneous) - T: The last output value of the cell. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxLayerNormalization#

class mlprodict.npy.xop_auto_import_.OnnxLayerNormalization(*args, **kwargs)#

Version

  • name: LayerNormalization (GitHub)

  • domain: main

  • since_version: 17

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

This is layer normalization defined in ONNX as function. The overall computation can be split into two stages. The first stage is standardization, which makes the normalized elements have zero mean and unit variances. The computation required by standardization can be described by the following equations.

Mean = ReduceMean<axes=normalized_axes>(X)
D = Sub(X, Mean)
DD = Mul(D, D)
Var = ReduceMean<axes=normalized_axes>(DD)
VarEps = Add(Var, epsilon)
StdDev = Sqrt(VarEps)
InvStdDev = Reciprocal(StdDev)
Normalized = Mul(D, InvStdDev)

where normalized_axes is [axis, …, rank of X - 1]. The variables Var and StdDev stand for variance and standard deviation, respectively. The second output is Mean and the last one is InvStdDev. Depending on stash_type attribute, the actual computation must happen in different floating-point precision. For example, if stash_type is 1, this operator casts all input variables to 32-bit float, perform the computation, and finally cast Normalized back to the original type of X. The second stage then scales and shifts the outcome of the first stage using

NormalizedScaled = Mul(Normalized, Scale)
Y = Add(NormalizedScaled, B)

The second stage doesn’t depends on stash_type. All equations are in [this syntax](onnx/onnx). The same variable (i.e., input, output, and attribute) uses the same name in the equations above and this operator’s definition. Let d[i] indicate the i-th dimension of X. If X’s shape is [d[0], …, d[axis-1], d[axis], …, d[rank-1]], the shape of Mean and InvStdDev is [d[0], …, d[axis-1], 1, …, 1]. Y and X have the same shape.

Attributes

  • axis: The first normalization dimension. If rank(X) is r, axis’ allowed range is [-r, r]. Negative value means counting dimensions from the back. Default value is -1.

  • epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.

  • stash_type: Type of Mean and InvStdDev. This also specifies stage one’s computation precision. Default value is 1.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Tensor to be normalized.

  • Scale (heterogeneous) - T: Scale tensor.

  • B (optional, heterogeneous) - T: Bias tensor.

Outputs

Between 1 and 3 outputs.

  • Y (heterogeneous) - T: Normalized tensor.

  • Mean (optional, heterogeneous) - U: Saved mean used during training to speed up gradient computation

  • InvStdDev (optional, heterogeneous) - U: Saved inverse standard deviation used during training to speed up gradient computation.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input types and output Y type to float tensors.

  • U in ( tensor(bfloat16), tensor(float) ): Type of Mean and InvStdDev tensors.

OnnxLayerNormalization_17#

class mlprodict.npy.xop_auto_import_.OnnxLayerNormalization_17(*args, **kwargs)#

Version

  • name: LayerNormalization (GitHub)

  • domain: main

  • since_version: 17

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

This is layer normalization defined in ONNX as function. The overall computation can be split into two stages. The first stage is standardization, which makes the normalized elements have zero mean and unit variances. The computation required by standardization can be described by the following equations.

Mean = ReduceMean<axes=normalized_axes>(X)
D = Sub(X, Mean)
DD = Mul(D, D)
Var = ReduceMean<axes=normalized_axes>(DD)
VarEps = Add(Var, epsilon)
StdDev = Sqrt(VarEps)
InvStdDev = Reciprocal(StdDev)
Normalized = Mul(D, InvStdDev)

where normalized_axes is [axis, …, rank of X - 1]. The variables Var and StdDev stand for variance and standard deviation, respectively. The second output is Mean and the last one is InvStdDev. Depending on stash_type attribute, the actual computation must happen in different floating-point precision. For example, if stash_type is 1, this operator casts all input variables to 32-bit float, perform the computation, and finally cast Normalized back to the original type of X. The second stage then scales and shifts the outcome of the first stage using

NormalizedScaled = Mul(Normalized, Scale)
Y = Add(NormalizedScaled, B)

The second stage doesn’t depends on stash_type. All equations are in [this syntax](onnx/onnx). The same variable (i.e., input, output, and attribute) uses the same name in the equations above and this operator’s definition. Let d[i] indicate the i-th dimension of X. If X’s shape is [d[0], …, d[axis-1], d[axis], …, d[rank-1]], the shape of Mean and InvStdDev is [d[0], …, d[axis-1], 1, …, 1]. Y and X have the same shape.

Attributes

  • axis: The first normalization dimension. If rank(X) is r, axis’ allowed range is [-r, r]. Negative value means counting dimensions from the back. Default value is -1.

  • epsilon: The epsilon value to use to avoid division by zero. Default value is 9.999999747378752e-06.

  • stash_type: Type of Mean and InvStdDev. This also specifies stage one’s computation precision. Default value is 1.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T: Tensor to be normalized.

  • Scale (heterogeneous) - T: Scale tensor.

  • B (optional, heterogeneous) - T: Bias tensor.

Outputs

Between 1 and 3 outputs.

  • Y (heterogeneous) - T: Normalized tensor.

  • Mean (optional, heterogeneous) - U: Saved mean used during training to speed up gradient computation

  • InvStdDev (optional, heterogeneous) - U: Saved inverse standard deviation used during training to speed up gradient computation.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input types and output Y type to float tensors.

  • U in ( tensor(bfloat16), tensor(float) ): Type of Mean and InvStdDev tensors.

OnnxLeakyRelu#

class mlprodict.npy.xop_auto_import_.OnnxLeakyRelu(*args, **kwargs)#

Version

  • name: LeakyRelu (GitHub)

  • domain: main

  • since_version: 16

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

LeakyRelu takes input data (Tensor<T>) and an argument alpha, and produces one output data (Tensor<T>) where the function f(x) = alpha * x for x < 0, f(x) = x for x >= 0, is applied to the data tensor elementwise.

History - Version 16 adds bfloat16 to the types allowed.

Attributes

  • alpha: Coefficient of leakage. Default value is 0.009999999776482582.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLeakyRelu_1#

class mlprodict.npy.xop_auto_import_.OnnxLeakyRelu_1(*args, **kwargs)#

Version

  • name: LeakyRelu (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

LeakyRelu takes input data (Tensor<T>) and an argument alpha, and produces one output data (Tensor<T>) where the function f(x) = alpha * x for x < 0, f(x) = x for x >= 0, is applied to the data tensor elementwise.

Attributes

  • alpha: Coefficient of leakage default to 0.01. Default value is 0.009999999776482582.

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLeakyRelu_16#

class mlprodict.npy.xop_auto_import_.OnnxLeakyRelu_16(*args, **kwargs)#

Version

  • name: LeakyRelu (GitHub)

  • domain: main

  • since_version: 16

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

LeakyRelu takes input data (Tensor<T>) and an argument alpha, and produces one output data (Tensor<T>) where the function f(x) = alpha * x for x < 0, f(x) = x for x >= 0, is applied to the data tensor elementwise.

History - Version 16 adds bfloat16 to the types allowed.

Attributes

  • alpha: Coefficient of leakage. Default value is 0.009999999776482582.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLeakyRelu_6#

class mlprodict.npy.xop_auto_import_.OnnxLeakyRelu_6(*args, **kwargs)#

Version

  • name: LeakyRelu (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

LeakyRelu takes input data (Tensor<T>) and an argument alpha, and produces one output data (Tensor<T>) where the function f(x) = alpha * x for x < 0, f(x) = x for x >= 0, is applied to the data tensor elementwise.

Attributes

  • alpha: Coefficient of leakage. Default value is 0.009999999776482582.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLess#

class mlprodict.npy.xop_auto_import_.OnnxLess(*args, **kwargs)#

Version

  • name: Less (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Returns the tensor resulted from performing the less logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxLessOrEqual#

class mlprodict.npy.xop_auto_import_.OnnxLessOrEqual(*args, **kwargs)#

Version

  • name: LessOrEqual (GitHub)

  • domain: main

  • since_version: 16

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 16.

Summary

Returns the tensor resulted from performing the less_equal logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxLessOrEqual_12#

class mlprodict.npy.xop_auto_import_.OnnxLessOrEqual_12(*args, **kwargs)#

Version

  • name: LessOrEqual (GitHub)

  • domain: main

  • since_version: 12

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 12.

Summary

Returns the tensor resulted from performing the less_equal logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxLessOrEqual_16#

class mlprodict.npy.xop_auto_import_.OnnxLessOrEqual_16(*args, **kwargs)#

Version

  • name: LessOrEqual (GitHub)

  • domain: main

  • since_version: 16

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 16.

Summary

Returns the tensor resulted from performing the less_equal logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxLess_1#

class mlprodict.npy.xop_auto_import_.OnnxLess_1(*args, **kwargs)#

Version

  • name: Less (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Returns the tensor resulted from performing the less logical operation elementwise on the input tensors A and B.

If broadcasting is enabled, the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. See the doc of Add for a detailed description of the broadcasting rules.

Attributes

  • axis: If set, defines the broadcast dimensions.

  • broadcast: Enable broadcasting Default value is 0.

Inputs

  • A (heterogeneous) - T: Left input tensor for the logical operator.

  • B (heterogeneous) - T: Right input tensor for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input to float tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxLess_13#

class mlprodict.npy.xop_auto_import_.OnnxLess_13(*args, **kwargs)#

Version

  • name: Less (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Returns the tensor resulted from performing the less logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxLess_7#

class mlprodict.npy.xop_auto_import_.OnnxLess_7(*args, **kwargs)#

Version

  • name: Less (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Returns the tensor resulted from performing the less logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input to float tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxLess_9#

class mlprodict.npy.xop_auto_import_.OnnxLess_9(*args, **kwargs)#

Version

  • name: Less (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Returns the tensor resulted from performing the less logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all numeric tensors.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxLog#

class mlprodict.npy.xop_auto_import_.OnnxLog(*args, **kwargs)#

Version

  • name: Log (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Calculates the natural log of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The natural log of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLogSoftmax#

class mlprodict.npy.xop_auto_import_.OnnxLogSoftmax(*args, **kwargs)#

Version

  • name: LogSoftmax (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The operator computes the log of softmax values for the given input:

LogSoftmax(input, axis) = Log(Softmax(input, axis=axis))

The “axis” attribute indicates the dimension along which LogSoftmax will be performed. The output tensor has the same shape and contains the LogSoftmax values of the corresponding input.

Attributes

  • axis:

    Describes the dimension LogSoftmax will be performed on. Negative

    value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is -1.

Inputs

  • input (heterogeneous) - T: The input tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as the input tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLogSoftmax_1#

class mlprodict.npy.xop_auto_import_.OnnxLogSoftmax_1(*args, **kwargs)#

Version

  • name: LogSoftmax (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

The operator computes the logsoftmax (log of softmax) values for each layer in the batch

of the given input. The input is a 2-D tensor (Tensor<float>) of size

(batch_size x input_feature_dimensions). The output tensor has the same shape and contains the logsoftmax values of the corresponding input.

Input does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor input in [a_0, a_1, …, a_{k-1}, a_k, …, a_{n-1}] and k is the axis provided, then input will be coerced into a 2-dimensional tensor with dimensions [a_0 * … * a_{k-1}, a_k * … * a_{n-1}]. For the default case where axis=1, this means the input tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * … * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = N and a_1 * … * a_{n-1} = D. Each of these dimensions must be matched correctly, or else the operator will throw errors.

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size Default value is 1.

Inputs

  • input (heterogeneous) - T: The input tensor that’s coerced into a 2D matrix of size (NxD) as described above.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as input tensor (the original size without coercion).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLogSoftmax_11#

class mlprodict.npy.xop_auto_import_.OnnxLogSoftmax_11(*args, **kwargs)#

Version

  • name: LogSoftmax (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

The operator computes the logsoftmax (log of softmax) values for each layer in the batch

of the given input.

The input does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor input in [a_0, a_1, …, a_{k-1}, a_k, …, a_{n-1}] and k is the axis provided, then input will be coerced into a 2-dimensional tensor with dimensions [a_0 * … * a_{k-1}, a_k * … * a_{n-1}]. For the default case where axis=1, this means the input tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * … * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = N and a_1 * … * a_{n-1} = D. Each of these dimensions must be matched correctly, or else the operator will throw errors. The output tensor has the same shape and contains the logsoftmax values of the corresponding input.

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is 1.

Inputs

  • input (heterogeneous) - T: The input tensor that’s coerced into a 2D matrix of size (NxD) as described above.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as input tensor (the original size without coercion).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLogSoftmax_13#

class mlprodict.npy.xop_auto_import_.OnnxLogSoftmax_13(*args, **kwargs)#

Version

  • name: LogSoftmax (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The operator computes the log of softmax values for the given input:

LogSoftmax(input, axis) = Log(Softmax(input, axis=axis))

The “axis” attribute indicates the dimension along which LogSoftmax will be performed. The output tensor has the same shape and contains the LogSoftmax values of the corresponding input.

Attributes

  • axis:

    Describes the dimension LogSoftmax will be performed on. Negative

    value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is -1.

Inputs

  • input (heterogeneous) - T: The input tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as the input tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLog_1#

class mlprodict.npy.xop_auto_import_.OnnxLog_1(*args, **kwargs)#

Version

  • name: Log (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Calculates the natural log of the given input tensor, element-wise.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The natural log of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLog_13#

class mlprodict.npy.xop_auto_import_.OnnxLog_13(*args, **kwargs)#

Version

  • name: Log (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Calculates the natural log of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The natural log of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLog_6#

class mlprodict.npy.xop_auto_import_.OnnxLog_6(*args, **kwargs)#

Version

  • name: Log (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Calculates the natural log of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The natural log of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLoop#

class mlprodict.npy.xop_auto_import_.OnnxLoop(*args, **kwargs)#

Version

  • name: Loop (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Generic Looping construct. This loop has multiple termination conditions:

  1. Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.

  2. Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.

This table summarizes the operating modes of this operator with equivalent C-style code:

Operator inputs defined as (max_trip_count, condition_var).

input (“”, “”):
for (int i=0; ; ++i) {

cond = … // Note this value is ignored, but is required in the body

}

input (“”, cond) // Note this is analogous to a while loop

bool cond = …; for (int i=0; cond; ++i) {

cond = …;

}

input (“”, 1) // Note this is analogous to a do-while loop

bool cond = true for (int i=0; cond; ++i) {

cond = …;

}

input (trip_count, “”) // Note this is analogous to a for loop

int trip_count = … for (int i=0; i < trip_count; ++i) {

cond = …; // ignored

}

input (trip_count, cond)

int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {

cond = …;

}

Sample usage - cond as well as trip count

graph predict-net {

%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return

}

graph body-net (

%i[INT32, scalar] // iteration number %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %b_in[INT32, scalar] // incoming value of loop-carried-dependency b

) {

%my_local = Add(%a, %b_in) %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated return %keepgoing_out, %b_out, %user_defined_val

}

Sample equivalent C code

{

/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */

/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code / / initialize loop-carried variables and scan-output variables */ bool keepgoing_out = keepgoing int b_out = b

for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
/* Implicitly-defined code: bind actual parameter values

to formal parameter variables of loop-body */

bool keepgoing_in = keepgoing_out; bool b_in = b_out;

/* User-defined code (loop body) / int my_local = a + b_in; // Reading value “a” from the enclosing scope is fine b_out = a - b_in; keepgoing_out = my_local > b_out; user_defined_val = b_in + b_in; // b_in and b_out are different variables / End user-defined code */

/* Implicitly defined-code */ user_defined_vals[i] = user_defined_val // accumulate scan-output values

} // int t = my_local; // Can’t do this. my_local is not accessible here.

// The values below are bound to the output variables of the loop and therefore accessible // b_out; user_defined_vals; keepgoing_out;

}

There are several things of note in this code snippet:

  1. Values from the enclosing scope (i.e. variable “a” here) are in scope and can be referenced in the inputs of the loop.

  2. Any values computed in the loop body that needs to be used in a subsequent iteration or after the loop are modelled using a pair of variables in the loop-body, consisting of an input variable (eg., b_in) and an output variable (eg., b_out). These are referred to as loop-carried dependences. The loop operation node supplies the input value of the input variable for the first iteration, and returns the output value of the output variable produced by the final iteration.

  3. Scan_output variables are used to implicitly concatenate values computed across all the iterations. In the above example, the value of user_defined_val computed over all iterations are concatenated and returned as the value of user_defined_vals after the loop.

  4. Values created in the body cannot be accessed in the enclosing scope, except using the mechanism described above.

Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).

The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order.

Attributes

  • body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.

Inputs

Between 2 and 2147483647 inputs.

  • M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.

  • cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.

  • v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)

Outputs

Between 1 and 2147483647 outputs.

  • v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs. Scan outputs must be Tensors.

Type Constraints

  • V in ( optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor, Sequence(Tensor), Optional(Tensor), and Optional(Sequence(Tensor)) types

  • I in ( tensor(int64) ): tensor of int64, which should be a scalar.

  • B in ( tensor(bool) ): tensor of bool, which should be a scalar.

OnnxLoop_1#

class mlprodict.npy.xop_auto_import_.OnnxLoop_1(*args, **kwargs)#

Version

  • name: Loop (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Generic Looping construct. This loop has multiple termination conditions:

  1. Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.

  2. Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.

This table summarizes the operating modes of this operator with equivalent C-style code:

Operator inputs defined as (max_trip_count, condition_var).

input (“”, “”):
for (int i=0; ; ++i) {

cond = … // Note this value is ignored, but is required in the body

}

input (“”, cond) // Note this is analogous to a while loop

bool cond = …; for (int i=0; cond; ++i) {

cond = …;

}

input (“”, 1) // Note this is analogous to a do-while loop

bool cond = true for (int i=0; cond; ++i) {

cond = …;

}

input (trip_count, “”) // Note this is analogous to a for loop

int trip_count = … for (int i=0; i < trip_count; ++i) {

cond = …; // ignored

}

input (trip_count, cond)

int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {

cond = …;

}

Sample usage - cond as well as trip count

graph predict-net {

%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return

}

graph body-net (

%i[INT32, scalar] %keepgoing[BOOL, scalar] %b[INT32, scalar]

) {

%my_local = Add(%a, %b) %b_out = Sub(%a, %b) %keepgoing_out = Greater(%my_local, %b_out) %user_defined_vals = Add(%b, %b) return %keepgoing_out, %b_out, %user_defined_vals

}

Sample equivalent C code

{

/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */

/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code */ for (int i=0; i < max_trip_count && keepgoing; ++i) {

/* User-defined code (loop body) / int my_local = a + b; // Reading values in the enclosing scope is fine b = a - b; // writes fine if we specify b as a loop-carried dependency keepgoing = my_local > b; // keepgoing is a loop-carried dependency user_defined_vals[i] = b + b; / End user-defined code */

} // my_local = 123; // Can’t do this. my_local was defined in the body

// These below values are live-out from the loop and therefore accessible b_out; user_defined_vals; keepgoing_out;

}

There are several things of note in this code snippet:

  1. Values from the enclosing scope (i.e. variable a here) are in scope and can be referenced in the inputs of the loop.

  2. Any variables which you wish to make available in the enclosing scope (i.e. the variables b and keepgoing) must be declared as either loop-carried dependencies (both at the op inputs and output and at the body net input and output) or scan_outputs.

  3. Values created in the body cannot be accessed in the enclosing scope.

Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).

Attributes

  • body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.

Inputs

Between 3 and 2147483647 inputs.

  • M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.

  • cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.

  • v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)

Outputs

Between 1 and 2147483647 outputs.

  • v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs

Type Constraints

  • V in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types

  • I in ( tensor(int64) ): tensor of int64, which should be a scalar.

  • B in ( tensor(bool) ): tensor of bool, which should be a scalar.

OnnxLoop_11#

class mlprodict.npy.xop_auto_import_.OnnxLoop_11(*args, **kwargs)#

Version

  • name: Loop (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Generic Looping construct. This loop has multiple termination conditions:

  1. Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.

  2. Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.

This table summarizes the operating modes of this operator with equivalent C-style code:

Operator inputs defined as (max_trip_count, condition_var).

input (“”, “”):
for (int i=0; ; ++i) {

cond = … // Note this value is ignored, but is required in the body

}

input (“”, cond) // Note this is analogous to a while loop

bool cond = …; for (int i=0; cond; ++i) {

cond = …;

}

input (“”, 1) // Note this is analogous to a do-while loop

bool cond = true for (int i=0; cond; ++i) {

cond = …;

}

input (trip_count, “”) // Note this is analogous to a for loop

int trip_count = … for (int i=0; i < trip_count; ++i) {

cond = …; // ignored

}

input (trip_count, cond)

int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {

cond = …;

}

Sample usage - cond as well as trip count

graph predict-net {

%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return

}

graph body-net (

%i[INT32, scalar] // iteration number %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %b_in[INT32, scalar] // incoming value of loop-carried-dependency b

) {

%my_local = Add(%a, %b_in) %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated return %keepgoing_out, %b_out, %user_defined_val

}

Sample equivalent C code

{

/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */

/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code / / initialize loop-carried variables and scan-output variables */ bool keepgoing_out = keepgoing int b_out = b

for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
/* Implicitly-defined code: bind actual parameter values

to formal parameter variables of loop-body */

bool keepgoing_in = keepgoing_out; bool b_in = b_out;

/* User-defined code (loop body) / int my_local = a + b_in; // Reading value “a” from the enclosing scope is fine b_out = a - b_in; keepgoing_out = my_local > b_out; user_defined_val = b_in + b_in; // b_in and b_out are different variables / End user-defined code */

/* Implicitly defined-code */ user_defined_vals[i] = user_defined_val // accumulate scan-output values

} // int t = my_local; // Can’t do this. my_local is not accessible here.

// The values below are bound to the output variables of the loop and therefore accessible // b_out; user_defined_vals; keepgoing_out;

}

There are several things of note in this code snippet:

  1. Values from the enclosing scope (i.e. variable “a” here) are in scope and can be referenced in the inputs of the loop.

  2. Any values computed in the loop body that needs to be used in a subsequent iteration or after the loop are modelled using a pair of variables in the loop-body, consisting of an input variable (eg., b_in) and an output variable (eg., b_out). These are referred to as loop-carried dependences. The loop operation node supplies the input value of the input variable for the first iteration, and returns the output value of the output variable produced by the final iteration.

  3. Scan_output variables are used to implicitly concatenate values computed across all the iterations. In the above example, the value of user_defined_val computed over all iterations are concatenated and returned as the value of user_defined_vals after the loop.

  4. Values created in the body cannot be accessed in the enclosing scope, except using the mechanism described above.

Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).

Attributes

  • body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.

Inputs

Between 2 and 2147483647 inputs.

  • M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.

  • cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.

  • v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)

Outputs

Between 1 and 2147483647 outputs.

  • v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs

Type Constraints

  • V in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types

  • I in ( tensor(int64) ): tensor of int64, which should be a scalar.

  • B in ( tensor(bool) ): tensor of bool, which should be a scalar.

OnnxLoop_13#

class mlprodict.npy.xop_auto_import_.OnnxLoop_13(*args, **kwargs)#

Version

  • name: Loop (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Generic Looping construct. This loop has multiple termination conditions:

  1. Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.

  2. Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.

This table summarizes the operating modes of this operator with equivalent C-style code:

Operator inputs defined as (max_trip_count, condition_var).

input (“”, “”):
for (int i=0; ; ++i) {

cond = … // Note this value is ignored, but is required in the body

}

input (“”, cond) // Note this is analogous to a while loop

bool cond = …; for (int i=0; cond; ++i) {

cond = …;

}

input (“”, 1) // Note this is analogous to a do-while loop

bool cond = true for (int i=0; cond; ++i) {

cond = …;

}

input (trip_count, “”) // Note this is analogous to a for loop

int trip_count = … for (int i=0; i < trip_count; ++i) {

cond = …; // ignored

}

input (trip_count, cond)

int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {

cond = …;

}

Sample usage - cond as well as trip count

graph predict-net {

%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return

}

graph body-net (

%i[INT32, scalar] // iteration number %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %b_in[INT32, scalar] // incoming value of loop-carried-dependency b

) {

%my_local = Add(%a, %b_in) %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated return %keepgoing_out, %b_out, %user_defined_val

}

Sample equivalent C code

{

/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */

/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code / / initialize loop-carried variables and scan-output variables */ bool keepgoing_out = keepgoing int b_out = b

for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
/* Implicitly-defined code: bind actual parameter values

to formal parameter variables of loop-body */

bool keepgoing_in = keepgoing_out; bool b_in = b_out;

/* User-defined code (loop body) / int my_local = a + b_in; // Reading value “a” from the enclosing scope is fine b_out = a - b_in; keepgoing_out = my_local > b_out; user_defined_val = b_in + b_in; // b_in and b_out are different variables / End user-defined code */

/* Implicitly defined-code */ user_defined_vals[i] = user_defined_val // accumulate scan-output values

} // int t = my_local; // Can’t do this. my_local is not accessible here.

// The values below are bound to the output variables of the loop and therefore accessible // b_out; user_defined_vals; keepgoing_out;

}

There are several things of note in this code snippet:

  1. Values from the enclosing scope (i.e. variable “a” here) are in scope and can be referenced in the inputs of the loop.

  2. Any values computed in the loop body that needs to be used in a subsequent iteration or after the loop are modelled using a pair of variables in the loop-body, consisting of an input variable (eg., b_in) and an output variable (eg., b_out). These are referred to as loop-carried dependences. The loop operation node supplies the input value of the input variable for the first iteration, and returns the output value of the output variable produced by the final iteration.

  3. Scan_output variables are used to implicitly concatenate values computed across all the iterations. In the above example, the value of user_defined_val computed over all iterations are concatenated and returned as the value of user_defined_vals after the loop.

  4. Values created in the body cannot be accessed in the enclosing scope, except using the mechanism described above.

Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).

The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order.

Attributes

  • body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.

Inputs

Between 2 and 2147483647 inputs.

  • M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.

  • cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.

  • v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)

Outputs

Between 1 and 2147483647 outputs.

  • v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs. Scan outputs must be Tensors.

Type Constraints

  • V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor and Sequence types

  • I in ( tensor(int64) ): tensor of int64, which should be a scalar.

  • B in ( tensor(bool) ): tensor of bool, which should be a scalar.

OnnxLoop_16#

class mlprodict.npy.xop_auto_import_.OnnxLoop_16(*args, **kwargs)#

Version

  • name: Loop (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Generic Looping construct. This loop has multiple termination conditions:

  1. Trip count. Iteration count specified at runtime. Set by specifying the input M. Optional. Set to empty string to omit. Note that a static trip count (specified at graph construction time) can be specified by passing in a constant node for input M.

  2. Loop termination condition. This is an input to the op that determines whether to run the first iteration and also a loop-carried dependency for the body graph. The body graph must yield a value for the condition variable, whether this input is provided or not.

This table summarizes the operating modes of this operator with equivalent C-style code:

Operator inputs defined as (max_trip_count, condition_var).

input (“”, “”):
for (int i=0; ; ++i) {

cond = … // Note this value is ignored, but is required in the body

}

input (“”, cond) // Note this is analogous to a while loop

bool cond = …; for (int i=0; cond; ++i) {

cond = …;

}

input (“”, 1) // Note this is analogous to a do-while loop

bool cond = true for (int i=0; cond; ++i) {

cond = …;

}

input (trip_count, “”) // Note this is analogous to a for loop

int trip_count = … for (int i=0; i < trip_count; ++i) {

cond = …; // ignored

}

input (trip_count, cond)

int trip_count = …; bool cond = …; for (int i=0; i < trip_count && cond; ++i) {

cond = …;

}

Sample usage - cond as well as trip count

graph predict-net {

%a = Constant[value = <Scalar Tensor [3]>]() %b = Constant[value = <Scalar Tensor [6]>]() %keepgoing = Constant[value = <Scalar Tensor [1]>]() %max_trip_count = Constant[value = <Scalar Tensor [10]>]() %keepgoing_out, %b_out, %user_defined_vals = Loop[body = <graph body-net>](%max_trip_count, %keepgoing, %b) return

}

graph body-net (

%i[INT32, scalar] // iteration number %keepgoing_in[BOOL, scalar] // incoming loop-termination-condition; not used %b_in[INT32, scalar] // incoming value of loop-carried-dependency b

) {

%my_local = Add(%a, %b_in) %b_out = Sub(%a, %b_in) // outgoing value of loop-carried-dependency b %keepgoing_out = Greater(%my_local, %b_out) // outgoing loop-termination-condition %user_defined_val = Add(%b_in, %b_in) // scan-output value to be accumulated return %keepgoing_out, %b_out, %user_defined_val

}

Sample equivalent C code

{

/* User-defined code (enclosing scope) / int a = 3, b = 6; bool keepgoing = true; // Analogous to input cond / End user-defined code */

/* Implicitly-defined code / const int max_trip_count = 10; // Analogous to input M int user_defined_vals[]; // Imagine this is resizable / End implicitly-defined code / / initialize loop-carried variables and scan-output variables */ bool keepgoing_out = keepgoing int b_out = b

for (int i=0; i < max_trip_count && keepgoing_out; ++i) {
/* Implicitly-defined code: bind actual parameter values

to formal parameter variables of loop-body */

bool keepgoing_in = keepgoing_out; bool b_in = b_out;

/* User-defined code (loop body) / int my_local = a + b_in; // Reading value “a” from the enclosing scope is fine b_out = a - b_in; keepgoing_out = my_local > b_out; user_defined_val = b_in + b_in; // b_in and b_out are different variables / End user-defined code */

/* Implicitly defined-code */ user_defined_vals[i] = user_defined_val // accumulate scan-output values

} // int t = my_local; // Can’t do this. my_local is not accessible here.

// The values below are bound to the output variables of the loop and therefore accessible // b_out; user_defined_vals; keepgoing_out;

}

There are several things of note in this code snippet:

  1. Values from the enclosing scope (i.e. variable “a” here) are in scope and can be referenced in the inputs of the loop.

  2. Any values computed in the loop body that needs to be used in a subsequent iteration or after the loop are modelled using a pair of variables in the loop-body, consisting of an input variable (eg., b_in) and an output variable (eg., b_out). These are referred to as loop-carried dependences. The loop operation node supplies the input value of the input variable for the first iteration, and returns the output value of the output variable produced by the final iteration.

  3. Scan_output variables are used to implicitly concatenate values computed across all the iterations. In the above example, the value of user_defined_val computed over all iterations are concatenated and returned as the value of user_defined_vals after the loop.

  4. Values created in the body cannot be accessed in the enclosing scope, except using the mechanism described above.

Note that the semantics of this op support “diagonal” or “wavefront” execution. (See Step 3 here for an example: https://devblogs.nvidia.com/optimizing-recurrent-neural-networks-cudnn-5/). Frontends should emit multi-layer RNNs as a series of While operators (with time being the inner looping dimension), with each successive layer consuming the scan_outputs from the previous layer, possibly going through several point-wise operators (e.g. dropout, residual connections, linear layer).

The input/output of subgraph (produced by loop node) matching is based on order instead of name. The implementation will figure out the names based on this order.

Attributes

  • body (required): The graph run each iteration. It has 2+N inputs: (iteration_num, condition, loop carried dependencies…). It has 1+N+K outputs: (condition, loop carried dependencies…, scan_outputs…). Each scan_output is created by concatenating the value of the specified output value at the end of each iteration of the loop. It is an error if the dimensions or data type of these scan_outputs change across loop iterations.

Inputs

Between 2 and 2147483647 inputs.

  • M (optional, heterogeneous) - I: A maximum trip-count for the loop specified at runtime. Optional. Pass empty string to skip.

  • cond (optional, heterogeneous) - B: A boolean termination condition. Optional. Pass empty string to skip.

  • v_initial (variadic) - V: The initial values of any loop-carried dependencies (values that change across loop iterations)

Outputs

Between 1 and 2147483647 outputs.

  • v_final_and_scan_outputs (variadic) - V: Final N loop carried dependency values then K scan_outputs. Scan outputs must be Tensors.

Type Constraints

  • V in ( optional(seq(tensor(bfloat16))), optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bfloat16)), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bfloat16)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor, Sequence(Tensor), Optional(Tensor), and Optional(Sequence(Tensor)) types

  • I in ( tensor(int64) ): tensor of int64, which should be a scalar.

  • B in ( tensor(bool) ): tensor of bool, which should be a scalar.

OnnxLpNormalization#

class mlprodict.npy.xop_auto_import_.OnnxLpNormalization(*args, **kwargs)#

Version

  • name: LpNormalization (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Given a matrix, apply Lp-normalization along the provided axis.

Attributes

  • axis: The axis on which to apply normalization, -1 mean last axis. Default value is -1.

  • p: The order of the normalization, only 1 or 2 are supported. Default value is 2.

Inputs

  • input (heterogeneous) - T: Input matrix

Outputs

  • output (heterogeneous) - T: Matrix after normalization

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLpNormalization_1#

class mlprodict.npy.xop_auto_import_.OnnxLpNormalization_1(*args, **kwargs)#

Version

  • name: LpNormalization (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Given a matrix, apply Lp-normalization along the provided axis.

Attributes

  • axis: The axis on which to apply normalization, -1 mean last axis. Default value is -1.

  • p: The order of the normalization, only 1 or 2 are supported. Default value is 2.

Inputs

  • input (heterogeneous) - T: Input matrix

Outputs

  • output (heterogeneous) - T: Matrix after normalization

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLpPool#

class mlprodict.npy.xop_auto_import_.OnnxLpPool(*args, **kwargs)#

Version

  • name: LpPool (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

LpPool consumes an input tensor X and applies Lp pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. Lp pooling consisting of computing the Lp norm on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - {kernelSpatialShape}) / strides_spatial_shape[i] + 1)

or#

output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - {kernelSpatialShape}) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - {kernelSpatialShape} + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + {kernelSpatialShape} - input_spatial_shape[i]

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is 0.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.

  • kernel_shape (required): The size of the kernel along each axis.

  • p: p value of the Lp norm used to pool over the input data. Default value is 2.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from Lp pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLpPool_1#

class mlprodict.npy.xop_auto_import_.OnnxLpPool_1(*args, **kwargs)#

Version

  • name: LpPool (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

LpPool consumes an input tensor X and applies Lp pooling across the the tensor according to kernel sizes, stride sizes, and pad lengths. Lp pooling consisting of computing the Lp norm on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. DEPRECATION NOTE: auto_pad is only intended to support legacy uses, and for framework authors, one is explicitly encouraged to use explicit padding specified in the pads attribute. Default value is 'NOTSET'.

  • kernel_shape: The size of the kernel along each axis.

  • p: p value of the Lp norm used to pool over the input data, default is 2.0. Default value is 2.0.

  • pads: Padding for the beginning and ending along each axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute.

  • strides: Stride along each axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimension are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from Lp pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLpPool_11#

class mlprodict.npy.xop_auto_import_.OnnxLpPool_11(*args, **kwargs)#

Version

  • name: LpPool (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

LpPool consumes an input tensor X and applies Lp pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. Lp pooling consisting of computing the Lp norm on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • kernel_shape (required): The size of the kernel along each axis.

  • p: p value of the Lp norm used to pool over the input data. Default value is 2.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from Lp pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLpPool_18#

class mlprodict.npy.xop_auto_import_.OnnxLpPool_18(*args, **kwargs)#

Version

  • name: LpPool (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

LpPool consumes an input tensor X and applies Lp pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. Lp pooling consisting of computing the Lp norm on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - {kernelSpatialShape}) / strides_spatial_shape[i] + 1)

or#

output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - {kernelSpatialShape}) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - {kernelSpatialShape} + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + {kernelSpatialShape} - input_spatial_shape[i]

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is 0.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults is 1 along each spatial axis.

  • kernel_shape (required): The size of the kernel along each axis.

  • p: p value of the Lp norm used to pool over the input data. Default value is 2.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from Lp pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxLpPool_2#

class mlprodict.npy.xop_auto_import_.OnnxLpPool_2(*args, **kwargs)#

Version

  • name: LpPool (GitHub)

  • domain: main

  • since_version: 2

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 2.

Summary

LpPool consumes an input tensor X and applies Lp pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. Lp pooling consisting of computing the Lp norm on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.

  • kernel_shape (required): The size of the kernel along each axis.

  • p: p value of the Lp norm used to pool over the input data. Default value is 2.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size.

Outputs

  • Y (heterogeneous) - T: Output data tensor from Lp pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMatMul#

class mlprodict.npy.xop_auto_import_.OnnxMatMul(*args, **kwargs)#

Version

  • name: MatMul (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

Inputs

  • A (heterogeneous) - T: N-dimensional matrix A

  • B (heterogeneous) - T: N-dimensional matrix B

Outputs

  • Y (heterogeneous) - T: Matrix multiply results from A * B

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to float/int tensors.

OnnxMatMulInteger#

class mlprodict.npy.xop_auto_import_.OnnxMatMulInteger(*args, **kwargs)#

Version

  • name: MatMulInteger (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html. The production MUST never overflow. The accumulation may overflow if and only if in 32 bits.

Inputs

Between 2 and 4 inputs.

  • A (heterogeneous) - T1: N-dimensional matrix A

  • B (heterogeneous) - T2: N-dimensional matrix B

  • a_zero_point (optional, heterogeneous) - T1: Zero point tensor for input ‘A’. It’s optional and default value is 0. It could be a scalar or N-D tensor. Scalar refers to per tensor quantization whereas N-D refers to per row quantization. If the input is 2D of shape [M, K] then zero point tensor may be an M element vector [zp_1, zp_2, …, zp_M]. If the input is N-D tensor with shape [D1, D2, M, K] then zero point tensor may have shape [D1, D2, M, 1].

  • b_zero_point (optional, heterogeneous) - T2: Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a N-D tensor, Scalar refers to per tensor quantization whereas N-D refers to per col quantization. If the input is 2D of shape [K, N] then zero point tensor may be an N element vector [zp_1, zp_2, …, zp_N]. If the input is N-D tensor with shape [D1, D2, K, N] then zero point tensor may have shape [D1, D2, 1, N].

Outputs

  • Y (heterogeneous) - T3: Matrix multiply results from A * B

Type Constraints

  • T1 in ( tensor(int8), tensor(uint8) ): Constrain input A data type to 8-bit integer tensor.

  • T2 in ( tensor(int8), tensor(uint8) ): Constrain input B data type to 8-bit integer tensor.

  • T3 in ( tensor(int32) ): Constrain output Y data type as 32-bit integer tensor.

OnnxMatMulInteger_10#

class mlprodict.npy.xop_auto_import_.OnnxMatMulInteger_10(*args, **kwargs)#

Version

  • name: MatMulInteger (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html. The production MUST never overflow. The accumulation may overflow if and only if in 32 bits.

Inputs

Between 2 and 4 inputs.

  • A (heterogeneous) - T1: N-dimensional matrix A

  • B (heterogeneous) - T2: N-dimensional matrix B

  • a_zero_point (optional, heterogeneous) - T1: Zero point tensor for input ‘A’. It’s optional and default value is 0. It could be a scalar or N-D tensor. Scalar refers to per tensor quantization whereas N-D refers to per row quantization. If the input is 2D of shape [M, K] then zero point tensor may be an M element vector [zp_1, zp_2, …, zp_M]. If the input is N-D tensor with shape [D1, D2, M, K] then zero point tensor may have shape [D1, D2, M, 1].

  • b_zero_point (optional, heterogeneous) - T2: Zero point tensor for input ‘B’. It’s optional and default value is 0. It could be a scalar or a N-D tensor, Scalar refers to per tensor quantization whereas N-D refers to per col quantization. If the input is 2D of shape [K, N] then zero point tensor may be an N element vector [zp_1, zp_2, …, zp_N]. If the input is N-D tensor with shape [D1, D2, K, N] then zero point tensor may have shape [D1, D2, 1, N].

Outputs

  • Y (heterogeneous) - T3: Matrix multiply results from A * B

Type Constraints

  • T1 in ( tensor(int8), tensor(uint8) ): Constrain input A data type to 8-bit integer tensor.

  • T2 in ( tensor(int8), tensor(uint8) ): Constrain input B data type to 8-bit integer tensor.

  • T3 in ( tensor(int32) ): Constrain output Y data type as 32-bit integer tensor.

OnnxMatMul_1#

class mlprodict.npy.xop_auto_import_.OnnxMatMul_1(*args, **kwargs)#

Version

  • name: MatMul (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

Inputs

  • A (heterogeneous) - T: N-dimensional matrix A

  • B (heterogeneous) - T: N-dimensional matrix B

Outputs

  • Y (heterogeneous) - T: Matrix multiply results from A * B

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMatMul_13#

class mlprodict.npy.xop_auto_import_.OnnxMatMul_13(*args, **kwargs)#

Version

  • name: MatMul (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

Inputs

  • A (heterogeneous) - T: N-dimensional matrix A

  • B (heterogeneous) - T: N-dimensional matrix B

Outputs

  • Y (heterogeneous) - T: Matrix multiply results from A * B

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to float/int tensors.

OnnxMatMul_9#

class mlprodict.npy.xop_auto_import_.OnnxMatMul_9(*args, **kwargs)#

Version

  • name: MatMul (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

Inputs

  • A (heterogeneous) - T: N-dimensional matrix A

  • B (heterogeneous) - T: N-dimensional matrix B

Outputs

  • Y (heterogeneous) - T: Matrix multiply results from A * B

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to float/int tensors.

OnnxMax#

class mlprodict.npy.xop_auto_import_.OnnxMax(*args, **kwargs)#

Version

  • name: Max (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Element-wise max of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for max.

Outputs

  • max (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to numeric tensors.

OnnxMaxPool#

class mlprodict.npy.xop_auto_import_.OnnxMaxPool(*args, **kwargs)#

Version

  • name: MaxPool (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

or#

output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) - input_spatial_shape[i]

The output of each pooling window is maximum number of elements exclude pad.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is 0.

  • dilations: Dilation value along each spatial axis of filter. If not present, the dilation defaults to 1 along each spatial axis.

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • storage_order: The storage order of the tensor. 0 is row major, and 1 is column major. This attribute is used only to convert an n-tuple index value into a single integer value for producing the second output. Default value is 0.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

Between 1 and 2 outputs.

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

  • Indices (optional, heterogeneous) - I: Indices tensor from max pooling across the input tensor. The dimensions of indices are the same as output tensor. The values in indices of are the indices of the selected values during pooling. The indices are computed as flatten 1-D tensor, and the indices do not consider padding. So the values in indices are in [0, N x C x D1 x … x Dn).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int8), tensor(uint8) ): Constrain input and output types to float and 8 bit tensors.

  • I in ( tensor(int64) ): Constrain index tensor to int64

OnnxMaxPool_1#

class mlprodict.npy.xop_auto_import_.OnnxMaxPool_1(*args, **kwargs)#

Version

  • name: MaxPool (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i]

The output of each pooling window is maximum number of elements exclude pad.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMaxPool_10#

class mlprodict.npy.xop_auto_import_.OnnxMaxPool_10(*args, **kwargs)#

Version

  • name: MaxPool (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

or#

output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) - input_spatial_shape[i]

The output of each pooling window is maximum number of elements exclude pad.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is 0.

  • dilations: Dilation value along each spatial axis of filter.

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • storage_order: The storage order of the tensor. 0 is row major, and 1 is column major. Default value is 0.

  • strides: Stride along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

Between 1 and 2 outputs.

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

  • Indices (optional, heterogeneous) - I: Indices tensor from max pooling across the input tensor. The dimensions of indices are the same as output tensor. The values in indices of are the indices of the selected values during pooling. The indices are computed as flatten 1-D tensor, and the indices do not consider padding. So the values in indices are in [0, N x C x D1 x … x Dn).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • I in ( tensor(int64) ): Constrain index tensor to int64

OnnxMaxPool_11#

class mlprodict.npy.xop_auto_import_.OnnxMaxPool_11(*args, **kwargs)#

Version

  • name: MaxPool (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

or#

output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) - input_spatial_shape[i]

The output of each pooling window is maximum number of elements exclude pad.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is 0.

  • dilations: Dilation value along each spatial axis of filter. If not present, the dilation defaults to 1 along each spatial axis.

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • storage_order: The storage order of the tensor. 0 is row major, and 1 is column major. Default value is 0.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

Between 1 and 2 outputs.

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

  • Indices (optional, heterogeneous) - I: Indices tensor from max pooling across the input tensor. The dimensions of indices are the same as output tensor. The values in indices of are the indices of the selected values during pooling. The indices are computed as flatten 1-D tensor, and the indices do not consider padding. So the values in indices are in [0, N x C x D1 x … x Dn).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • I in ( tensor(int64) ): Constrain index tensor to int64

OnnxMaxPool_12#

class mlprodict.npy.xop_auto_import_.OnnxMaxPool_12(*args, **kwargs)#

Version

  • name: MaxPool (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

or#

output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)

if ceil_mode is enabled

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i] - 1) * dilations[i] + 1) - input_spatial_shape[i]

The output of each pooling window is maximum number of elements exclude pad.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is 0.

  • dilations: Dilation value along each spatial axis of filter. If not present, the dilation defaults to 1 along each spatial axis.

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • storage_order: The storage order of the tensor. 0 is row major, and 1 is column major. This attribute is used only to convert an n-tuple index value into a single integer value for producing the second output. Default value is 0.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

Between 1 and 2 outputs.

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

  • Indices (optional, heterogeneous) - I: Indices tensor from max pooling across the input tensor. The dimensions of indices are the same as output tensor. The values in indices of are the indices of the selected values during pooling. The indices are computed as flatten 1-D tensor, and the indices do not consider padding. So the values in indices are in [0, N x C x D1 x … x Dn).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int8), tensor(uint8) ): Constrain input and output types to float and 8 bit tensors.

  • I in ( tensor(int64) ): Constrain index tensor to int64

OnnxMaxPool_8#

class mlprodict.npy.xop_auto_import_.OnnxMaxPool_8(*args, **kwargs)#

Version

  • name: MaxPool (GitHub)

  • domain: main

  • since_version: 8

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 8.

Summary

MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:

output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i] - kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)

* pad_shape[i] is sum of pads along axis i

auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:

VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i] - kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])

And pad shape will be following if SAME_UPPER or SAME_LOWER:

pad_shape[i] = (output_spatial_shape[i] - 1) * strides_spatial_shape[i] + kernel_spatial_shape[i] - input_spatial_shape[i]

The output of each pooling window is maximum number of elements exclude pad.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • storage_order: The storage order of the tensor. 0 is row major, and 1 is column major. Default value is 0.

  • strides: Stride along each spatial axis.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

Outputs

Between 1 and 2 outputs.

  • Y (heterogeneous) - T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used

  • Indices (optional, heterogeneous) - I: Indices tensor from max pooling across the input tensor. The dimensions of indices are the same as output tensor. The values in indices of are the indices of the selected values during pooling. The indices are computed as flatten 1-D tensor, and the indices do not consider padding. So the values in indices are in [0, N x C x D1 x … x Dn).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • I in ( tensor(int64) ): Constrain index tensor to int64

OnnxMaxRoiPool#

class mlprodict.npy.xop_auto_import_.OnnxMaxRoiPool(*args, **kwargs)#

Version

  • name: MaxRoiPool (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

ROI max pool consumes an input tensor X and region of interests (RoIs) to apply max pooling across each RoI, to produce output 4-D tensor of shape (num_rois, channels, pooled_shape[0], pooled_shape[1]).

Attributes

  • pooled_shape (required): ROI pool output shape (height, width).

  • spatial_scale: Multiplicative spatial scale factor to translate ROI coordinates from their input scale to the scale used when pooling. Default value is 1.0.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (heterogeneous) - T: RoIs (Regions of Interest) to pool over. Should be a 2-D tensor of shape (num_rois, 5) given as [[batch_id, x1, y1, x2, y2], …].

Outputs

  • Y (heterogeneous) - T: RoI pooled output 4-D tensor of shape (num_rois, channels, pooled_shape[0], pooled_shape[1]).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMaxRoiPool_1#

class mlprodict.npy.xop_auto_import_.OnnxMaxRoiPool_1(*args, **kwargs)#

Version

  • name: MaxRoiPool (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

ROI max pool consumes an input tensor X and region of interests (RoIs) to apply max pooling across each RoI, to produce output 4-D tensor of shape (num_rois, channels, pooled_shape[0], pooled_shape[1]).

Attributes

  • pooled_shape (required): ROI pool output shape (height, width).

  • spatial_scale: Multiplicative spatial scale factor to translate ROI coordinates from their input scale to the scale used when pooling. Default value is 1.0.

Inputs

  • X (heterogeneous) - T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (heterogeneous) - T: RoIs (Regions of Interest) to pool over. Should be a 2-D tensor of shape (num_rois, 5) given as [[batch_id, x1, y1, x2, y2], …].

Outputs

  • Y (heterogeneous) - T: RoI pooled output 4-D tensor of shape (num_rois, channels, pooled_shape[0], pooled_shape[1]).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMaxUnpool#

class mlprodict.npy.xop_auto_import_.OnnxMaxUnpool(*args, **kwargs)#

Version

  • name: MaxUnpool (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

MaxUnpool essentially computes the partial inverse of the MaxPool op.

The input information to this op is typically the output information from a MaxPool op. The first input tensor X is the tensor that needs to be unpooled, which is typically the pooled tensor (first output) from MaxPool. The second input tensor, I, contains the indices to the (locally maximal) elements corrsponding to the elements in the first input tensor X. Input tensor I is typically the second output of the MaxPool op. The third (optional) input is a tensor that specifies the output size of the unpooling operation.

MaxUnpool is intended to do ‘partial’ inverse of the MaxPool op. ‘Partial’ because all the non-maximal

values from the original input to MaxPool are set to zero in the output of the MaxUnpool op. Pooling the result of an unpooling operation should give back the original input to the unpooling op.

MaxUnpool can produce the same output size for several input sizes, which makes unpooling op ambiguous.

The third input argument, output_size, is meant to disambiguate the op and produce output tensor of known/predictable size.

In addition to the inputs, MaxUnpool takes three attributes, namely kernel_shape, strides, and pads,

which define the exact unpooling op. The attributes typically have the same values as the corrsponding pooling op that the unpooling op is trying to invert.

Attributes

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T1: Input data tensor that has to be unpooled. This tensor is typically the first output of the MaxPool op.Dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non-image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • I (heterogeneous) - T2: Input data tensor containing the indices corresponding to elements in the first input tensor X.This tensor is typically the second output of the MaxPool op.Dimensions must be the same as input tensor X. The indices are linear, i.e. computed considering the tensor as flattened 1-D tensor, assuming row-major storage. Also, the linear indices should not consider padding. So the values in indices are in the range [0, N x C x D1 x … x Dn).

  • output_shape (optional, heterogeneous) - T2: The shape of the output can be explicitly set which will cause pads values to be auto generated. If ‘output_shape’ is specified, ‘pads’ values are ignored.

Outputs

  • output (heterogeneous) - T1: Output data tensor that contains the result of the unpooling.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T2 in ( tensor(int64) ): Constrain index tensor to int64

OnnxMaxUnpool_11#

class mlprodict.npy.xop_auto_import_.OnnxMaxUnpool_11(*args, **kwargs)#

Version

  • name: MaxUnpool (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

MaxUnpool essentially computes the partial inverse of the MaxPool op.

The input information to this op is typically the output information from a MaxPool op. The first input tensor X is the tensor that needs to be unpooled, which is typically the pooled tensor (first output) from MaxPool. The second input tensor, I, contains the indices to the (locally maximal) elements corrsponding to the elements in the first input tensor X. Input tensor I is typically the second output of the MaxPool op. The third (optional) input is a tensor that specifies the output size of the unpooling operation.

MaxUnpool is intended to do ‘partial’ inverse of the MaxPool op. ‘Partial’ because all the non-maximal

values from the original input to MaxPool are set to zero in the output of the MaxUnpool op. Pooling the result of an unpooling operation should give back the original input to the unpooling op.

MaxUnpool can produce the same output size for several input sizes, which makes unpooling op ambiguous.

The third input argument, output_size, is meant to disambiguate the op and produce output tensor of known/predictable size.

In addition to the inputs, MaxUnpool takes three attributes, namely kernel_shape, strides, and pads,

which define the exact unpooling op. The attributes typically have the same values as the corrsponding pooling op that the unpooling op is trying to invert.

Attributes

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T1: Input data tensor that has to be unpooled. This tensor is typically the first output of the MaxPool op.Dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non-image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • I (heterogeneous) - T2: Input data tensor containing the indices corresponding to elements in the first input tensor X.This tensor is typically the second output of the MaxPool op.Dimensions must be the same as input tensor X. The indices are linear, i.e. computed considering the tensor as flattened 1-D tensor, assuming row-major storage. Also, the linear indices should not consider padding. So the values in indices are in the range [0, N x C x D1 x … x Dn).

  • output_shape (optional, heterogeneous) - T2: The shape of the output can be explicitly set which will cause pads values to be auto generated. If ‘output_shape’ is specified, ‘pads’ values are ignored.

Outputs

  • output (heterogeneous) - T1: Output data tensor that contains the result of the unpooling.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T2 in ( tensor(int64) ): Constrain index tensor to int64

OnnxMaxUnpool_9#

class mlprodict.npy.xop_auto_import_.OnnxMaxUnpool_9(*args, **kwargs)#

Version

  • name: MaxUnpool (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

MaxUnpool essentially computes the partial inverse of the MaxPool op.

The input information to this op is typically the output information from a MaxPool op. The first input tensor X is the tensor that needs to be unpooled, which is typically the pooled tensor (first output) from MaxPool. The second input tensor, I, contains the indices to the (locally maximal) elements corrsponding to the elements in the first input tensor X. Input tensor I is typically the second output of the MaxPool op. The third (optional) input is a tensor that specifies the output size of the unpooling operation.

MaxUnpool is intended to do ‘partial’ inverse of the MaxPool op. ‘Partial’ because all the non-maximal

values from the original input to MaxPool are set to zero in the output of the MaxUnpool op. Pooling the result of an unpooling operation should give back the original input to the unpooling op.

MaxUnpool can produce the same output size for several input sizes, which makes unpooling op ambiguous.

The third input argument, output_size, is meant to disambiguate the op and produce output tensor of known/predictable size.

In addition to the inputs, MaxUnpool takes three attributes, namely kernel_shape, strides, and pads,

which define the exact unpooling op. The attributes typically have the same values as the corrsponding pooling op that the unpooling op is trying to invert.

Attributes

  • kernel_shape (required): The size of the kernel along each axis.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis.

Inputs

Between 2 and 3 inputs.

  • X (heterogeneous) - T1: Input data tensor that has to be unpooled. This tensor is typically the first output of the MaxPool op.Dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non-image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • I (heterogeneous) - T2: Input data tensor containing the indices corresponding to elements in the first input tensor X.This tensor is typically the second output of the MaxPool op.Dimensions must be the same as input tensor X. The indices are linear, i.e. computed considering the tensor as flattened 1-D tensor, assuming row-major storage. Also, the linear indices should not consider padding. So the values in indices are in the range [0, N x C x D1 x … x Dn).

  • output_shape (optional, heterogeneous) - T2: The shape of the output can be explicitly set which will cause pads values to be auto generated. If ‘output_shape’ is specified, ‘pads’ values are ignored.

Outputs

  • output (heterogeneous) - T1: Output data tensor that contains the result of the unpooling.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T2 in ( tensor(int64) ): Constrain index tensor to int64

OnnxMax_1#

class mlprodict.npy.xop_auto_import_.OnnxMax_1(*args, **kwargs)#

Version

  • name: Max (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Element-wise max of each of the input tensors. All inputs and outputs must have the same shape and data type.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for Max.

Outputs

  • max (heterogeneous) - T: Output tensor. Same dimension as inputs.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMax_12#

class mlprodict.npy.xop_auto_import_.OnnxMax_12(*args, **kwargs)#

Version

  • name: Max (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Element-wise max of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for max.

Outputs

  • max (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to numeric tensors.

OnnxMax_13#

class mlprodict.npy.xop_auto_import_.OnnxMax_13(*args, **kwargs)#

Version

  • name: Max (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Element-wise max of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for max.

Outputs

  • max (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to numeric tensors.

OnnxMax_6#

class mlprodict.npy.xop_auto_import_.OnnxMax_6(*args, **kwargs)#

Version

  • name: Max (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Element-wise max of each of the input tensors. All inputs and outputs must have the same shape and data type.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for Max.

Outputs

  • max (heterogeneous) - T: Output tensor. Same dimension as inputs.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMax_8#

class mlprodict.npy.xop_auto_import_.OnnxMax_8(*args, **kwargs)#

Version

  • name: Max (GitHub)

  • domain: main

  • since_version: 8

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 8.

Summary

Element-wise max of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for max.

Outputs

  • max (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMean#

class mlprodict.npy.xop_auto_import_.OnnxMean(*args, **kwargs)#

Version

  • name: Mean (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Element-wise mean of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for mean.

Outputs

  • mean (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMeanVarianceNormalization#

class mlprodict.npy.xop_auto_import_.OnnxMeanVarianceNormalization(*args, **kwargs)#

Version

This version of the operator has been available since version 13.

Summary

A MeanVarianceNormalization Function: Perform mean variance normalization on the input tensor X using formula: <br/> ` (X-EX)/sqrt(E(X-EX)^2) `

Attributes

  • axes: A list of integers, along which to reduce. The default is to caculate along axes [0,2,3] for calculating mean and variance along each channel. Two variables with the same C-coordinate are associated with the same mean and variance. Default value is [0 2 3].

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to all numeric tensors.

OnnxMeanVarianceNormalization_13#

class mlprodict.npy.xop_auto_import_.OnnxMeanVarianceNormalization_13(*args, **kwargs)#

Version

This version of the operator has been available since version 13.

Summary

A MeanVarianceNormalization Function: Perform mean variance normalization on the input tensor X using formula: <br/> ` (X-EX)/sqrt(E(X-EX)^2) `

Attributes

  • axes: A list of integers, along which to reduce. The default is to caculate along axes [0,2,3] for calculating mean and variance along each channel. Two variables with the same C-coordinate are associated with the same mean and variance. Default value is [0 2 3].

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to all numeric tensors.

OnnxMeanVarianceNormalization_9#

class mlprodict.npy.xop_auto_import_.OnnxMeanVarianceNormalization_9(*args, **kwargs)#

Version

This version of the operator has been available since version 9.

Summary

A MeanVarianceNormalization Function: Perform mean variance normalization on the input tensor X using formula: <br/> ` (X-EX)/sqrt(E(X-EX)^2) `

Attributes

  • axes: A list of integers, along which to reduce. The default is to caculate along axes [0,2,3] for calculating mean and variance along each channel. Two variables with the same C-coordinate are associated with the same mean and variance. Default value is [0 2 3].

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to all numeric tensors.

OnnxMean_1#

class mlprodict.npy.xop_auto_import_.OnnxMean_1(*args, **kwargs)#

Version

  • name: Mean (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Element-wise mean of each of the input tensors. All inputs and outputs must have the same shape and data type.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for Mean.

Outputs

  • mean (heterogeneous) - T: Output tensor. Same dimension as inputs.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMean_13#

class mlprodict.npy.xop_auto_import_.OnnxMean_13(*args, **kwargs)#

Version

  • name: Mean (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Element-wise mean of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for mean.

Outputs

  • mean (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMean_6#

class mlprodict.npy.xop_auto_import_.OnnxMean_6(*args, **kwargs)#

Version

  • name: Mean (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Element-wise mean of each of the input tensors. All inputs and outputs must have the same shape and data type.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for Mean.

Outputs

  • mean (heterogeneous) - T: Output tensor. Same dimension as inputs.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMean_8#

class mlprodict.npy.xop_auto_import_.OnnxMean_8(*args, **kwargs)#

Version

  • name: Mean (GitHub)

  • domain: main

  • since_version: 8

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 8.

Summary

Element-wise mean of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for mean.

Outputs

  • mean (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMelWeightMatrix#

class mlprodict.npy.xop_auto_import_.OnnxMelWeightMatrix(*args, **kwargs)#

Version

  • name: MelWeightMatrix (GitHub)

  • domain: main

  • since_version: 17

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Generate a MelWeightMatrix that can be used to re-weight a Tensor containing a linearly sampled frequency spectra (from DFT or STFT) into num_mel_bins frequency information based on the [lower_edge_hertz, upper_edge_hertz] range on the mel scale. This function defines the mel scale in terms of a frequency in hertz according to the following formula:

mel(f) = 2595 * log10(1 + f/700)

In the returned matrix, all the triangles (filterbanks) have a peak value of 1.0.

The returned MelWeightMatrix can be used to right-multiply a spectrogram S of shape [frames, num_spectrogram_bins] of linear scale spectrum values (e.g. STFT magnitudes) to generate a “mel spectrogram” M of shape [frames, num_mel_bins].

Attributes

  • output_datatype: The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T3. The default value is 1 = FLOAT. Default value is 1.

Inputs

  • num_mel_bins (heterogeneous) - T1: The number of bands in the mel spectrum.

  • dft_length (heterogeneous) - T1: The size of the original DFT. The size of the original DFT is used to infer the size of the onesided DFT, which is understood to be floor(dft_length/2) + 1, i.e. the spectrogram only contains the nonredundant DFT bins.

  • sample_rate (heterogeneous) - T1: Samples per second of the input signal used to create the spectrogram. Used to figure out the frequencies corresponding to each spectrogram bin, which dictates how they are mapped into the mel scale.

  • lower_edge_hertz (heterogeneous) - T2: Lower bound on the frequencies to be included in the mel spectrum. This corresponds to the lower edge of the lowest triangular band.

  • upper_edge_hertz (heterogeneous) - T2: The desired top edge of the highest frequency band.

Outputs

  • output (heterogeneous) - T3: The Mel Weight Matrix. The output has the shape: [floor(dft_length/2) + 1][num_mel_bins].

Type Constraints

  • T1 in ( tensor(int32), tensor(int64) ): Constrain to integer tensors.

  • T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain to float tensors

  • T3 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any numerical types.

OnnxMelWeightMatrix_17#

class mlprodict.npy.xop_auto_import_.OnnxMelWeightMatrix_17(*args, **kwargs)#

Version

  • name: MelWeightMatrix (GitHub)

  • domain: main

  • since_version: 17

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Generate a MelWeightMatrix that can be used to re-weight a Tensor containing a linearly sampled frequency spectra (from DFT or STFT) into num_mel_bins frequency information based on the [lower_edge_hertz, upper_edge_hertz] range on the mel scale. This function defines the mel scale in terms of a frequency in hertz according to the following formula:

mel(f) = 2595 * log10(1 + f/700)

In the returned matrix, all the triangles (filterbanks) have a peak value of 1.0.

The returned MelWeightMatrix can be used to right-multiply a spectrogram S of shape [frames, num_spectrogram_bins] of linear scale spectrum values (e.g. STFT magnitudes) to generate a “mel spectrogram” M of shape [frames, num_mel_bins].

Attributes

  • output_datatype: The data type of the output tensor. Strictly must be one of the values from DataType enum in TensorProto whose values correspond to T3. The default value is 1 = FLOAT. Default value is 1.

Inputs

  • num_mel_bins (heterogeneous) - T1: The number of bands in the mel spectrum.

  • dft_length (heterogeneous) - T1: The size of the original DFT. The size of the original DFT is used to infer the size of the onesided DFT, which is understood to be floor(dft_length/2) + 1, i.e. the spectrogram only contains the nonredundant DFT bins.

  • sample_rate (heterogeneous) - T1: Samples per second of the input signal used to create the spectrogram. Used to figure out the frequencies corresponding to each spectrogram bin, which dictates how they are mapped into the mel scale.

  • lower_edge_hertz (heterogeneous) - T2: Lower bound on the frequencies to be included in the mel spectrum. This corresponds to the lower edge of the lowest triangular band.

  • upper_edge_hertz (heterogeneous) - T2: The desired top edge of the highest frequency band.

Outputs

  • output (heterogeneous) - T3: The Mel Weight Matrix. The output has the shape: [floor(dft_length/2) + 1][num_mel_bins].

Type Constraints

  • T1 in ( tensor(int32), tensor(int64) ): Constrain to integer tensors.

  • T2 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain to float tensors

  • T3 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any numerical types.

OnnxMin#

class mlprodict.npy.xop_auto_import_.OnnxMin(*args, **kwargs)#

Version

  • name: Min (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Element-wise min of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for min.

Outputs

  • min (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to numeric tensors.

OnnxMin_1#

class mlprodict.npy.xop_auto_import_.OnnxMin_1(*args, **kwargs)#

Version

  • name: Min (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Element-wise min of each of the input tensors. All inputs and outputs must have the same shape and data type.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for Min

Outputs

  • min (heterogeneous) - T: Output tensor. Same dimension as inputs.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMin_12#

class mlprodict.npy.xop_auto_import_.OnnxMin_12(*args, **kwargs)#

Version

  • name: Min (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Element-wise min of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for min.

Outputs

  • min (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to numeric tensors.

OnnxMin_13#

class mlprodict.npy.xop_auto_import_.OnnxMin_13(*args, **kwargs)#

Version

  • name: Min (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Element-wise min of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for min.

Outputs

  • min (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to numeric tensors.

OnnxMin_6#

class mlprodict.npy.xop_auto_import_.OnnxMin_6(*args, **kwargs)#

Version

  • name: Min (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Element-wise min of each of the input tensors. All inputs and outputs must have the same shape and data type.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for Min

Outputs

  • min (heterogeneous) - T: Output tensor. Same dimension as inputs.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMin_8#

class mlprodict.npy.xop_auto_import_.OnnxMin_8(*args, **kwargs)#

Version

  • name: Min (GitHub)

  • domain: main

  • since_version: 8

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 8.

Summary

Element-wise min of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for min.

Outputs

  • min (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMish#

class mlprodict.npy.xop_auto_import_.OnnxMish(*args, **kwargs)#

Version

  • name: Mish (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Mish: A Self Regularized Non-Monotonic Neural Activation Function.

Perform the linear unit element-wise on the input tensor X using formula:

mish(x) = x * tanh(softplus(x)) = x * tanh(ln(1 + e^{x}))

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input X and output types to float tensors.

OnnxMish_18#

class mlprodict.npy.xop_auto_import_.OnnxMish_18(*args, **kwargs)#

Version

  • name: Mish (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Mish: A Self Regularized Non-Monotonic Neural Activation Function.

Perform the linear unit element-wise on the input tensor X using formula:

mish(x) = x * tanh(softplus(x)) = x * tanh(ln(1 + e^{x}))

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input X and output types to float tensors.

OnnxMod#

class mlprodict.npy.xop_auto_import_.OnnxMod(*args, **kwargs)#

Version

  • name: Mod (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Performs element-wise binary modulus (with Numpy-style broadcasting support).

The sign of the remainder is the same as that of the Divisor.

Mod operator can also behave like C fmod() or numpy.fmod. In this case, the sign of the remainder however, will be the same as the Dividend (in contrast to integer mod). To force a behavior like numpy.fmod() an ‘fmod’ Attribute is provided. This attribute is set to 0 by default causing the behavior to be like integer mod. Setting this attribute to 1 causes the remainder to be calculated similar to that of numpy.fmod().

If the input type is floating point, then fmod attribute must be set to 1.

In case of dividend being zero, the results will be platform dependent.

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Attributes

  • fmod: Whether the operator should behave like fmod (default=0 meaning it will do integer mods); Set this to 1 to force fmod treatment Default value is 0.

Inputs

  • A (heterogeneous) - T: Dividend tensor

  • B (heterogeneous) - T: Divisor tensor

Outputs

  • C (heterogeneous) - T: Remainder tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to high-precision numeric tensors.

OnnxMod_10#

class mlprodict.npy.xop_auto_import_.OnnxMod_10(*args, **kwargs)#

Version

  • name: Mod (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Performs element-wise binary modulus (with Numpy-style broadcasting support).

The sign of the remainder is the same as that of the Divisor.

Mod operator can also behave like C fmod() or numpy.fmod. In this case, the sign of the remainder however, will be the same as the Dividend (in contrast to integer mod). To force a behavior like numpy.fmod() an ‘fmod’ Attribute is provided. This attribute is set to 0 by default causing the behavior to be like integer mod. Setting this attribute to 1 causes the remainder to be calculated similar to that of numpy.fmod().

If the input type is floating point, then fmod attribute must be set to 1.

In case of dividend being zero, the results will be platform dependent.

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Attributes

  • fmod: Whether the operator should behave like fmod (default=0 meaning it will do integer mods); Set this to 1 to force fmod treatment Default value is 0.

Inputs

  • A (heterogeneous) - T: Dividend tensor

  • B (heterogeneous) - T: Divisor tensor

Outputs

  • C (heterogeneous) - T: Remainder tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to high-precision numeric tensors.

OnnxMod_13#

class mlprodict.npy.xop_auto_import_.OnnxMod_13(*args, **kwargs)#

Version

  • name: Mod (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Performs element-wise binary modulus (with Numpy-style broadcasting support).

The sign of the remainder is the same as that of the Divisor.

Mod operator can also behave like C fmod() or numpy.fmod. In this case, the sign of the remainder however, will be the same as the Dividend (in contrast to integer mod). To force a behavior like numpy.fmod() an ‘fmod’ Attribute is provided. This attribute is set to 0 by default causing the behavior to be like integer mod. Setting this attribute to 1 causes the remainder to be calculated similar to that of numpy.fmod().

If the input type is floating point, then fmod attribute must be set to 1.

In case of dividend being zero, the results will be platform dependent.

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Attributes

  • fmod: Whether the operator should behave like fmod (default=0 meaning it will do integer mods); Set this to 1 to force fmod treatment Default value is 0.

Inputs

  • A (heterogeneous) - T: Dividend tensor

  • B (heterogeneous) - T: Divisor tensor

Outputs

  • C (heterogeneous) - T: Remainder tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to high-precision numeric tensors.

OnnxMul#

class mlprodict.npy.xop_auto_import_.OnnxMul(*args, **kwargs)#

Version

  • name: Mul (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Performs element-wise binary multiplication (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

(Opset 14 change): Extend supported types to include uint8, int8, uint16, and int16.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxMul_1#

class mlprodict.npy.xop_auto_import_.OnnxMul_1(*args, **kwargs)#

Version

  • name: Mul (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Performs element-wise binary multiplication (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of element size 1 (including a scalar tensor and any tensor with rank equal to or smaller than the first tensor), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor shape(A) = (2, 3, 4, 5), shape(B) = (5,) shape(A) = (2, 3, 4, 5), shape(B) = (4, 5) shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1 shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

  • axis: If set, defines the broadcast dimensions. See doc for details.

  • broadcast: Pass 1 to enable broadcasting Default value is 0.

  • consumed_inputs: legacy optimization attribute.

Inputs

  • A (heterogeneous) - T: First operand, should share the type with the second operand.

  • B (heterogeneous) - T: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

  • C (heterogeneous) - T: Result, has same dimensions and type as A

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxMul_13#

class mlprodict.npy.xop_auto_import_.OnnxMul_13(*args, **kwargs)#

Version

  • name: Mul (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Performs element-wise binary multiplication (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxMul_14#

class mlprodict.npy.xop_auto_import_.OnnxMul_14(*args, **kwargs)#

Version

  • name: Mul (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Performs element-wise binary multiplication (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

(Opset 14 change): Extend supported types to include uint8, int8, uint16, and int16.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxMul_6#

class mlprodict.npy.xop_auto_import_.OnnxMul_6(*args, **kwargs)#

Version

  • name: Mul (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Performs element-wise binary multiplication (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of element size 1 (including a scalar tensor and any tensor with rank equal to or smaller than the first tensor), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor shape(A) = (2, 3, 4, 5), shape(B) = (5,) shape(A) = (2, 3, 4, 5), shape(B) = (4, 5) shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1 shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

  • axis: If set, defines the broadcast dimensions. See doc for details.

  • broadcast: Pass 1 to enable broadcasting Default value is 0.

Inputs

  • A (heterogeneous) - T: First operand, should share the type with the second operand.

  • B (heterogeneous) - T: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

  • C (heterogeneous) - T: Result, has same dimensions and type as A

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxMul_7#

class mlprodict.npy.xop_auto_import_.OnnxMul_7(*args, **kwargs)#

Version

  • name: Mul (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Performs element-wise binary multiplication (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxMultinomial#

class mlprodict.npy.xop_auto_import_.OnnxMultinomial(*args, **kwargs)#

Version

  • name: Multinomial (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Generate a tensor of samples from a multinomial distribution according to the probabilities of each of the possible outcomes.

Attributes

  • dtype: (Optional) The data type for the elements of the output tensor, if not specified, we will use int32. Default value is 6.

  • sample_size: Number of times to sample. Default value is 1.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

  • input (heterogeneous) - T1: Input tensor with shape [batch_size, class_size], where class_size is the number of all possible outcomes. Each value along the axis zero represents the unnormalized log-probability of each corresponding outcome in a batch.

Outputs

  • output (heterogeneous) - T2: Output tensor with shape [batch_size, sample_size], where sample_size is the number of times to sample. Each value along the axis zero represents the outcome of the corresponding sample in a batch.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain input types to float tensors.

  • T2 in ( tensor(int32), tensor(int64) ): Constrain output types to integral tensors.

OnnxMultinomial_7#

class mlprodict.npy.xop_auto_import_.OnnxMultinomial_7(*args, **kwargs)#

Version

  • name: Multinomial (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Generate a tensor of samples from a multinomial distribution according to the probabilities of each of the possible outcomes.

Attributes

  • dtype: (Optional) The data type for the elements of the output tensor, if not specified, we will use int32. Default value is 6.

  • sample_size: Number of times to sample. Default value is 1.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

  • input (heterogeneous) - T1: Input tensor with shape [batch_size, class_size], where class_size is the number of all possible outcomes. Each value along the axis zero represents the unnormalized log-probability of each corresponding outcome in a batch.

Outputs

  • output (heterogeneous) - T2: Output tensor with shape [batch_size, sample_size], where sample_size is the number of times to sample. Each value along the axis zero represents the outcome of the corresponding sample in a batch.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain input types to float tensors.

  • T2 in ( tensor(int32), tensor(int64) ): Constrain output types to integral tensors.

OnnxNeg#

class mlprodict.npy.xop_auto_import_.OnnxNeg(*args, **kwargs)#

Version

  • name: Neg (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Neg takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where each element flipped sign, y = -x, is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8) ): Constrain input and output types to signed numeric tensors.

OnnxNeg_1#

class mlprodict.npy.xop_auto_import_.OnnxNeg_1(*args, **kwargs)#

Version

  • name: Neg (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Neg takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where each element flipped sign, y = -x, is applied to the tensor elementwise.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxNeg_13#

class mlprodict.npy.xop_auto_import_.OnnxNeg_13(*args, **kwargs)#

Version

  • name: Neg (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Neg takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where each element flipped sign, y = -x, is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8) ): Constrain input and output types to signed numeric tensors.

OnnxNeg_6#

class mlprodict.npy.xop_auto_import_.OnnxNeg_6(*args, **kwargs)#

Version

  • name: Neg (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Neg takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where each element flipped sign, y = -x, is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8) ): Constrain input and output types to signed numeric tensors.

OnnxNegativeLogLikelihoodLoss#

class mlprodict.npy.xop_auto_import_.OnnxNegativeLogLikelihoodLoss(*args, **kwargs)#

Version

This version of the operator has been available since version 13.

Summary

A NegativeLogLikelihoodLoss operator computes (weighted) negative log likelihood loss. Its “input” tensor has the shape of (N, C, d1, d2, …, dk) where k >= 0. The “input” tensor contains log-probabilities for input[n, :, d_1, d_2,…, d_k] being in a class of [0, C). The operator’s “target” input tensor has the shape of (N, d1, d2, …, dk). It encodes class labels (one of C classes) or it may contain a special value (indicated by an attribute ignore_index) for N x d1 x d2 x … x dk samples. The loss value for input[n, :, d_1, d_2,…d_k] being classified as class c = target[n][d_1][d_2]…[d_k] is computed as:

loss[n][d_1][d_2]…[d_k] = -input[n][c][d_1][d_2]…[d_k].

When an optional “weight” is provided, the sample loss is calculated as:

loss[n][d_1][d_2]…[d_k] = -input[n][c][d_1][d_2]…[d_k] * weight[c].

loss is zero for the case when target-value equals ignore_index.

loss[n][d_1][d_2]…[d_k] = 0, when target[n][d_1][d_2]…[d_k] = ignore_index

If “reduction” attribute is set to “none”, the operator’s output will be the above loss with shape (N, d1, d2, …, dk). If “reduction” attribute is set to “mean” (the default attribute value), the output loss is (weight) averaged:

mean(loss), if “weight” is not provided,

or if weight is provided,

sum(loss) / sum(weight[target[n][d_1][d_2]…[d_k]]]), for all samples.

If “reduction” attribute is set to “sum”, the output is a scalar:

sum(loss).

See also https://pytorch.org/docs/stable/nn.html#torch.nn.NLLLoss.

Example 1:

// negative log likelihood loss, “none” reduction N, C, d1 = 2, 3, 2 input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],

[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]

target = [[2, 1], [0, 2]]

loss = np.zeros((N, d1)) for n in range(N):

for d_1 in range(d1):

c = target[n][d_1] loss[n][d_1] = -input[n][c][d_1]

// print(loss) // [[-3. -2.] // [-0. -2.]]

Example 2:

// weighted negative log likelihood loss, sum reduction N, C, d1 = 2, 3, 2 input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],

[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]

target = [[2, 1], [0, 2]] weight = [0.2, 0.3, 0.1] loss = np.zeros((N, d1)) for n in range(N):

for d_1 in range(d1):

c = target[n][d_1] loss[n][d_1] = -input[n][c][d_1] * weight[c]

loss = np.sum(loss) // print(loss) // -1.1

Example 3:

// weighted negative log likelihood loss, mean reduction N, C, d1 = 2, 3, 2 input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],

[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]

target = [[2, 1], [0, 2]] weight = [0.2, 0.3, 0.1] loss = np.zeros((N, d1)) weight_total = 0 for n in range(N):

for d_1 in range(d1):

c = target[n][d_1] loss[n][d_1] = -input[n][c][d_1] * weight[c] weight_total = weight_total + weight[c]

loss = np.sum(loss) / weight_total // print(loss) // -1.57

Attributes

  • ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.

  • reduction: Type of reduction to apply to loss: none, sum, mean (default). ‘none’: the output is the loss for each sample. ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the sum of applied weights. Default value is 'mean'.

Inputs

Between 2 and 3 inputs.

  • input (heterogeneous) - T: Input tensor of shape (N, C) or (N, C, d1, d2, …, dk).

  • target (heterogeneous) - Tind: Target tensor of shape (N) or (N, d1, d2, …, dk). Target element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the target values should either be in the range [0, C) or have the value ignore_index.

  • weight (optional, heterogeneous) - T: Optional rescaling weight tensor. If given, it has to be a tensor of size C. Otherwise, it is treated as if having all ones.

Outputs

  • loss (heterogeneous) - T: The negative log likelihood loss

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input, weight, and output types to floating-point tensors.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types

OnnxNegativeLogLikelihoodLoss_12#

class mlprodict.npy.xop_auto_import_.OnnxNegativeLogLikelihoodLoss_12(*args, **kwargs)#

Version

This version of the operator has been available since version 12.

Summary

A NegativeLogLikelihoodLoss operator computes (weighted) negative log likelihood loss. Its “input” tensor has the shape of (N, C, d1, d2, …, dk) where k >= 0. The “input” tensor contains log-probabilities for input[n, :, d_1, d_2,…, d_k] being in a class of [0, C). The operator’s “target” input tensor has the shape of (N, d1, d2, …, dk). It encodes class labels (one of C classes) or it may contain a special value (indicated by an attribute ignore_index) for N x d1 x d2 x … x dk samples. The loss value for input[n, :, d_1, d_2,…d_k] being classified as class c = target[n][d_1][d_2]…[d_k] is computed as:

loss[n][d_1][d_2]…[d_k] = -input[n][c][d_1][d_2]…[d_k].

When an optional “weight” is provided, the sample loss is calculated as:

loss[n][d_1][d_2]…[d_k] = -input[n][c][d_1][d_2]…[d_k] * weight[c].

loss is zero for the case when target-value equals ignore_index.

loss[n][d_1][d_2]…[d_k] = 0, when target[n][d_1][d_2]…[d_k] = ignore_index

If “reduction” attribute is set to “none”, the operator’s output will be the above loss with shape (N, d1, d2, …, dk). If “reduction” attribute is set to “mean” (the default attribute value), the output loss is (weight) averaged:

mean(loss), if “weight” is not provided,

or if weight is provided,

sum(loss) / sum(weight[target[n][d_1][d_2]…[d_k]]]), for all samples.

If “reduction” attribute is set to “sum”, the output is a scalar:

sum(loss).

See also https://pytorch.org/docs/stable/nn.html#torch.nn.NLLLoss. Example 1:

// negative log likelihood loss, “none” reduction N, C, d1 = 2, 3, 2 input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],

[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]

target = [[2, 1], [0, 2]] loss = np.zeros((N, d1)) for n in range(N):

for d_1 in range(d1):

c = target[n][d_1] loss[n][d_1] = -input[n][c][d_1]

// print(loss) // [[-3. -2.] // [-0. -2.]]

Example 2:

// weighted negative log likelihood loss, sum reduction N, C, d1 = 2, 3, 2 input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],

[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]

target = [[2, 1], [0, 2]] weight = [0.2, 0.3, 0.1] loss = np.zeros((N, d1)) for n in range(N):

for d_1 in range(d1):

c = target[n][d_1] loss[n][d_1] = -input[n][c][d_1] * weight[c]

loss = np.sum(loss) // print(loss) // -1.1

Example 3:

// weighted negative log likelihood loss, mean reduction N, C, d1 = 2, 3, 2 input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],

[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]

target = [[2, 1], [0, 2]] weight = [0.2, 0.3, 0.1] loss = np.zeros((N, d1)) weight_total = 0 for n in range(N):

for d_1 in range(d1):

c = target[n][d_1] loss[n][d_1] = -input[n][c][d_1] * weight[c] weight_total = weight_total + weight[c]

loss = np.sum(loss) / weight_total // print(loss) // -1.57

Attributes

  • ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.

  • reduction: Type of reduction to apply to loss: none, sum, mean (default). ‘none’: the output is the loss for each sample. ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the sum of applied weights. Default value is 'mean'.

Inputs

Between 2 and 3 inputs.

  • input (heterogeneous) - T: Input tensor of shape (N, C) or (N, C, d1, d2, …, dk).

  • target (heterogeneous) - Tind: Target tensor of shape (N) or (N, d1, d2, …, dk). Target element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the target values should either be in the range [0, C) or have the value ignore_index.

  • weight (optional, heterogeneous) - T: Optional rescaling weight tensor. If given, it has to be a tensor of size C. Otherwise, it is treated as if having all ones.

Outputs

  • loss (heterogeneous) - T: The negative log likelihood loss

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input, weight, and output types to floating-point tensors.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types

OnnxNegativeLogLikelihoodLoss_13#

class mlprodict.npy.xop_auto_import_.OnnxNegativeLogLikelihoodLoss_13(*args, **kwargs)#

Version

This version of the operator has been available since version 13.

Summary

A NegativeLogLikelihoodLoss operator computes (weighted) negative log likelihood loss. Its “input” tensor has the shape of (N, C, d1, d2, …, dk) where k >= 0. The “input” tensor contains log-probabilities for input[n, :, d_1, d_2,…, d_k] being in a class of [0, C). The operator’s “target” input tensor has the shape of (N, d1, d2, …, dk). It encodes class labels (one of C classes) or it may contain a special value (indicated by an attribute ignore_index) for N x d1 x d2 x … x dk samples. The loss value for input[n, :, d_1, d_2,…d_k] being classified as class c = target[n][d_1][d_2]…[d_k] is computed as:

loss[n][d_1][d_2]…[d_k] = -input[n][c][d_1][d_2]…[d_k].

When an optional “weight” is provided, the sample loss is calculated as:

loss[n][d_1][d_2]…[d_k] = -input[n][c][d_1][d_2]…[d_k] * weight[c].

loss is zero for the case when target-value equals ignore_index.

loss[n][d_1][d_2]…[d_k] = 0, when target[n][d_1][d_2]…[d_k] = ignore_index

If “reduction” attribute is set to “none”, the operator’s output will be the above loss with shape (N, d1, d2, …, dk). If “reduction” attribute is set to “mean” (the default attribute value), the output loss is (weight) averaged:

mean(loss), if “weight” is not provided,

or if weight is provided,

sum(loss) / sum(weight[target[n][d_1][d_2]…[d_k]]]), for all samples.

If “reduction” attribute is set to “sum”, the output is a scalar:

sum(loss).

See also https://pytorch.org/docs/stable/nn.html#torch.nn.NLLLoss.

Example 1:

// negative log likelihood loss, “none” reduction N, C, d1 = 2, 3, 2 input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],

[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]

target = [[2, 1], [0, 2]]

loss = np.zeros((N, d1)) for n in range(N):

for d_1 in range(d1):

c = target[n][d_1] loss[n][d_1] = -input[n][c][d_1]

// print(loss) // [[-3. -2.] // [-0. -2.]]

Example 2:

// weighted negative log likelihood loss, sum reduction N, C, d1 = 2, 3, 2 input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],

[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]

target = [[2, 1], [0, 2]] weight = [0.2, 0.3, 0.1] loss = np.zeros((N, d1)) for n in range(N):

for d_1 in range(d1):

c = target[n][d_1] loss[n][d_1] = -input[n][c][d_1] * weight[c]

loss = np.sum(loss) // print(loss) // -1.1

Example 3:

// weighted negative log likelihood loss, mean reduction N, C, d1 = 2, 3, 2 input = [[[1.0, 2.0], [2.0, 2.0], [3.0, 2.0]],

[[0.0, 1.0], [2.0, 2.0], [1.0, 2]]]

target = [[2, 1], [0, 2]] weight = [0.2, 0.3, 0.1] loss = np.zeros((N, d1)) weight_total = 0 for n in range(N):

for d_1 in range(d1):

c = target[n][d_1] loss[n][d_1] = -input[n][c][d_1] * weight[c] weight_total = weight_total + weight[c]

loss = np.sum(loss) / weight_total // print(loss) // -1.57

Attributes

  • ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.

  • reduction: Type of reduction to apply to loss: none, sum, mean (default). ‘none’: the output is the loss for each sample. ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the sum of applied weights. Default value is 'mean'.

Inputs

Between 2 and 3 inputs.

  • input (heterogeneous) - T: Input tensor of shape (N, C) or (N, C, d1, d2, …, dk).

  • target (heterogeneous) - Tind: Target tensor of shape (N) or (N, d1, d2, …, dk). Target element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the target values should either be in the range [0, C) or have the value ignore_index.

  • weight (optional, heterogeneous) - T: Optional rescaling weight tensor. If given, it has to be a tensor of size C. Otherwise, it is treated as if having all ones.

Outputs

  • loss (heterogeneous) - T: The negative log likelihood loss

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input, weight, and output types to floating-point tensors.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types

OnnxNonMaxSuppression#

class mlprodict.npy.xop_auto_import_.OnnxNonMaxSuppression(*args, **kwargs)#

Version

  • name: NonMaxSuppression (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Filter out boxes that have high intersection-over-union (IOU) overlap with previously selected boxes. Bounding boxes with score less than score_threshold are removed. Bounding box format is indicated by attribute center_point_box. Note that this algorithm is agnostic to where the origin is in the coordinate system and more generally is invariant to orthogonal transformations and translations of the coordinate system; thus translating or reflections of the coordinate system result in the same boxes being selected by the algorithm. The selected_indices output is a set of integers indexing into the input collection of bounding boxes representing the selected boxes. The bounding box coordinates corresponding to the selected indices can then be obtained using the Gather or GatherND operation.

Attributes

  • center_point_box: Integer indicate the format of the box data. The default is 0. 0 - the box data is supplied as [y1, x1, y2, x2] where (y1, x1) and (y2, x2) are the coordinates of any diagonal pair of box corners and the coordinates can be provided as normalized (i.e., lying in the interval [0, 1]) or absolute. Mostly used for TF models. 1 - the box data is supplied as [x_center, y_center, width, height]. Mostly used for Pytorch models. Default value is 0.

Inputs

Between 2 and 5 inputs.

  • boxes (heterogeneous) - tensor(float): An input tensor with shape [num_batches, spatial_dimension, 4]. The single box data format is indicated by center_point_box.

  • scores (heterogeneous) - tensor(float): An input tensor with shape [num_batches, num_classes, spatial_dimension]

  • max_output_boxes_per_class (optional, heterogeneous) - tensor(int64): Integer representing the maximum number of boxes to be selected per batch per class. It is a scalar. Default to 0, which means no output.

  • iou_threshold (optional, heterogeneous) - tensor(float): Float representing the threshold for deciding whether boxes overlap too much with respect to IOU. It is scalar. Value range [0, 1]. Default to 0.

  • score_threshold (optional, heterogeneous) - tensor(float): Float representing the threshold for deciding when to remove boxes based on score. It is a scalar.

Outputs

  • selected_indices (heterogeneous) - tensor(int64): selected indices from the boxes tensor. [num_selected_indices, 3], the selected index format is [batch_index, class_index, box_index].

OnnxNonMaxSuppression_10#

class mlprodict.npy.xop_auto_import_.OnnxNonMaxSuppression_10(*args, **kwargs)#

Version

  • name: NonMaxSuppression (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Filter out boxes that have high intersection-over-union (IOU) overlap with previously selected boxes. Bounding boxes with score less than score_threshold are removed. Bounding box format is indicated by attribute center_point_box. Note that this algorithm is agnostic to where the origin is in the coordinate system and more generally is invariant to orthogonal transformations and translations of the coordinate system; thus translating or reflections of the coordinate system result in the same boxes being selected by the algorithm. The selected_indices output is a set of integers indexing into the input collection of bounding boxes representing the selected boxes. The bounding box coordinates corresponding to the selected indices can then be obtained using the Gather or GatherND operation.

Attributes

  • center_point_box: Integer indicate the format of the box data. The default is 0. 0 - the box data is supplied as [y1, x1, y2, x2] where (y1, x1) and (y2, x2) are the coordinates of any diagonal pair of box corners and the coordinates can be provided as normalized (i.e., lying in the interval [0, 1]) or absolute. Mostly used for TF models. 1 - the box data is supplied as [x_center, y_center, width, height]. Mostly used for Pytorch models. Default value is 0.

Inputs

Between 2 and 5 inputs.

  • boxes (heterogeneous) - tensor(float): An input tensor with shape [num_batches, spatial_dimension, 4]. The single box data format is indicated by center_point_box.

  • scores (heterogeneous) - tensor(float): An input tensor with shape [num_batches, num_classes, spatial_dimension]

  • max_output_boxes_per_class (optional, heterogeneous) - tensor(int64): Integer representing the maximum number of boxes to be selected per batch per class. It is a scalar. Default to 0, which means no output.

  • iou_threshold (optional, heterogeneous) - tensor(float): Float representing the threshold for deciding whether boxes overlap too much with respect to IOU. It is scalar. Value range [0, 1]. Default to 0.

  • score_threshold (optional, heterogeneous) - tensor(float): Float representing the threshold for deciding when to remove boxes based on score. It is a scalar.

Outputs

  • selected_indices (heterogeneous) - tensor(int64): selected indices from the boxes tensor. [num_selected_indices, 3], the selected index format is [batch_index, class_index, box_index].

OnnxNonMaxSuppression_11#

class mlprodict.npy.xop_auto_import_.OnnxNonMaxSuppression_11(*args, **kwargs)#

Version

  • name: NonMaxSuppression (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Filter out boxes that have high intersection-over-union (IOU) overlap with previously selected boxes. Bounding boxes with score less than score_threshold are removed. Bounding box format is indicated by attribute center_point_box. Note that this algorithm is agnostic to where the origin is in the coordinate system and more generally is invariant to orthogonal transformations and translations of the coordinate system; thus translating or reflections of the coordinate system result in the same boxes being selected by the algorithm. The selected_indices output is a set of integers indexing into the input collection of bounding boxes representing the selected boxes. The bounding box coordinates corresponding to the selected indices can then be obtained using the Gather or GatherND operation.

Attributes

  • center_point_box: Integer indicate the format of the box data. The default is 0. 0 - the box data is supplied as [y1, x1, y2, x2] where (y1, x1) and (y2, x2) are the coordinates of any diagonal pair of box corners and the coordinates can be provided as normalized (i.e., lying in the interval [0, 1]) or absolute. Mostly used for TF models. 1 - the box data is supplied as [x_center, y_center, width, height]. Mostly used for Pytorch models. Default value is 0.

Inputs

Between 2 and 5 inputs.

  • boxes (heterogeneous) - tensor(float): An input tensor with shape [num_batches, spatial_dimension, 4]. The single box data format is indicated by center_point_box.

  • scores (heterogeneous) - tensor(float): An input tensor with shape [num_batches, num_classes, spatial_dimension]

  • max_output_boxes_per_class (optional, heterogeneous) - tensor(int64): Integer representing the maximum number of boxes to be selected per batch per class. It is a scalar. Default to 0, which means no output.

  • iou_threshold (optional, heterogeneous) - tensor(float): Float representing the threshold for deciding whether boxes overlap too much with respect to IOU. It is scalar. Value range [0, 1]. Default to 0.

  • score_threshold (optional, heterogeneous) - tensor(float): Float representing the threshold for deciding when to remove boxes based on score. It is a scalar.

Outputs

  • selected_indices (heterogeneous) - tensor(int64): selected indices from the boxes tensor. [num_selected_indices, 3], the selected index format is [batch_index, class_index, box_index].

OnnxNonZero#

class mlprodict.npy.xop_auto_import_.OnnxNonZero(*args, **kwargs)#

Version

  • name: NonZero (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Returns the indices of the elements that are non-zero (in row-major order - by dimension). NonZero behaves similar to numpy.nonzero: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html, but for scalar input, NonZero produces output shape (0, N) instead of (1, N), which is different from Numpy’s behavior.

Inputs

  • X (heterogeneous) - T: input

Outputs

  • Y (heterogeneous) - tensor(int64): output

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to all tensor types.

OnnxNonZero_13#

class mlprodict.npy.xop_auto_import_.OnnxNonZero_13(*args, **kwargs)#

Version

  • name: NonZero (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Returns the indices of the elements that are non-zero (in row-major order - by dimension). NonZero behaves similar to numpy.nonzero: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html, but for scalar input, NonZero produces output shape (0, N) instead of (1, N), which is different from Numpy’s behavior.

Inputs

  • X (heterogeneous) - T: input

Outputs

  • Y (heterogeneous) - tensor(int64): output

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to all tensor types.

OnnxNonZero_9#

class mlprodict.npy.xop_auto_import_.OnnxNonZero_9(*args, **kwargs)#

Version

  • name: NonZero (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Returns the indices of the elements that are non-zero (in row-major order - by dimension). NonZero behaves similar to numpy.nonzero: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html, but for scalar input, NonZero produces output shape (0, N) instead of (1, N), which is different from Numpy’s behavior.

Inputs

  • X (heterogeneous) - T: input

Outputs

  • Y (heterogeneous) - tensor(int64): output

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to all tensor types.

OnnxNot#

class mlprodict.npy.xop_auto_import_.OnnxNot(*args, **kwargs)#

Version

  • name: Not (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Returns the negation of the input tensor element-wise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bool) ): Constrain input/output to boolean tensors.

OnnxNot_1#

class mlprodict.npy.xop_auto_import_.OnnxNot_1(*args, **kwargs)#

Version

  • name: Not (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Returns the negation of the input tensor element-wise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bool) ): Constrain input/output to boolean tensors.

OnnxOneHot#

class mlprodict.npy.xop_auto_import_.OnnxOneHot(*args, **kwargs)#

Version

  • name: OneHot (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Produces a one-hot tensor based on inputs. The locations represented by the index values in the ‘indices’ input tensor will have ‘on_value’ and the other locations will have ‘off_value’ in the output tensor, where ‘on_value’ and ‘off_value’ are specified as part of required input argument ‘values’, which is a two-element tensor of format [off_value, on_value]. The rank of the output tensor will be one greater than the rank of the input tensor. The additional dimension is for one-hot representation. The additional dimension will be inserted at the position specified by ‘axis’. If ‘axis’ is not specified then then additional dimension will be inserted as the innermost dimension, i.e. axis=-1. The size of the additional dimension is specified by required scalar input ‘depth’. The type of the output tensor is the same as the type of the ‘values’ input. Any entries in the ‘indices’ input tensor with values outside the range [-depth, depth-1] will result in one-hot representation with all ‘off_value’ values in the output tensor.

when axis = 0: output[input[i, j, k], i, j, k] = 1 for all i, j, k and 0 otherwise.

when axis = -1: output[i, j, k, input[i, j, k]] = 1 for all i, j, k and 0 otherwise.

Attributes

  • axis: (Optional) Axis along which one-hot representation in added. Default: axis=-1. axis=-1 means that the additional dimension will be inserted as the innermost/last dimension in the output tensor. Negative value means counting dimensions from the back. Accepted range is [-r-1, r] where r = rank(indices). Default value is -1.

Inputs

  • indices (heterogeneous) - T1: Input tensor containing indices. Any entries in the ‘indices’ input tensor with values outside the range [-depth, depth-1] will result in one-hot representation with all ‘off_value’ values in the output tensor.In case ‘indices’ is of non-integer type, the values will be casted to int64 before use.

  • depth (heterogeneous) - T2: Scalar specifying the number of classes in one-hot tensor. This is also the size of the one-hot dimension (specified by ‘axis’ attribute) added on in the output tensor. The values in the ‘indices’ input tensor are expected to be in the range [-depth, depth-1]. In case ‘depth’ is of non-integer type, it will be casted to int64 before use.

  • values (heterogeneous) - T3: Rank 1 tensor containing exactly two elements, in the format [off_value, on_value], where ‘on_value’ is the value used for filling locations specified in ‘indices’ input tensor, and ‘off_value’ is the value used for filling locations other than those specified in ‘indices’ input tensor.

Outputs

  • output (heterogeneous) - T3: Tensor of rank one greater than input tensor ‘indices’, i.e. rank(output) = rank(indices) + 1. The data type for the elements of the output tensor is the same as the type of input ‘values’ is used.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to only numeric types.

  • T2 in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to only numeric types.

  • T3 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor type.

OnnxOneHot_11#

class mlprodict.npy.xop_auto_import_.OnnxOneHot_11(*args, **kwargs)#

Version

  • name: OneHot (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Produces a one-hot tensor based on inputs. The locations represented by the index values in the ‘indices’ input tensor will have ‘on_value’ and the other locations will have ‘off_value’ in the output tensor, where ‘on_value’ and ‘off_value’ are specified as part of required input argument ‘values’, which is a two-element tensor of format [off_value, on_value]. The rank of the output tensor will be one greater than the rank of the input tensor. The additional dimension is for one-hot representation. The additional dimension will be inserted at the position specified by ‘axis’. If ‘axis’ is not specified then then additional dimension will be inserted as the innermost dimension, i.e. axis=-1. The size of the additional dimension is specified by required scalar input ‘depth’. The type of the output tensor is the same as the type of the ‘values’ input. Any entries in the ‘indices’ input tensor with values outside the range [-depth, depth-1] will result in one-hot representation with all ‘off_value’ values in the output tensor.

when axis = 0: output[input[i, j, k], i, j, k] = 1 for all i, j, k and 0 otherwise.

when axis = -1: output[i, j, k, input[i, j, k]] = 1 for all i, j, k and 0 otherwise.

Attributes

  • axis: (Optional) Axis along which one-hot representation in added. Default: axis=-1. axis=-1 means that the additional dimension will be inserted as the innermost/last dimension in the output tensor. Negative value means counting dimensions from the back. Accepted range is [-r-1, r] where r = rank(indices). Default value is -1.

Inputs

  • indices (heterogeneous) - T1: Input tensor containing indices. Any entries in the ‘indices’ input tensor with values outside the range [-depth, depth-1] will result in one-hot representation with all ‘off_value’ values in the output tensor.In case ‘indices’ is of non-integer type, the values will be casted to int64 before use.

  • depth (heterogeneous) - T2: Scalar specifying the number of classes in one-hot tensor. This is also the size of the one-hot dimension (specified by ‘axis’ attribute) added on in the output tensor. The values in the ‘indices’ input tensor are expected to be in the range [-depth, depth-1]. In case ‘depth’ is of non-integer type, it will be casted to int64 before use.

  • values (heterogeneous) - T3: Rank 1 tensor containing exactly two elements, in the format [off_value, on_value], where ‘on_value’ is the value used for filling locations specified in ‘indices’ input tensor, and ‘off_value’ is the value used for filling locations other than those specified in ‘indices’ input tensor.

Outputs

  • output (heterogeneous) - T3: Tensor of rank one greater than input tensor ‘indices’, i.e. rank(output) = rank(indices) + 1. The data type for the elements of the output tensor is the same as the type of input ‘values’ is used.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to only numeric types.

  • T2 in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to only numeric types.

  • T3 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor type.

OnnxOneHot_9#

class mlprodict.npy.xop_auto_import_.OnnxOneHot_9(*args, **kwargs)#

Version

  • name: OneHot (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Produces a one-hot tensor based on inputs. The locations represented by the index values in the ‘indices’ input tensor will have ‘on_value’ and the other locations will have ‘off_value’ in the output tensor, where ‘on_value’ and ‘off_value’ are specified as part of required input argument ‘values’, which is a two-element tensor of format [off_value, on_value]. The rank of the output tensor will be one greater than the rank of the input tensor. The additional dimension is for one-hot representation. The additional dimension will be inserted at the position specified by ‘axis’. If ‘axis’ is not specified then then additional dimension will be inserted as the innermost dimension, i.e. axis=-1. The size of the additional dimension is specified by required scalar input ‘depth’. The type of the output tensor is the same as the type of the ‘values’ input. Any entries in the ‘indices’ input tensor with values outside the range [0, depth) will result in one-hot representation with all ‘off_value’ values in the output tensor.

Attributes

  • axis: (Optional) Axis along which one-hot representation in added. Default: axis=-1. axis=-1 means that the additional dimension will be inserted as the innermost/last dimension in the output tensor. Default value is -1.

Inputs

  • indices (heterogeneous) - T1: Input tensor containing indices. The values must be non-negative integers. Any entries in the ‘indices’ input tensor with values outside the range [0, depth) will result in one-hot representation with all ‘off_value’ values in the output tensor.In case ‘indices’ is of non-integer type, the values will be casted to int64 before use.

  • depth (heterogeneous) - T2: Scalar specifying the number of classes in one-hot tensor. This is also the size of the one-hot dimension (specified by ‘axis’ attribute) added on in the output tensor. The values in the ‘indices’ input tensor are expected to be in the range [0, depth). In case ‘depth’ is of non-integer type, it will be casted to int64 before use.

  • values (heterogeneous) - T3: Rank 1 tensor containing exactly two elements, in the format [off_value, on_value], where ‘on_value’ is the value used for filling locations specified in ‘indices’ input tensor, and ‘off_value’ is the value used for filling locations other than those specified in ‘indices’ input tensor.

Outputs

  • output (heterogeneous) - T3: Tensor of rank one greater than input tensor ‘indices’, i.e. rank(output) = rank(indices) + 1. The data type for the elements of the output tensor is the same as the type of input ‘values’ is used.

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to only numeric types.

  • T2 in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to only numeric types.

  • T3 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor type.

OnnxOptional#

class mlprodict.npy.xop_auto_import_.OnnxOptional(*args, **kwargs)#

Version

  • name: Optional (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Constructs an optional-type value containing either an empty optional of a certain type specified by the attribute, or a non-empty value containing the input element.

Attributes

  • type: Type of the element in the optional output

Inputs

Between 0 and 1 inputs.

  • input (optional, heterogeneous) - V: The input element.

Outputs

  • output (heterogeneous) - O: The optional output enclosing the input element.

Type Constraints

  • V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input type to all tensor and sequence types.

  • O in ( optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)) ): Constrain output type to all optional tensor or optional sequence types.

OnnxOptionalGetElement#

class mlprodict.npy.xop_auto_import_.OnnxOptionalGetElement(*args, **kwargs)#

Version

  • name: OptionalGetElement (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

If the input is a tensor or sequence type, it returns the input. If the input is an optional type, it outputs the element in the input. It is an error if the input is an empty optional-type (i.e. does not have an element) and the behavior is undefined in this case.

Inputs

  • input (heterogeneous) - O: The optional input.

Outputs

  • output (heterogeneous) - V: Output element in the optional input.

Type Constraints

  • O in ( optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input type to optional tensor and optional sequence types.

  • V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output type to all tensor or sequence types.

OnnxOptionalGetElement_15#

class mlprodict.npy.xop_auto_import_.OnnxOptionalGetElement_15(*args, **kwargs)#

Version

  • name: OptionalGetElement (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Outputs the element in the optional-type input. It is an error if the input value does not have an element and the behavior is undefined in this case.

Inputs

  • input (heterogeneous) - O: The optional input.

Outputs

  • output (heterogeneous) - V: Output element in the optional input.

Type Constraints

  • O in ( optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)) ): Constrain input type to optional tensor and optional sequence types.

  • V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output type to all tensor or sequence types.

OnnxOptionalGetElement_18#

class mlprodict.npy.xop_auto_import_.OnnxOptionalGetElement_18(*args, **kwargs)#

Version

  • name: OptionalGetElement (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

If the input is a tensor or sequence type, it returns the input. If the input is an optional type, it outputs the element in the input. It is an error if the input is an empty optional-type (i.e. does not have an element) and the behavior is undefined in this case.

Inputs

  • input (heterogeneous) - O: The optional input.

Outputs

  • output (heterogeneous) - V: Output element in the optional input.

Type Constraints

  • O in ( optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input type to optional tensor and optional sequence types.

  • V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain output type to all tensor or sequence types.

OnnxOptionalHasElement#

class mlprodict.npy.xop_auto_import_.OnnxOptionalHasElement(*args, **kwargs)#

Version

  • name: OptionalHasElement (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Returns true if (1) the input is an optional-type and contains an element, or, (2) the input is a tensor or sequence type. If the input is not provided or is an empty optional-type, this op returns false.

Inputs

Between 0 and 1 inputs.

  • input (optional, heterogeneous) - O: The optional input.

Outputs

  • output (heterogeneous) - B: A scalar boolean tensor. If true, it indicates that optional-type input contains an element. Otherwise, it is empty.

Type Constraints

  • O in ( optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input type to optional tensor and optional sequence types.

  • B in ( tensor(bool) ): Constrain output to a boolean tensor.

OnnxOptionalHasElement_15#

class mlprodict.npy.xop_auto_import_.OnnxOptionalHasElement_15(*args, **kwargs)#

Version

  • name: OptionalHasElement (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Returns true if the optional-type input contains an element. If it is an empty optional-type, this op returns false.

Inputs

  • input (heterogeneous) - O: The optional input.

Outputs

  • output (heterogeneous) - B: A scalar boolean tensor. If true, it indicates that optional-type input contains an element. Otherwise, it is empty.

Type Constraints

  • O in ( optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)) ): Constrain input type to optional tensor and optional sequence types.

  • B in ( tensor(bool) ): Constrain output to a boolean tensor.

OnnxOptionalHasElement_18#

class mlprodict.npy.xop_auto_import_.OnnxOptionalHasElement_18(*args, **kwargs)#

Version

  • name: OptionalHasElement (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Returns true if (1) the input is an optional-type and contains an element, or, (2) the input is a tensor or sequence type. If the input is not provided or is an empty optional-type, this op returns false.

Inputs

Between 0 and 1 inputs.

  • input (optional, heterogeneous) - O: The optional input.

Outputs

  • output (heterogeneous) - B: A scalar boolean tensor. If true, it indicates that optional-type input contains an element. Otherwise, it is empty.

Type Constraints

  • O in ( optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)), seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input type to optional tensor and optional sequence types.

  • B in ( tensor(bool) ): Constrain output to a boolean tensor.

OnnxOptional_15#

class mlprodict.npy.xop_auto_import_.OnnxOptional_15(*args, **kwargs)#

Version

  • name: Optional (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Constructs an optional-type value containing either an empty optional of a certain type specified by the attribute, or a non-empty value containing the input element.

Attributes

  • type: Type of the element in the optional output

Inputs

Between 0 and 1 inputs.

  • input (optional, heterogeneous) - V: The input element.

Outputs

  • output (heterogeneous) - O: The optional output enclosing the input element.

Type Constraints

  • V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input type to all tensor and sequence types.

  • O in ( optional(seq(tensor(bool))), optional(seq(tensor(complex128))), optional(seq(tensor(complex64))), optional(seq(tensor(double))), optional(seq(tensor(float))), optional(seq(tensor(float16))), optional(seq(tensor(int16))), optional(seq(tensor(int32))), optional(seq(tensor(int64))), optional(seq(tensor(int8))), optional(seq(tensor(string))), optional(seq(tensor(uint16))), optional(seq(tensor(uint32))), optional(seq(tensor(uint64))), optional(seq(tensor(uint8))), optional(tensor(bool)), optional(tensor(complex128)), optional(tensor(complex64)), optional(tensor(double)), optional(tensor(float)), optional(tensor(float16)), optional(tensor(int16)), optional(tensor(int32)), optional(tensor(int64)), optional(tensor(int8)), optional(tensor(string)), optional(tensor(uint16)), optional(tensor(uint32)), optional(tensor(uint64)), optional(tensor(uint8)) ): Constrain output type to all optional tensor or optional sequence types.

OnnxOr#

class mlprodict.npy.xop_auto_import_.OnnxOr(*args, **kwargs)#

Version

  • name: Or (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Returns the tensor resulted from performing the or logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool) ): Constrain input to boolean tensor.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxOr_1#

class mlprodict.npy.xop_auto_import_.OnnxOr_1(*args, **kwargs)#

Version

  • name: Or (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Returns the tensor resulted from performing the or logical operation elementwise on the input tensors A and B.

If broadcasting is enabled, the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. See the doc of Add for a detailed description of the broadcasting rules.

Attributes

  • axis: If set, defines the broadcast dimensions.

  • broadcast: Enable broadcasting Default value is 0.

Inputs

  • A (heterogeneous) - T: Left input tensor for the logical operator.

  • B (heterogeneous) - T: Right input tensor for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool) ): Constrain input to boolean tensor.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxOr_7#

class mlprodict.npy.xop_auto_import_.OnnxOr_7(*args, **kwargs)#

Version

  • name: Or (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Returns the tensor resulted from performing the or logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool) ): Constrain input to boolean tensor.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxOrgPytorchAtenATen#

class mlprodict.npy.xop_auto_import_.OnnxOrgPytorchAtenATen(*args, **kwargs)#

Version

  • name: ATen (GitHub)

  • domain: org.pytorch.aten

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain org.pytorch.aten.

Summary

ATen

Attributes

  • operator (required): Name of ATen operator. Default value is ?.

  • overload_name: Overload name of ATen operator. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic) - T: ATen Op inputs.

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T: ATen Op outputs.

OnnxOrgPytorchAtenATen_1#

class mlprodict.npy.xop_auto_import_.OnnxOrgPytorchAtenATen_1(*args, **kwargs)#

Version

  • name: ATen (GitHub)

  • domain: org.pytorch.aten

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain org.pytorch.aten.

Summary

ATen

Attributes

  • operator (required): Name of ATen operator. Default value is ?.

  • overload_name: Overload name of ATen operator. Default value is ?.

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic) - T: ATen Op inputs.

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic) - T: ATen Op outputs.

OnnxPRelu#

class mlprodict.npy.xop_auto_import_.OnnxPRelu(*args, **kwargs)#

Version

  • name: PRelu (GitHub)

  • domain: main

  • since_version: 16

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

PRelu takes input data (Tensor<T>) and slope tensor as input, and produces one output data (Tensor<T>) where the function f(x) = slope * x for x < 0, f(x) = x for x >= 0., is applied to the data tensor elementwise.

History - Version 16 adds bfloat16 to the types allowed. This operator supports unidirectional broadcasting (tensor slope should be unidirectional broadcastable to input tensor X); for more details please check Broadcasting in ONNX.

Inputs

  • X (heterogeneous) - T: Input tensor

  • slope (heterogeneous) - T: Slope tensor. The shape of slope can be smaller then first input X; if so, its shape must be unidirectional broadcastable to X

Outputs

  • Y (heterogeneous) - T: Output tensor (same size as X)

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to float/int tensors.

OnnxPRelu_1#

class mlprodict.npy.xop_auto_import_.OnnxPRelu_1(*args, **kwargs)#

Version

  • name: PRelu (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

PRelu takes input data (Tensor<T>) and slope tensor as input, and produces one output data (Tensor<T>) where the function f(x) = slope * x for x < 0, f(x) = x for x >= 0., is applied to the data tensor elementwise.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: Input tensor

  • slope (heterogeneous) - T: Slope tensor. If Slope is of size 1, the value is sharedacross different channels

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxPRelu_16#

class mlprodict.npy.xop_auto_import_.OnnxPRelu_16(*args, **kwargs)#

Version

  • name: PRelu (GitHub)

  • domain: main

  • since_version: 16

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

PRelu takes input data (Tensor<T>) and slope tensor as input, and produces one output data (Tensor<T>) where the function f(x) = slope * x for x < 0, f(x) = x for x >= 0., is applied to the data tensor elementwise.

History - Version 16 adds bfloat16 to the types allowed. This operator supports unidirectional broadcasting (tensor slope should be unidirectional broadcastable to input tensor X); for more details please check Broadcasting in ONNX.

Inputs

  • X (heterogeneous) - T: Input tensor

  • slope (heterogeneous) - T: Slope tensor. The shape of slope can be smaller then first input X; if so, its shape must be unidirectional broadcastable to X

Outputs

  • Y (heterogeneous) - T: Output tensor (same size as X)

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to float/int tensors.

OnnxPRelu_6#

class mlprodict.npy.xop_auto_import_.OnnxPRelu_6(*args, **kwargs)#

Version

  • name: PRelu (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

PRelu takes input data (Tensor<T>) and slope tensor as input, and produces one output data (Tensor<T>) where the function f(x) = slope * x for x < 0, f(x) = x for x >= 0., is applied to the data tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

  • slope (heterogeneous) - T: Slope tensor. If Slope is of size 1, the value is sharedacross different channels

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxPRelu_7#

class mlprodict.npy.xop_auto_import_.OnnxPRelu_7(*args, **kwargs)#

Version

  • name: PRelu (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

PRelu takes input data (Tensor<T>) and slope tensor as input, and produces one output data (Tensor<T>) where the function f(x) = slope * x for x < 0, f(x) = x for x >= 0., is applied to the data tensor elementwise. This operator supports unidirectional broadcasting (tensor slope should be unidirectional broadcastable to input tensor X); for more details please check Broadcasting in ONNX.

Inputs

  • X (heterogeneous) - T: Input tensor

  • slope (heterogeneous) - T: Slope tensor. The shape of slope can be smaller then first input X; if so, its shape must be unidirectional broadcastable to X

Outputs

  • Y (heterogeneous) - T: Output tensor (same size as X)

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxPRelu_9#

class mlprodict.npy.xop_auto_import_.OnnxPRelu_9(*args, **kwargs)#

Version

  • name: PRelu (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

PRelu takes input data (Tensor<T>) and slope tensor as input, and produces one output data (Tensor<T>) where the function f(x) = slope * x for x < 0, f(x) = x for x >= 0., is applied to the data tensor elementwise. This operator supports unidirectional broadcasting (tensor slope should be unidirectional broadcastable to input tensor X); for more details please check Broadcasting in ONNX.

Inputs

  • X (heterogeneous) - T: Input tensor

  • slope (heterogeneous) - T: Slope tensor. The shape of slope can be smaller then first input X; if so, its shape must be unidirectional broadcastable to X

Outputs

  • Y (heterogeneous) - T: Output tensor (same size as X)

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to float/int tensors.

OnnxPad#

class mlprodict.npy.xop_auto_import_.OnnxPad(*args, **kwargs)#

Version

  • name: Pad (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Given a tensor containing the data to be padded (data), a tensor containing the number of start and end pad values for axis (pads), (optionally) a mode, and (optionally) constant_value, a padded tensor (output) is generated.

The three supported modes are (similar to corresponding modes supported by numpy.pad):

  1. constant`(default) - pads with a given constant value as specified by `constant_value (which defaults to 0, empty string, or False)

  2. reflect - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis

  3. edge - pads with the edge values of array

Example 1 (constant mode):

Insert 0 pads to the beginning of the second dimension.

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘constant’

constant_value = 0.0

output = [

[0.0, 0.0, 1.0, 1.2], [0.0, 0.0, 2.3, 3.4], [0.0, 0.0, 4.5, 5.7],

]

Example 2 (reflect mode):

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘reflect’

output = [

[1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7],

]

Example 3 (edge mode):

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘edge’

output = [

[1.0, 1.0, 1.0, 1.2], [2.3, 2.3, 2.3, 3.4], [4.5, 4.5, 4.5, 5.7],

]

Attributes

  • mode: Supported modes: constant`(default), `reflect, edge Default value is 'constant'.

Inputs

Between 2 and 4 inputs.

  • data (heterogeneous) - T: Input tensor.

  • pads (heterogeneous) - tensor(int64): Tensor of integers indicating the number of padding elements to add or remove (if negative) at the beginning and end of each axis. For 2D input tensor, it is the number of pixels. pads should be a 1D tensor of shape [2 * num_axes] where num_axes refers to the number of elements in the axes input or the input rank if axes are not provided explicitly. pads format should be: [x1_begin, x2_begin, …, x1_end, x2_end,…], where xi_begin is the number of pad values added at the beginning of axis axes[i] and xi_end, the number of pad values added at the end of axis axes[i].

  • constant_value (optional, heterogeneous) - T: (Optional) A scalar value to be used if the mode chosen is constant (by default it is 0, empty string or False).

  • axes (optional, heterogeneous) - Tind: 1-D tensor of axes that pads apply to. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Behavior is undefined if an axis is repeated. If not provided, all axes are assumed ([0, 1, …, input_rank-1]).

Outputs

  • output (heterogeneous) - T: Tensor after padding.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxPad_1#

class mlprodict.npy.xop_auto_import_.OnnxPad_1(*args, **kwargs)#

Version

  • name: Pad (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Given data tensor, paddings, mode, and value. Example:

Insert 0 paddings to the beginning of the second dimension. data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

] paddings = [0, 0, 2, 0] output = [

[

[0.0, 0.0, 1.0, 1.2], [0.0, 0.0, 2.3, 3.4], [0.0, 0.0, 4.5, 5.7],

],

]

Attributes

  • mode: Three modes: constant(default), reflect, edge Default value is 'constant'.

  • paddings (required): List of integers indicate the padding element count at the beginning and end of each axis, for 2D it is the number of pixel. paddings rank should be double of the input’s rank. paddings format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.

  • value: One float, indicates the value to be filled, default is 0 Default value is 0.0.

Inputs

  • data (heterogeneous) - T: Input tensor.

Outputs

  • output (heterogeneous) - T: Tensor after padding.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxPad_11#

class mlprodict.npy.xop_auto_import_.OnnxPad_11(*args, **kwargs)#

Version

  • name: Pad (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Given a tensor containing the data to be padded (data), a tensor containing the number of start and end pad values for axis (pads), (optionally) a mode, and (optionally) constant_value, a padded tensor (output) is generated.

The three supported modes are (similar to corresponding modes supported by numpy.pad):

  1. constant`(default) - pads with a given constant value as specified by `constant_value (which defaults to 0)

  2. reflect - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis

  3. edge - pads with the edge values of array

Example 1 (constant mode):

Insert 0 pads to the beginning of the second dimension.

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘constant’

constant_value = 0.0

output = [

[0.0, 0.0, 1.0, 1.2], [0.0, 0.0, 2.3, 3.4], [0.0, 0.0, 4.5, 5.7],

]

Example 2 (reflect mode):

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘reflect’

output = [

[1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7],

]

Example 3 (edge mode):

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘edge’

output = [

[1.0, 1.0, 1.0, 1.2], [2.3, 2.3, 2.3, 3.4], [4.5, 4.5, 4.5, 5.7],

]

Attributes

  • mode: Supported modes: constant`(default), `reflect, edge Default value is 'constant'.

Inputs

Between 2 and 3 inputs.

  • data (heterogeneous) - T: Input tensor.

  • pads (heterogeneous) - tensor(int64): Tensor of integers indicating the number of padding elements to add or remove (if negative) at the beginning and end of each axis. For 2D input tensor, it is the number of pixels. pads should be a 1D tensor of shape [2 * input_rank]. pads format should be: [x1_begin, x2_begin,…,x1_end, x2_end,…], where xi_begin is the number of pad values added at the beginning of axis i and xi_end, the number of pad values added at the end of axis i.

  • constant_value (optional, heterogeneous) - T: (Optional) A scalar value to be used if the mode chosen is constant (by default it is 0).

Outputs

  • output (heterogeneous) - T: Tensor after padding.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output to only numeric types.

OnnxPad_13#

class mlprodict.npy.xop_auto_import_.OnnxPad_13(*args, **kwargs)#

Version

  • name: Pad (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Given a tensor containing the data to be padded (data), a tensor containing the number of start and end pad values for axis (pads), (optionally) a mode, and (optionally) constant_value, a padded tensor (output) is generated.

The three supported modes are (similar to corresponding modes supported by numpy.pad):

  1. constant`(default) - pads with a given constant value as specified by `constant_value (which defaults to 0, empty string, or False)

  2. reflect - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis

  3. edge - pads with the edge values of array

Example 1 (constant mode):

Insert 0 pads to the beginning of the second dimension.

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘constant’

constant_value = 0.0

output = [

[0.0, 0.0, 1.0, 1.2], [0.0, 0.0, 2.3, 3.4], [0.0, 0.0, 4.5, 5.7],

]

Example 2 (reflect mode):

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘reflect’

output = [

[1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7],

]

Example 3 (edge mode):

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘edge’

output = [

[1.0, 1.0, 1.0, 1.2], [2.3, 2.3, 2.3, 3.4], [4.5, 4.5, 4.5, 5.7],

]

Attributes

  • mode: Supported modes: constant`(default), `reflect, edge Default value is 'constant'.

Inputs

Between 2 and 3 inputs.

  • data (heterogeneous) - T: Input tensor.

  • pads (heterogeneous) - tensor(int64): Tensor of integers indicating the number of padding elements to add or remove (if negative) at the beginning and end of each axis. For 2D input tensor, it is the number of pixels. pads should be a 1D tensor of shape [2 * input_rank]. pads format should be: [x1_begin, x2_begin,…,x1_end, x2_end,…], where xi_begin is the number of pad values added at the beginning of axis i and xi_end, the number of pad values added at the end of axis i.

  • constant_value (optional, heterogeneous) - T: (Optional) A scalar value to be used if the mode chosen is constant (by default it is 0, empty string or False).

Outputs

  • output (heterogeneous) - T: Tensor after padding.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxPad_18#

class mlprodict.npy.xop_auto_import_.OnnxPad_18(*args, **kwargs)#

Version

  • name: Pad (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Given a tensor containing the data to be padded (data), a tensor containing the number of start and end pad values for axis (pads), (optionally) a mode, and (optionally) constant_value, a padded tensor (output) is generated.

The three supported modes are (similar to corresponding modes supported by numpy.pad):

  1. constant`(default) - pads with a given constant value as specified by `constant_value (which defaults to 0, empty string, or False)

  2. reflect - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis

  3. edge - pads with the edge values of array

Example 1 (constant mode):

Insert 0 pads to the beginning of the second dimension.

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘constant’

constant_value = 0.0

output = [

[0.0, 0.0, 1.0, 1.2], [0.0, 0.0, 2.3, 3.4], [0.0, 0.0, 4.5, 5.7],

]

Example 2 (reflect mode):

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘reflect’

output = [

[1.0, 1.2, 1.0, 1.2], [2.3, 3.4, 2.3, 3.4], [4.5, 5.7, 4.5, 5.7],

]

Example 3 (edge mode):

data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

]

pads = [0, 2, 0, 0]

mode = ‘edge’

output = [

[1.0, 1.0, 1.0, 1.2], [2.3, 2.3, 2.3, 3.4], [4.5, 4.5, 4.5, 5.7],

]

Attributes

  • mode: Supported modes: constant`(default), `reflect, edge Default value is 'constant'.

Inputs

Between 2 and 4 inputs.

  • data (heterogeneous) - T: Input tensor.

  • pads (heterogeneous) - tensor(int64): Tensor of integers indicating the number of padding elements to add or remove (if negative) at the beginning and end of each axis. For 2D input tensor, it is the number of pixels. pads should be a 1D tensor of shape [2 * num_axes] where num_axes refers to the number of elements in the axes input or the input rank if axes are not provided explicitly. pads format should be: [x1_begin, x2_begin, …, x1_end, x2_end,…], where xi_begin is the number of pad values added at the beginning of axis axes[i] and xi_end, the number of pad values added at the end of axis axes[i].

  • constant_value (optional, heterogeneous) - T: (Optional) A scalar value to be used if the mode chosen is constant (by default it is 0, empty string or False).

  • axes (optional, heterogeneous) - Tind: 1-D tensor of axes that pads apply to. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Behavior is undefined if an axis is repeated. If not provided, all axes are assumed ([0, 1, …, input_rank-1]).

Outputs

  • output (heterogeneous) - T: Tensor after padding.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxPad_2#

class mlprodict.npy.xop_auto_import_.OnnxPad_2(*args, **kwargs)#

Version

  • name: Pad (GitHub)

  • domain: main

  • since_version: 2

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 2.

Summary

Given data tensor, pads, mode, and value. Example:

Insert 0 pads to the beginning of the second dimension. data = [

[1.0, 1.2], [2.3, 3.4], [4.5, 5.7],

] pads = [0, 2, 0, 0] output = [

[

[0.0, 0.0, 1.0, 1.2], [0.0, 0.0, 2.3, 3.4], [0.0, 0.0, 4.5, 5.7],

],

]

Attributes

  • mode: Three modes: constant(default), reflect, edge Default value is 'constant'.

  • pads (required): List of integers indicating the number of padding elements to add or remove (if negative) at the beginning and end of each axis. For 2D it is the number of pixels. pads rank should be double of the input’s rank. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.

  • value: One float, indicates the value to be filled. Default value is 0.0.

Inputs

  • data (heterogeneous) - T: Input tensor.

Outputs

  • output (heterogeneous) - T: Tensor after padding.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxPow#

class mlprodict.npy.xop_auto_import_.OnnxPow(*args, **kwargs)#

Version

  • name: Pow (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Pow takes input data (Tensor<T>) and exponent Tensor, and produces one output data (Tensor<T>) where the function f(x) = x^exponent, is applied to the data tensor elementwise. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • X (heterogeneous) - T: First operand, base of the exponent.

  • Y (heterogeneous) - T1: Second operand, power of the exponent.

Outputs

  • Z (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64) ): Constrain input X and output types to float/int tensors.

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input Y types to float/int tensors.

OnnxPow_1#

class mlprodict.npy.xop_auto_import_.OnnxPow_1(*args, **kwargs)#

Version

  • name: Pow (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Pow takes input data (Tensor<T>) and exponent Tensor, and produces one output data (Tensor<T>) where the function f(x) = x^exponent, is applied to the data tensor elementwise.

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of element size 1 (including a scalar tensor and any tensor with rank equal to or smaller than the first tensor), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor shape(A) = (2, 3, 4, 5), shape(B) = (5,) shape(A) = (2, 3, 4, 5), shape(B) = (4, 5) shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1 shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

  • axis: If set, defines the broadcast dimensions. See doc for details.

  • broadcast: Pass 1 to enable broadcasting Default value is 0.

Inputs

  • X (heterogeneous) - T: Input tensor of any shape, base of the exponent.

  • Y (heterogeneous) - T: Input tensor of any shape broadcastable to X shape, the exponent component.

Outputs

  • Z (heterogeneous) - T: Output tensor (same size as X)

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxPow_12#

class mlprodict.npy.xop_auto_import_.OnnxPow_12(*args, **kwargs)#

Version

  • name: Pow (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Pow takes input data (Tensor<T>) and exponent Tensor, and produces one output data (Tensor<T>) where the function f(x) = x^exponent, is applied to the data tensor elementwise. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • X (heterogeneous) - T: First operand, base of the exponent.

  • Y (heterogeneous) - T1: Second operand, power of the exponent.

Outputs

  • Z (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64) ): Constrain input X and output types to float/int tensors.

  • T1 in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input Y types to float/int tensors.

OnnxPow_13#

class mlprodict.npy.xop_auto_import_.OnnxPow_13(*args, **kwargs)#

Version

  • name: Pow (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Pow takes input data (Tensor<T>) and exponent Tensor, and produces one output data (Tensor<T>) where the function f(x) = x^exponent, is applied to the data tensor elementwise. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • X (heterogeneous) - T: First operand, base of the exponent.

  • Y (heterogeneous) - T1: Second operand, power of the exponent.

Outputs

  • Z (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64) ): Constrain input X and output types to float/int tensors.

  • T1 in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input Y types to float/int tensors.

OnnxPow_15#

class mlprodict.npy.xop_auto_import_.OnnxPow_15(*args, **kwargs)#

Version

  • name: Pow (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Pow takes input data (Tensor<T>) and exponent Tensor, and produces one output data (Tensor<T>) where the function f(x) = x^exponent, is applied to the data tensor elementwise. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • X (heterogeneous) - T: First operand, base of the exponent.

  • Y (heterogeneous) - T1: Second operand, power of the exponent.

Outputs

  • Z (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64) ): Constrain input X and output types to float/int tensors.

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input Y types to float/int tensors.

OnnxPow_7#

class mlprodict.npy.xop_auto_import_.OnnxPow_7(*args, **kwargs)#

Version

  • name: Pow (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Pow takes input data (Tensor<T>) and exponent Tensor, and produces one output data (Tensor<T>) where the function f(x) = x^exponent, is applied to the data tensor elementwise. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • X (heterogeneous) - T: First operand, base of the exponent.

  • Y (heterogeneous) - T: Second operand, power of the exponent.

Outputs

  • Z (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxQLinearConv#

class mlprodict.npy.xop_auto_import_.OnnxQLinearConv(*args, **kwargs)#

Version

  • name: QLinearConv (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

The convolution operator consumes a quantized input tensor, its scale and zero point, a quantized filter, its scale and zero point, and output’s scale and zero point, and computes the quantized output. Each scale and zero-point pair must have same shape. It means they must be either scalars (per tensor) or 1-D tensors (per output channel). Each input or output and its related zero point must have same type. When bias is present it must be quantized using scale = input scale * weight scale and zero point as 0.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.

  • group: number of groups input channels and output channels are divided into. default is 1. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input ‘w’.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0.The value represent the number of pixels added to the beginning and end part of the corresponding axis.`pads` format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number ofpixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaultsto 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

Between 8 and 9 inputs.

  • x (heterogeneous) - T1: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • x_scale (heterogeneous) - tensor(float): Scale tensor for input ‘x’. It’s a scalar, which means a per- tensor/layer quantization.

  • x_zero_point (heterogeneous) - T1: Zero point tensor for input ‘x’. It’s a scalar, which means a per- tensor/layer quantization.

  • w (heterogeneous) - T2: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. X.shape[1] == (W.shape[1] * group) == C (assuming zero based indices for the shape array). Or in other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL.

  • w_scale (heterogeneous) - tensor(float): Scale tensor for input ‘w’. It could be a scalar or a 1-D tensor, which means a per-tensor/layer or per output channel quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of output channels (M).

  • w_zero_point (heterogeneous) - T2: Zero point tensor for input ‘w’. It could be a scalar or a 1-D tensor, which means a per-tensor/layer or per output channel quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of output channels (M).

  • y_scale (heterogeneous) - tensor(float): Scale tensor for output ‘y’. It’s a scalar, which means a per- tensor/layer quantization.

  • y_zero_point (heterogeneous) - T3: Zero point tensor for output ‘y’. It’s a scalar, which means a per- tensor/layer quantization.

  • B (optional, heterogeneous) - T4: Optional 1D bias to be added to the convolution, has size of M. Bias must be quantized using scale = x_scale * w_scale and zero_point = 0

Outputs

  • y (heterogeneous) - T3: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Type Constraints

  • T1 in ( tensor(int8), tensor(uint8) ): Constrain input type to 8-bit integer tensor.

  • T2 in ( tensor(int8), tensor(uint8) ): Constrain filter type to 8-bit integer tensor.

  • T3 in ( tensor(int8), tensor(uint8) ): Constrain output type to 8-bit integer tensor.

  • T4 in ( tensor(int32) ): Constrain bias type to 32-bit integer tensor.

OnnxQLinearConv_10#

class mlprodict.npy.xop_auto_import_.OnnxQLinearConv_10(*args, **kwargs)#

Version

  • name: QLinearConv (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

The convolution operator consumes a quantized input tensor, its scale and zero point, a quantized filter, its scale and zero point, and output’s scale and zero point, and computes the quantized output. Each scale and zero-point pair must have same shape. It means they must be either scalars (per tensor) or 1-D tensors (per output channel). Each input or output and its related zero point must have same type. When bias is present it must be quantized using scale = input scale * weight scale and zero point as 0.

Attributes

  • auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is 'NOTSET'.

  • dilations: dilation value along each spatial axis of the filter. If not present, the dilation defaults to 1 along each spatial axis.

  • group: number of groups input channels and output channels are divided into. default is 1. Default value is 1.

  • kernel_shape: The shape of the convolution kernel. If not present, should be inferred from input ‘w’.

  • pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0.The value represent the number of pixels added to the beginning and end part of the corresponding axis.`pads` format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number ofpixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i.This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaultsto 0 along start and end of each spatial axis.

  • strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.

Inputs

Between 8 and 9 inputs.

  • x (heterogeneous) - T1: Input data tensor from previous layer; has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and width. Note that this is for the 2D image. Otherwise the size is (N x C x D1 x D2 … x Dn). Optionally, if dimension denotation is in effect, the operation expects input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].

  • x_scale (heterogeneous) - tensor(float): Scale tensor for input ‘x’. It’s a scalar, which means a per- tensor/layer quantization.

  • x_zero_point (heterogeneous) - T1: Zero point tensor for input ‘x’. It’s a scalar, which means a per- tensor/layer quantization.

  • w (heterogeneous) - T2: The weight tensor that will be used in the convolutions; has size (M x C/group x kH x kW), where C is the number of channels, and kH and kW are the height and width of the kernel, and M is the number of feature maps. For more than 2 dimensions, the kernel shape will be (M x C/group x k1 x k2 x … x kn), where (k1 x k2 x … kn) is the dimension of the kernel. Optionally, if dimension denotation is in effect, the operation expects the weight tensor to arrive with the dimension denotation of [FILTER_OUT_CHANNEL, FILTER_IN_CHANNEL, FILTER_SPATIAL, FILTER_SPATIAL …]. X.shape[1] == (W.shape[1] * group) == C (assuming zero based indices for the shape array). Or in other words FILTER_IN_CHANNEL should be equal to DATA_CHANNEL.

  • w_scale (heterogeneous) - tensor(float): Scale tensor for input ‘w’. It could be a scalar or a 1-D tensor, which means a per-tensor/layer or per output channel quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of output channels (M).

  • w_zero_point (heterogeneous) - T2: Zero point tensor for input ‘w’. It could be a scalar or a 1-D tensor, which means a per-tensor/layer or per output channel quantization. If it’s a 1-D tensor, its number of elements should be equal to the number of output channels (M).

  • y_scale (heterogeneous) - tensor(float): Scale tensor for output ‘y’. It’s a scalar, which means a per- tensor/layer quantization.

  • y_zero_point (heterogeneous) - T3: Zero point tensor for output ‘y’. It’s a scalar, which means a per- tensor/layer quantization.

  • B (optional, heterogeneous) - T4: Optional 1D bias to be added to the convolution, has size of M. Bias must be quantized using scale = x_scale * w_scale and zero_point = 0

Outputs

  • y (heterogeneous) - T3: Output data tensor that contains the result of the convolution. The output dimensions are functions of the kernel size, stride size, and pad lengths.

Type Constraints

  • T1 in ( tensor(int8), tensor(uint8) ): Constrain input type to 8-bit integer tensor.

  • T2 in ( tensor(int8), tensor(uint8) ): Constrain filter type to 8-bit integer tensor.

  • T3 in ( tensor(int8), tensor(uint8) ): Constrain output type to 8-bit integer tensor.

  • T4 in ( tensor(int32) ): Constrain bias type to 32-bit integer tensor.

OnnxQLinearMatMul#

class mlprodict.npy.xop_auto_import_.OnnxQLinearMatMul(*args, **kwargs)#

Version

  • name: QLinearMatMul (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html. It consumes two quantized input tensors, their scales and zero points, scale and zero point of output, and computes the quantized output. The quantization formula is y = saturate((x / y_scale) + y_zero_point). For (x / y_scale), it is rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. Scale and zero point must have same shape. They must be either scalar (per tensor) or N-D tensor (per row for ‘a’ and per column for ‘b’). Scalar refers to per tensor quantization whereas N-D refers to per row or per column quantization. If the input is 2D of shape [M, K] then zero point and scale tensor may be an M element vector [v_1, v_2, …, v_M] for per row quantization and K element vector of shape [v_1, v_2, …, v_K] for per column quantization. If the input is N-D tensor with shape [D1, D2, M, K] then zero point and scale tensor may have shape [D1, D2, M, 1] for per row quantization and shape [D1, D2, 1, K] for per column quantization. Production must never overflow, and accumulation may overflow if and only if in 32 bits.

Inputs

  • a (heterogeneous) - T1: N-dimensional quantized matrix a

  • a_scale (heterogeneous) - tensor(float): scale of quantized input a

  • a_zero_point (heterogeneous) - T1: zero point of quantized input a

  • b (heterogeneous) - T2: N-dimensional quantized matrix b

  • b_scale (heterogeneous) - tensor(float): scale of quantized input b

  • b_zero_point (heterogeneous) - T2: zero point of quantized input b

  • y_scale (heterogeneous) - tensor(float): scale of quantized output y

  • y_zero_point (heterogeneous) - T3: zero point of quantized output y

Outputs

  • y (heterogeneous) - T3: Quantized matrix multiply results from a * b

Type Constraints

  • T1 in ( tensor(int8), tensor(uint8) ): Constrain input a and its zero point data type to 8-bit integer tensor.

  • T2 in ( tensor(int8), tensor(uint8) ): Constrain input b and its zero point data type to 8-bit integer tensor.

  • T3 in ( tensor(int8), tensor(uint8) ): Constrain output y and its zero point data type to 8-bit integer tensor.

OnnxQLinearMatMul_10#

class mlprodict.npy.xop_auto_import_.OnnxQLinearMatMul_10(*args, **kwargs)#

Version

  • name: QLinearMatMul (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html. It consumes two quantized input tensors, their scales and zero points, scale and zero point of output, and computes the quantized output. The quantization formula is y = saturate((x / y_scale) + y_zero_point). For (x / y_scale), it is rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. Scale and zero point must have same shape. They must be either scalar (per tensor) or N-D tensor (per row for ‘a’ and per column for ‘b’). Scalar refers to per tensor quantization whereas N-D refers to per row or per column quantization. If the input is 2D of shape [M, K] then zero point and scale tensor may be an M element vector [v_1, v_2, …, v_M] for per row quantization and K element vector of shape [v_1, v_2, …, v_K] for per column quantization. If the input is N-D tensor with shape [D1, D2, M, K] then zero point and scale tensor may have shape [D1, D2, M, 1] for per row quantization and shape [D1, D2, 1, K] for per column quantization. Production must never overflow, and accumulation may overflow if and only if in 32 bits.

Inputs

  • a (heterogeneous) - T1: N-dimensional quantized matrix a

  • a_scale (heterogeneous) - tensor(float): scale of quantized input a

  • a_zero_point (heterogeneous) - T1: zero point of quantized input a

  • b (heterogeneous) - T2: N-dimensional quantized matrix b

  • b_scale (heterogeneous) - tensor(float): scale of quantized input b

  • b_zero_point (heterogeneous) - T2: zero point of quantized input b

  • y_scale (heterogeneous) - tensor(float): scale of quantized output y

  • y_zero_point (heterogeneous) - T3: zero point of quantized output y

Outputs

  • y (heterogeneous) - T3: Quantized matrix multiply results from a * b

Type Constraints

  • T1 in ( tensor(int8), tensor(uint8) ): Constrain input a and its zero point data type to 8-bit integer tensor.

  • T2 in ( tensor(int8), tensor(uint8) ): Constrain input b and its zero point data type to 8-bit integer tensor.

  • T3 in ( tensor(int8), tensor(uint8) ): Constrain output y and its zero point data type to 8-bit integer tensor.

OnnxQuantizeLinear#

class mlprodict.npy.xop_auto_import_.OnnxQuantizeLinear(*args, **kwargs)#

Version

  • name: QuantizeLinear (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor. The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it’s uint8, or [-128, 127] if it’s int8. For (x / y_scale), it’s rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. ‘y_zero_point’ and ‘y’ must have same type.

Attributes

  • axis: (Optional) The axis of the quantization dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is 1.

Inputs

Between 2 and 3 inputs.

  • x (heterogeneous) - T1: N-D full precision Input tensor to be quantized.

  • y_scale (heterogeneous) - tensor(float): Scale for doing quantization to get ‘y’. It can be a scalar, which means per-tensor/layer quantization, or a 1-D Tensor for per-axis quantization.

  • y_zero_point (optional, heterogeneous) - T2: Zero point for doing quantization to get ‘y’. Shape must match y_scale. Default is uint8 with zero point of 0 if it’s not specified.

Outputs

  • y (heterogeneous) - T2: N-D quantized output tensor. It has same shape as input ‘x’.

Type Constraints

  • T1 in ( tensor(float), tensor(int32) ): Constrain ‘x’ to float or int32 tensor.

  • T2 in ( tensor(int8), tensor(uint8) ): Constrain ‘y_zero_point’ and ‘y’ to 8-bit integer tensor.

OnnxQuantizeLinear_10#

class mlprodict.npy.xop_auto_import_.OnnxQuantizeLinear_10(*args, **kwargs)#

Version

  • name: QuantizeLinear (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

The linear per-tensor/layer quantization operator. It consumes a high precision tensor, a scale, a zero point to compute the low precision / quantized tensor. The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it’s uint8, or [-128, 127] if it’s int8. For (x / y_scale), it’s rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. ‘y_zero_point’ and ‘y’ must have same type.

Inputs

Between 2 and 3 inputs.

  • x (heterogeneous) - T1: N-D full precision Input tensor to be quantized.

  • y_scale (heterogeneous) - tensor(float): Scale for doing quantization to get ‘y’. It’s a scalar, which means a per-tensor/layer quantization.

  • y_zero_point (optional, heterogeneous) - T2: Zero point for doing quantization to get ‘y’. It’s a scalar, which means a per-tensor/layer quantization. Default value is uint8 typed 0 if it’s not specified.

Outputs

  • y (heterogeneous) - T2: N-D quantized output tensor. It has same shape as input ‘x’.

Type Constraints

  • T1 in ( tensor(float), tensor(int32) ): Constrain ‘x’ to float or int32 tensor.

  • T2 in ( tensor(int8), tensor(uint8) ): Constrain ‘y_zero_point’ and ‘y’ to 8-bit integer tensor.

OnnxQuantizeLinear_13#

class mlprodict.npy.xop_auto_import_.OnnxQuantizeLinear_13(*args, **kwargs)#

Version

  • name: QuantizeLinear (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor. The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization. The quantization formula is y = saturate ((x / y_scale) + y_zero_point). For saturation, it saturates to [0, 255] if it’s uint8, or [-128, 127] if it’s int8. For (x / y_scale), it’s rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. ‘y_zero_point’ and ‘y’ must have same type.

Attributes

  • axis: (Optional) The axis of the quantization dimension of the input tensor. Ignored for per-tensor quantization. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is 1.

Inputs

Between 2 and 3 inputs.

  • x (heterogeneous) - T1: N-D full precision Input tensor to be quantized.

  • y_scale (heterogeneous) - tensor(float): Scale for doing quantization to get ‘y’. It can be a scalar, which means per-tensor/layer quantization, or a 1-D Tensor for per-axis quantization.

  • y_zero_point (optional, heterogeneous) - T2: Zero point for doing quantization to get ‘y’. Shape must match y_scale. Default is uint8 with zero point of 0 if it’s not specified.

Outputs

  • y (heterogeneous) - T2: N-D quantized output tensor. It has same shape as input ‘x’.

Type Constraints

  • T1 in ( tensor(float), tensor(int32) ): Constrain ‘x’ to float or int32 tensor.

  • T2 in ( tensor(int8), tensor(uint8) ): Constrain ‘y_zero_point’ and ‘y’ to 8-bit integer tensor.

OnnxRNN#

class mlprodict.npy.xop_auto_import_.OnnxRNN(*args, **kwargs)#

Version

  • name: RNN (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Computes an one-layer simple RNN. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

t - time step (t-1 means previous time step)

Wi - W parameter weight matrix for input gate

Ri - R recurrence weight matrix for input gate

Wbi - W parameter bias vector for input gate

Rbi - R parameter bias vector for input gate

WBi - W parameter weight matrix for backward input gate

RBi - R recurrence weight matrix for backward input gate

WBbi - WR bias vectors for backward input gate

RBbi - RR bias vectors for backward input gate

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Tanh):

  • Ht = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Wbi + Rbi)

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: One (or two if bidirectional) activation function for input gate. The activation function must be one of the activation functions specified above. Optional: Default Tanh if not specified. Default value is [b'Tanh' b'Tanh'].

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • layout: The shape format of inputs X, initial_h and outputs Y, Y_h. If 0, the following shapes are expected: X.shape = [seq_length, batch_size, input_size], Y.shape = [seq_length, num_directions, batch_size, hidden_size], initial_h.shape = Y_h.shape = [num_directions, batch_size, hidden_size]. If 1, the following shapes are expected: X.shape = [batch_size, seq_length, input_size], Y.shape = [batch_size, seq_length, num_directions, hidden_size], initial_h.shape = Y_h.shape = [batch_size, num_directions, hidden_size]. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for input gate. Concatenation of Wi and WBi (if bidirectional). The tensor has shape [num_directions, hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of Ri and RBi (if bidirectional). The tensor has shape [num_directions, hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wbi, Rbi] and [WBbi, RBbi] (if bidirectional). The tensor has shape [num_directions, 2*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Between 0 and 2 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxRNN_1#

class mlprodict.npy.xop_auto_import_.OnnxRNN_1(*args, **kwargs)#

Version

  • name: RNN (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes an one-layer simple RNN. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

t - time step (t-1 means previous time step)

Wi - W parameter weight matrix for input gate

Ri - R recurrence weight matrix for input gate

Wbi - W parameter bias vector for input gate

Rbi - R parameter bias vector for input gate

WBi - W parameter weight matrix for backward input gate

RBi - R recurrence weight matrix for backward input gate

WBbi - WR bias vectors for backward input gate

RBbi - RR bias vectors for backward input gate

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Tanh):

  • Ht = f(Xt*(Wi^T) + Ht-1*Ri + Wbi + Rbi)

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: One (or two if bidirectional) activation function for input gate. The activation function must be one of the activation functions specified above. Optional: Default Tanh if not specified. Default value is [b'Tanh' b'Tanh'].

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • output_sequence: The sequence output for the hidden is optional if 0. Default 0. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for input gate. Concatenation of Wi and WBi (if bidirectional). The tensor has shape [num_directions, hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of Ri and RBi (if bidirectional). The tensor has shape [num_directions, hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wbi, Rbi] and [WBbi, RBbi] (if bidirectional). The tensor has shape [num_directions, 2*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Between 0 and 2 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size]. It is optional if output_sequence is 0.

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxRNN_14#

class mlprodict.npy.xop_auto_import_.OnnxRNN_14(*args, **kwargs)#

Version

  • name: RNN (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Computes an one-layer simple RNN. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

t - time step (t-1 means previous time step)

Wi - W parameter weight matrix for input gate

Ri - R recurrence weight matrix for input gate

Wbi - W parameter bias vector for input gate

Rbi - R parameter bias vector for input gate

WBi - W parameter weight matrix for backward input gate

RBi - R recurrence weight matrix for backward input gate

WBbi - WR bias vectors for backward input gate

RBbi - RR bias vectors for backward input gate

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Tanh):

  • Ht = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Wbi + Rbi)

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: One (or two if bidirectional) activation function for input gate. The activation function must be one of the activation functions specified above. Optional: Default Tanh if not specified. Default value is [b'Tanh' b'Tanh'].

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

  • layout: The shape format of inputs X, initial_h and outputs Y, Y_h. If 0, the following shapes are expected: X.shape = [seq_length, batch_size, input_size], Y.shape = [seq_length, num_directions, batch_size, hidden_size], initial_h.shape = Y_h.shape = [num_directions, batch_size, hidden_size]. If 1, the following shapes are expected: X.shape = [batch_size, seq_length, input_size], Y.shape = [batch_size, seq_length, num_directions, hidden_size], initial_h.shape = Y_h.shape = [batch_size, num_directions, hidden_size]. Default value is 0.

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for input gate. Concatenation of Wi and WBi (if bidirectional). The tensor has shape [num_directions, hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of Ri and RBi (if bidirectional). The tensor has shape [num_directions, hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wbi, Rbi] and [WBbi, RBbi] (if bidirectional). The tensor has shape [num_directions, 2*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Between 0 and 2 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxRNN_7#

class mlprodict.npy.xop_auto_import_.OnnxRNN_7(*args, **kwargs)#

Version

  • name: RNN (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Computes an one-layer simple RNN. This operator is usually supported via some custom implementation such as CuDNN.

Notations:

X - input tensor

i - input gate

t - time step (t-1 means previous time step)

Wi - W parameter weight matrix for input gate

Ri - R recurrence weight matrix for input gate

Wbi - W parameter bias vector for input gate

Rbi - R parameter bias vector for input gate

WBi - W parameter weight matrix for backward input gate

RBi - R recurrence weight matrix for backward input gate

WBbi - WR bias vectors for backward input gate

RBbi - RR bias vectors for backward input gate

H - Hidden state

num_directions - 2 if direction == bidirectional else 1

Activation functions:

Relu(x) - max(0, x)

Tanh(x) - (1 - e^{-2x})/(1 + e^{-2x})

Sigmoid(x) - 1/(1 + e^{-x})

(NOTE: Below are optional)

Affine(x) - alpha*x + beta

LeakyRelu(x) - x if x >= 0 else alpha * x

ThresholdedRelu(x) - x if x >= alpha else 0

ScaledTanh(x) - alpha*Tanh(beta*x)

HardSigmoid(x) - min(max(alpha*x + beta, 0), 1)

Elu(x) - x if x >= 0 else alpha*(e^x - 1)

Softsign(x) - x/(1 + |x|)

Softplus(x) - log(1 + e^x)

Equations (Default: f=Tanh):

  • Ht = f(Xt*(Wi^T) + Ht-1*(Ri^T) + Wbi + Rbi)

This operator has optional inputs/outputs. See ONNX for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Attributes

  • activation_alpha: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.For example with LeakyRelu, the default alpha is 0.01.

  • activation_beta: Optional scaling values used by some activation functions. The values are consumed in the order of activation functions, for example (f, g, h) in LSTM. Default values are the same as of corresponding ONNX operators.

  • activations: One (or two if bidirectional) activation function for input gate. The activation function must be one of the activation functions specified above. Optional: Default Tanh if not specified. Default value is [b'Tanh' b'Tanh'].

  • clip: Cell clip threshold. Clipping bounds the elements of a tensor in the range of [-threshold, +threshold] and is applied to the input of activations. No clip if not specified.

  • direction: Specify if the RNN is forward, reverse, or bidirectional. Must be one of forward (default), reverse, or bidirectional. Default value is 'forward'.

  • hidden_size: Number of neurons in the hidden layer

Inputs

Between 3 and 6 inputs.

  • X (heterogeneous) - T: The input sequences packed (and potentially padded) into one 3-D tensor with the shape of [seq_length, batch_size, input_size].

  • W (heterogeneous) - T: The weight tensor for input gate. Concatenation of Wi and WBi (if bidirectional). The tensor has shape [num_directions, hidden_size, input_size].

  • R (heterogeneous) - T: The recurrence weight tensor. Concatenation of Ri and RBi (if bidirectional). The tensor has shape [num_directions, hidden_size, hidden_size].

  • B (optional, heterogeneous) - T: The bias tensor for input gate. Concatenation of [Wbi, Rbi] and [WBbi, RBbi] (if bidirectional). The tensor has shape [num_directions, 2*hidden_size]. Optional: If not specified - assumed to be 0.

  • sequence_lens (optional, heterogeneous) - T1: Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length seq_length. It has shape [batch_size].

  • initial_h (optional, heterogeneous) - T: Optional initial value of the hidden. If not specified - assumed to be 0. It has shape [num_directions, batch_size, hidden_size].

Outputs

Between 0 and 2 outputs.

  • Y (optional, heterogeneous) - T: A tensor that concats all the intermediate output values of the hidden. It has shape [seq_length, num_directions, batch_size, hidden_size].

  • Y_h (optional, heterogeneous) - T: The last output value of the hidden. It has shape [num_directions, batch_size, hidden_size].

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • T1 in ( tensor(int32) ): Constrain seq_lens to integer tensor.

OnnxRandomNormal#

class mlprodict.npy.xop_auto_import_.OnnxRandomNormal(*args, **kwargs)#

Version

  • name: RandomNormal (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Generate a tensor with random values drawn from a normal distribution. The shape of the tensor is specified by the shape argument and the parameter of the normal distribution specified by mean and scale.

The data type is specified by the ‘dtype’ argument. The ‘dtype’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message.

Attributes

  • dtype: The data type for the elements of the output tensor. Default is TensorProto::FLOAT. Default value is 1.

  • mean: The mean of the normal distribution. Default value is 0.0.

  • scale: The standard deviation of the normal distribution. Default value is 1.0.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

  • shape (required): The shape of the output tensor.

Outputs

  • output (heterogeneous) - T: Output tensor of random values drawn from normal distribution

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain output types to float tensors.

OnnxRandomNormalLike#

class mlprodict.npy.xop_auto_import_.OnnxRandomNormalLike(*args, **kwargs)#

Version

  • name: RandomNormalLike (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Generate a tensor with random values drawn from a normal distribution. The shape of the output tensor is copied from the shape of the input tensor, and the parameters of the normal distribution are specified by mean and scale.

The data type is specified by the ‘dtype’ argument, or copied from the input tensor if not provided. The ‘dtype’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message, and be valid as an output type.

Attributes

  • dtype: (Optional) The data type for the elements of the output tensor, if not specified, we will use the data type of the input tensor.

  • mean: The mean of the normal distribution. Default value is 0.0.

  • scale: The standard deviation of the normal distribution. Default value is 1.0.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

  • input (heterogeneous) - T1: Input tensor to copy shape and optionally type information from.

Outputs

  • output (heterogeneous) - T2: Output tensor of random values drawn from normal distribution

Type Constraints

  • T1 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Constrain output types to float tensors.

OnnxRandomNormalLike_1#

class mlprodict.npy.xop_auto_import_.OnnxRandomNormalLike_1(*args, **kwargs)#

Version

  • name: RandomNormalLike (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Generate a tensor with random values drawn from a normal distribution. The shape of the output tensor is copied from the shape of the input tensor, and the parameters of the normal distribution are specified by mean and scale.

The data type is specified by the ‘dtype’ argument, or copied from the input tensor if not provided. The ‘dtype’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message, and be valid as an output type.

Attributes

  • dtype: (Optional) The data type for the elements of the output tensor, if not specified, we will use the data type of the input tensor.

  • mean: The mean of the normal distribution. Default value is 0.0.

  • scale: The standard deviation of the normal distribution. Default value is 1.0.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

  • input (heterogeneous) - T1: Input tensor to copy shape and optionally type information from.

Outputs

  • output (heterogeneous) - T2: Output tensor of random values drawn from normal distribution

Type Constraints

  • T1 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Constrain output types to float tensors.

OnnxRandomNormal_1#

class mlprodict.npy.xop_auto_import_.OnnxRandomNormal_1(*args, **kwargs)#

Version

  • name: RandomNormal (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Generate a tensor with random values drawn from a normal distribution. The shape of the tensor is specified by the shape argument and the parameter of the normal distribution specified by mean and scale.

The data type is specified by the ‘dtype’ argument. The ‘dtype’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message.

Attributes

  • dtype: The data type for the elements of the output tensor. Default is TensorProto::FLOAT. Default value is 1.

  • mean: The mean of the normal distribution. Default value is 0.0.

  • scale: The standard deviation of the normal distribution. Default value is 1.0.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

  • shape (required): The shape of the output tensor.

Outputs

  • output (heterogeneous) - T: Output tensor of random values drawn from normal distribution

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain output types to float tensors.

OnnxRandomUniform#

class mlprodict.npy.xop_auto_import_.OnnxRandomUniform(*args, **kwargs)#

Version

  • name: RandomUniform (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Generate a tensor with random values drawn from a uniform distribution. The shape of the tensor is specified by the shape argument and the range by low and high.

The data type is specified by the ‘dtype’ argument. The ‘dtype’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message.

Attributes

  • dtype: The data type for the elements of the output tensor. If not specified, default is TensorProto::FLOAT. Default value is 1.

  • high: Upper boundary of the output values. Default value is 1.0.

  • low: Lower boundary of the output values. Default value is 0.0.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

  • shape (required): The shape of the output tensor.

Outputs

  • output (heterogeneous) - T: Output tensor of random values drawn from uniform distribution

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain output types to float tensors.

OnnxRandomUniformLike#

class mlprodict.npy.xop_auto_import_.OnnxRandomUniformLike(*args, **kwargs)#

Version

  • name: RandomUniformLike (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Generate a tensor with random values drawn from a uniform distribution. The shape of the output tensor is copied from the shape of the input tensor, and the parameters of the uniform distribution are specified by low and high.

The data type is specified by the ‘dtype’ argument, or copied from the input tensor if not provided. The ‘dtype’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message and be valid as an output type.

Attributes

  • dtype: (Optional) The data type for the elements of the output tensor, if not specified, we will use the data type of the input tensor.

  • high: Upper boundary of the output values. Default value is 1.0.

  • low: Lower boundary of the output values. Default value is 0.0.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

  • input (heterogeneous) - T1: Input tensor to copy shape and optionally type information from.

Outputs

  • output (heterogeneous) - T2: Output tensor of random values drawn from uniform distribution

Type Constraints

  • T1 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Constrain output types to float tensors.

OnnxRandomUniformLike_1#

class mlprodict.npy.xop_auto_import_.OnnxRandomUniformLike_1(*args, **kwargs)#

Version

  • name: RandomUniformLike (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Generate a tensor with random values drawn from a uniform distribution. The shape of the output tensor is copied from the shape of the input tensor, and the parameters of the uniform distribution are specified by low and high.

The data type is specified by the ‘dtype’ argument, or copied from the input tensor if not provided. The ‘dtype’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message and be valid as an output type.

Attributes

  • dtype: (Optional) The data type for the elements of the output tensor, if not specified, we will use the data type of the input tensor.

  • high: Upper boundary of the output values. Default value is 1.0.

  • low: Lower boundary of the output values. Default value is 0.0.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

Inputs

  • input (heterogeneous) - T1: Input tensor to copy shape and optionally type information from.

Outputs

  • output (heterogeneous) - T2: Output tensor of random values drawn from uniform distribution

Type Constraints

  • T1 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Constrain output types to float tensors.

OnnxRandomUniform_1#

class mlprodict.npy.xop_auto_import_.OnnxRandomUniform_1(*args, **kwargs)#

Version

  • name: RandomUniform (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Generate a tensor with random values drawn from a uniform distribution. The shape of the tensor is specified by the shape argument and the range by low and high.

The data type is specified by the ‘dtype’ argument. The ‘dtype’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message.

Attributes

  • dtype: The data type for the elements of the output tensor. If not specified, default is TensorProto::FLOAT. Default value is 1.

  • high: Upper boundary of the output values. Default value is 1.0.

  • low: Lower boundary of the output values. Default value is 0.0.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one.

  • shape (required): The shape of the output tensor.

Outputs

  • output (heterogeneous) - T: Output tensor of random values drawn from uniform distribution

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain output types to float tensors.

OnnxRange#

class mlprodict.npy.xop_auto_import_.OnnxRange(*args, **kwargs)#

Version

  • name: Range (GitHub)

  • domain: main

  • since_version: 11

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Generate a tensor containing a sequence of numbers that begin at start and extends by increments of delta up to limit (exclusive).

The number of elements in the output of range is computed as below-

number_of_elements = max( ceil( (limit - start) / delta ) , 0 )

The pseudocode determining the contents of the output is shown below-

for(int i=0; i<number_of_elements; ++i)

{

` output[i] = start + (i * delta); `

}

Example 1 Inputs: start = 3, limit = 9, delta = 3 Output: [3, 6]

Example 2 Inputs: start = 10, limit = 4, delta = -2 Output: [10, 8, 6]

Inputs

  • start (heterogeneous) - T: Scalar. First entry for the range of output values.

  • limit (heterogeneous) - T: Scalar. Exclusive upper limit for the range of output values.

  • delta (heterogeneous) - T: Scalar. Value to step by.

Outputs

  • output (heterogeneous) - T: A 1-D tensor with same type as the inputs containing generated range of values.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64) ): Constrain input types to common numeric type tensors.

OnnxRange_11#

class mlprodict.npy.xop_auto_import_.OnnxRange_11(*args, **kwargs)#

Version

  • name: Range (GitHub)

  • domain: main

  • since_version: 11

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Generate a tensor containing a sequence of numbers that begin at start and extends by increments of delta up to limit (exclusive).

The number of elements in the output of range is computed as below-

number_of_elements = max( ceil( (limit - start) / delta ) , 0 )

The pseudocode determining the contents of the output is shown below-

for(int i=0; i<number_of_elements; ++i)

{

` output[i] = start + (i * delta); `

}

Example 1 Inputs: start = 3, limit = 9, delta = 3 Output: [3, 6]

Example 2 Inputs: start = 10, limit = 4, delta = -2 Output: [10, 8, 6]

Inputs

  • start (heterogeneous) - T: Scalar. First entry for the range of output values.

  • limit (heterogeneous) - T: Scalar. Exclusive upper limit for the range of output values.

  • delta (heterogeneous) - T: Scalar. Value to step by.

Outputs

  • output (heterogeneous) - T: A 1-D tensor with same type as the inputs containing generated range of values.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(int16), tensor(int32), tensor(int64) ): Constrain input types to common numeric type tensors.

OnnxReciprocal#

class mlprodict.npy.xop_auto_import_.OnnxReciprocal(*args, **kwargs)#

Version

  • name: Reciprocal (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Reciprocal takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the reciprocal is, y = 1/x, is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxReciprocal_1#

class mlprodict.npy.xop_auto_import_.OnnxReciprocal_1(*args, **kwargs)#

Version

  • name: Reciprocal (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Reciprocal takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the reciprocal is, y = 1/x, is applied to the tensor elementwise.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxReciprocal_13#

class mlprodict.npy.xop_auto_import_.OnnxReciprocal_13(*args, **kwargs)#

Version

  • name: Reciprocal (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Reciprocal takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the reciprocal is, y = 1/x, is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxReciprocal_6#

class mlprodict.npy.xop_auto_import_.OnnxReciprocal_6(*args, **kwargs)#

Version

  • name: Reciprocal (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Reciprocal takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the reciprocal is, y = 1/x, is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxReduceL1#

class mlprodict.npy.xop_auto_import_.OnnxReduceL1(*args, **kwargs)#

Version

  • name: ReduceL1 (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the L1 norm of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceL1_1#

class mlprodict.npy.xop_auto_import_.OnnxReduceL1_1(*args, **kwargs)#

Version

  • name: ReduceL1 (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the L1 norm of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceL1_11#

class mlprodict.npy.xop_auto_import_.OnnxReduceL1_11(*args, **kwargs)#

Version

  • name: ReduceL1 (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the L1 norm of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceL1_13#

class mlprodict.npy.xop_auto_import_.OnnxReduceL1_13(*args, **kwargs)#

Version

  • name: ReduceL1 (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the L1 norm of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceL1_18#

class mlprodict.npy.xop_auto_import_.OnnxReduceL1_18(*args, **kwargs)#

Version

  • name: ReduceL1 (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the L1 norm of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceL2#

class mlprodict.npy.xop_auto_import_.OnnxReduceL2(*args, **kwargs)#

Version

  • name: ReduceL2 (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the L2 norm of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceL2_1#

class mlprodict.npy.xop_auto_import_.OnnxReduceL2_1(*args, **kwargs)#

Version

  • name: ReduceL2 (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the L2 norm of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceL2_11#

class mlprodict.npy.xop_auto_import_.OnnxReduceL2_11(*args, **kwargs)#

Version

  • name: ReduceL2 (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the L2 norm of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceL2_13#

class mlprodict.npy.xop_auto_import_.OnnxReduceL2_13(*args, **kwargs)#

Version

  • name: ReduceL2 (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the L2 norm of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceL2_18#

class mlprodict.npy.xop_auto_import_.OnnxReduceL2_18(*args, **kwargs)#

Version

  • name: ReduceL2 (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the L2 norm of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceLogSum#

class mlprodict.npy.xop_auto_import_.OnnxReduceLogSum(*args, **kwargs)#

Version

  • name: ReduceLogSum (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the log sum of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceLogSumExp#

class mlprodict.npy.xop_auto_import_.OnnxReduceLogSumExp(*args, **kwargs)#

Version

  • name: ReduceLogSumExp (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the log sum exponent of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceLogSumExp_1#

class mlprodict.npy.xop_auto_import_.OnnxReduceLogSumExp_1(*args, **kwargs)#

Version

  • name: ReduceLogSumExp (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the log sum exponent of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceLogSumExp_11#

class mlprodict.npy.xop_auto_import_.OnnxReduceLogSumExp_11(*args, **kwargs)#

Version

  • name: ReduceLogSumExp (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the log sum exponent of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceLogSumExp_13#

class mlprodict.npy.xop_auto_import_.OnnxReduceLogSumExp_13(*args, **kwargs)#

Version

  • name: ReduceLogSumExp (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the log sum exponent of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceLogSumExp_18#

class mlprodict.npy.xop_auto_import_.OnnxReduceLogSumExp_18(*args, **kwargs)#

Version

  • name: ReduceLogSumExp (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the log sum exponent of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceLogSum_1#

class mlprodict.npy.xop_auto_import_.OnnxReduceLogSum_1(*args, **kwargs)#

Version

  • name: ReduceLogSum (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the log sum of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceLogSum_11#

class mlprodict.npy.xop_auto_import_.OnnxReduceLogSum_11(*args, **kwargs)#

Version

  • name: ReduceLogSum (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the log sum of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceLogSum_13#

class mlprodict.npy.xop_auto_import_.OnnxReduceLogSum_13(*args, **kwargs)#

Version

  • name: ReduceLogSum (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the log sum of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceLogSum_18#

class mlprodict.npy.xop_auto_import_.OnnxReduceLogSum_18(*args, **kwargs)#

Version

  • name: ReduceLogSum (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the log sum of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceMax#

class mlprodict.npy.xop_auto_import_.OnnxReduceMax(*args, **kwargs)#

Version

  • name: ReduceMax (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the max of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to high-precision and 8 bit numeric tensors.

OnnxReduceMax_1#

class mlprodict.npy.xop_auto_import_.OnnxReduceMax_1(*args, **kwargs)#

Version

  • name: ReduceMax (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the max of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceMax_11#

class mlprodict.npy.xop_auto_import_.OnnxReduceMax_11(*args, **kwargs)#

Version

  • name: ReduceMax (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the max of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceMax_12#

class mlprodict.npy.xop_auto_import_.OnnxReduceMax_12(*args, **kwargs)#

Version

  • name: ReduceMax (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Computes the max of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to high-precision and 8 bit numeric tensors.

OnnxReduceMax_13#

class mlprodict.npy.xop_auto_import_.OnnxReduceMax_13(*args, **kwargs)#

Version

  • name: ReduceMax (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the max of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to high-precision and 8 bit numeric tensors.

OnnxReduceMax_18#

class mlprodict.npy.xop_auto_import_.OnnxReduceMax_18(*args, **kwargs)#

Version

  • name: ReduceMax (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the max of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to high-precision and 8 bit numeric tensors.

OnnxReduceMean#

class mlprodict.npy.xop_auto_import_.OnnxReduceMean(*args, **kwargs)#

Version

  • name: ReduceMean (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the mean of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceMean_1#

class mlprodict.npy.xop_auto_import_.OnnxReduceMean_1(*args, **kwargs)#

Version

  • name: ReduceMean (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the mean of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceMean_11#

class mlprodict.npy.xop_auto_import_.OnnxReduceMean_11(*args, **kwargs)#

Version

  • name: ReduceMean (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the mean of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceMean_13#

class mlprodict.npy.xop_auto_import_.OnnxReduceMean_13(*args, **kwargs)#

Version

  • name: ReduceMean (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the mean of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceMean_18#

class mlprodict.npy.xop_auto_import_.OnnxReduceMean_18(*args, **kwargs)#

Version

  • name: ReduceMean (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the mean of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceMin#

class mlprodict.npy.xop_auto_import_.OnnxReduceMin(*args, **kwargs)#

Version

  • name: ReduceMin (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the min of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to high-precision and 8 bit numeric tensors.

OnnxReduceMin_1#

class mlprodict.npy.xop_auto_import_.OnnxReduceMin_1(*args, **kwargs)#

Version

  • name: ReduceMin (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the min of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceMin_11#

class mlprodict.npy.xop_auto_import_.OnnxReduceMin_11(*args, **kwargs)#

Version

  • name: ReduceMin (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the min of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceMin_12#

class mlprodict.npy.xop_auto_import_.OnnxReduceMin_12(*args, **kwargs)#

Version

  • name: ReduceMin (GitHub)

  • domain: main

  • since_version: 12

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 12.

Summary

Computes the min of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to high-precision and 8 bit numeric tensors.

OnnxReduceMin_13#

class mlprodict.npy.xop_auto_import_.OnnxReduceMin_13(*args, **kwargs)#

Version

  • name: ReduceMin (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the min of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to high-precision and 8 bit numeric tensors.

OnnxReduceMin_18#

class mlprodict.npy.xop_auto_import_.OnnxReduceMin_18(*args, **kwargs)#

Version

  • name: ReduceMin (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the min of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(int8), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to high-precision and 8 bit numeric tensors.

OnnxReduceProd#

class mlprodict.npy.xop_auto_import_.OnnxReduceProd(*args, **kwargs)#

Version

  • name: ReduceProd (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the product of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceProd_1#

class mlprodict.npy.xop_auto_import_.OnnxReduceProd_1(*args, **kwargs)#

Version

  • name: ReduceProd (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the product of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceProd_11#

class mlprodict.npy.xop_auto_import_.OnnxReduceProd_11(*args, **kwargs)#

Version

  • name: ReduceProd (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the product of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceProd_13#

class mlprodict.npy.xop_auto_import_.OnnxReduceProd_13(*args, **kwargs)#

Version

  • name: ReduceProd (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the product of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceProd_18#

class mlprodict.npy.xop_auto_import_.OnnxReduceProd_18(*args, **kwargs)#

Version

  • name: ReduceProd (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the product of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceSum#

class mlprodict.npy.xop_auto_import_.OnnxReduceSum(*args, **kwargs)#

Version

  • name: ReduceSum (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the sum of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceSumSquare#

class mlprodict.npy.xop_auto_import_.OnnxReduceSumSquare(*args, **kwargs)#

Version

  • name: ReduceSumSquare (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the sum square of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceSumSquare_1#

class mlprodict.npy.xop_auto_import_.OnnxReduceSumSquare_1(*args, **kwargs)#

Version

  • name: ReduceSumSquare (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the sum square of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceSumSquare_11#

class mlprodict.npy.xop_auto_import_.OnnxReduceSumSquare_11(*args, **kwargs)#

Version

  • name: ReduceSumSquare (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the sum square of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceSumSquare_13#

class mlprodict.npy.xop_auto_import_.OnnxReduceSumSquare_13(*args, **kwargs)#

Version

  • name: ReduceSumSquare (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the sum square of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceSumSquare_18#

class mlprodict.npy.xop_auto_import_.OnnxReduceSumSquare_18(*args, **kwargs)#

Version

  • name: ReduceSumSquare (GitHub)

  • domain: main

  • since_version: 18

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Computes the sum square of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceSum_1#

class mlprodict.npy.xop_auto_import_.OnnxReduceSum_1(*args, **kwargs)#

Version

  • name: ReduceSum (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Computes the sum of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor.

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceSum_11#

class mlprodict.npy.xop_auto_import_.OnnxReduceSum_11(*args, **kwargs)#

Version

  • name: ReduceSum (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Computes the sum of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • axes: A list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor. Accepted range is [-r, r-1] where r = rank(data).

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxReduceSum_13#

class mlprodict.npy.xop_auto_import_.OnnxReduceSum_13(*args, **kwargs)#

Version

  • name: ReduceSum (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Computes the sum of the input tensor’s element along the provided axes. The resulting tensor has the same rank as the input if keepdims equals 1. If keepdims equals 0, then the resulting tensor has the reduced dimension pruned.

The above behavior is similar to numpy, with the exception that numpy defaults keepdims to False instead of True.

Attributes

  • keepdims: Keep the reduced dimension or not, default 1 means keep reduced dimension. Default value is 1.

  • noop_with_empty_axes: Defines behavior if ‘axes’ is empty. Default behavior with ‘false’ is to reduce all axes. When axes is empty and this attribute is set to true, input tensor will not be reduced,and the output tensor would be equivalent to input tensor. Default value is 0.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: An input tensor.

  • axes (optional, heterogeneous) - tensor(int64): Optional input list of integers, along which to reduce. The default is to reduce over all the dimensions of the input tensor if ‘noop_with_empty_axes’ is false, else act as an Identity op when ‘noop_with_empty_axes’ is true. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • reduced (heterogeneous) - T: Reduced output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxRelu#

class mlprodict.npy.xop_auto_import_.OnnxRelu(*args, **kwargs)#

Version

  • name: Relu (GitHub)

  • domain: main

  • since_version: 14

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Relu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the rectified linear function, y = max(0, x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8) ): Constrain input and output types to signed numeric tensors.

OnnxRelu_1#

class mlprodict.npy.xop_auto_import_.OnnxRelu_1(*args, **kwargs)#

Version

  • name: Relu (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Relu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the rectified linear function, y = max(0, x), is applied to the tensor elementwise.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxRelu_13#

class mlprodict.npy.xop_auto_import_.OnnxRelu_13(*args, **kwargs)#

Version

  • name: Relu (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Relu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the rectified linear function, y = max(0, x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxRelu_14#

class mlprodict.npy.xop_auto_import_.OnnxRelu_14(*args, **kwargs)#

Version

  • name: Relu (GitHub)

  • domain: main

  • since_version: 14

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Relu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the rectified linear function, y = max(0, x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8) ): Constrain input and output types to signed numeric tensors.

OnnxRelu_6#

class mlprodict.npy.xop_auto_import_.OnnxRelu_6(*args, **kwargs)#

Version

  • name: Relu (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Relu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the rectified linear function, y = max(0, x), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxReshape#

class mlprodict.npy.xop_auto_import_.OnnxReshape(*args, **kwargs)#

Version

  • name: Reshape (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Reshape the input tensor similar to numpy.reshape. First input is the data tensor, second input is a shape tensor which specifies the output shape. It outputs the reshaped tensor. At most one dimension of the new shape can be -1. In this case, the value is inferred from the size of the tensor and the remaining dimensions. A dimension could also be 0, in which case the actual dimension value is unchanged (i.e. taken from the input tensor). If ‘allowzero’ is set, and the new shape includes 0, the dimension will be set explicitly to zero (i.e. not taken from input tensor). Shape (second input) could be an empty shape, which means converting to a scalar. The input tensor’s shape and the output tensor’s shape are required to have the same number of elements.

If the attribute ‘allowzero’ is set, it is invalid for the specified shape to contain both a zero value and -1, as the value of the dimension corresponding to -1 cannot be determined uniquely.

Attributes

  • allowzero: (Optional) By default, when any value in the ‘shape’ input is equal to zero the corresponding dimension value is copied from the input tensor dynamically. allowzero=1 indicates that if any value in the ‘shape’ input is set to zero, the zero value is honored, similar to NumPy. Default value is 0.

Inputs

  • data (heterogeneous) - T: An input tensor.

  • shape (heterogeneous) - tensor(int64): Specified shape for output.

Outputs

  • reshaped (heterogeneous) - T: Reshaped data.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxReshape_1#

class mlprodict.npy.xop_auto_import_.OnnxReshape_1(*args, **kwargs)#

Version

  • name: Reshape (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Reshape the input tensor similar to numpy.reshape. It takes a tensor as input and an argument shape. It outputs the reshaped tensor. At most one dimension of the new shape can be -1. In this case, the value is inferred from the size of the tensor and the remaining dimensions. A dimension could also be 0, in which case the actual dimension value is unchanged (i.e. taken from the input tensor). Shape (second input) could be an empty shape, which means converting to a scalar. The input tensor’s shape and the output tensor’s shape are required to have the same number of elements.

Attributes

  • consumed_inputs: legacy optimization attribute.

  • shape: New shape

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • reshaped (heterogeneous) - T: Reshaped data.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxReshape_13#

class mlprodict.npy.xop_auto_import_.OnnxReshape_13(*args, **kwargs)#

Version

  • name: Reshape (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Reshape the input tensor similar to numpy.reshape. First input is the data tensor, second input is a shape tensor which specifies the output shape. It outputs the reshaped tensor. At most one dimension of the new shape can be -1. In this case, the value is inferred from the size of the tensor and the remaining dimensions. A dimension could also be 0, in which case the actual dimension value is unchanged (i.e. taken from the input tensor). Shape (second input) could be an empty shape, which means converting to a scalar. The input tensor’s shape and the output tensor’s shape are required to have the same number of elements.

Inputs

  • data (heterogeneous) - T: An input tensor.

  • shape (heterogeneous) - tensor(int64): Specified shape for output.

Outputs

  • reshaped (heterogeneous) - T: Reshaped data.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxReshape_14#

class mlprodict.npy.xop_auto_import_.OnnxReshape_14(*args, **kwargs)#

Version

  • name: Reshape (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Reshape the input tensor similar to numpy.reshape. First input is the data tensor, second input is a shape tensor which specifies the output shape. It outputs the reshaped tensor. At most one dimension of the new shape can be -1. In this case, the value is inferred from the size of the tensor and the remaining dimensions. A dimension could also be 0, in which case the actual dimension value is unchanged (i.e. taken from the input tensor). If ‘allowzero’ is set, and the new shape includes 0, the dimension will be set explicitly to zero (i.e. not taken from input tensor). Shape (second input) could be an empty shape, which means converting to a scalar. The input tensor’s shape and the output tensor’s shape are required to have the same number of elements.

If the attribute ‘allowzero’ is set, it is invalid for the specified shape to contain both a zero value and -1, as the value of the dimension corresponding to -1 cannot be determined uniquely.

Attributes

  • allowzero: (Optional) By default, when any value in the ‘shape’ input is equal to zero the corresponding dimension value is copied from the input tensor dynamically. allowzero=1 indicates that if any value in the ‘shape’ input is set to zero, the zero value is honored, similar to NumPy. Default value is 0.

Inputs

  • data (heterogeneous) - T: An input tensor.

  • shape (heterogeneous) - tensor(int64): Specified shape for output.

Outputs

  • reshaped (heterogeneous) - T: Reshaped data.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxReshape_5#

class mlprodict.npy.xop_auto_import_.OnnxReshape_5(*args, **kwargs)#

Version

  • name: Reshape (GitHub)

  • domain: main

  • since_version: 5

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 5.

Summary

Reshape the input tensor similar to numpy.reshape. First input is the data tensor, second input is a shape tensor which specifies the output shape. It outputs the reshaped tensor. At most one dimension of the new shape can be -1. In this case, the value is inferred from the size of the tensor and the remaining dimensions. A dimension could also be 0, in which case the actual dimension value is unchanged (i.e. taken from the input tensor). Shape (second input) could be an empty shape, which means converting to a scalar. The input tensor’s shape and the output tensor’s shape are required to have the same number of elements.

Inputs

  • data (heterogeneous) - T: An input tensor.

  • shape (heterogeneous) - tensor(int64): Specified shape for output.

Outputs

  • reshaped (heterogeneous) - T: Reshaped data.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxResize#

class mlprodict.npy.xop_auto_import_.OnnxResize(*args, **kwargs)#

Version

  • name: Resize (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Resize the input tensor. In general, it calculates every value in the output tensor as a weighted average of neighborhood (a.k.a. sampling locations) in the input tensor. Each dimension value of the output tensor is: <br/>

output_dimension = floor(input_dimension * (roi_end - roi_start) * scale) <br/>

if input "sizes" is not specified.

Attributes

  • antialias: If set to 1, “linear” and “cubic” interpolation modes will use an antialiasing filter when downscaling. Antialiasing is achieved by stretching the resampling filter by a factor max(1, 1 / scale), which means that when downsampling, more input pixels contribute to an output pixel. Default value is 0.

  • axes: If provided, it specifies a subset of axes that ‘roi’, ‘scales’ and ‘sizes’ refer to. If not provided, all axes are assumed [0, 1, …, r-1], where r = rank(data). Non-specified dimensions are interpreted as non-resizable. Negative value means counting dimensions from the back. Accepted range is [-r, r-1], where r = rank(data). Behavior is undefined if an axis is repeated.

  • coordinate_transformation_mode:

    This attribute describes how to transform the coordinate in the

    resized tensor to the coordinate in the original tensor. <br/> The coordinate of each dimension is transformed individually. Let’s describe a case using axis x as an example. Denote x_resized as the coordinate of axis x in the resized tensor, x_original as the coordinate of axis x in the original tensor, length_original as the length of the original tensor in axis x, length_resized as the length of the resized tensor in axis x, roi_x = (start_x, end_x) of the axis x in input “roi”, scale = length_resized / length_original, <br/> if coordinate_transformation_mode is “half_pixel”, <br/> x_original = (x_resized + 0.5) / scale - 0.5 <br/> if coordinate_transformation_mode is “pytorch_half_pixel”, <br/> x_original = length_resized > 1 ? (x_resized + 0.5) / scale - 0.5 : 0 <br/> if coordinate_transformation_mode is “align_corners”, <br/> x_original = x_resized * (length_original - 1) / (length_resized - 1) <br/> if coordinate_transformation_mode is “asymmetric”, <br/> x_original = x_resized / scale <br/> if coordinate_transformation_mode is “tf_crop_and_resize”, <br/> x_original = length_resized > 1 ? start_x * (length_original - 1) + x_resized * (end_x - start_x) * (length_original - 1) / (length_resized - 1) : 0.5 * (start_x + end_x) * (length_original - 1) . Default value is 'half_pixel'.

  • cubic_coeff_a: The coefficient ‘a’ used in cubic interpolation. Two common choice are -0.5 (in some cases of TensorFlow) and -0.75 (in PyTorch). Check out Equation (4) in https://ieeexplore.ieee.org/document/1163711 for the details. This attribute is valid only if mode is “cubic”. Default value is -0.75.

  • exclude_outside: If set to 1, the weight of sampling locations outside the tensor will be set to 0 and the weight will be renormalized so that their sum is 1.0. The default value is 0. Default value is 0.

  • extrapolation_value: When coordinate_transformation_mode is “tf_crop_and_resize” and x_original is outside the range [0, length_original - 1], this value is used as the corresponding output value. Default is 0.0f. Default value is 0.0.

  • keep_aspect_ratio_policy:

    This attribute describes how to interpret the sizes input with

    regard to keeping the original aspect ratio of the input, and it is not applicable when the scales input is used. <br/> Given a set of sizes, associated with a subset of axes (explicitly provided or default), and assuming d = axes[i], with i being the index of the provided sizes. <br/> If keep_aspect_ratio_policy is “stretch”, the original aspect ratio is disregarded, and the input is resized to the specified size: <br/> out_size[d] = sizes[i] <br/> If keep_aspect_ratio_policy is “not_larger”, the sizes are adjusted so that no extent of the output is larger than the specified size, while keeping the original aspect ratio: <br/> scale = Min(sizes[i] / in_size[d]) <br/> out_size[d] = round_int(scale * in_size[i]) <br/> If keep_aspect_ratio_policy is “not_smaller”, the sizes are adjusted so that no extent of the output is smaller than the specified size, while keeping the original aspect ratio: <br/> scale = Max(sizes[i] / in_size[d]) <br/> out_size[d] = round_int(scale * in_size[i]) <br/> For non- resizable axes (those not specified in axes), the output size will be equal to the input size. Note: round_int stands for computing the nearest integer value, rounding halfway cases up. Default value is 'stretch'.

  • mode: Three interpolation modes: “nearest” (default), “linear” and “cubic”. The “linear” mode includes linear interpolation for 1D tensor and N-linear interpolation for N-D tensor (for example, bilinear interpolation for 2D tensor). The “cubic” mode includes cubic interpolation for 1D tensor and N-cubic interpolation for N-D tensor (for example, bicubic interpolation for 2D tensor). Default value is 'nearest'.

  • nearest_mode: Four modes: “round_prefer_floor” (default, as known as round half down), “round_prefer_ceil” (as known as round half up), “floor”, “ceil”. Only used by nearest interpolation. It indicates how to get “nearest” pixel in input tensor from x_original, so this attribute is valid only if “mode” is “nearest”. Default value is 'round_prefer_floor'.

Inputs

Between 1 and 4 inputs.

  • X (heterogeneous) - T1: N-D tensor

  • roi (optional, heterogeneous) - T2: 1-D tensor given as [start1, …, startN, end1, …, endN], where N is the rank of X or the length of axes, if provided. The RoIs’ coordinates are normalized in the coordinate system of the input image. It only takes effect when coordinate_transformation_mode is “tf_crop_and_resize”

  • scales (optional, heterogeneous) - tensor(float): The scale array along each dimension. It takes value greater than 0. If it’s less than 1, it’s sampling down, otherwise, it’s upsampling. The number of elements of ‘scales’ should be the same as the rank of input ‘X’ or the length of ‘axes’, if provided. One of ‘scales’ and ‘sizes’ MUST be specified and it is an error if both are specified. If ‘sizes’ is needed, the user can use an empty string as the name of ‘scales’ in this operator’s input list.

  • sizes (optional, heterogeneous) - tensor(int64): Target size of the output tensor. Its interpretation depends on the ‘keep_aspect_ratio_policy’ value.The number of elements of ‘sizes’ should be the same as the rank of input ‘X’, or the length of ‘axes’, if provided. Only one of ‘scales’ and ‘sizes’ can be specified.

Outputs

  • Y (heterogeneous) - T1: N-D tensor after resizing

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input ‘X’ and output ‘Y’ to all tensor types.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Constrain roi type to float or double.

OnnxResize_10#

class mlprodict.npy.xop_auto_import_.OnnxResize_10(*args, **kwargs)#

Version

  • name: Resize (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Resize the input tensor. Each dimension value of the output tensor is:

output_dimension = floor(input_dimension * scale).

Attributes

  • mode: Two interpolation modes: nearest (default), and linear (including bilinear, trilinear, etc) Default value is 'nearest'.

Inputs

  • X (heterogeneous) - T: N-D tensor

  • scales (heterogeneous) - tensor(float): The scale array along each dimension. It takes value greater than 0. If it’s less than 1, it’s sampling down, otherwise, it’s upsampling. The number of elements of ‘scales’ should be the same as the rank of input ‘X’.

Outputs

  • Y (heterogeneous) - T: N-D tensor after resizing

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input ‘X’ and output ‘Y’ to all tensor types.

OnnxResize_11#

class mlprodict.npy.xop_auto_import_.OnnxResize_11(*args, **kwargs)#

Version

  • name: Resize (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Resize the input tensor. In general, it calculates every value in the output tensor as a weighted average of neighborhood (a.k.a. sampling locations) in the input tensor. Each dimension value of the output tensor is:

output_dimension = floor(input_dimension * (roi_end - roi_start) * scale) if input "sizes" is not specified.

Attributes

  • coordinate_transformation_mode:

    This attribute describes how to transform the coordinate in the

    resized tensor to the coordinate in the original tensor. <br/> The coordinate of each dimension is transformed individually. Let’s describe a case using axis x as an example. Denote x_resized as the coordinate of axis x in the resized tensor, x_original as the coordinate of axis x in the original tensor, length_original as the length of the original tensor in axis x, length_resized as the length of the resized tensor in axis x, roi_x = (start_x, end_x) of the axis x in input “roi”, scale = length_resized / length_original, <br/> if coordinate_transformation_mode is “half_pixel”, <br/> x_original = (x_resized + 0.5) / scale - 0.5, <br/> if coordinate_transformation_mode is “pytorch_half_pixel”, <br/> x_original = length_resized > 1 ? (x_resized + 0.5) / scale - 0.5 : 0, <br/> if coordinate_transformation_mode is “align_corners”, <br/> x_original = x_resized * (length_original - 1) / (length_resized - 1), <br/> if coordinate_transformation_mode is “asymmetric”, <br/> x_original = x_resized / scale, <br/> if coordinate_transformation_mode is “tf_half_pixel_for_nn”, <br/> x_original = (x_resized + 0.5) / scale, <br/> if coordinate_transformation_mode is “tf_crop_and_resize”, <br/> x_original = length_resized > 1 ? start_x * (length_original - 1) + x_resized * (end_x - start_x) * (length_original - 1) / (length_resized - 1) : 0.5 * (start_x + end_x) * (length_original - 1). Default value is 'half_pixel'.

  • cubic_coeff_a: The coefficient ‘a’ used in cubic interpolation. Two common choice are -0.5 (in some cases of TensorFlow) and -0.75 (in PyTorch). Check out Equation (4) in https://ieeexplore.ieee.org/document/1163711 for the details. This attribute is valid only if “mode” is “cubic”. Default value is -0.75.

  • exclude_outside: If set to 1, the weight of sampling locations outside the tensor will be set to 0 and the weight will be renormalized so that their sum is 1.0. The default value is 0. Default value is 0.

  • extrapolation_value: When coordinate_transformation_mode is “tf_crop_and_resize” and x_original is outside the range [0, length_original - 1], this value is used as the corresponding output value. Default is 0.0f. Default value is 0.0.

  • mode: Three interpolation modes: nearest (default), linear and cubic. The “linear” mode includes linear interpolation for 1D tensor and N-linear interpolation for N-D tensor (for example, bilinear interpolation for 2D tensor). The “cubic” mode includes cubic interpolation for 1D tensor and N-cubic interpolation for N-D tensor (for example, bicubic interpolation for 2D tensor). Default value is 'nearest'.

  • nearest_mode: Four modes: round_prefer_floor (default, as known as round half down), round_prefer_ceil (as known as round half up), floor, ceil. Only used by nearest interpolation. It indicates how to get “nearest” pixel in input tensor from x_original, so this attribute is valid only if “mode” is “nearest”. Default value is 'round_prefer_floor'.

Inputs

Between 3 and 4 inputs.

  • X (heterogeneous) - T1: N-D tensor

  • roi (heterogeneous) - T2: 1-D tensor given as [start1, …, startN, end1, …, endN], where N is the rank of X. The RoIs’ coordinates are normalized in the coordinate system of the input image. It only takes effect when coordinate_transformation_mode is “tf_crop_and_resize”

  • scales (heterogeneous) - tensor(float): The scale array along each dimension. It takes value greater than 0. If it’s less than 1, it’s sampling down, otherwise, it’s upsampling. The number of elements of ‘scales’ should be the same as the rank of input ‘X’. If ‘size’ is needed, the user must set ‘scales’ to an empty tensor.

  • sizes (optional, heterogeneous) - tensor(int64): The size of the output tensor. The number of elements of ‘sizes’ should be the same as the rank of input ‘X’. May only be set if ‘scales’ is set to an empty tensor.

Outputs

  • Y (heterogeneous) - T1: N-D tensor after resizing

Type Constraints

  • T1 in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input ‘X’ and output ‘Y’ to all tensor types.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Constrain roi type to float or double.

OnnxResize_13#

class mlprodict.npy.xop_auto_import_.OnnxResize_13(*args, **kwargs)#

Version

  • name: Resize (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Resize the input tensor. In general, it calculates every value in the output tensor as a weighted average of neighborhood (a.k.a. sampling locations) in the input tensor. Each dimension value of the output tensor is:

output_dimension = floor(input_dimension * (roi_end - roi_start) * scale) if input "sizes" is not specified.

Attributes

  • coordinate_transformation_mode:

    This attribute describes how to transform the coordinate in the

    resized tensor to the coordinate in the original tensor. <br/> The coordinate of each dimension is transformed individually. Let’s describe a case using axis x as an example. Denote x_resized as the coordinate of axis x in the resized tensor, x_original as the coordinate of axis x in the original tensor, length_original as the length of the original tensor in axis x, length_resized as the length of the resized tensor in axis x, roi_x = (start_x, end_x) of the axis x in input “roi”, scale = length_resized / length_original, <br/> if coordinate_transformation_mode is “half_pixel”, <br/> x_original = (x_resized + 0.5) / scale - 0.5, <br/> if coordinate_transformation_mode is “pytorch_half_pixel”, <br/> x_original = length_resized > 1 ? (x_resized + 0.5) / scale - 0.5 : 0, <br/> if coordinate_transformation_mode is “align_corners”, <br/> x_original = x_resized * (length_original - 1) / (length_resized - 1), <br/> if coordinate_transformation_mode is “asymmetric”, <br/> x_original = x_resized / scale, <br/> if coordinate_transformation_mode is “tf_crop_and_resize”, <br/> x_original = length_resized > 1 ? start_x * (length_original - 1) + x_resized * (end_x - start_x) * (length_original - 1) / (length_resized - 1) : 0.5 * (start_x + end_x) * (length_original - 1). Default value is 'half_pixel'.

  • cubic_coeff_a: The coefficient ‘a’ used in cubic interpolation. Two common choice are -0.5 (in some cases of TensorFlow) and -0.75 (in PyTorch). Check out Equation (4) in https://ieeexplore.ieee.org/document/1163711 for the details. This attribute is valid only if “mode” is “cubic”. Default value is -0.75.

  • exclude_outside: If set to 1, the weight of sampling locations outside the tensor will be set to 0 and the weight will be renormalized so that their sum is 1.0. The default value is 0. Default value is 0.

  • extrapolation_value: When coordinate_transformation_mode is “tf_crop_and_resize” and x_original is outside the range [0, length_original - 1], this value is used as the corresponding output value. Default is 0.0f. Default value is 0.0.

  • mode: Three interpolation modes: nearest (default), linear and cubic. The “linear” mode includes linear interpolation for 1D tensor and N-linear interpolation for N-D tensor (for example, bilinear interpolation for 2D tensor). The “cubic” mode includes cubic interpolation for 1D tensor and N-cubic interpolation for N-D tensor (for example, bicubic interpolation for 2D tensor). Default value is 'nearest'.

  • nearest_mode: Four modes: round_prefer_floor (default, as known as round half down), round_prefer_ceil (as known as round half up), floor, ceil. Only used by nearest interpolation. It indicates how to get “nearest” pixel in input tensor from x_original, so this attribute is valid only if “mode” is “nearest”. Default value is 'round_prefer_floor'.

Inputs

Between 1 and 4 inputs.

  • X (heterogeneous) - T1: N-D tensor

  • roi (optional, heterogeneous) - T2: 1-D tensor given as [start1, …, startN, end1, …, endN], where N is the rank of X. The RoIs’ coordinates are normalized in the coordinate system of the input image. It only takes effect when coordinate_transformation_mode is “tf_crop_and_resize”

  • scales (optional, heterogeneous) - tensor(float): The scale array along each dimension. It takes value greater than 0. If it’s less than 1, it’s sampling down, otherwise, it’s upsampling. The number of elements of ‘scales’ should be the same as the rank of input ‘X’. One of ‘scales’ and ‘sizes’ MUST be specified and it is an error if both are specified. If ‘sizes’ is needed, the user can use an empty string as the name of ‘scales’ in this operator’s input list.

  • sizes (optional, heterogeneous) - tensor(int64): The size of the output tensor. The number of elements of ‘sizes’ should be the same as the rank of input ‘X’. Only one of ‘scales’ and ‘sizes’ can be specified.

Outputs

  • Y (heterogeneous) - T1: N-D tensor after resizing

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input ‘X’ and output ‘Y’ to all tensor types.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Constrain roi type to float or double.

OnnxResize_18#

class mlprodict.npy.xop_auto_import_.OnnxResize_18(*args, **kwargs)#

Version

  • name: Resize (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Resize the input tensor. In general, it calculates every value in the output tensor as a weighted average of neighborhood (a.k.a. sampling locations) in the input tensor. Each dimension value of the output tensor is: <br/>

output_dimension = floor(input_dimension * (roi_end - roi_start) * scale) <br/>

if input "sizes" is not specified.

Attributes

  • antialias: If set to 1, “linear” and “cubic” interpolation modes will use an antialiasing filter when downscaling. Antialiasing is achieved by stretching the resampling filter by a factor max(1, 1 / scale), which means that when downsampling, more input pixels contribute to an output pixel. Default value is 0.

  • axes: If provided, it specifies a subset of axes that ‘roi’, ‘scales’ and ‘sizes’ refer to. If not provided, all axes are assumed [0, 1, …, r-1], where r = rank(data). Non-specified dimensions are interpreted as non-resizable. Negative value means counting dimensions from the back. Accepted range is [-r, r-1], where r = rank(data). Behavior is undefined if an axis is repeated.

  • coordinate_transformation_mode:

    This attribute describes how to transform the coordinate in the

    resized tensor to the coordinate in the original tensor. <br/> The coordinate of each dimension is transformed individually. Let’s describe a case using axis x as an example. Denote x_resized as the coordinate of axis x in the resized tensor, x_original as the coordinate of axis x in the original tensor, length_original as the length of the original tensor in axis x, length_resized as the length of the resized tensor in axis x, roi_x = (start_x, end_x) of the axis x in input “roi”, scale = length_resized / length_original, <br/> if coordinate_transformation_mode is “half_pixel”, <br/> x_original = (x_resized + 0.5) / scale - 0.5 <br/> if coordinate_transformation_mode is “pytorch_half_pixel”, <br/> x_original = length_resized > 1 ? (x_resized + 0.5) / scale - 0.5 : 0 <br/> if coordinate_transformation_mode is “align_corners”, <br/> x_original = x_resized * (length_original - 1) / (length_resized - 1) <br/> if coordinate_transformation_mode is “asymmetric”, <br/> x_original = x_resized / scale <br/> if coordinate_transformation_mode is “tf_crop_and_resize”, <br/> x_original = length_resized > 1 ? start_x * (length_original - 1) + x_resized * (end_x - start_x) * (length_original - 1) / (length_resized - 1) : 0.5 * (start_x + end_x) * (length_original - 1) . Default value is 'half_pixel'.

  • cubic_coeff_a: The coefficient ‘a’ used in cubic interpolation. Two common choice are -0.5 (in some cases of TensorFlow) and -0.75 (in PyTorch). Check out Equation (4) in https://ieeexplore.ieee.org/document/1163711 for the details. This attribute is valid only if mode is “cubic”. Default value is -0.75.

  • exclude_outside: If set to 1, the weight of sampling locations outside the tensor will be set to 0 and the weight will be renormalized so that their sum is 1.0. The default value is 0. Default value is 0.

  • extrapolation_value: When coordinate_transformation_mode is “tf_crop_and_resize” and x_original is outside the range [0, length_original - 1], this value is used as the corresponding output value. Default is 0.0f. Default value is 0.0.

  • keep_aspect_ratio_policy:

    This attribute describes how to interpret the sizes input with

    regard to keeping the original aspect ratio of the input, and it is not applicable when the scales input is used. <br/> Given a set of sizes, associated with a subset of axes (explicitly provided or default), and assuming d = axes[i], with i being the index of the provided sizes. <br/> If keep_aspect_ratio_policy is “stretch”, the original aspect ratio is disregarded, and the input is resized to the specified size: <br/> out_size[d] = sizes[i] <br/> If keep_aspect_ratio_policy is “not_larger”, the sizes are adjusted so that no extent of the output is larger than the specified size, while keeping the original aspect ratio: <br/> scale = Min(sizes[i] / in_size[d]) <br/> out_size[d] = round_int(scale * in_size[i]) <br/> If keep_aspect_ratio_policy is “not_smaller”, the sizes are adjusted so that no extent of the output is smaller than the specified size, while keeping the original aspect ratio: <br/> scale = Max(sizes[i] / in_size[d]) <br/> out_size[d] = round_int(scale * in_size[i]) <br/> For non- resizable axes (those not specified in axes), the output size will be equal to the input size. Note: round_int stands for computing the nearest integer value, rounding halfway cases up. Default value is 'stretch'.

  • mode: Three interpolation modes: “nearest” (default), “linear” and “cubic”. The “linear” mode includes linear interpolation for 1D tensor and N-linear interpolation for N-D tensor (for example, bilinear interpolation for 2D tensor). The “cubic” mode includes cubic interpolation for 1D tensor and N-cubic interpolation for N-D tensor (for example, bicubic interpolation for 2D tensor). Default value is 'nearest'.

  • nearest_mode: Four modes: “round_prefer_floor” (default, as known as round half down), “round_prefer_ceil” (as known as round half up), “floor”, “ceil”. Only used by nearest interpolation. It indicates how to get “nearest” pixel in input tensor from x_original, so this attribute is valid only if “mode” is “nearest”. Default value is 'round_prefer_floor'.

Inputs

Between 1 and 4 inputs.

  • X (heterogeneous) - T1: N-D tensor

  • roi (optional, heterogeneous) - T2: 1-D tensor given as [start1, …, startN, end1, …, endN], where N is the rank of X or the length of axes, if provided. The RoIs’ coordinates are normalized in the coordinate system of the input image. It only takes effect when coordinate_transformation_mode is “tf_crop_and_resize”

  • scales (optional, heterogeneous) - tensor(float): The scale array along each dimension. It takes value greater than 0. If it’s less than 1, it’s sampling down, otherwise, it’s upsampling. The number of elements of ‘scales’ should be the same as the rank of input ‘X’ or the length of ‘axes’, if provided. One of ‘scales’ and ‘sizes’ MUST be specified and it is an error if both are specified. If ‘sizes’ is needed, the user can use an empty string as the name of ‘scales’ in this operator’s input list.

  • sizes (optional, heterogeneous) - tensor(int64): Target size of the output tensor. Its interpretation depends on the ‘keep_aspect_ratio_policy’ value.The number of elements of ‘sizes’ should be the same as the rank of input ‘X’, or the length of ‘axes’, if provided. Only one of ‘scales’ and ‘sizes’ can be specified.

Outputs

  • Y (heterogeneous) - T1: N-D tensor after resizing

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input ‘X’ and output ‘Y’ to all tensor types.

  • T2 in ( tensor(double), tensor(float), tensor(float16) ): Constrain roi type to float or double.

OnnxReverseSequence#

class mlprodict.npy.xop_auto_import_.OnnxReverseSequence(*args, **kwargs)#

Version

  • name: ReverseSequence (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Reverse batch of sequences having different lengths specified by sequence_lens.

For each slice i iterating on batch axis, the operator reverses the first sequence_lens[i] elements on time axis, and copies elements whose index’s beyond sequence_lens[i] to the output. So the output slice i contains reversed sequences on the first sequence_lens[i] elements, then have original values copied for the other elements.

Example 1:
input = [[0.0, 4.0, 8.0, 12.0],

[1.0, 5.0, 9.0, 13.0], [2.0, 6.0, 10.0, 14.0], [3.0, 7.0, 11.0, 15.0]]

sequence_lens = [4, 3, 2, 1] time_axis = 0 batch_axis = 1

output = [[3.0, 6.0, 9.0, 12.0],

[2.0, 5.0, 8.0, 13.0], [1.0, 4.0, 10.0, 14.0], [0.0, 7.0, 11.0, 15.0]]

Example 2:
input = [[0.0, 1.0, 2.0, 3.0 ],

[4.0, 5.0, 6.0, 7.0 ], [8.0, 9.0, 10.0, 11.0], [12.0, 13.0, 14.0, 15.0]]

sequence_lens = [1, 2, 3, 4] time_axis = 1 batch_axis = 0

output = [[0.0, 1.0, 2.0, 3.0 ],

[5.0, 4.0, 6.0, 7.0 ], [10.0, 9.0, 8.0, 11.0], [15.0, 14.0, 13.0, 12.0]]

Attributes

  • batch_axis: (Optional) Specify which axis is batch axis. Must be one of 1 (default), or 0. Default value is 1.

  • time_axis: (Optional) Specify which axis is time axis. Must be one of 0 (default), or 1. Default value is 0.

Inputs

  • input (heterogeneous) - T: Tensor of rank r >= 2.

  • sequence_lens (heterogeneous) - tensor(int64): Tensor specifying lengths of the sequences in a batch. It has shape [batch_size].

Outputs

  • Y (heterogeneous) - T: Tensor with same shape of input.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input and output types can be of any tensor type.

OnnxReverseSequence_10#

class mlprodict.npy.xop_auto_import_.OnnxReverseSequence_10(*args, **kwargs)#

Version

  • name: ReverseSequence (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Reverse batch of sequences having different lengths specified by sequence_lens.

For each slice i iterating on batch axis, the operator reverses the first sequence_lens[i] elements on time axis, and copies elements whose index’s beyond sequence_lens[i] to the output. So the output slice i contains reversed sequences on the first sequence_lens[i] elements, then have original values copied for the other elements.

Example 1:
input = [[0.0, 4.0, 8.0, 12.0],

[1.0, 5.0, 9.0, 13.0], [2.0, 6.0, 10.0, 14.0], [3.0, 7.0, 11.0, 15.0]]

sequence_lens = [4, 3, 2, 1] time_axis = 0 batch_axis = 1

output = [[3.0, 6.0, 9.0, 12.0],

[2.0, 5.0, 8.0, 13.0], [1.0, 4.0, 10.0, 14.0], [0.0, 7.0, 11.0, 15.0]]

Example 2:
input = [[0.0, 1.0, 2.0, 3.0 ],

[4.0, 5.0, 6.0, 7.0 ], [8.0, 9.0, 10.0, 11.0], [12.0, 13.0, 14.0, 15.0]]

sequence_lens = [1, 2, 3, 4] time_axis = 1 batch_axis = 0

output = [[0.0, 1.0, 2.0, 3.0 ],

[5.0, 4.0, 6.0, 7.0 ], [10.0, 9.0, 8.0, 11.0], [15.0, 14.0, 13.0, 12.0]]

Attributes

  • batch_axis: (Optional) Specify which axis is batch axis. Must be one of 1 (default), or 0. Default value is 1.

  • time_axis: (Optional) Specify which axis is time axis. Must be one of 0 (default), or 1. Default value is 0.

Inputs

  • input (heterogeneous) - T: Tensor of rank r >= 2.

  • sequence_lens (heterogeneous) - tensor(int64): Tensor specifying lengths of the sequences in a batch. It has shape [batch_size].

Outputs

  • Y (heterogeneous) - T: Tensor with same shape of input.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input and output types can be of any tensor type.

OnnxRoiAlign#

class mlprodict.npy.xop_auto_import_.OnnxRoiAlign(*args, **kwargs)#

Version

  • name: RoiAlign (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Region of Interest (RoI) align operation described in the [Mask R-CNN paper](https://arxiv.org/abs/1703.06870). RoiAlign consumes an input tensor X and region of interests (rois) to apply pooling across each RoI; it produces a 4-D tensor of shape (num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing quantizations while converting from original image into feature map and from feature map into RoI feature; in each ROI bin, the value of the sampled locations are computed directly through bilinear interpolation.

Attributes

  • coordinate_transformation_mode: Allowed values are ‘half_pixel’ and ‘output_half_pixel’. Use the value ‘half_pixel’ to pixel shift the input coordinates by -0.5 (the recommended behavior). Use the value ‘output_half_pixel’ to omit the pixel shift for the input (use this for a backward-compatible behavior). Default value is 'half_pixel'.

  • mode: The pooling method. Two modes are supported: ‘avg’ and ‘max’. Default is ‘avg’. Default value is 'avg'.

  • output_height: default 1; Pooled output Y’s height. Default value is 1.

  • output_width: default 1; Pooled output Y’s width. Default value is 1.

  • sampling_ratio: Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0. Default value is 0.

  • spatial_scale: Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f. Default value is 1.0.

Inputs

  • X (heterogeneous) - T1: Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (heterogeneous) - T1: RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], …]. The RoIs’ coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.

  • batch_indices (heterogeneous) - T2: 1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

Outputs

  • Y (heterogeneous) - T1: RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain types to float tensors.

  • T2 in ( tensor(int64) ): Constrain types to int tensors.

OnnxRoiAlign_10#

class mlprodict.npy.xop_auto_import_.OnnxRoiAlign_10(*args, **kwargs)#

Version

  • name: RoiAlign (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Region of Interest (RoI) align operation described in the [Mask R-CNN paper](https://arxiv.org/abs/1703.06870). RoiAlign consumes an input tensor X and region of interests (rois) to apply pooling across each RoI; it produces a 4-D tensor of shape (num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing quantizations while converting from original image into feature map and from feature map into RoI feature; in each ROI bin, the value of the sampled locations are computed directly through bilinear interpolation.

Attributes

  • mode: The pooling method. Two modes are supported: ‘avg’ and ‘max’. Default is ‘avg’. Default value is 'avg'.

  • output_height: default 1; Pooled output Y’s height. Default value is 1.

  • output_width: default 1; Pooled output Y’s width. Default value is 1.

  • sampling_ratio: Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0. Default value is 0.

  • spatial_scale: Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f. Default value is 1.0.

Inputs

  • X (heterogeneous) - T1: Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (heterogeneous) - T1: RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], …]. The RoIs’ coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.

  • batch_indices (heterogeneous) - T2: 1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

Outputs

  • Y (heterogeneous) - T1: RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain types to float tensors.

  • T2 in ( tensor(int64) ): Constrain types to int tensors.

OnnxRoiAlign_16#

class mlprodict.npy.xop_auto_import_.OnnxRoiAlign_16(*args, **kwargs)#

Version

  • name: RoiAlign (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Region of Interest (RoI) align operation described in the [Mask R-CNN paper](https://arxiv.org/abs/1703.06870). RoiAlign consumes an input tensor X and region of interests (rois) to apply pooling across each RoI; it produces a 4-D tensor of shape (num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing quantizations while converting from original image into feature map and from feature map into RoI feature; in each ROI bin, the value of the sampled locations are computed directly through bilinear interpolation.

Attributes

  • coordinate_transformation_mode: Allowed values are ‘half_pixel’ and ‘output_half_pixel’. Use the value ‘half_pixel’ to pixel shift the input coordinates by -0.5 (the recommended behavior). Use the value ‘output_half_pixel’ to omit the pixel shift for the input (use this for a backward-compatible behavior). Default value is 'half_pixel'.

  • mode: The pooling method. Two modes are supported: ‘avg’ and ‘max’. Default is ‘avg’. Default value is 'avg'.

  • output_height: default 1; Pooled output Y’s height. Default value is 1.

  • output_width: default 1; Pooled output Y’s width. Default value is 1.

  • sampling_ratio: Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0. Default value is 0.

  • spatial_scale: Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f. Default value is 1.0.

Inputs

  • X (heterogeneous) - T1: Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.

  • rois (heterogeneous) - T1: RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], …]. The RoIs’ coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.

  • batch_indices (heterogeneous) - T2: 1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

Outputs

  • Y (heterogeneous) - T1: RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

Type Constraints

  • T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain types to float tensors.

  • T2 in ( tensor(int64) ): Constrain types to int tensors.

OnnxRound#

class mlprodict.npy.xop_auto_import_.OnnxRound(*args, **kwargs)#

Version

  • name: Round (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Round takes one input Tensor and rounds the values, element-wise, meaning it finds the nearest integer for each value. In case of halfs, the rule is to round them to the nearest even integer. The output tensor has the same shape and type as the input.

Examples:

round([0.9]) = [1.0]
round([2.5]) = [2.0]
round([2.3]) = [2.0]
round([1.5]) = [2.0]
round([-4.5]) = [-4.0]

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxRound_11#

class mlprodict.npy.xop_auto_import_.OnnxRound_11(*args, **kwargs)#

Version

  • name: Round (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Round takes one input Tensor and rounds the values, element-wise, meaning it finds the nearest integer for each value. In case of halfs, the rule is to round them to the nearest even integer. The output tensor has the same shape and type as the input.

Examples:

round([0.9]) = [1.0]
round([2.5]) = [2.0]
round([2.3]) = [2.0]
round([1.5]) = [2.0]
round([-4.5]) = [-4.0]

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSTFT#

class mlprodict.npy.xop_auto_import_.OnnxSTFT(*args, **kwargs)#

Version

  • name: STFT (GitHub)

  • domain: main

  • since_version: 17

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Computes the Short-time Fourier Transform of the signal.

Attributes

  • onesided: If onesided is 1, only values for w in [0, 1, 2, …, floor(n_fft/2) + 1] are returned because the real-to-complex Fourier transform satisfies the conjugate symmetry, i.e., X[m, w] = X[m,w]=X[m,n_fft-w]*. Note if the input or window tensors are complex, then onesided output is not possible. Enabling onesided with real inputs performs a Real-valued fast Fourier transform (RFFT).When invoked with real or complex valued input, the default value is 1. Values can be 0 or 1. Default value is 1.

Inputs

Between 2 and 4 inputs.

  • signal (heterogeneous) - T1: Input tensor representing a real or complex valued signal. For real input, the following shape is expected: [batch_size][signal_length][1]. For complex input, the following shape is expected: [batch_size][signal_length][2], where [batch_size][signal_length][0] represents the real component and [batch_size][signal_length][1] represents the imaginary component of the signal.

  • frame_step (heterogeneous) - T2: The number of samples to step between successive DFTs.

  • window (optional, heterogeneous) - T1: A tensor representing the window that will be slid over the signal.The window must have rank 1 with shape: [window_shape]. It’s an optional value.

  • frame_length (optional, heterogeneous) - T2: A scalar representing the size of the DFT. It’s an optional value.

Outputs

  • output (heterogeneous) - T1: The Short-time Fourier Transform of the signals.If onesided is 1, the output has the shape: [batch_size][frames][dft_unique_bins][2], where dft_unique_bins is frame_length // 2 + 1 (the unique components of the DFT) If onesided is 0, the output has the shape: [batch_size][frames][frame_length][2], where frame_length is the length of the DFT.

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain signal and output to float tensors.

  • T2 in ( tensor(int32), tensor(int64) ): Constrain scalar length types to int64_t.

OnnxSTFT_17#

class mlprodict.npy.xop_auto_import_.OnnxSTFT_17(*args, **kwargs)#

Version

  • name: STFT (GitHub)

  • domain: main

  • since_version: 17

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Computes the Short-time Fourier Transform of the signal.

Attributes

  • onesided: If onesided is 1, only values for w in [0, 1, 2, …, floor(n_fft/2) + 1] are returned because the real-to-complex Fourier transform satisfies the conjugate symmetry, i.e., X[m, w] = X[m,w]=X[m,n_fft-w]*. Note if the input or window tensors are complex, then onesided output is not possible. Enabling onesided with real inputs performs a Real-valued fast Fourier transform (RFFT).When invoked with real or complex valued input, the default value is 1. Values can be 0 or 1. Default value is 1.

Inputs

Between 2 and 4 inputs.

  • signal (heterogeneous) - T1: Input tensor representing a real or complex valued signal. For real input, the following shape is expected: [batch_size][signal_length][1]. For complex input, the following shape is expected: [batch_size][signal_length][2], where [batch_size][signal_length][0] represents the real component and [batch_size][signal_length][1] represents the imaginary component of the signal.

  • frame_step (heterogeneous) - T2: The number of samples to step between successive DFTs.

  • window (optional, heterogeneous) - T1: A tensor representing the window that will be slid over the signal.The window must have rank 1 with shape: [window_shape]. It’s an optional value.

  • frame_length (optional, heterogeneous) - T2: A scalar representing the size of the DFT. It’s an optional value.

Outputs

  • output (heterogeneous) - T1: The Short-time Fourier Transform of the signals.If onesided is 1, the output has the shape: [batch_size][frames][dft_unique_bins][2], where dft_unique_bins is frame_length // 2 + 1 (the unique components of the DFT) If onesided is 0, the output has the shape: [batch_size][frames][frame_length][2], where frame_length is the length of the DFT.

Type Constraints

  • T1 in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain signal and output to float tensors.

  • T2 in ( tensor(int32), tensor(int64) ): Constrain scalar length types to int64_t.

OnnxScan#

class mlprodict.npy.xop_auto_import_.OnnxScan(*args, **kwargs)#

Version

  • name: Scan (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Scan can be used to iterate over one or more scan_input tensors, constructing zero or more scan_output tensors. It combines ideas from general recurrences, functional programming constructs such as scan, fold, map, and zip, and is intended to enable generalizations of RNN-like constructs for sequence-to-sequence processing. Other tensors (referred to as state_variables here) can be used to carry a state when iterating from one element to another (similar to hidden-state in RNNs, also referred to as loop-carried dependences in the context of loops). Many common usages involve a single scan_input tensor (where functionality similar to scan, fold and map can be obtained). When more than one scan_input is used, a behavior similar to zip is obtained.

The attribute body must be a graph, specifying the computation to be performed in every iteration. It takes as input the current values of the state_variables and the current iterated element of the scan_inputs. It must return the (updated) values of the state_variables and zero or more scan_output_element tensors. The values of the scan_output_element tensors are concatenated over all the iterations to produce the scan_output values of the scan construct (similar to the concatenated intermediate hidden-state values of RNN-like constructs). All the output tensors (state_variables as well as scan_output_element tensors) are required to have the same shape in each iteration of the loop (a restriction imposed to enable efficient memory allocation).

Note that the iterated element passed to the body subgraph does not have a sequence axis. It will have a rank one less than the rank of the corresponding scan_input.

The scan operation returns the final values of the state_variables as well as the scan_outputs.

The optional attribute scan_input_directions specifies the direction (forward or backward) for each scan input. If this attribute is omitted, all sequences are scanned in the forward direction. A bidirectional scan may be performed by specifying the same tensor input twice in the scan_inputs, once with a forward direction, and once with a backward direction.

The scan_output of the operation is produced by concatenating the scan_output_element values produced by the body in each iteration. The optional attribute scan_output_directions specifies the direction in which scan_output is constructed (by appending or prepending the scan_output_element to scan_output in each iteration) for each scan_output. If this attribute is omitted, the scan_output_element is appended to the scan_output in each iteration.

The optional attribute scan_input_axes specifies the axis to be scanned for each scan_input. If omitted, every scan_input will be scanned in axis 0. For example, if axis 0 is the batch axis and axis 1 is the time axis (to be scanned), specify an axis value of 1. Note that scanning a non-zero axis may be less efficient than scanning axis zero.

The optional attribute scan_output_axes specifies the axis along which the scan_outputs are accumulated for each scan_output. For example, if axis 1 is the time axis (to be scanned) for both inputs and outputs, specify a scan_input axis and scan_output axis value of 1.

Note that because of the ONNX restriction that only the last parameter of an operator can be variadic, the initial-states and scan-inputs are listed together as one input parameter. Similarly, the final-states and scan-outputs are listed together as one output parameter. The attribute num_scan_inputs indicates the number M of scan-inputs.

The behavior of

Scan <

num_scan_inputs = m, body = loop-body, scan_input_axes = [axis_1, …, axis_m]

> (init_1, …, init_n, scan_1, …, scan_m)

is equivalent to the following pseudo-code:

// scan_i.shape[axis_i] denotes the (max) sequence-length of scan_i // scan_i.shape[axis_i] is required to be equal to scan_j.shape[axis_j] for all i,j. sequence_length = scan_1.shape[axis_1];

// initialize state-variables st_1 = init_1; … st_n = init_n; // initialize scan-output variables: [] denotes an empty tensor scan_out_1 = []; …; scan_out_k = []; // identify number of iterations:

// execute loop for (int t = 0; t < sequence_length; ++t) {

// generate the scan-input elements: the notation T<axis=k>[t] indicates the sub-tensor // of rank one less than T obtained by indexing T at position t along axis k. si_1 = scan_1<axis=axis_1>[t]; … ; si_m = scan_m<axis=axis_m>[t]; // execute loop-body st_1, …, st_n, so_1, …, so_k = loop-body(st_1, …, st_n, si_1, …, si_m) // accumulate the scan-output elements scan_out_1 = Concat<axis=0>(scan_out_1, so_1); … ; scan_out_k = Concat<axis=0>(scan_out_k, so_k);

}

return st_1, …, st_n, scan_out_1, …, scan_out_k;

Sample usage: Encoding RNN using a Scan

The following example shows how a simple RNN over an input tensor %X, with weight tensor %Wi, recurrence weight tensor %Ri, bias tensors %Wbi and %Rbi, and initial hidden-state %H_0 can be encoded as a ScanLoop. Note that the loop-body is a nested graph, and it directly computes %Wi, %Ri, %Wbi, and %Rbi (typically constants or initializers in the body graph). If these values are computed in the outer graph, they need to be passed in as extra state_variables.

graph rnn-encoding {

%H_0 = … %X = … %Y_h, %Y = Scan[body = <graph rnn-cell-1>, num_scan_inputs=1](%H_0, %X) return %Y, %Y_h

}

graph rnn-cell-1 (

%H_tminus1[FLOAT, tensor] %X_t[FLOAT, tensor]

) {

%Wi = … %Ri = … %Wbi = … %Rbi = … %t1 = X_t * (Wi^T) %t2 = H_tminus1*(Ri^T) %t3 = Add(%t1, %t2) %t4 = Add(%t3, %Wbi) %t5 = Add(%t4, %Rbi) %Ht = Tanh(%t5) %Accumulate = Identity(%Ht) return %Ht, %Accumulate

}

Attributes

  • body (required): The graph run each iteration. It has N+M inputs: (loop state variables…, scan_input_elts…). It has N+K outputs: (loop state variables…, scan_output_elts…). Each scan_output is created by concatenating the value of the specified scan_output_elt value at the end of each iteration of the loop. It is an error if the dimensions of these values change across loop iterations.

  • num_scan_inputs (required): An attribute specifying the number of scan_inputs M.

  • scan_input_axes: An optional list of M flags. The i-th element of the list specifies the axis to be scanned (the sequence axis) for the i-th scan_input. If omitted, 0 will be used as the scan axis for every scan_input. Negative value for an axis means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

  • scan_input_directions: An optional list of M flags. The i-th element of the list specifies the direction to be scanned for the i-th scan_input tensor: 0 indicates forward direction and 1 indicates reverse direction. If omitted, all scan_input tensors will be scanned in the forward direction.

  • scan_output_axes: An optional list of K flags. The i-th element of the list specifies the axis for the i-th scan_output. The scan outputs are accumulated along the specified axis. If omitted, 0 will be used as the scan axis for every scan_output. Negative value for an axis means counting dimensions from the back. Accepted range is [-r, r-1].

  • scan_output_directions: An optional list of K flags, one for each scan_output. The i-th element of the list specifies whether the i-th scan_output should be constructed by appending or prepending a new value in each iteration: 0 indicates appending and 1 indicates prepending. If omitted, all scan_output tensors will be produced by appending a value in each iteration.

Inputs

Between 1 and 2147483647 inputs.

  • initial_state_and_scan_inputs (variadic) - V: Initial values of the loop’s N state variables followed by M scan_inputs

Outputs

Between 1 and 2147483647 outputs.

  • final_state_and_scan_outputs (variadic) - V: Final values of the loop’s N state variables followed by K scan_outputs

Type Constraints

  • V in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types

OnnxScan_11#

class mlprodict.npy.xop_auto_import_.OnnxScan_11(*args, **kwargs)#

Version

  • name: Scan (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Scan can be used to iterate over one or more scan_input tensors, constructing zero or more scan_output tensors. It combines ideas from general recurrences, functional programming constructs such as scan, fold, map, and zip, and is intended to enable generalizations of RNN-like constructs for sequence-to-sequence processing. Other tensors (referred to as state_variables here) can be used to carry a state when iterating from one element to another (similar to hidden-state in RNNs, also referred to as loop-carried dependences in the context of loops). Many common usages involve a single scan_input tensor (where functionality similar to scan, fold and map can be obtained). When more than one scan_input is used, a behavior similar to zip is obtained.

The attribute body must be a graph, specifying the computation to be performed in every iteration. It takes as input the current values of the state_variables and the current iterated element of the scan_inputs. It must return the (updated) values of the state_variables and zero or more scan_output_element tensors. The values of the scan_output_element tensors are concatenated over all the iterations to produce the scan_output values of the scan construct (similar to the concatenated intermediate hidden-state values of RNN-like constructs). All the output tensors (state_variables as well as scan_output_element tensors) are required to have the same shape in each iteration of the loop (a restriction imposed to enable efficient memory allocation).

Note that the iterated element passed to the body subgraph does not have a sequence axis. It will have a rank one less than the rank of the corresponding scan_input.

The scan operation returns the final values of the state_variables as well as the scan_outputs.

The optional attribute scan_input_directions specifies the direction (forward or backward) for each scan input. If this attribute is omitted, all sequences are scanned in the forward direction. A bidirectional scan may be performed by specifying the same tensor input twice in the scan_inputs, once with a forward direction, and once with a backward direction.

The scan_output of the operation is produced by concatenating the scan_output_element values produced by the body in each iteration. The optional attribute scan_output_directions specifies the direction in which scan_output is constructed (by appending or prepending the scan_output_element to scan_output in each iteration) for each scan_output. If this attribute is omitted, the scan_output_element is appended to the scan_output in each iteration.

The optional attribute scan_input_axes specifies the axis to be scanned for each scan_input. If omitted, every scan_input will be scanned in axis 0. For example, if axis 0 is the batch axis and axis 1 is the time axis (to be scanned), specify an axis value of 1. Note that scanning a non-zero axis may be less efficient than scanning axis zero.

The optional attribute scan_output_axes specifies the axis along which the scan_outputs are accumulated for each scan_output. For example, if axis 1 is the time axis (to be scanned) for both inputs and outputs, specify a scan_input axis and scan_output axis value of 1.

Note that because of the ONNX restriction that only the last parameter of an operator can be variadic, the initial-states and scan-inputs are listed together as one input parameter. Similarly, the final-states and scan-outputs are listed together as one output parameter. The attribute num_scan_inputs indicates the number M of scan-inputs.

The behavior of

Scan <

num_scan_inputs = m, body = loop-body, scan_input_axes = [axis_1, …, axis_m]

> (init_1, …, init_n, scan_1, …, scan_m)

is equivalent to the following pseudo-code:

// scan_i.shape[axis_i] denotes the (max) sequence-length of scan_i // scan_i.shape[axis_i] is required to be equal to scan_j.shape[axis_j] for all i,j. sequence_length = scan_1.shape[axis_1];

// initialize state-variables st_1 = init_1; … st_n = init_n; // initialize scan-output variables: [] denotes an empty tensor scan_out_1 = []; …; scan_out_k = []; // identify number of iterations:

// execute loop for (int t = 0; t < sequence_length; ++t) {

// generate the scan-input elements: the notation T<axis=k>[t] indicates the sub-tensor // of rank one less than T obtained by indexing T at position t along axis k. si_1 = scan_1<axis=axis_1>[t]; … ; si_m = scan_m<axis=axis_m>[t]; // execute loop-body st_1, …, st_n, so_1, …, so_k = loop-body(st_1, …, st_n, si_1, …, si_m) // accumulate the scan-output elements scan_out_1 = Concat<axis=0>(scan_out_1, so_1); … ; scan_out_k = Concat<axis=0>(scan_out_k, so_k);

}

return st_1, …, st_n, scan_out_1, …, scan_out_k;

Sample usage: Encoding RNN using a Scan

The following example shows how a simple RNN over an input tensor %X, with weight tensor %Wi, recurrence weight tensor %Ri, bias tensors %Wbi and %Rbi, and initial hidden-state %H_0 can be encoded as a ScanLoop. Note that the loop-body is a nested graph, and it directly computes %Wi, %Ri, %Wbi, and %Rbi (typically constants or initializers in the body graph). If these values are computed in the outer graph, they need to be passed in as extra state_variables.

graph rnn-encoding {

%H_0 = … %X = … %Y_h, %Y = Scan[body = <graph rnn-cell-1>, num_scan_inputs=1](%H_0, %X) return %Y, %Y_h

}

graph rnn-cell-1 (

%H_tminus1[FLOAT, tensor] %X_t[FLOAT, tensor]

) {

%Wi = … %Ri = … %Wbi = … %Rbi = … %t1 = X_t * (Wi^T) %t2 = H_tminus1*(Ri^T) %t3 = Add(%t1, %t2) %t4 = Add(%t3, %Wbi) %t5 = Add(%t4, %Rbi) %Ht = Tanh(%t5) %Accumulate = Identity(%Ht) return %Ht, %Accumulate

}

Attributes

  • body (required): The graph run each iteration. It has N+M inputs: (loop state variables…, scan_input_elts…). It has N+K outputs: (loop state variables…, scan_output_elts…). Each scan_output is created by concatenating the value of the specified scan_output_elt value at the end of each iteration of the loop. It is an error if the dimensions of these values change across loop iterations.

  • num_scan_inputs (required): An attribute specifying the number of scan_inputs M.

  • scan_input_axes: An optional list of M flags. The i-th element of the list specifies the axis to be scanned (the sequence axis) for the i-th scan_input. If omitted, 0 will be used as the scan axis for every scan_input. Negative value for an axis means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

  • scan_input_directions: An optional list of M flags. The i-th element of the list specifies the direction to be scanned for the i-th scan_input tensor: 0 indicates forward direction and 1 indicates reverse direction. If omitted, all scan_input tensors will be scanned in the forward direction.

  • scan_output_axes: An optional list of K flags. The i-th element of the list specifies the axis for the i-th scan_output. The scan outputs are accumulated along the specified axis. If omitted, 0 will be used as the scan axis for every scan_output. Negative value for an axis means counting dimensions from the back. Accepted range is [-r, r-1].

  • scan_output_directions: An optional list of K flags, one for each scan_output. The i-th element of the list specifies whether the i-th scan_output should be constructed by appending or prepending a new value in each iteration: 0 indicates appending and 1 indicates prepending. If omitted, all scan_output tensors will be produced by appending a value in each iteration.

Inputs

Between 1 and 2147483647 inputs.

  • initial_state_and_scan_inputs (variadic) - V: Initial values of the loop’s N state variables followed by M scan_inputs

Outputs

Between 1 and 2147483647 outputs.

  • final_state_and_scan_outputs (variadic) - V: Final values of the loop’s N state variables followed by K scan_outputs

Type Constraints

  • V in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types

OnnxScan_16#

class mlprodict.npy.xop_auto_import_.OnnxScan_16(*args, **kwargs)#

Version

  • name: Scan (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Scan can be used to iterate over one or more scan_input tensors, constructing zero or more scan_output tensors. It combines ideas from general recurrences, functional programming constructs such as scan, fold, map, and zip, and is intended to enable generalizations of RNN-like constructs for sequence-to-sequence processing. Other tensors (referred to as state_variables here) can be used to carry a state when iterating from one element to another (similar to hidden-state in RNNs, also referred to as loop-carried dependences in the context of loops). Many common usages involve a single scan_input tensor (where functionality similar to scan, fold and map can be obtained). When more than one scan_input is used, a behavior similar to zip is obtained.

The attribute body must be a graph, specifying the computation to be performed in every iteration. It takes as input the current values of the state_variables and the current iterated element of the scan_inputs. It must return the (updated) values of the state_variables and zero or more scan_output_element tensors. The values of the scan_output_element tensors are concatenated over all the iterations to produce the scan_output values of the scan construct (similar to the concatenated intermediate hidden-state values of RNN-like constructs). All the output tensors (state_variables as well as scan_output_element tensors) are required to have the same shape in each iteration of the loop (a restriction imposed to enable efficient memory allocation).

Note that the iterated element passed to the body subgraph does not have a sequence axis. It will have a rank one less than the rank of the corresponding scan_input.

The scan operation returns the final values of the state_variables as well as the scan_outputs.

The optional attribute scan_input_directions specifies the direction (forward or backward) for each scan input. If this attribute is omitted, all sequences are scanned in the forward direction. A bidirectional scan may be performed by specifying the same tensor input twice in the scan_inputs, once with a forward direction, and once with a backward direction.

The scan_output of the operation is produced by concatenating the scan_output_element values produced by the body in each iteration. The optional attribute scan_output_directions specifies the direction in which scan_output is constructed (by appending or prepending the scan_output_element to scan_output in each iteration) for each scan_output. If this attribute is omitted, the scan_output_element is appended to the scan_output in each iteration.

The optional attribute scan_input_axes specifies the axis to be scanned for each scan_input. If omitted, every scan_input will be scanned in axis 0. For example, if axis 0 is the batch axis and axis 1 is the time axis (to be scanned), specify an axis value of 1. Note that scanning a non-zero axis may be less efficient than scanning axis zero.

The optional attribute scan_output_axes specifies the axis along which the scan_outputs are accumulated for each scan_output. For example, if axis 1 is the time axis (to be scanned) for both inputs and outputs, specify a scan_input axis and scan_output axis value of 1.

Note that because of the ONNX restriction that only the last parameter of an operator can be variadic, the initial-states and scan-inputs are listed together as one input parameter. Similarly, the final-states and scan-outputs are listed together as one output parameter. The attribute num_scan_inputs indicates the number M of scan-inputs.

The behavior of

Scan <

num_scan_inputs = m, body = loop-body, scan_input_axes = [axis_1, …, axis_m]

> (init_1, …, init_n, scan_1, …, scan_m)

is equivalent to the following pseudo-code:

// scan_i.shape[axis_i] denotes the (max) sequence-length of scan_i // scan_i.shape[axis_i] is required to be equal to scan_j.shape[axis_j] for all i,j. sequence_length = scan_1.shape[axis_1];

// initialize state-variables st_1 = init_1; … st_n = init_n; // initialize scan-output variables: [] denotes an empty tensor scan_out_1 = []; …; scan_out_k = []; // identify number of iterations:

// execute loop for (int t = 0; t < sequence_length; ++t) {

// generate the scan-input elements: the notation T<axis=k>[t] indicates the sub-tensor // of rank one less than T obtained by indexing T at position t along axis k. si_1 = scan_1<axis=axis_1>[t]; … ; si_m = scan_m<axis=axis_m>[t]; // execute loop-body st_1, …, st_n, so_1, …, so_k = loop-body(st_1, …, st_n, si_1, …, si_m) // accumulate the scan-output elements scan_out_1 = Concat<axis=0>(scan_out_1, so_1); … ; scan_out_k = Concat<axis=0>(scan_out_k, so_k);

}

return st_1, …, st_n, scan_out_1, …, scan_out_k;

Sample usage: Encoding RNN using a Scan

The following example shows how a simple RNN over an input tensor %X, with weight tensor %Wi, recurrence weight tensor %Ri, bias tensors %Wbi and %Rbi, and initial hidden-state %H_0 can be encoded as a ScanLoop. Note that the loop-body is a nested graph, and it directly computes %Wi, %Ri, %Wbi, and %Rbi (typically constants or initializers in the body graph). If these values are computed in the outer graph, they need to be passed in as extra state_variables.

graph rnn-encoding {

%H_0 = … %X = … %Y_h, %Y = Scan[body = <graph rnn-cell-1>, num_scan_inputs=1](%H_0, %X) return %Y, %Y_h

}

graph rnn-cell-1 (

%H_tminus1[FLOAT, tensor] %X_t[FLOAT, tensor]

) {

%Wi = … %Ri = … %Wbi = … %Rbi = … %t1 = X_t * (Wi^T) %t2 = H_tminus1*(Ri^T) %t3 = Add(%t1, %t2) %t4 = Add(%t3, %Wbi) %t5 = Add(%t4, %Rbi) %Ht = Tanh(%t5) %Accumulate = Identity(%Ht) return %Ht, %Accumulate

}

Attributes

  • body (required): The graph run each iteration. It has N+M inputs: (loop state variables…, scan_input_elts…). It has N+K outputs: (loop state variables…, scan_output_elts…). Each scan_output is created by concatenating the value of the specified scan_output_elt value at the end of each iteration of the loop. It is an error if the dimensions of these values change across loop iterations.

  • num_scan_inputs (required): An attribute specifying the number of scan_inputs M.

  • scan_input_axes: An optional list of M flags. The i-th element of the list specifies the axis to be scanned (the sequence axis) for the i-th scan_input. If omitted, 0 will be used as the scan axis for every scan_input. Negative value for an axis means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

  • scan_input_directions: An optional list of M flags. The i-th element of the list specifies the direction to be scanned for the i-th scan_input tensor: 0 indicates forward direction and 1 indicates reverse direction. If omitted, all scan_input tensors will be scanned in the forward direction.

  • scan_output_axes: An optional list of K flags. The i-th element of the list specifies the axis for the i-th scan_output. The scan outputs are accumulated along the specified axis. If omitted, 0 will be used as the scan axis for every scan_output. Negative value for an axis means counting dimensions from the back. Accepted range is [-r, r-1].

  • scan_output_directions: An optional list of K flags, one for each scan_output. The i-th element of the list specifies whether the i-th scan_output should be constructed by appending or prepending a new value in each iteration: 0 indicates appending and 1 indicates prepending. If omitted, all scan_output tensors will be produced by appending a value in each iteration.

Inputs

Between 1 and 2147483647 inputs.

  • initial_state_and_scan_inputs (variadic) - V: Initial values of the loop’s N state variables followed by M scan_inputs

Outputs

Between 1 and 2147483647 outputs.

  • final_state_and_scan_outputs (variadic) - V: Final values of the loop’s N state variables followed by K scan_outputs

Type Constraints

  • V in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types

OnnxScan_8#

class mlprodict.npy.xop_auto_import_.OnnxScan_8(*args, **kwargs)#

Version

  • name: Scan (GitHub)

  • domain: main

  • since_version: 8

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 8.

Summary

Scan can be used to iterate over one or more scan_input tensors, constructing zero or more scan_output tensors. It combines ideas from general recurrences, functional programming constructs such as scan, fold, map, and zip, and is intended to enable generalizations of RNN-like constructs for sequence-to-sequence processing. Other tensors (referred to as state_variables here) can be used to carry a state when iterating from one element to another (similar to hidden-state in RNNs, also referred to as loop-carried dependences in the context of loops). All these tensors are required to have the same shape in each iteration of the loop (a restriction imposed to enable efficient memory allocation). Many common usages involve a single scan_input tensor (where functionality similar to scan, fold and map can be obtained). When more than one scan_input is used, a behavior similar to zip is obtained.

The attribute body must be a graph, specifying the computation to be performed in every iteration. It takes as input the current values of the state_variables and the current iterated element of the scan_inputs. It must return the (updated) values of the state_variables and zero or more scan_output_element tensors. The values of the scan_output_element tensors are concatenated over all the iterations to produce the scan_output values of the scan construct (similar to the concatenated intermediate hidden-state values of RNN-like constructs).

The scan operation returns the final values of the state_variables as well as the scan_outputs.

The operation supports batching, and the batch-axis is required to be 0. When multiple scan_input tensors are used, they must all have the same batch-size, and they must all have the same maximum-sequence-length (the dimensionality of the sequence axis or scan axis). The sequence axis or scan axis is required to be 1.

The operation has an optional sequence_lens input (of shape [BATCH_SIZE]) to allow variable length sequences of length <= the maximum-sequence-length. If this input is not specified, all sequences are assumed to be of length equal to maximum-sequence-length. For variable length input sequences, the scan_outputs will consist of a sequence of same length as the input, padded to the maximum-sequence-length.

The optional attribute directions can be used to scan a sequence in the reverse direction. If this attribute is omitted, all sequences are scanned in the forward direction. A bidirectional scan be performed by specifying the same tensor input twice in the scan_inputs, once with a forward direction, and once with a backward direction.

Note that because of the ONNX restriction that only the last parameter of an operator can be variadic, the initial-states and scan-inputs are listed together as one input parameter. Similarly, the final-states and scan-outputs are listed together as one output parameter. The attribute num_scan_inputs indicates the number M of scan-inputs.

The behavior of

Scan <

num_scan_inputs = m, body = loop-body

> (sequence_lengths, init_1, …, init_n, scan_1, …, scan_m)

is equivalent to the following pseudo-code:

// T.shape[0] denotes the batch-size of T // The batch-size of scan_1, …, scan_m are all required to be equal batch_size = scan_1.shape[0];

// scan_i.shape[1] denotes the (max) sequence-length of scan_i // scan_i.shape[1] is required to be equal to scan_j.shape[1] for all i,j. max_sequence_length = scan_1.shape[1];

for (int batch = 0; batch < batch_size; ++batch) {

// initialize state-variables st_1 = init_1; … st_n = init_n; // initialize scan-output variables: [] denotes an empty tensor scan_out_1 = []; …; scan_out_k = []; // identify number of iterations: N = (sequence_lengths specified) ? sequence_lengths[batch] : max_sequence_length;

// execute loop for (int t = 0; t < N; ++t) {

// generate the scan-input elements: the notation T<axis=k>[t] indicates the sub-tensor // of rank one less than T obtained by indexing T at position t along axis k. si_1 = (scan_1<axis=0>[batch])<axis=1>[t]; … ; si_m = (scan_m<axis=0>[batch])<axis=1>[t]; // execute loop-body st_1, …, st_n, so_1, …, so_k = loop-body(st_1, …, st_n, si_1, …, si_m) // accumulate the scan-output elements scan_out_1 = Concat<axis=0>(scan_out_1, so_1); … ; scan_out_k = Concat<axis=0>(scan_out_k, so_k);

} // accumulate the outputs for this batch: bst_1[batch] = st_1; …, bst_n[batch] = st_n; // Note scan-outputs will have size max_sequence_length, but only first N values will be meaningful. // The remaining values have an undefined value. b_scan_out_1[batch] = scan_out_1; …; b_scan_out_k[batch] = scan_out_k;

} return bst_1, …, bst_n, b_scan_out_1, …, b_scan_out_k;

Sample usage: Encoding RNN using a Scan

The following example shows how a simple RNN over an input tensor %X, with weight tensor %Wi, recurrence weight tensor %Ri, bias tensors %Wbi and %Rbi, and initial hidden-state %H_0 can be encoded as a ScanLoop. Note that the loop-body is a nested graph, and it directly computes %Wi, %Ri, %Wbi, and %Rbi (typically constants or initializers in the body graph). If these values are computed in the outer graph, they need to be passed in as extra state_variables.

graph rnn-encoding {

%H_0 = … %X = … %Y_h, %Y = Scan[body = <graph rnn-cell-1>, num_scan_inputs=1](“”, %H_0, %X) return %Y, %Y_h

}

graph rnn-cell-1 (

%H_tminus1[FLOAT, tensor] %X_t[FLOAT, tensor]

) {

%Wi = … %Ri = … %Wbi = … %Rbi = … %t1 = X_t * (Wi^T) %t2 = H_tminus1*(Ri^T) %t3 = Add(%t1, %t2) %t4 = Add(%t3, %Wbi) %t5 = Add(%t4, %Rbi) %Ht = Tanh(%t5) %Accumulate = Identity(%Ht) return %Ht, %Accumulate

}

Attributes

  • body (required): The graph run each iteration. It has N+M inputs: (loop state variables…, scan_input_elts…). It has N+K outputs: (loop state variables…, scan_output_elts…). Each scan_output is created by concatenating the value of the specified scan_output_elt value at the end of each iteration of the loop. It is an error if the dimensions of these values change across loop iterations.

  • directions: An optional list of M flags. The i-th element of the list specifies the direction to be scanned for the i-th scan_input tensor: 0 indicates forward direction and 1 indicates reverse direction. If omitted, all scan_input tensors will be scanned in the forward direction.

  • num_scan_inputs (required): An attribute specifying the number of scan_inputs M.

Inputs

Between 2 and 2147483647 inputs.

  • sequence_lens (optional, heterogeneous) - I: Optional tensor specifying lengths of the sequences in a batch. If this input is not specified, all sequences are assumed to be of the maximum sequence length (the dimension of the sequence axis of the scan_input tensors).

  • initial_state_and_scan_inputs (variadic) - V: Initial values of the loop’s N state variables followed by M scan_inputs

Outputs

Between 1 and 2147483647 outputs.

  • final_state_and_scan_outputs (variadic) - V: Final values of the loop’s N state variables followed by K scan_outputs

Type Constraints

  • I in ( tensor(int64) ): Int64 tensor

  • V in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types

OnnxScan_9#

class mlprodict.npy.xop_auto_import_.OnnxScan_9(*args, **kwargs)#

Version

  • name: Scan (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Scan can be used to iterate over one or more scan_input tensors, constructing zero or more scan_output tensors. It combines ideas from general recurrences, functional programming constructs such as scan, fold, map, and zip, and is intended to enable generalizations of RNN-like constructs for sequence-to-sequence processing. Other tensors (referred to as state_variables here) can be used to carry a state when iterating from one element to another (similar to hidden-state in RNNs, also referred to as loop-carried dependences in the context of loops). Many common usages involve a single scan_input tensor (where functionality similar to scan, fold and map can be obtained). When more than one scan_input is used, a behavior similar to zip is obtained.

The attribute body must be a graph, specifying the computation to be performed in every iteration. It takes as input the current values of the state_variables and the current iterated element of the scan_inputs. It must return the (updated) values of the state_variables and zero or more scan_output_element tensors. The values of the scan_output_element tensors are concatenated over all the iterations to produce the scan_output values of the scan construct (similar to the concatenated intermediate hidden-state values of RNN-like constructs). All the output tensors (state_variables as well as scan_output_element tensors) are required to have the same shape in each iteration of the loop (a restriction imposed to enable efficient memory allocation).

Note that the iterated element passed to the body subgraph does not have a sequence axis. It will have a rank one less than the rank of the corresponding scan_input.

The scan operation returns the final values of the state_variables as well as the scan_outputs.

The optional attribute scan_input_directions specifies the direction (forward or backward) for each scan input. If this attribute is omitted, all sequences are scanned in the forward direction. A bidirectional scan may be performed by specifying the same tensor input twice in the scan_inputs, once with a forward direction, and once with a backward direction.

The scan_output of the operation is produced by concatenating the scan_output_element values produced by the body in each iteration. The optional attribute scan_output_directions specifies the direction in which scan_output is constructed (by appending or prepending the scan_output_element to scan_output in each iteration) for each scan_output. If this attribute is omitted, the scan_output_element is appended to the scan_output in each iteration.

The optional attribute scan_input_axes specifies the axis to be scanned for each scan_input. If omitted, every scan_input will be scanned in axis 0. For example, if axis 0 is the batch axis and axis 1 is the time axis (to be scanned), specify an axis value of 1. Note that scanning a non-zero axis may be less efficient than scanning axis zero.

The optional attribute scan_output_axes specifies the axis along which the scan_outputs are accumulated for each scan_output. For example, if axis 1 is the time axis (to be scanned) for both inputs and outputs, specify a scan_input axis and scan_output axis value of 1.

Note that because of the ONNX restriction that only the last parameter of an operator can be variadic, the initial-states and scan-inputs are listed together as one input parameter. Similarly, the final-states and scan-outputs are listed together as one output parameter. The attribute num_scan_inputs indicates the number M of scan-inputs.

The behavior of

Scan <

num_scan_inputs = m, body = loop-body, scan_input_axes = [axis_1, …, axis_m]

> (init_1, …, init_n, scan_1, …, scan_m)

is equivalent to the following pseudo-code:

// scan_i.shape[axis_i] denotes the (max) sequence-length of scan_i // scan_i.shape[axis_i] is required to be equal to scan_j.shape[axis_j] for all i,j. sequence_length = scan_1.shape[axis_1];

// initialize state-variables st_1 = init_1; … st_n = init_n; // initialize scan-output variables: [] denotes an empty tensor scan_out_1 = []; …; scan_out_k = []; // identify number of iterations:

// execute loop for (int t = 0; t < sequence_length; ++t) {

// generate the scan-input elements: the notation T<axis=k>[t] indicates the sub-tensor // of rank one less than T obtained by indexing T at position t along axis k. si_1 = scan_1<axis=axis_1>[t]; … ; si_m = scan_m<axis=axis_m>[t]; // execute loop-body st_1, …, st_n, so_1, …, so_k = loop-body(st_1, …, st_n, si_1, …, si_m) // accumulate the scan-output elements scan_out_1 = Concat<axis=0>(scan_out_1, so_1); … ; scan_out_k = Concat<axis=0>(scan_out_k, so_k);

}

return st_1, …, st_n, scan_out_1, …, scan_out_k;

Sample usage: Encoding RNN using a Scan

The following example shows how a simple RNN over an input tensor %X, with weight tensor %Wi, recurrence weight tensor %Ri, bias tensors %Wbi and %Rbi, and initial hidden-state %H_0 can be encoded as a ScanLoop. Note that the loop-body is a nested graph, and it directly computes %Wi, %Ri, %Wbi, and %Rbi (typically constants or initializers in the body graph). If these values are computed in the outer graph, they need to be passed in as extra state_variables.

graph rnn-encoding {

%H_0 = … %X = … %Y_h, %Y = Scan[body = <graph rnn-cell-1>, num_scan_inputs=1](%H_0, %X) return %Y, %Y_h

}

graph rnn-cell-1 (

%H_tminus1[FLOAT, tensor] %X_t[FLOAT, tensor]

) {

%Wi = … %Ri = … %Wbi = … %Rbi = … %t1 = X_t * (Wi^T) %t2 = H_tminus1*(Ri^T) %t3 = Add(%t1, %t2) %t4 = Add(%t3, %Wbi) %t5 = Add(%t4, %Rbi) %Ht = Tanh(%t5) %Accumulate = Identity(%Ht) return %Ht, %Accumulate

}

Attributes

  • body (required): The graph run each iteration. It has N+M inputs: (loop state variables…, scan_input_elts…). It has N+K outputs: (loop state variables…, scan_output_elts…). Each scan_output is created by concatenating the value of the specified scan_output_elt value at the end of each iteration of the loop. It is an error if the dimensions of these values change across loop iterations.

  • num_scan_inputs (required): An attribute specifying the number of scan_inputs M.

  • scan_input_axes: An optional list of M flags. The i-th element of the list specifies the axis to be scanned (the sequence axis) for the i-th scan_input. If omitted, 0 will be used as the scan axis for every scan_input.

  • scan_input_directions: An optional list of M flags. The i-th element of the list specifies the direction to be scanned for the i-th scan_input tensor: 0 indicates forward direction and 1 indicates reverse direction. If omitted, all scan_input tensors will be scanned in the forward direction.

  • scan_output_axes: An optional list of K flags. The i-th element of the list specifies the axis for the i-th scan_output. The scan outputs are accumulated along the specified axis. If omitted, 0 will be used as the scan axis for every scan_output.

  • scan_output_directions: An optional list of K flags, one for each scan_output. The i-th element of the list specifies whether the i-th scan_output should be constructed by appending or prepending a new value in each iteration: 0 indicates appending and 1 indicates prepending. If omitted, all scan_output tensors will be produced by appending a value in each iteration.

Inputs

Between 1 and 2147483647 inputs.

  • initial_state_and_scan_inputs (variadic) - V: Initial values of the loop’s N state variables followed by M scan_inputs

Outputs

Between 1 and 2147483647 outputs.

  • final_state_and_scan_outputs (variadic) - V: Final values of the loop’s N state variables followed by K scan_outputs

Type Constraints

  • V in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): All Tensor types

OnnxScatter#

class mlprodict.npy.xop_auto_import_.OnnxScatter(*args, **kwargs)#

Version

  • name: Scatter (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been deprecated since version 11.

Summary

This operator is deprecated. Please use ScatterElements, which provides the same functionality.

Scatter takes three inputs data, updates, and indices of the same rank r >= 1 and an optional attribute axis that identifies an axis of data (by default, the outer-most axis, that is axis 0). The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data.

For each entry in updates, the target index in data is obtained by combining the corresponding entry in indices with the index of the entry itself: the index-value for dimension = axis is obtained from the value of the corresponding entry in indices and the index-value for dimension != axis is obtained from the index of the entry itself.

For instance, in a 2-D tensor case, the update corresponding to the [i][j] entry is performed as below:

output[indices[i][j]][j] = updates[i][j] if axis = 0,
output[i][indices[i][j]] = updates[i][j] if axis = 1,

This operator is the inverse of GatherElements. It is similar to Torch’s Scatter operation.

Example 1:

data = [
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
]
indices = [
    [1, 0, 2],
    [0, 2, 1],
]
updates = [
    [1.0, 1.1, 1.2],
    [2.0, 2.1, 2.2],
]
output = [
    [2.0, 1.1, 0.0]
    [1.0, 0.0, 2.2]
    [0.0, 2.1, 1.2]
]

Example 2:

data = [[1.0, 2.0, 3.0, 4.0, 5.0]]
indices = [[1, 3]]
updates = [[1.1, 2.1]]
axis = 1
output = [[1.0, 1.1, 3.0, 2.1, 5.0]]

Attributes

  • axis: Which axis to scatter on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of r >= 1 (same rank as input). All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

  • updates (heterogeneous) - T: Tensor of rank r >=1 (same rank and shape as indices)

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1 (same rank as input).

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input and output types can be of any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxScatterElements#

class mlprodict.npy.xop_auto_import_.OnnxScatterElements(*args, **kwargs)#

Version

  • name: ScatterElements (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

ScatterElements takes three inputs data, updates, and indices of the same rank r >= 1 and an optional attribute axis that identifies an axis of data (by default, the outer-most axis, that is axis 0). The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data.

For each entry in updates, the target index in data is obtained by combining the corresponding entry in indices with the index of the entry itself: the index-value for dimension = axis is obtained from the value of the corresponding entry in indices and the index-value for dimension != axis is obtained from the index of the entry itself.

reduction allows specification of an optional reduction operation, which is applied to all values in updates tensor into output at the specified indices. In cases where reduction is set to “none”, indices should not have duplicate entries: that is, if idx1 != idx2, then indices[idx1] != indices[idx2]. For instance, in a 2-D tensor case, the update corresponding to the [i][j] entry is performed as below:

output[indices[i][j]][j] = updates[i][j] if axis = 0,
output[i][indices[i][j]] = updates[i][j] if axis = 1,

When reduction is set to some reduction function f, the update corresponding to the [i][j] entry is performed as below:

output[indices[i][j]][j] += f(output[indices[i][j]][j], updates[i][j]) if axis = 0,
output[i][indices[i][j]] += f(output[i][indices[i][j]], updates[i][j]) if axis = 1,

where the f is +/*/max/min as specified.

This operator is the inverse of GatherElements. It is similar to Torch’s Scatter operation.

(Opset 18 change): Adds max/min to the set of allowed reduction ops.

Example 1:

data = [
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
]
indices = [
    [1, 0, 2],
    [0, 2, 1],
]
updates = [
    [1.0, 1.1, 1.2],
    [2.0, 2.1, 2.2],
]
output = [
    [2.0, 1.1, 0.0]
    [1.0, 0.0, 2.2]
    [0.0, 2.1, 1.2]
]

Example 2:

data = [[1.0, 2.0, 3.0, 4.0, 5.0]]
indices = [[1, 3]]
updates = [[1.1, 2.1]]
axis = 1
output = [[1.0, 1.1, 3.0, 2.1, 5.0]]

Attributes

  • axis: Which axis to scatter on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

  • reduction: Type of reduction to apply: none (default), add, mul, max, min. ‘none’: no reduction applied. ‘add’: reduction using the addition operation. ‘mul’: reduction using the multiplication operation.’max’: reduction using the maximum operation.’min’: reduction using the minimum operation. Default value is 'none'.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of r >= 1 (same rank as input). All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

  • updates (heterogeneous) - T: Tensor of rank r >=1 (same rank and shape as indices)

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1 (same rank as input).

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input and output types can be of any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxScatterElements_11#

class mlprodict.npy.xop_auto_import_.OnnxScatterElements_11(*args, **kwargs)#

Version

  • name: ScatterElements (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

ScatterElements takes three inputs data, updates, and indices of the same rank r >= 1 and an optional attribute axis that identifies an axis of data (by default, the outer-most axis, that is axis 0). The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data.

For each entry in updates, the target index in data is obtained by combining the corresponding entry in indices with the index of the entry itself: the index-value for dimension = axis is obtained from the value of the corresponding entry in indices and the index-value for dimension != axis is obtained from the index of the entry itself.

For instance, in a 2-D tensor case, the update corresponding to the [i][j] entry is performed as below:

output[indices[i][j]][j] = updates[i][j] if axis = 0,
output[i][indices[i][j]] = updates[i][j] if axis = 1,

This operator is the inverse of GatherElements. It is similar to Torch’s Scatter operation.

Example 1:

data = [
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
]
indices = [
    [1, 0, 2],
    [0, 2, 1],
]
updates = [
    [1.0, 1.1, 1.2],
    [2.0, 2.1, 2.2],
]
output = [
    [2.0, 1.1, 0.0]
    [1.0, 0.0, 2.2]
    [0.0, 2.1, 1.2]
]

Example 2:

data = [[1.0, 2.0, 3.0, 4.0, 5.0]]
indices = [[1, 3]]
updates = [[1.1, 2.1]]
axis = 1
output = [[1.0, 1.1, 3.0, 2.1, 5.0]]

Attributes

  • axis: Which axis to scatter on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of r >= 1 (same rank as input). All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

  • updates (heterogeneous) - T: Tensor of rank r >=1 (same rank and shape as indices)

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1 (same rank as input).

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input and output types can be of any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxScatterElements_13#

class mlprodict.npy.xop_auto_import_.OnnxScatterElements_13(*args, **kwargs)#

Version

  • name: ScatterElements (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

ScatterElements takes three inputs data, updates, and indices of the same rank r >= 1 and an optional attribute axis that identifies an axis of data (by default, the outer-most axis, that is axis 0). The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data.

For each entry in updates, the target index in data is obtained by combining the corresponding entry in indices with the index of the entry itself: the index-value for dimension = axis is obtained from the value of the corresponding entry in indices and the index-value for dimension != axis is obtained from the index of the entry itself.

For instance, in a 2-D tensor case, the update corresponding to the [i][j] entry is performed as below:

output[indices[i][j]][j] = updates[i][j] if axis = 0,
output[i][indices[i][j]] = updates[i][j] if axis = 1,

This operator is the inverse of GatherElements. It is similar to Torch’s Scatter operation.

Example 1:

data = [
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
]
indices = [
    [1, 0, 2],
    [0, 2, 1],
]
updates = [
    [1.0, 1.1, 1.2],
    [2.0, 2.1, 2.2],
]
output = [
    [2.0, 1.1, 0.0]
    [1.0, 0.0, 2.2]
    [0.0, 2.1, 1.2]
]

Example 2:

data = [[1.0, 2.0, 3.0, 4.0, 5.0]]
indices = [[1, 3]]
updates = [[1.1, 2.1]]
axis = 1
output = [[1.0, 1.1, 3.0, 2.1, 5.0]]

Attributes

  • axis: Which axis to scatter on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of r >= 1 (same rank as input). All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

  • updates (heterogeneous) - T: Tensor of rank r >=1 (same rank and shape as indices)

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1 (same rank as input).

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input and output types can be of any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxScatterElements_16#

class mlprodict.npy.xop_auto_import_.OnnxScatterElements_16(*args, **kwargs)#

Version

  • name: ScatterElements (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

ScatterElements takes three inputs data, updates, and indices of the same rank r >= 1 and an optional attribute axis that identifies an axis of data (by default, the outer-most axis, that is axis 0). The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data. For each entry in updates, the target index in data is obtained by combining the corresponding entry in indices with the index of the entry itself: the index-value for dimension = axis is obtained from the value of the corresponding entry in indices and the index-value for dimension != axis is obtained from the index of the entry itself. reduction allows specification of an optional reduction operation, which is applied to all values in updates tensor into output at the specified indices. In cases where reduction is set to “none”, indices should not have duplicate entries: that is, if idx1 != idx2, then indices[idx1] != indices[idx2]. For instance, in a 2-D tensor case, the update corresponding to the [i][j] entry is performed as below:

output[indices[i][j]][j] = updates[i][j] if axis = 0,
output[i][indices[i][j]] = updates[i][j] if axis = 1,

When reduction is set to “add”, the update corresponding to the [i][j] entry is performed as below:

output[indices[i][j]][j] += updates[i][j] if axis = 0,
output[i][indices[i][j]] += updates[i][j] if axis = 1,

When reduction is set to “mul”, the update corresponding to the [i][j] entry is performed as below:

output[indices[i][j]][j] *= updates[i][j] if axis = 0,
output[i][indices[i][j]] *= updates[i][j] if axis = 1,

This operator is the inverse of GatherElements. It is similar to Torch’s Scatter operation. Example 1:

data = [
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
]
indices = [
    [1, 0, 2],
    [0, 2, 1],
]
updates = [
    [1.0, 1.1, 1.2],
    [2.0, 2.1, 2.2],
]
output = [
    [2.0, 1.1, 0.0]
    [1.0, 0.0, 2.2]
    [0.0, 2.1, 1.2]
]

Example 2:

data = [[1.0, 2.0, 3.0, 4.0, 5.0]]
indices = [[1, 3]]
updates = [[1.1, 2.1]]
axis = 1
output = [[1.0, 1.1, 3.0, 2.1, 5.0]]

Attributes

  • axis: Which axis to scatter on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

  • reduction: Type of reduction to apply: none (default), add, mul. ‘none’: no reduction applied. ‘add’: reduction using the addition operation. ‘mul’: reduction using the multiplication operation. Default value is 'none'.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of r >= 1 (same rank as input). All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

  • updates (heterogeneous) - T: Tensor of rank r >=1 (same rank and shape as indices)

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1 (same rank as input).

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input and output types can be of any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxScatterElements_18#

class mlprodict.npy.xop_auto_import_.OnnxScatterElements_18(*args, **kwargs)#

Version

  • name: ScatterElements (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

ScatterElements takes three inputs data, updates, and indices of the same rank r >= 1 and an optional attribute axis that identifies an axis of data (by default, the outer-most axis, that is axis 0). The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data.

For each entry in updates, the target index in data is obtained by combining the corresponding entry in indices with the index of the entry itself: the index-value for dimension = axis is obtained from the value of the corresponding entry in indices and the index-value for dimension != axis is obtained from the index of the entry itself.

reduction allows specification of an optional reduction operation, which is applied to all values in updates tensor into output at the specified indices. In cases where reduction is set to “none”, indices should not have duplicate entries: that is, if idx1 != idx2, then indices[idx1] != indices[idx2]. For instance, in a 2-D tensor case, the update corresponding to the [i][j] entry is performed as below:

output[indices[i][j]][j] = updates[i][j] if axis = 0,
output[i][indices[i][j]] = updates[i][j] if axis = 1,

When reduction is set to some reduction function f, the update corresponding to the [i][j] entry is performed as below:

output[indices[i][j]][j] += f(output[indices[i][j]][j], updates[i][j]) if axis = 0,
output[i][indices[i][j]] += f(output[i][indices[i][j]], updates[i][j]) if axis = 1,

where the f is +/*/max/min as specified.

This operator is the inverse of GatherElements. It is similar to Torch’s Scatter operation.

(Opset 18 change): Adds max/min to the set of allowed reduction ops.

Example 1:

data = [
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
]
indices = [
    [1, 0, 2],
    [0, 2, 1],
]
updates = [
    [1.0, 1.1, 1.2],
    [2.0, 2.1, 2.2],
]
output = [
    [2.0, 1.1, 0.0]
    [1.0, 0.0, 2.2]
    [0.0, 2.1, 1.2]
]

Example 2:

data = [[1.0, 2.0, 3.0, 4.0, 5.0]]
indices = [[1, 3]]
updates = [[1.1, 2.1]]
axis = 1
output = [[1.0, 1.1, 3.0, 2.1, 5.0]]

Attributes

  • axis: Which axis to scatter on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

  • reduction: Type of reduction to apply: none (default), add, mul, max, min. ‘none’: no reduction applied. ‘add’: reduction using the addition operation. ‘mul’: reduction using the multiplication operation.’max’: reduction using the maximum operation.’min’: reduction using the minimum operation. Default value is 'none'.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of r >= 1 (same rank as input). All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

  • updates (heterogeneous) - T: Tensor of rank r >=1 (same rank and shape as indices)

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1 (same rank as input).

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input and output types can be of any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxScatterND#

class mlprodict.npy.xop_auto_import_.OnnxScatterND(*args, **kwargs)#

Version

  • name: ScatterND (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

ScatterND takes three inputs data tensor of rank r >= 1, indices tensor of rank q >= 1, and updates tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data.

indices is an integer tensor. Let k denote indices.shape[-1], the last dimension in the shape of indices.

indices is treated as a (q-1)-dimensional tensor of k-tuples, where each k-tuple is a partial-index into data.

Hence, k can be a value at most the rank of data. When k equals rank(data), each update entry specifies an update to a single element of the tensor. When k is less than rank(data) each update entry specifies an update to a slice of the tensor. Index values are allowed to be negative, as per the usual convention for counting backwards from the end, but are expected in the valid range.

updates is treated as a (q-1)-dimensional tensor of replacement-slice-values. Thus, the first (q-1) dimensions of updates.shape must match the first (q-1) dimensions of indices.shape. The remaining dimensions of updates correspond to the dimensions of the replacement-slice-values. Each replacement-slice-value is a (r-k) dimensional tensor, corresponding to the trailing (r-k) dimensions of data. Thus, the shape of updates must equal indices.shape[0:q-1] ++ data.shape[k:r-1], where ++ denotes the concatenation of shapes.

The output is calculated via the following equation:

output = np.copy(data) update_indices = indices.shape[:-1] for idx in np.ndindex(update_indices):

output[indices[idx]] = updates[idx]

The order of iteration in the above loop is not specified. In particular, indices should not have duplicate entries: that is, if idx1 != idx2, then indices[idx1] != indices[idx2]. This ensures that the output value does not depend on the iteration order.

reduction allows specification of an optional reduction operation, which is applied to all values in updates tensor into output at the specified indices. In cases where reduction is set to “none”, indices should not have duplicate entries: that is, if idx1 != idx2, then indices[idx1] != indices[idx2]. This ensures that the output value does not depend on the iteration order. When reduction is set to some reduction function f, output is calculated as follows:

output = np.copy(data) update_indices = indices.shape[:-1] for idx in np.ndindex(update_indices):

output[indices[idx]] = f(output[indices[idx]], updates[idx])

where the f is +/*/max/min as specified.

This operator is the inverse of GatherND.

(Opset 18 change): Adds max/min to the set of allowed reduction ops.

Example 1:

data    = [1, 2, 3, 4, 5, 6, 7, 8]
indices = [[4], [3], [1], [7]]
updates = [9, 10, 11, 12]
output  = [1, 11, 3, 10, 9, 6, 7, 12]

Example 2:

data    = [[[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]]]
indices = [[0], [2]]
updates = [[[5, 5, 5, 5], [6, 6, 6, 6], [7, 7, 7, 7], [8, 8, 8, 8]],
           [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]]]
output  = [[[5, 5, 5, 5], [6, 6, 6, 6], [7, 7, 7, 7], [8, 8, 8, 8]],
           [[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]]]

Attributes

  • reduction: Type of reduction to apply: none (default), add, mul, max, min. ‘none’: no reduction applied. ‘add’: reduction using the addition operation. ‘mul’: reduction using the addition operation. ‘max’: reduction using the maximum operation.’min’: reduction using the minimum operation. Default value is 'none'.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - tensor(int64): Tensor of rank q >= 1.

  • updates (heterogeneous) - T: Tensor of rank q + r - indices_shape[-1] - 1.

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

OnnxScatterND_11#

class mlprodict.npy.xop_auto_import_.OnnxScatterND_11(*args, **kwargs)#

Version

  • name: ScatterND (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

ScatterND takes three inputs data tensor of rank r >= 1, indices tensor of rank q >= 1, and updates tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data. Note that indices should not have duplicate entries. That is, two or more updates for the same index-location is not supported.

indices is an integer tensor. Let k denote indices.shape[-1], the last dimension in the shape of indices.

indices is treated as a (q-1)-dimensional tensor of k-tuples, where each k-tuple is a partial-index into data.

Hence, k can be a value at most the rank of data. When k equals rank(data), each update entry specifies an update to a single element of the tensor. When k is less than rank(data) each update entry specifies an update to a slice of the tensor. Index values are allowed to be negative, as per the usual convention for counting backwards from the end, but are expected in the valid range.

updates is treated as a (q-1)-dimensional tensor of replacement-slice-values. Thus, the first (q-1) dimensions of updates.shape must match the first (q-1) dimensions of indices.shape. The remaining dimensions of updates correspond to the dimensions of the replacement-slice-values. Each replacement-slice-value is a (r-k) dimensional tensor, corresponding to the trailing (r-k) dimensions of data. Thus, the shape of updates must equal indices.shape[0:q-1] ++ data.shape[k:r-1], where ++ denotes the concatenation of shapes.

The output is calculated via the following equation:

output = np.copy(data) update_indices = indices.shape[:-1] for idx in np.ndindex(update_indices):

output[indices[idx]] = updates[idx]

The order of iteration in the above loop is not specified. In particular, indices should not have duplicate entries: that is, if idx1 != idx2, then indices[idx1] != indices[idx2]. This ensures that the output value does not depend on the iteration order.

This operator is the inverse of GatherND.

Example 1:

data    = [1, 2, 3, 4, 5, 6, 7, 8]
indices = [[4], [3], [1], [7]]
updates = [9, 10, 11, 12]
output  = [1, 11, 3, 10, 9, 6, 7, 12]

Example 2:

data    = [[[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]]]
indices = [[0], [2]]
updates = [[[5, 5, 5, 5], [6, 6, 6, 6], [7, 7, 7, 7], [8, 8, 8, 8]],
           [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]]]
output  = [[[5, 5, 5, 5], [6, 6, 6, 6], [7, 7, 7, 7], [8, 8, 8, 8]],
           [[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]]]

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - tensor(int64): Tensor of rank q >= 1.

  • updates (heterogeneous) - T: Tensor of rank q + r - indices_shape[-1] - 1.

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

OnnxScatterND_13#

class mlprodict.npy.xop_auto_import_.OnnxScatterND_13(*args, **kwargs)#

Version

  • name: ScatterND (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

ScatterND takes three inputs data tensor of rank r >= 1, indices tensor of rank q >= 1, and updates tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data. Note that indices should not have duplicate entries. That is, two or more updates for the same index-location is not supported.

indices is an integer tensor. Let k denote indices.shape[-1], the last dimension in the shape of indices.

indices is treated as a (q-1)-dimensional tensor of k-tuples, where each k-tuple is a partial-index into data.

Hence, k can be a value at most the rank of data. When k equals rank(data), each update entry specifies an update to a single element of the tensor. When k is less than rank(data) each update entry specifies an update to a slice of the tensor. Index values are allowed to be negative, as per the usual convention for counting backwards from the end, but are expected in the valid range.

updates is treated as a (q-1)-dimensional tensor of replacement-slice-values. Thus, the first (q-1) dimensions of updates.shape must match the first (q-1) dimensions of indices.shape. The remaining dimensions of updates correspond to the dimensions of the replacement-slice-values. Each replacement-slice-value is a (r-k) dimensional tensor, corresponding to the trailing (r-k) dimensions of data. Thus, the shape of updates must equal indices.shape[0:q-1] ++ data.shape[k:r-1], where ++ denotes the concatenation of shapes.

The output is calculated via the following equation:

output = np.copy(data) update_indices = indices.shape[:-1] for idx in np.ndindex(update_indices):

output[indices[idx]] = updates[idx]

The order of iteration in the above loop is not specified. In particular, indices should not have duplicate entries: that is, if idx1 != idx2, then indices[idx1] != indices[idx2]. This ensures that the output value does not depend on the iteration order.

This operator is the inverse of GatherND.

Example 1:

data    = [1, 2, 3, 4, 5, 6, 7, 8]
indices = [[4], [3], [1], [7]]
updates = [9, 10, 11, 12]
output  = [1, 11, 3, 10, 9, 6, 7, 12]

Example 2:

data    = [[[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]]]
indices = [[0], [2]]
updates = [[[5, 5, 5, 5], [6, 6, 6, 6], [7, 7, 7, 7], [8, 8, 8, 8]],
           [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]]]
output  = [[[5, 5, 5, 5], [6, 6, 6, 6], [7, 7, 7, 7], [8, 8, 8, 8]],
           [[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]]]

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - tensor(int64): Tensor of rank q >= 1.

  • updates (heterogeneous) - T: Tensor of rank q + r - indices_shape[-1] - 1.

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

OnnxScatterND_16#

class mlprodict.npy.xop_auto_import_.OnnxScatterND_16(*args, **kwargs)#

Version

  • name: ScatterND (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

ScatterND takes three inputs data tensor of rank r >= 1, indices tensor of rank q >= 1, and updates tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data.

indices is an integer tensor. Let k denote indices.shape[-1], the last dimension in the shape of indices.

indices is treated as a (q-1)-dimensional tensor of k-tuples, where each k-tuple is a partial-index into data.

Hence, k can be a value at most the rank of data. When k equals rank(data), each update entry specifies an update to a single element of the tensor. When k is less than rank(data) each update entry specifies an update to a slice of the tensor. Index values are allowed to be negative, as per the usual convention for counting backwards from the end, but are expected in the valid range.

updates is treated as a (q-1)-dimensional tensor of replacement-slice-values. Thus, the first (q-1) dimensions of updates.shape must match the first (q-1) dimensions of indices.shape. The remaining dimensions of updates correspond to the dimensions of the replacement-slice-values. Each replacement-slice-value is a (r-k) dimensional tensor, corresponding to the trailing (r-k) dimensions of data. Thus, the shape of updates must equal indices.shape[0:q-1] ++ data.shape[k:r-1], where ++ denotes the concatenation of shapes.

The output is calculated via the following equation:

output = np.copy(data) update_indices = indices.shape[:-1] for idx in np.ndindex(update_indices):

output[indices[idx]] = updates[idx]

The order of iteration in the above loop is not specified. In particular, indices should not have duplicate entries: that is, if idx1 != idx2, then indices[idx1] != indices[idx2]. This ensures that the output value does not depend on the iteration order.

reduction allows specification of an optional reduction operation, which is applied to all values in updates tensor into output at the specified indices. In cases where reduction is set to “none”, indices should not have duplicate entries: that is, if idx1 != idx2, then indices[idx1] != indices[idx2]. This ensures that the output value does not depend on the iteration order. When reduction is set to “add”, output is calculated as follows:

output = np.copy(data) update_indices = indices.shape[:-1] for idx in np.ndindex(update_indices):

output[indices[idx]] += updates[idx]

When reduction is set to “mul”, output is calculated as follows:

output = np.copy(data) update_indices = indices.shape[:-1] for idx in np.ndindex(update_indices):

output[indices[idx]] *= updates[idx]

This operator is the inverse of GatherND. Example 1:

data    = [1, 2, 3, 4, 5, 6, 7, 8]
indices = [[4], [3], [1], [7]]
updates = [9, 10, 11, 12]
output  = [1, 11, 3, 10, 9, 6, 7, 12]

Example 2:

data    = [[[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]]]
indices = [[0], [2]]
updates = [[[5, 5, 5, 5], [6, 6, 6, 6], [7, 7, 7, 7], [8, 8, 8, 8]],
           [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]]]
output  = [[[5, 5, 5, 5], [6, 6, 6, 6], [7, 7, 7, 7], [8, 8, 8, 8]],
           [[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]]]

Attributes

  • reduction: Type of reduction to apply: none (default), add, mul. ‘none’: no reduction applied. ‘add’: reduction using the addition operation. ‘mul’: reduction using the multiplication operation. Default value is 'none'.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - tensor(int64): Tensor of rank q >= 1.

  • updates (heterogeneous) - T: Tensor of rank q + r - indices_shape[-1] - 1.

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

OnnxScatterND_18#

class mlprodict.npy.xop_auto_import_.OnnxScatterND_18(*args, **kwargs)#

Version

  • name: ScatterND (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

ScatterND takes three inputs data tensor of rank r >= 1, indices tensor of rank q >= 1, and updates tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data.

indices is an integer tensor. Let k denote indices.shape[-1], the last dimension in the shape of indices.

indices is treated as a (q-1)-dimensional tensor of k-tuples, where each k-tuple is a partial-index into data.

Hence, k can be a value at most the rank of data. When k equals rank(data), each update entry specifies an update to a single element of the tensor. When k is less than rank(data) each update entry specifies an update to a slice of the tensor. Index values are allowed to be negative, as per the usual convention for counting backwards from the end, but are expected in the valid range.

updates is treated as a (q-1)-dimensional tensor of replacement-slice-values. Thus, the first (q-1) dimensions of updates.shape must match the first (q-1) dimensions of indices.shape. The remaining dimensions of updates correspond to the dimensions of the replacement-slice-values. Each replacement-slice-value is a (r-k) dimensional tensor, corresponding to the trailing (r-k) dimensions of data. Thus, the shape of updates must equal indices.shape[0:q-1] ++ data.shape[k:r-1], where ++ denotes the concatenation of shapes.

The output is calculated via the following equation:

output = np.copy(data) update_indices = indices.shape[:-1] for idx in np.ndindex(update_indices):

output[indices[idx]] = updates[idx]

The order of iteration in the above loop is not specified. In particular, indices should not have duplicate entries: that is, if idx1 != idx2, then indices[idx1] != indices[idx2]. This ensures that the output value does not depend on the iteration order.

reduction allows specification of an optional reduction operation, which is applied to all values in updates tensor into output at the specified indices. In cases where reduction is set to “none”, indices should not have duplicate entries: that is, if idx1 != idx2, then indices[idx1] != indices[idx2]. This ensures that the output value does not depend on the iteration order. When reduction is set to some reduction function f, output is calculated as follows:

output = np.copy(data) update_indices = indices.shape[:-1] for idx in np.ndindex(update_indices):

output[indices[idx]] = f(output[indices[idx]], updates[idx])

where the f is +/*/max/min as specified.

This operator is the inverse of GatherND.

(Opset 18 change): Adds max/min to the set of allowed reduction ops.

Example 1:

data    = [1, 2, 3, 4, 5, 6, 7, 8]
indices = [[4], [3], [1], [7]]
updates = [9, 10, 11, 12]
output  = [1, 11, 3, 10, 9, 6, 7, 12]

Example 2:

data    = [[[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]]]
indices = [[0], [2]]
updates = [[[5, 5, 5, 5], [6, 6, 6, 6], [7, 7, 7, 7], [8, 8, 8, 8]],
           [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]]]
output  = [[[5, 5, 5, 5], [6, 6, 6, 6], [7, 7, 7, 7], [8, 8, 8, 8]],
           [[1, 2, 3, 4], [5, 6, 7, 8], [8, 7, 6, 5], [4, 3, 2, 1]],
           [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]],
           [[8, 7, 6, 5], [4, 3, 2, 1], [1, 2, 3, 4], [5, 6, 7, 8]]]

Attributes

  • reduction: Type of reduction to apply: none (default), add, mul, max, min. ‘none’: no reduction applied. ‘add’: reduction using the addition operation. ‘mul’: reduction using the addition operation. ‘max’: reduction using the maximum operation.’min’: reduction using the minimum operation. Default value is 'none'.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - tensor(int64): Tensor of rank q >= 1.

  • updates (heterogeneous) - T: Tensor of rank q + r - indices_shape[-1] - 1.

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to any tensor type.

OnnxScatter_11#

class mlprodict.npy.xop_auto_import_.OnnxScatter_11(*args, **kwargs)#

Version

  • name: Scatter (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been deprecated since version 11.

Summary

This operator is deprecated. Please use ScatterElements, which provides the same functionality.

Scatter takes three inputs data, updates, and indices of the same rank r >= 1 and an optional attribute axis that identifies an axis of data (by default, the outer-most axis, that is axis 0). The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data.

For each entry in updates, the target index in data is obtained by combining the corresponding entry in indices with the index of the entry itself: the index-value for dimension = axis is obtained from the value of the corresponding entry in indices and the index-value for dimension != axis is obtained from the index of the entry itself.

For instance, in a 2-D tensor case, the update corresponding to the [i][j] entry is performed as below:

output[indices[i][j]][j] = updates[i][j] if axis = 0,
output[i][indices[i][j]] = updates[i][j] if axis = 1,

This operator is the inverse of GatherElements. It is similar to Torch’s Scatter operation.

Example 1:

data = [
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0],
]
indices = [
    [1, 0, 2],
    [0, 2, 1],
]
updates = [
    [1.0, 1.1, 1.2],
    [2.0, 2.1, 2.2],
]
output = [
    [2.0, 1.1, 0.0]
    [1.0, 0.0, 2.2]
    [0.0, 2.1, 1.2]
]

Example 2:

data = [[1.0, 2.0, 3.0, 4.0, 5.0]]
indices = [[1, 3]]
updates = [[1.1, 2.1]]
axis = 1
output = [[1.0, 1.1, 3.0, 2.1, 5.0]]

Attributes

  • axis: Which axis to scatter on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of r >= 1 (same rank as input). All index values are expected to be within bounds [-s, s-1] along axis of size s. It is an error if any of the index values are out of bounds.

  • updates (heterogeneous) - T: Tensor of rank r >=1 (same rank and shape as indices)

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1 (same rank as input).

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input and output types can be of any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxScatter_9#

class mlprodict.npy.xop_auto_import_.OnnxScatter_9(*args, **kwargs)#

Version

  • name: Scatter (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Given data, updates and indices input tensors of rank r >= 1, write the values provided by updates into the first input, data, along axis dimension of data (by default outer-most one as axis=0) at corresponding indices. For each entry in updates, the target index in data is specified by corresponding entry in indices for dimension = axis, and index in source for dimension != axis. For instance, in a 2-D tensor case, data[indices[i][j]][j] = updates[i][j] if axis = 0, or data[i][indices[i][j]] = updates[i][j] if axis = 1, where i and j are loop counters from 0 up to the respective size in updates - 1. Example 1:

data = [

[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0],

] indices = [

[1, 0, 2], [0, 2, 1],

] updates = [

[1.0, 1.1, 1.2], [2.0, 2.1, 2.2],

] output = [

[2.0, 1.1, 0.0] [1.0, 0.0, 2.2] [0.0, 2.1, 1.2]

]

Example 2:

data = [[1.0, 2.0, 3.0, 4.0, 5.0]] indices = [[1, 3]] updates = [[1.1, 2.1]] axis = 1 output = [[1.0, 1.1, 3.0, 2.1, 5.0]]

Attributes

  • axis: Which axis to scatter on. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] Default value is 0.

Inputs

  • data (heterogeneous) - T: Tensor of rank r >= 1.

  • indices (heterogeneous) - Tind: Tensor of int32/int64 indices, of r >= 1 (same rank as input).

  • updates (heterogeneous) - T: Tensor of rank r >=1 (same rank and shape as indices)

Outputs

  • output (heterogeneous) - T: Tensor of rank r >= 1 (same rank as input).

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input and output types can be of any tensor type.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxSelu#

class mlprodict.npy.xop_auto_import_.OnnxSelu(*args, **kwargs)#

Version

  • name: Selu (GitHub)

  • domain: main

  • since_version: 6

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Selu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the scaled exponential linear unit function, y = gamma * (alpha * e^x - alpha) for x <= 0, y = gamma * x for x > 0, is applied to the tensor elementwise.

Attributes

  • alpha: Coefficient of SELU default to 1.67326319217681884765625 (i.e., float32 approximation of 1.6732632423543772848170429916717). Default value is 1.6732631921768188.

  • gamma: Coefficient of SELU default to 1.05070102214813232421875 (i.e., float32 approximation of 1.0507009873554804934193349852946). Default value is 1.0507010221481323.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSelu_1#

class mlprodict.npy.xop_auto_import_.OnnxSelu_1(*args, **kwargs)#

Version

  • name: Selu (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Selu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the scaled exponential linear unit function, y = gamma * (alpha * e^x - alpha) for x <= 0, y = gamma * x for x > 0, is applied to the tensor elementwise.

Attributes

  • alpha: Coefficient of SELU default to 1.6732. Default value is 1.673200011253357.

  • consumed_inputs: legacy optimization attribute.

  • gamma: Coefficient of SELU default to 1.0507. Default value is 1.0506999492645264.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSelu_6#

class mlprodict.npy.xop_auto_import_.OnnxSelu_6(*args, **kwargs)#

Version

  • name: Selu (GitHub)

  • domain: main

  • since_version: 6

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Selu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the scaled exponential linear unit function, y = gamma * (alpha * e^x - alpha) for x <= 0, y = gamma * x for x > 0, is applied to the tensor elementwise.

Attributes

  • alpha: Coefficient of SELU default to 1.67326319217681884765625 (i.e., float32 approximation of 1.6732632423543772848170429916717). Default value is 1.6732631921768188.

  • gamma: Coefficient of SELU default to 1.05070102214813232421875 (i.e., float32 approximation of 1.0507009873554804934193349852946). Default value is 1.0507010221481323.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSequenceAt#

class mlprodict.npy.xop_auto_import_.OnnxSequenceAt(*args, **kwargs)#

Version

  • name: SequenceAt (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Outputs a tensor copy from the tensor at ‘position’ in ‘input_sequence’. Accepted range for ‘position’ is in [-n, n - 1], where n is the number of tensors in ‘input_sequence’. Negative value means counting positions from the back.

Inputs

  • input_sequence (heterogeneous) - S: Input sequence.

  • position (heterogeneous) - I: Position of the tensor in the sequence. Negative value means counting positions from the back. Accepted range in [-n, n - 1], where n is the number of tensors in ‘input_sequence’. It is an error if any of the index values are out of bounds. It must be a scalar(tensor of empty shape).

Outputs

  • tensor (heterogeneous) - T: Output tensor at the specified position in the input sequence.

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain to any tensor type.

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor type.

  • I in ( tensor(int32), tensor(int64) ): Constrain position to integral tensor. It must be a scalar(tensor of empty shape).

OnnxSequenceAt_11#

class mlprodict.npy.xop_auto_import_.OnnxSequenceAt_11(*args, **kwargs)#

Version

  • name: SequenceAt (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Outputs a tensor copy from the tensor at ‘position’ in ‘input_sequence’. Accepted range for ‘position’ is in [-n, n - 1], where n is the number of tensors in ‘input_sequence’. Negative value means counting positions from the back.

Inputs

  • input_sequence (heterogeneous) - S: Input sequence.

  • position (heterogeneous) - I: Position of the tensor in the sequence. Negative value means counting positions from the back. Accepted range in [-n, n - 1], where n is the number of tensors in ‘input_sequence’. It is an error if any of the index values are out of bounds. It must be a scalar(tensor of empty shape).

Outputs

  • tensor (heterogeneous) - T: Output tensor at the specified position in the input sequence.

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain to any tensor type.

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor type.

  • I in ( tensor(int32), tensor(int64) ): Constrain position to integral tensor. It must be a scalar(tensor of empty shape).

OnnxSequenceConstruct#

class mlprodict.npy.xop_auto_import_.OnnxSequenceConstruct(*args, **kwargs)#

Version

  • name: SequenceConstruct (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Construct a tensor sequence containing ‘inputs’ tensors. All tensors in ‘inputs’ must have the same data type.

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic, heterogeneous) - T: Tensors.

Outputs

  • output_sequence (heterogeneous) - S: Sequence enclosing the input tensors.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to any tensor type.

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain output types to any tensor type.

OnnxSequenceConstruct_11#

class mlprodict.npy.xop_auto_import_.OnnxSequenceConstruct_11(*args, **kwargs)#

Version

  • name: SequenceConstruct (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Construct a tensor sequence containing ‘inputs’ tensors. All tensors in ‘inputs’ must have the same data type.

Inputs

Between 1 and 2147483647 inputs.

  • inputs (variadic, heterogeneous) - T: Tensors.

Outputs

  • output_sequence (heterogeneous) - S: Sequence enclosing the input tensors.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to any tensor type.

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain output types to any tensor type.

OnnxSequenceEmpty#

class mlprodict.npy.xop_auto_import_.OnnxSequenceEmpty(*args, **kwargs)#

Version

  • name: SequenceEmpty (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Construct an empty tensor sequence, with given data type.

Attributes

  • dtype: (Optional) The data type of the tensors in the output sequence. The default type is ‘float’.

Outputs

  • output (heterogeneous) - S: Empty sequence.

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain output types to any tensor type.

OnnxSequenceEmpty_11#

class mlprodict.npy.xop_auto_import_.OnnxSequenceEmpty_11(*args, **kwargs)#

Version

  • name: SequenceEmpty (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Construct an empty tensor sequence, with given data type.

Attributes

  • dtype: (Optional) The data type of the tensors in the output sequence. The default type is ‘float’.

Outputs

  • output (heterogeneous) - S: Empty sequence.

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain output types to any tensor type.

OnnxSequenceErase#

class mlprodict.npy.xop_auto_import_.OnnxSequenceErase(*args, **kwargs)#

Version

  • name: SequenceErase (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Outputs a tensor sequence that removes the tensor at ‘position’ from ‘input_sequence’. Accepted range for ‘position’ is in [-n, n - 1], where n is the number of tensors in ‘input_sequence’. Negative value means counting positions from the back. ‘position’ is optional, by default it erases the last tensor from ‘input_sequence’.

Inputs

Between 1 and 2 inputs.

  • input_sequence (heterogeneous) - S: Input sequence.

  • position (optional, heterogeneous) - I: Position of the tensor in the sequence. Negative value means counting positions from the back. Accepted range in [-n, n - 1], where n is the number of tensors in ‘input_sequence’. It is an error if any of the index values are out of bounds. It must be a scalar(tensor of empty shape).

Outputs

  • output_sequence (heterogeneous) - S: Output sequence that has the tensor at the specified position removed.

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain to any tensor type.

  • I in ( tensor(int32), tensor(int64) ): Constrain position to integral tensor. It must be a scalar(tensor of empty shape).

OnnxSequenceErase_11#

class mlprodict.npy.xop_auto_import_.OnnxSequenceErase_11(*args, **kwargs)#

Version

  • name: SequenceErase (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Outputs a tensor sequence that removes the tensor at ‘position’ from ‘input_sequence’. Accepted range for ‘position’ is in [-n, n - 1], where n is the number of tensors in ‘input_sequence’. Negative value means counting positions from the back. ‘position’ is optional, by default it erases the last tensor from ‘input_sequence’.

Inputs

Between 1 and 2 inputs.

  • input_sequence (heterogeneous) - S: Input sequence.

  • position (optional, heterogeneous) - I: Position of the tensor in the sequence. Negative value means counting positions from the back. Accepted range in [-n, n - 1], where n is the number of tensors in ‘input_sequence’. It is an error if any of the index values are out of bounds. It must be a scalar(tensor of empty shape).

Outputs

  • output_sequence (heterogeneous) - S: Output sequence that has the tensor at the specified position removed.

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain to any tensor type.

  • I in ( tensor(int32), tensor(int64) ): Constrain position to integral tensor. It must be a scalar(tensor of empty shape).

OnnxSequenceInsert#

class mlprodict.npy.xop_auto_import_.OnnxSequenceInsert(*args, **kwargs)#

Version

  • name: SequenceInsert (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Outputs a tensor sequence that inserts ‘tensor’ into ‘input_sequence’ at ‘position’. ‘tensor’ must have the same data type as ‘input_sequence’. Accepted range for ‘position’ is in [-n, n], where n is the number of tensors in ‘input_sequence’. Negative value means counting positions from the back. ‘position’ is optional, by default it inserts ‘tensor’ to the back of ‘input_sequence’.

Inputs

Between 2 and 3 inputs.

  • input_sequence (heterogeneous) - S: Input sequence.

  • tensor (heterogeneous) - T: Input tensor to be inserted into the input sequence.

  • position (optional, heterogeneous) - I: Position in the sequence where the new tensor is inserted. It is optional and default is to insert to the back of the sequence. Negative value means counting positions from the back. Accepted range in [-n, n], where n is the number of tensors in ‘input_sequence’. It is an error if any of the index values are out of bounds. It must be a scalar(tensor of empty shape).

Outputs

  • output_sequence (heterogeneous) - S: Output sequence that contains the inserted tensor at given position.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor type.

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain to any tensor type.

  • I in ( tensor(int32), tensor(int64) ): Constrain position to integral tensor. It must be a scalar(tensor of empty shape).

OnnxSequenceInsert_11#

class mlprodict.npy.xop_auto_import_.OnnxSequenceInsert_11(*args, **kwargs)#

Version

  • name: SequenceInsert (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Outputs a tensor sequence that inserts ‘tensor’ into ‘input_sequence’ at ‘position’. ‘tensor’ must have the same data type as ‘input_sequence’. Accepted range for ‘position’ is in [-n, n], where n is the number of tensors in ‘input_sequence’. Negative value means counting positions from the back. ‘position’ is optional, by default it inserts ‘tensor’ to the back of ‘input_sequence’.

Inputs

Between 2 and 3 inputs.

  • input_sequence (heterogeneous) - S: Input sequence.

  • tensor (heterogeneous) - T: Input tensor to be inserted into the input sequence.

  • position (optional, heterogeneous) - I: Position in the sequence where the new tensor is inserted. It is optional and default is to insert to the back of the sequence. Negative value means counting positions from the back. Accepted range in [-n, n], where n is the number of tensors in ‘input_sequence’. It is an error if any of the index values are out of bounds. It must be a scalar(tensor of empty shape).

Outputs

  • output_sequence (heterogeneous) - S: Output sequence that contains the inserted tensor at given position.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor type.

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain to any tensor type.

  • I in ( tensor(int32), tensor(int64) ): Constrain position to integral tensor. It must be a scalar(tensor of empty shape).

OnnxSequenceLength#

class mlprodict.npy.xop_auto_import_.OnnxSequenceLength(*args, **kwargs)#

Version

  • name: SequenceLength (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Produces a scalar(tensor of empty shape) containing the number of tensors in ‘input_sequence’.

Inputs

  • input_sequence (heterogeneous) - S: Input sequence.

Outputs

  • length (heterogeneous) - I: Length of input sequence. It must be a scalar(tensor of empty shape).

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain to any tensor type.

  • I in ( tensor(int64) ): Constrain output to integral tensor. It must be a scalar(tensor of empty shape).

OnnxSequenceLength_11#

class mlprodict.npy.xop_auto_import_.OnnxSequenceLength_11(*args, **kwargs)#

Version

  • name: SequenceLength (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Produces a scalar(tensor of empty shape) containing the number of tensors in ‘input_sequence’.

Inputs

  • input_sequence (heterogeneous) - S: Input sequence.

Outputs

  • length (heterogeneous) - I: Length of input sequence. It must be a scalar(tensor of empty shape).

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain to any tensor type.

  • I in ( tensor(int64) ): Constrain output to integral tensor. It must be a scalar(tensor of empty shape).

OnnxSequenceMap#

class mlprodict.npy.xop_auto_import_.OnnxSequenceMap(*args, **kwargs)#

Version

  • name: SequenceMap (GitHub)

  • domain: main

  • since_version: 17

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Applies a sub-graph to each sample in the input sequence(s).

Inputs can be either tensors or sequences, with the exception of the first input which must be a sequence. The length of the first input sequence will determine the number of samples in the outputs. Any other sequence inputs should have the same number of samples. The number of inputs and outputs, should match the one of the subgraph.

For each i-th element in the output, a sample will be extracted from the input sequence(s) at the i-th position and the sub-graph will be applied to it. The outputs will contain the outputs of the sub-graph for each sample, in the same order as in the input.

This operator assumes that processing each sample is independent and could executed in parallel or in any order. Users cannot expect any specific ordering in which each subgraph is computed.

Attributes

  • body (required): The graph to be run for each sample in the sequence(s). It should have as many inputs and outputs as inputs and outputs to the SequenceMap function.

Inputs

Between 1 and 2147483647 inputs.

  • input_sequence (heterogeneous) - S: Input sequence.

  • additional_inputs (variadic) - V: Additional inputs to the graph

Outputs

Between 1 and 2147483647 outputs.

  • out_sequence (variadic) - S: Output sequence(s)

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain input types to any sequence type.

  • V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor or sequence type.

OnnxSequenceMap_17#

class mlprodict.npy.xop_auto_import_.OnnxSequenceMap_17(*args, **kwargs)#

Version

  • name: SequenceMap (GitHub)

  • domain: main

  • since_version: 17

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 17.

Summary

Applies a sub-graph to each sample in the input sequence(s).

Inputs can be either tensors or sequences, with the exception of the first input which must be a sequence. The length of the first input sequence will determine the number of samples in the outputs. Any other sequence inputs should have the same number of samples. The number of inputs and outputs, should match the one of the subgraph.

For each i-th element in the output, a sample will be extracted from the input sequence(s) at the i-th position and the sub-graph will be applied to it. The outputs will contain the outputs of the sub-graph for each sample, in the same order as in the input.

This operator assumes that processing each sample is independent and could executed in parallel or in any order. Users cannot expect any specific ordering in which each subgraph is computed.

Attributes

  • body (required): The graph to be run for each sample in the sequence(s). It should have as many inputs and outputs as inputs and outputs to the SequenceMap function.

Inputs

Between 1 and 2147483647 inputs.

  • input_sequence (heterogeneous) - S: Input sequence.

  • additional_inputs (variadic) - V: Additional inputs to the graph

Outputs

Between 1 and 2147483647 outputs.

  • out_sequence (variadic) - S: Output sequence(s)

Type Constraints

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain input types to any sequence type.

  • V in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain to any tensor or sequence type.

OnnxShape#

class mlprodict.npy.xop_auto_import_.OnnxShape(*args, **kwargs)#

Version

  • name: Shape (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Takes a tensor as input and outputs an 1D int64 tensor containing the shape of the input tensor. Optional attributes start and end can be used to compute a slice of the input tensor’s shape. If start axis is omitted, the slice starts from axis 0. The end axis, if specified, is exclusive (and the returned value will not include the size of that axis). If the end axis is omitted, the axes upto the last one will be included. Negative axes indicate counting back from the last axis. Note that axes will be clamped to the range [0, r-1], where r is the rank of the input tensor if they are out-of-range (after adding r in the case of negative axis). Thus, specifying any end value > r is equivalent to specifying an end value of r, and specifying any start value < -r is equivalent to specifying a start value of 0.

For example: Input tensor with shape: [2, 3, 4] No attributes specified. Output: [2, 3, 4]

Input tensor with shape: [2, 3, 4] start: -1 Output: [4]

Input tensor with shape: [2, 3, 4] end: -1 Output: [2, 3]

Input tensor with shape: [2, 3, 4] start: 1 end: 2 Output: [3]

Attributes

  • end: (Optional) Ending axis for slicing the shape. Negative value means counting dimensions from the back. If omitted, sizes of all axes upto (including) the last one will be included.

  • start: (Optional) Starting axis for slicing the shape. Default value is 0.Negative value means counting dimensions from the back. Default value is 0.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • shape (heterogeneous) - T1: Shape of the input tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input tensor can be of arbitrary type.

  • T1 in ( tensor(int64) ): Constrain output to int64 tensor.

OnnxShape_1#

class mlprodict.npy.xop_auto_import_.OnnxShape_1(*args, **kwargs)#

Version

  • name: Shape (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Takes a tensor as input and outputs an 1D int64 tensor containing the shape of the input tensor.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • shape (heterogeneous) - T1: Shape of the input tensor

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input tensor can be of arbitrary type.

  • T1 in ( tensor(int64) ): Constrain output to int64 tensor.

OnnxShape_13#

class mlprodict.npy.xop_auto_import_.OnnxShape_13(*args, **kwargs)#

Version

  • name: Shape (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Takes a tensor as input and outputs an 1D int64 tensor containing the shape of the input tensor.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • shape (heterogeneous) - T1: Shape of the input tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input tensor can be of arbitrary type.

  • T1 in ( tensor(int64) ): Constrain output to int64 tensor.

OnnxShape_15#

class mlprodict.npy.xop_auto_import_.OnnxShape_15(*args, **kwargs)#

Version

  • name: Shape (GitHub)

  • domain: main

  • since_version: 15

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 15.

Summary

Takes a tensor as input and outputs an 1D int64 tensor containing the shape of the input tensor. Optional attributes start and end can be used to compute a slice of the input tensor’s shape. If start axis is omitted, the slice starts from axis 0. The end axis, if specified, is exclusive (and the returned value will not include the size of that axis). If the end axis is omitted, the axes upto the last one will be included. Negative axes indicate counting back from the last axis. Note that axes will be clamped to the range [0, r-1], where r is the rank of the input tensor if they are out-of-range (after adding r in the case of negative axis). Thus, specifying any end value > r is equivalent to specifying an end value of r, and specifying any start value < -r is equivalent to specifying a start value of 0.

For example: Input tensor with shape: [2, 3, 4] No attributes specified. Output: [2, 3, 4]

Input tensor with shape: [2, 3, 4] start: -1 Output: [4]

Input tensor with shape: [2, 3, 4] end: -1 Output: [2, 3]

Input tensor with shape: [2, 3, 4] start: 1 end: 2 Output: [3]

Attributes

  • end: (Optional) Ending axis for slicing the shape. Negative value means counting dimensions from the back. If omitted, sizes of all axes upto (including) the last one will be included.

  • start: (Optional) Starting axis for slicing the shape. Default value is 0.Negative value means counting dimensions from the back. Default value is 0.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • shape (heterogeneous) - T1: Shape of the input tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input tensor can be of arbitrary type.

  • T1 in ( tensor(int64) ): Constrain output to int64 tensor.

OnnxShrink#

class mlprodict.npy.xop_auto_import_.OnnxShrink(*args, **kwargs)#

Version

  • name: Shrink (GitHub)

  • domain: main

  • since_version: 9

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Shrink takes one input data (Tensor<numeric>) and produces one Tensor output, having same datatype and shape with input. It has two attributes, lambd and bias. The formula of this operator is: If x < -lambd, y = x + bias; If x > lambd, y = x - bias; Otherwise, y = 0.

Attributes

  • bias: The bias value added to output. Default is 0. Default value is 0.0.

  • lambd: The lambd value for the Shrink formulation. Default is 0.5. Default value is 0.5.

Inputs

  • input (heterogeneous) - T: The input data as Tensor.

Outputs

  • output (heterogeneous) - T: The output.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to only numeric types.

OnnxShrink_9#

class mlprodict.npy.xop_auto_import_.OnnxShrink_9(*args, **kwargs)#

Version

  • name: Shrink (GitHub)

  • domain: main

  • since_version: 9

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Shrink takes one input data (Tensor<numeric>) and produces one Tensor output, having same datatype and shape with input. It has two attributes, lambd and bias. The formula of this operator is: If x < -lambd, y = x + bias; If x > lambd, y = x - bias; Otherwise, y = 0.

Attributes

  • bias: The bias value added to output. Default is 0. Default value is 0.0.

  • lambd: The lambd value for the Shrink formulation. Default is 0.5. Default value is 0.5.

Inputs

  • input (heterogeneous) - T: The input data as Tensor.

Outputs

  • output (heterogeneous) - T: The output.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input to only numeric types.

OnnxSigmoid#

class mlprodict.npy.xop_auto_import_.OnnxSigmoid(*args, **kwargs)#

Version

  • name: Sigmoid (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Sigmoid takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the sigmoid function, y = 1 / (1 + exp(-x)), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSigmoid_1#

class mlprodict.npy.xop_auto_import_.OnnxSigmoid_1(*args, **kwargs)#

Version

  • name: Sigmoid (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Sigmoid takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the sigmoid function, y = 1 / (1 + exp(-x)), is applied to the tensor elementwise.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSigmoid_13#

class mlprodict.npy.xop_auto_import_.OnnxSigmoid_13(*args, **kwargs)#

Version

  • name: Sigmoid (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Sigmoid takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the sigmoid function, y = 1 / (1 + exp(-x)), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSigmoid_6#

class mlprodict.npy.xop_auto_import_.OnnxSigmoid_6(*args, **kwargs)#

Version

  • name: Sigmoid (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Sigmoid takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the sigmoid function, y = 1 / (1 + exp(-x)), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSign#

class mlprodict.npy.xop_auto_import_.OnnxSign(*args, **kwargs)#

Version

  • name: Sign (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Calculate the sign of the given input tensor element-wise. If input > 0, output 1. if input < 0, output -1. if input == 0, output 0.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The sign of the input tensor computed element-wise. It has the same shape and type of the input.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxSign_13#

class mlprodict.npy.xop_auto_import_.OnnxSign_13(*args, **kwargs)#

Version

  • name: Sign (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Calculate the sign of the given input tensor element-wise. If input > 0, output 1. if input < 0, output -1. if input == 0, output 0.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The sign of the input tensor computed element-wise. It has the same shape and type of the input.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxSign_9#

class mlprodict.npy.xop_auto_import_.OnnxSign_9(*args, **kwargs)#

Version

  • name: Sign (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Calculate the sign of the given input tensor element-wise. If input > 0, output 1. if input < 0, output -1. if input == 0, output 0.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The sign of the input tensor computed element-wise. It has the same shape and type of the input.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxSin#

class mlprodict.npy.xop_auto_import_.OnnxSin(*args, **kwargs)#

Version

  • name: Sin (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the sine of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The sine of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSin_7#

class mlprodict.npy.xop_auto_import_.OnnxSin_7(*args, **kwargs)#

Version

  • name: Sin (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the sine of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The sine of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSinh#

class mlprodict.npy.xop_auto_import_.OnnxSinh(*args, **kwargs)#

Version

  • name: Sinh (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Calculates the hyperbolic sine of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic sine values of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSinh_9#

class mlprodict.npy.xop_auto_import_.OnnxSinh_9(*args, **kwargs)#

Version

  • name: Sinh (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Calculates the hyperbolic sine of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic sine values of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSize#

class mlprodict.npy.xop_auto_import_.OnnxSize(*args, **kwargs)#

Version

  • name: Size (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Takes a tensor as input and outputs a int64 scalar that equals to the total number of elements of the input tensor.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • size (heterogeneous) - T1: Total number of elements of the input tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input tensor can be of arbitrary type.

  • T1 in ( tensor(int64) ): Constrain output to int64 tensor, which should be a scalar though.

OnnxSize_1#

class mlprodict.npy.xop_auto_import_.OnnxSize_1(*args, **kwargs)#

Version

  • name: Size (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Takes a tensor as input and outputs a int64 scalar that equals to the total number of elements of the input tensor.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • size (heterogeneous) - T1: Total number of elements of the input tensor

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input tensor can be of arbitrary type.

  • T1 in ( tensor(int64) ): Constrain output to int64 tensor, which should be a scalar though.

OnnxSize_13#

class mlprodict.npy.xop_auto_import_.OnnxSize_13(*args, **kwargs)#

Version

  • name: Size (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Takes a tensor as input and outputs a int64 scalar that equals to the total number of elements of the input tensor.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • size (heterogeneous) - T1: Total number of elements of the input tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input tensor can be of arbitrary type.

  • T1 in ( tensor(int64) ): Constrain output to int64 tensor, which should be a scalar though.

OnnxSlice#

class mlprodict.npy.xop_auto_import_.OnnxSlice(*args, **kwargs)#

Version

  • name: Slice (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Produces a slice of the input tensor along multiple axes. Similar to numpy: https://numpy.org/doc/stable/user/basics.indexing.html?highlight=slice#slicing-and-striding

Slice uses the starts, ends, axes and steps inputs to select a sub-tensor of its input data tensor.

An effective start[i], end[i], and step[i] must be computed for each i in [0, … r-1] where r = rank(input) as follows:

If axes are omitted, they are set to [0, …, r-1]. If steps are omitted, they are set to [1, …, 1] of length len(starts)

The effective values are initialized as start[i] = 0, end[i] = dims[i] where dims are the dimensions of input and `step[i] = `1.

All negative elements of axes are made non-negatve by adding r to them, where r =rank(input).

All negative values in starts[i] and ends[i] have dims[axes[i]] added to them, where dims are the dimensions of input. Then start[axes[i]] is the adjusted starts[i] is clamped into the range [0, dims[axes[i]]] for positive stepping and [0, dims[axes[i]]-1] for negative stepping.

The clamping for the adjusted ends[i] depends on the sign of steps[i] and must accommodate copying 0 through dims[axes[i]] elements, so for positive stepping end[axes[i]] is clamped to [0, dims[axes[i]]], while for negative stepping it is clamped to [-1, dims[axes[i]]-1].

Finally, step[axes[i]] = steps[i].

For slicing to the end of a dimension with unknown size, it is recommended to pass in INT_MAX when slicing forward and ‘INT_MIN’ when slicing backward.

Example 1:
data = [

[1, 2, 3, 4], [5, 6, 7, 8],

] axes = [0, 1] starts = [1, 0] ends = [2, 3] steps = [1, 2] result = [

[5, 7],

]

Example 2:
data = [

[1, 2, 3, 4], [5, 6, 7, 8],

] starts = [0, 1] ends = [-1, 1000] result = [

[2, 3, 4],

]

Inputs

Between 3 and 5 inputs.

  • data (heterogeneous) - T: Tensor of data to extract slices from.

  • starts (heterogeneous) - Tind: 1-D tensor of starting indices of corresponding axis in axes

  • ends (heterogeneous) - Tind: 1-D tensor of ending indices (exclusive) of corresponding axis in axes

  • axes (optional, heterogeneous) - Tind: 1-D tensor of axes that starts and ends apply to. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Behavior is undefined if an axis is repeated.

  • steps (optional, heterogeneous) - Tind: 1-D tensor of slice step of corresponding axis in axes. Negative value means slicing backward. ‘steps’ cannot be 0. Defaults to 1s.

Outputs

  • output (heterogeneous) - T: Sliced data tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxSlice_1#

class mlprodict.npy.xop_auto_import_.OnnxSlice_1(*args, **kwargs)#

Version

  • name: Slice (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Produces a slice of the input tensor along multiple axes. Similar to numpy: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html Slices uses axes, starts and ends attributes to specify the start and end dimension for each axis in the list of axes, it uses this information to slice the input data tensor. If a negative value is passed for any of the start or end indices, it represent number of elements before the end of that dimension. If the value passed to start or end is larger than the n (the number of elements in this dimension), it represents n. For slicing to the end of a dimension with unknown size, it is recommended to pass in INT_MAX. If axes are omitted, they are set to [0, …, ndim-1]. Example 1:

data = [

[1, 2, 3, 4], [5, 6, 7, 8],

] axes = [0, 1] starts = [1, 0] ends = [2, 3] result = [

[5, 6, 7],

]

Example 2:
data = [

[1, 2, 3, 4], [5, 6, 7, 8],

] starts = [0, 1] ends = [-1, 1000] result = [

[2, 3, 4],

]

Attributes

  • axes: Axes that starts and ends apply to. It’s optional. If not present, will be treated as [0, 1, …, len(starts) - 1].

  • ends (required): Ending indices (exclusive) of corresponding axis in axes`

  • starts (required): Starting indices of corresponding axis in axes

Inputs

  • data (heterogeneous) - T: Tensor of data to extract slices from.

Outputs

  • output (heterogeneous) - T: Sliced data tensor.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSlice_10#

class mlprodict.npy.xop_auto_import_.OnnxSlice_10(*args, **kwargs)#

Version

  • name: Slice (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Produces a slice of the input tensor along multiple axes. Similar to numpy: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html Slices uses starts, ends, axes and steps inputs to specify the start and end dimension and step for each axis in the list of axes, it uses this information to slice the input data tensor. If a negative value is passed for any of the start or end indices, it represent number of elements before the end of that dimension. If the value passed to start or end is larger than the n (the number of elements in this dimension), it represents n. For slicing to the end of a dimension with unknown size, it is recommended to pass in INT_MAX. If a negative value is passed for step, it represents slicing backward. If axes are omitted, they are set to [0, …, ndim-1]. If steps are omitted, they are set to [1, …, 1] of length len(starts) Example 1:

data = [

[1, 2, 3, 4], [5, 6, 7, 8],

] axes = [0, 1] starts = [1, 0] ends = [2, 3] steps = [1, 2] result = [

[5, 7],

]

Example 2:
data = [

[1, 2, 3, 4], [5, 6, 7, 8],

] starts = [0, 1] ends = [-1, 1000] result = [

[2, 3, 4],

]

Inputs

Between 3 and 5 inputs.

  • data (heterogeneous) - T: Tensor of data to extract slices from.

  • starts (heterogeneous) - Tind: 1-D tensor of starting indices of corresponding axis in axes

  • ends (heterogeneous) - Tind: 1-D tensor of ending indices (exclusive) of corresponding axis in axes

  • axes (optional, heterogeneous) - Tind: 1-D tensor of axes that starts and ends apply to.

  • steps (optional, heterogeneous) - Tind: 1-D tensor of slice step of corresponding axis in axes. Default to 1.

Outputs

  • output (heterogeneous) - T: Sliced data tensor.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxSlice_11#

class mlprodict.npy.xop_auto_import_.OnnxSlice_11(*args, **kwargs)#

Version

  • name: Slice (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Produces a slice of the input tensor along multiple axes. Similar to numpy: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html Slices uses starts, ends, axes and steps inputs to specify the start and end dimension and step for each axis in the list of axes, it uses this information to slice the input data tensor. If a negative value is passed for any of the start or end indices, it represents number of elements before the end of that dimension. If the value passed to start or end is larger than the n (the number of elements in this dimension), it represents n. For slicing to the end of a dimension with unknown size, it is recommended to pass in INT_MAX when slicing forward and ‘INT_MIN’ when slicing backward. If a negative value is passed for step, it represents slicing backward. However step value cannot be 0. If axes are omitted, they are set to [0, …, ndim-1]. If steps are omitted, they are set to [1, …, 1] of length len(starts) Example 1:

data = [

[1, 2, 3, 4], [5, 6, 7, 8],

] axes = [0, 1] starts = [1, 0] ends = [2, 3] steps = [1, 2] result = [

[5, 7],

]

Example 2:
data = [

[1, 2, 3, 4], [5, 6, 7, 8],

] starts = [0, 1] ends = [-1, 1000] result = [

[2, 3, 4],

]

Inputs

Between 3 and 5 inputs.

  • data (heterogeneous) - T: Tensor of data to extract slices from.

  • starts (heterogeneous) - Tind: 1-D tensor of starting indices of corresponding axis in axes

  • ends (heterogeneous) - Tind: 1-D tensor of ending indices (exclusive) of corresponding axis in axes

  • axes (optional, heterogeneous) - Tind: 1-D tensor of axes that starts and ends apply to. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).

  • steps (optional, heterogeneous) - Tind: 1-D tensor of slice step of corresponding axis in axes. Negative value means slicing backward. ‘steps’ cannot be 0. Defaults to 1.

Outputs

  • output (heterogeneous) - T: Sliced data tensor.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxSlice_13#

class mlprodict.npy.xop_auto_import_.OnnxSlice_13(*args, **kwargs)#

Version

  • name: Slice (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Produces a slice of the input tensor along multiple axes. Similar to numpy: https://numpy.org/doc/stable/user/basics.indexing.html?highlight=slice#slicing-and-striding

Slice uses the starts, ends, axes and steps inputs to select a sub-tensor of its input data tensor.

An effective start[i], end[i], and step[i] must be computed for each i in [0, … r-1] where r = rank(input) as follows:

If axes are omitted, they are set to [0, …, r-1]. If steps are omitted, they are set to [1, …, 1] of length len(starts)

The effective values are initialized as start[i] = 0, end[i] = dims[i] where dims are the dimensions of input and `step[i] = `1.

All negative elements of axes are made non-negatve by adding r to them, where r =rank(input).

All negative values in starts[i] and ends[i] have dims[axes[i]] added to them, where dims are the dimensions of input. Then start[axes[i]] is the adjusted starts[i] is clamped into the range [0, dims[axes[i]]] for positive stepping and [0, dims[axes[i]]-1] for negative stepping.

The clamping for the adjusted ends[i] depends on the sign of steps[i] and must accommodate copying 0 through dims[axes[i]] elements, so for positive stepping end[axes[i]] is clamped to [0, dims[axes[i]]], while for negative stepping it is clamped to [-1, dims[axes[i]]-1].

Finally, step[axes[i]] = steps[i].

For slicing to the end of a dimension with unknown size, it is recommended to pass in INT_MAX when slicing forward and ‘INT_MIN’ when slicing backward.

Example 1:
data = [

[1, 2, 3, 4], [5, 6, 7, 8],

] axes = [0, 1] starts = [1, 0] ends = [2, 3] steps = [1, 2] result = [

[5, 7],

]

Example 2:
data = [

[1, 2, 3, 4], [5, 6, 7, 8],

] starts = [0, 1] ends = [-1, 1000] result = [

[2, 3, 4],

]

Inputs

Between 3 and 5 inputs.

  • data (heterogeneous) - T: Tensor of data to extract slices from.

  • starts (heterogeneous) - Tind: 1-D tensor of starting indices of corresponding axis in axes

  • ends (heterogeneous) - Tind: 1-D tensor of ending indices (exclusive) of corresponding axis in axes

  • axes (optional, heterogeneous) - Tind: 1-D tensor of axes that starts and ends apply to. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data). Behavior is undefined if an axis is repeated.

  • steps (optional, heterogeneous) - Tind: 1-D tensor of slice step of corresponding axis in axes. Negative value means slicing backward. ‘steps’ cannot be 0. Defaults to 1s.

Outputs

  • output (heterogeneous) - T: Sliced data tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain indices to integer types

OnnxSoftmax#

class mlprodict.npy.xop_auto_import_.OnnxSoftmax(*args, **kwargs)#

Version

  • name: Softmax (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The operator computes the normalized exponential values for the given input:

Softmax(input, axis) = Exp(input) / ReduceSum(Exp(input), axis=axis, keepdims=1)

The “axis” attribute indicates the dimension along which Softmax will be performed. The output tensor has the same shape and contains the Softmax values of the corresponding input.

Attributes

  • axis:

    Describes the dimension Softmax will be performed on. Negative

    value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is -1.

Inputs

  • input (heterogeneous) - T: The input tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as the input tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSoftmaxCrossEntropyLoss#

class mlprodict.npy.xop_auto_import_.OnnxSoftmaxCrossEntropyLoss(*args, **kwargs)#

Version

This version of the operator has been available since version 13.

Summary

Loss function that measures the softmax cross entropy between ‘scores’ and ‘labels’. This operator first computes a loss tensor whose shape is identical to the labels input. If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, …, l_N). If the input is N-D tensor with shape (N, C, D1, D2, …, Dk), the loss tensor L may have (N, D1, D2, …, Dk) as its shape and L[i,][j_1][j_2]…[j_k] denotes a scalar element in L. After L is available, this operator can optionally do a reduction operator.

shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

The loss for one sample, l_i, can caculated as follows:

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.

or

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if ‘weights’ is provided.

loss is zero for the case when label-value equals ignore_index.

l[i][d1][d2]…[dk] = 0, when labels[n][d1][d2]…[dk] = ignore_index

where:

p = Softmax(scores) y = Log(p) c = labels[i][d1][d2]…[dk]

Finally, L is optionally reduced: If reduction = ‘none’, the output is L with shape (N, D1, D2, …, Dk). If reduction = ‘sum’, the output is scalar: Sum(L). If reduction = ‘mean’, the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W), where tensor W is of shape (N, D1, D2, …, Dk) and W[n][d1][d2]…[dk] = weights[labels[i][d1][d2]…[dk]].

Attributes

  • ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: no reduction will be applied, ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the number of elements in the output. Default value is 'mean'.

Inputs

Between 2 and 3 inputs.

  • scores (heterogeneous) - T: The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , …, Dk], where K is the number of dimensions.

  • labels (heterogeneous) - Tind: The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, …, Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.

  • weights (optional, heterogeneous) - T: A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: Weighted loss float Tensor. If reduction is ‘none’, this has the shape of [batch_size], or [batch_size, D1, D2, …, Dk] in case of K-dimensional loss. Otherwise, it is a scalar.

  • log_prob (optional, heterogeneous) - T: Log probability tensor. If the output of softmax is prob, its value is log(prob).

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types

OnnxSoftmaxCrossEntropyLoss_12#

class mlprodict.npy.xop_auto_import_.OnnxSoftmaxCrossEntropyLoss_12(*args, **kwargs)#

Version

This version of the operator has been available since version 12.

Summary

Loss function that measures the softmax cross entropy between ‘scores’ and ‘labels’. This operator first computes a loss tensor whose shape is identical to the labels input. If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, …, l_N). If the input is N-D tensor with shape (N, C, D1, D2, …, Dk), the loss tensor L may have (N, D1, D2, …, Dk) as its shape and L[i,][j_1][j_2]…[j_k] denotes a scalar element in L. After L is available, this operator can optionally do a reduction operator.

shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

The loss for one sample, l_i, can caculated as follows:

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.

or

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if ‘weights’ is provided.

loss is zero for the case when label-value equals ignore_index.

l[i][d1][d2]…[dk] = 0, when labels[n][d1][d2]…[dk] = ignore_index

where:

p = Softmax(scores) y = Log(p) c = labels[i][d1][d2]…[dk]

Finally, L is optionally reduced: If reduction = ‘none’, the output is L with shape (N, D1, D2, …, Dk). If reduction = ‘sum’, the output is scalar: Sum(L). If reduction = ‘mean’, the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W), where tensor W is of shape (N, D1, D2, …, Dk) and W[n][d1][d2]…[dk] = weights[labels[i][d1][d2]…[dk]].

Attributes

  • ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: no reduction will be applied, ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the number of elements in the output. Default value is 'mean'.

Inputs

Between 2 and 3 inputs.

  • scores (heterogeneous) - T: The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , …, Dk], where K is the number of dimensions.

  • labels (heterogeneous) - Tind: The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, …, Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.

  • weights (optional, heterogeneous) - T: A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: Weighted loss float Tensor. If reduction is ‘none’, this has the shape of [batch_size], or [batch_size, D1, D2, …, Dk] in case of K-dimensional loss. Otherwise, it is a scalar.

  • log_prob (optional, heterogeneous) - T: Log probability tensor. If the output of softmax is prob, its value is log(prob).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types

OnnxSoftmaxCrossEntropyLoss_13#

class mlprodict.npy.xop_auto_import_.OnnxSoftmaxCrossEntropyLoss_13(*args, **kwargs)#

Version

This version of the operator has been available since version 13.

Summary

Loss function that measures the softmax cross entropy between ‘scores’ and ‘labels’. This operator first computes a loss tensor whose shape is identical to the labels input. If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, …, l_N). If the input is N-D tensor with shape (N, C, D1, D2, …, Dk), the loss tensor L may have (N, D1, D2, …, Dk) as its shape and L[i,][j_1][j_2]…[j_k] denotes a scalar element in L. After L is available, this operator can optionally do a reduction operator.

shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

The loss for one sample, l_i, can caculated as follows:

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.

or

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if ‘weights’ is provided.

loss is zero for the case when label-value equals ignore_index.

l[i][d1][d2]…[dk] = 0, when labels[n][d1][d2]…[dk] = ignore_index

where:

p = Softmax(scores) y = Log(p) c = labels[i][d1][d2]…[dk]

Finally, L is optionally reduced: If reduction = ‘none’, the output is L with shape (N, D1, D2, …, Dk). If reduction = ‘sum’, the output is scalar: Sum(L). If reduction = ‘mean’, the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W), where tensor W is of shape (N, D1, D2, …, Dk) and W[n][d1][d2]…[dk] = weights[labels[i][d1][d2]…[dk]].

Attributes

  • ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.

  • reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: no reduction will be applied, ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the number of elements in the output. Default value is 'mean'.

Inputs

Between 2 and 3 inputs.

  • scores (heterogeneous) - T: The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , …, Dk], where K is the number of dimensions.

  • labels (heterogeneous) - Tind: The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, …, Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.

  • weights (optional, heterogeneous) - T: A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.

Outputs

Between 1 and 2 outputs.

  • output (heterogeneous) - T: Weighted loss float Tensor. If reduction is ‘none’, this has the shape of [batch_size], or [batch_size, D1, D2, …, Dk] in case of K-dimensional loss. Otherwise, it is a scalar.

  • log_prob (optional, heterogeneous) - T: Log probability tensor. If the output of softmax is prob, its value is log(prob).

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types

OnnxSoftmax_1#

class mlprodict.npy.xop_auto_import_.OnnxSoftmax_1(*args, **kwargs)#

Version

  • name: Softmax (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

The operator computes the softmax (normalized exponential) values for each layer in the batch

of the given input. The input is a 2-D tensor (Tensor<float>) of size

(batch_size x input_feature_dimensions). The output tensor has the same shape and contains the softmax values of the corresponding input.

Input does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor input in [a_0, a_1, …, a_{k-1}, a_k, …, a_{n-1}] and k is the axis provided, then input will be coerced into a 2-dimensional tensor with dimensions [a_0 * … * a_{k-1}, a_k * … * a_{n-1}]. For the default case where axis=1, this means the input tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * … * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = N and a_1 * … * a_{n-1} = D. Each of these dimensions must be matched correctly, or else the operator will throw errors.

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size Default value is 1.

Inputs

  • input (heterogeneous) - T: The input tensor that’s coerced into a 2D matrix of size (NxD) as described above.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as input tensor (the original size without coercion).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSoftmax_11#

class mlprodict.npy.xop_auto_import_.OnnxSoftmax_11(*args, **kwargs)#

Version

  • name: Softmax (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

The operator computes the softmax (normalized exponential) values for each layer in the batch

of the given input.

The input does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor input in [a_0, a_1, …, a_{k-1}, a_k, …, a_{n-1}] and k is the axis provided, then input will be coerced into a 2-dimensional tensor with dimensions [a_0 * … * a_{k-1}, a_k * … * a_{n-1}]. For the default case where axis=1, this means the input tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * … * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = N and a_1 * … * a_{n-1} = D. Each of these dimensions must be matched correctly, or else the operator will throw errors. The output tensor has the same shape and contains the softmax values of the corresponding input.

Attributes

  • axis: Describes the axis of the inputs when coerced to 2D; defaults to one because the 0th axis most likely describes the batch_size. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is 1.

Inputs

  • input (heterogeneous) - T: The input tensor that’s coerced into a 2D matrix of size (NxD) as described above.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as input tensor (the original size without coercion).

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSoftmax_13#

class mlprodict.npy.xop_auto_import_.OnnxSoftmax_13(*args, **kwargs)#

Version

  • name: Softmax (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

The operator computes the normalized exponential values for the given input:

Softmax(input, axis) = Exp(input) / ReduceSum(Exp(input), axis=axis, keepdims=1)

The “axis” attribute indicates the dimension along which Softmax will be performed. The output tensor has the same shape and contains the Softmax values of the corresponding input.

Attributes

  • axis:

    Describes the dimension Softmax will be performed on. Negative

    value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is -1.

Inputs

  • input (heterogeneous) - T: The input tensor of rank >= axis.

Outputs

  • output (heterogeneous) - T: The output values with the same shape as the input tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSoftplus#

class mlprodict.npy.xop_auto_import_.OnnxSoftplus(*args, **kwargs)#

Version

  • name: Softplus (GitHub)

  • domain: main

  • since_version: 1

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Softplus takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the softplus function, y = ln(exp(x) + 1), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: 1D input tensor

Outputs

  • Y (heterogeneous) - T: 1D input tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSoftplus_1#

class mlprodict.npy.xop_auto_import_.OnnxSoftplus_1(*args, **kwargs)#

Version

  • name: Softplus (GitHub)

  • domain: main

  • since_version: 1

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Softplus takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the softplus function, y = ln(exp(x) + 1), is applied to the tensor elementwise.

Inputs

  • X (heterogeneous) - T: 1D input tensor

Outputs

  • Y (heterogeneous) - T: 1D input tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSoftsign#

class mlprodict.npy.xop_auto_import_.OnnxSoftsign(*args, **kwargs)#

Version

  • name: Softsign (GitHub)

  • domain: main

  • since_version: 1

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Calculates the softsign (x/(1+|x|)) of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The softsign (x/(1+|x|)) values of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSoftsign_1#

class mlprodict.npy.xop_auto_import_.OnnxSoftsign_1(*args, **kwargs)#

Version

  • name: Softsign (GitHub)

  • domain: main

  • since_version: 1

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Calculates the softsign (x/(1+|x|)) of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The softsign (x/(1+|x|)) values of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSpaceToDepth#

class mlprodict.npy.xop_auto_import_.OnnxSpaceToDepth(*args, **kwargs)#

Version

  • name: SpaceToDepth (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

SpaceToDepth rearranges blocks of spatial data into depth. More specifically, this op outputs a copy of the input tensor where values from the height and width dimensions are moved to the depth dimension.

Attributes

  • blocksize (required): Blocks of [blocksize, blocksize] are moved.

Inputs

  • input (heterogeneous) - T: Input tensor of [N,C,H,W], where N is the batch axis, C is the channel or depth, H is the height and W is the width.

Outputs

  • output (heterogeneous) - T: Output tensor of [N, C * blocksize * blocksize, H/blocksize, W/blocksize].

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSpaceToDepth_1#

class mlprodict.npy.xop_auto_import_.OnnxSpaceToDepth_1(*args, **kwargs)#

Version

  • name: SpaceToDepth (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

SpaceToDepth rearranges blocks of spatial data into depth. More specifically, this op outputs a copy of the input tensor where values from the height and width dimensions are moved to the depth dimension.

Attributes

  • blocksize (required): Blocks of [blocksize, blocksize] are moved.

Inputs

  • input (heterogeneous) - T: Input tensor of [N,C,H,W], where N is the batch axis, C is the channel or depth, H is the height and W is the width.

Outputs

  • output (heterogeneous) - T: Output tensor of [N, C * blocksize * blocksize, H/blocksize, W/blocksize].

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSpaceToDepth_13#

class mlprodict.npy.xop_auto_import_.OnnxSpaceToDepth_13(*args, **kwargs)#

Version

  • name: SpaceToDepth (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

SpaceToDepth rearranges blocks of spatial data into depth. More specifically, this op outputs a copy of the input tensor where values from the height and width dimensions are moved to the depth dimension.

Attributes

  • blocksize (required): Blocks of [blocksize, blocksize] are moved.

Inputs

  • input (heterogeneous) - T: Input tensor of [N,C,H,W], where N is the batch axis, C is the channel or depth, H is the height and W is the width.

Outputs

  • output (heterogeneous) - T: Output tensor of [N, C * blocksize * blocksize, H/blocksize, W/blocksize].

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSplit#

class mlprodict.npy.xop_auto_import_.OnnxSplit(*args, **kwargs)#

Version

  • name: Split (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Split a tensor into a list of tensors, along the specified ‘axis’. Either input ‘split’ or the attribute ‘num_outputs’ should be specified, but not both. If the attribute ‘num_outputs’ is specified, then the tensor is split into equal sized parts. If the tensor is not evenly splittable into num_outputs, the last chunk will be smaller. If the input ‘split’ is specified, it indicates the sizes of each output in the split.

Attributes

  • axis: Which axis to split on. A negative value means counting dimensions from the back. Accepted range is [-rank, rank-1] where r = rank(input). Default value is 0.

  • num_outputs: Number of outputs to split parts of the tensor into. If the tensor is not evenly splittable the last chunk will be smaller.

Inputs

Between 1 and 2 inputs.

  • input (heterogeneous) - T: The tensor to split

  • split (optional, heterogeneous) - tensor(int64): Optional length of each output. Values should be >= 0.Sum of the values must be equal to the dim value at ‘axis’ specified.

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic, heterogeneous) - T: One or more outputs forming list of tensors after splitting

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSplitToSequence#

class mlprodict.npy.xop_auto_import_.OnnxSplitToSequence(*args, **kwargs)#

Version

  • name: SplitToSequence (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Split a tensor into a sequence of tensors, along the specified ‘axis’. Lengths of the parts can be specified using argument ‘split’. ‘split’ must contain only positive numbers. ‘split’ is either a scalar (tensor of empty shape), or a 1-D tensor. If ‘split’ is a scalar, then ‘input’ will be split into equally sized chunks(if possible). Last chunk will be smaller if the ‘input’ size along the given axis ‘axis’ is not divisible by ‘split’. Otherwise, the tensor is split into ‘size(split)’ chunks, with lengths of the parts on ‘axis’ specified in ‘split’. In this scenario, the sum of entries in ‘split’ must be equal to the dimension size of input tensor on ‘axis’.

Attributes

  • axis: Which axis to split on. A negative value means counting dimensions from the back. Accepted range is [-rank, rank-1]. Default value is 0.

  • keepdims: Keep the split dimension or not. Default 1, which means we keep split dimension. If input ‘split’ is specified, this attribute is ignored. Default value is 1.

Inputs

Between 1 and 2 inputs.

  • input (heterogeneous) - T: The tensor to split

  • split (optional, heterogeneous) - I: Length of each output. It can be either a scalar(tensor of empty shape), or a 1-D tensor. All values must be >= 0.

Outputs

  • output_sequence (heterogeneous) - S: One or more outputs forming a sequence of tensors after splitting

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all tensor types.

  • I in ( tensor(int32), tensor(int64) ): Constrain split size to integral tensor.

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain output types to all tensor types.

OnnxSplitToSequence_11#

class mlprodict.npy.xop_auto_import_.OnnxSplitToSequence_11(*args, **kwargs)#

Version

  • name: SplitToSequence (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Split a tensor into a sequence of tensors, along the specified ‘axis’. Lengths of the parts can be specified using argument ‘split’. ‘split’ must contain only positive numbers. ‘split’ is either a scalar (tensor of empty shape), or a 1-D tensor. If ‘split’ is a scalar, then ‘input’ will be split into equally sized chunks(if possible). Last chunk will be smaller if the ‘input’ size along the given axis ‘axis’ is not divisible by ‘split’. Otherwise, the tensor is split into ‘size(split)’ chunks, with lengths of the parts on ‘axis’ specified in ‘split’. In this scenario, the sum of entries in ‘split’ must be equal to the dimension size of input tensor on ‘axis’.

Attributes

  • axis: Which axis to split on. A negative value means counting dimensions from the back. Accepted range is [-rank, rank-1]. Default value is 0.

  • keepdims: Keep the split dimension or not. Default 1, which means we keep split dimension. If input ‘split’ is specified, this attribute is ignored. Default value is 1.

Inputs

Between 1 and 2 inputs.

  • input (heterogeneous) - T: The tensor to split

  • split (optional, heterogeneous) - I: Length of each output. It can be either a scalar(tensor of empty shape), or a 1-D tensor. All values must be >= 0.

Outputs

  • output_sequence (heterogeneous) - S: One or more outputs forming a sequence of tensors after splitting

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input types to all tensor types.

  • I in ( tensor(int32), tensor(int64) ): Constrain split size to integral tensor.

  • S in ( seq(tensor(bool)), seq(tensor(complex128)), seq(tensor(complex64)), seq(tensor(double)), seq(tensor(float)), seq(tensor(float16)), seq(tensor(int16)), seq(tensor(int32)), seq(tensor(int64)), seq(tensor(int8)), seq(tensor(string)), seq(tensor(uint16)), seq(tensor(uint32)), seq(tensor(uint64)), seq(tensor(uint8)) ): Constrain output types to all tensor types.

OnnxSplit_1#

class mlprodict.npy.xop_auto_import_.OnnxSplit_1(*args, **kwargs)#

Version

  • name: Split (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Split a tensor into a list of tensors, along the specified ‘axis’. The lengths of the split can be specified using argument ‘axis’ or optional second input blob to the operator. Otherwise, the tensor is split to equal sized parts.

Attributes

  • axis: Which axis to split on

  • split: length of each output

Inputs

Between 1 and 2 inputs.

  • input (heterogeneous) - T: The tensor to split

  • split (optional, heterogeneous) - T: Optional list of output lengths (see also arg ‘split’)

Outputs

Between 1 and 2147483647 outputs.

  • outputs… (variadic, heterogeneous) - T: One or more outputs forming list of tensors after splitting

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input types to float tensors.

OnnxSplit_11#

class mlprodict.npy.xop_auto_import_.OnnxSplit_11(*args, **kwargs)#

Version

  • name: Split (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Split a tensor into a list of tensors, along the specified ‘axis’. Lengths of the parts can be specified using argument ‘split’. Otherwise, the tensor is split to equal sized parts.

Attributes

  • axis: Which axis to split on. A negative value means counting dimensions from the back. Accepted range is [-rank, rank-1] where r = rank(input). Default value is 0.

  • split: length of each output. Values should be >= 0.

Inputs

  • input (heterogeneous) - T: The tensor to split

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic, heterogeneous) - T: One or more outputs forming list of tensors after splitting

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSplit_13#

class mlprodict.npy.xop_auto_import_.OnnxSplit_13(*args, **kwargs)#

Version

  • name: Split (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Split a tensor into a list of tensors, along the specified ‘axis’. Lengths of the parts can be specified using input ‘split’. Otherwise, the tensor is split to equal sized parts.

Attributes

  • axis: Which axis to split on. A negative value means counting dimensions from the back. Accepted range is [-rank, rank-1] where r = rank(input). Default value is 0.

Inputs

Between 1 and 2 inputs.

  • input (heterogeneous) - T: The tensor to split

  • split (optional, heterogeneous) - tensor(int64): Optional length of each output. Values should be >= 0.Sum of the values must be equal to the dim value at ‘axis’ specified.

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic, heterogeneous) - T: One or more outputs forming list of tensors after splitting

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSplit_18#

class mlprodict.npy.xop_auto_import_.OnnxSplit_18(*args, **kwargs)#

Version

  • name: Split (GitHub)

  • domain: main

  • since_version: 18

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 18.

Summary

Split a tensor into a list of tensors, along the specified ‘axis’. Either input ‘split’ or the attribute ‘num_outputs’ should be specified, but not both. If the attribute ‘num_outputs’ is specified, then the tensor is split into equal sized parts. If the tensor is not evenly splittable into num_outputs, the last chunk will be smaller. If the input ‘split’ is specified, it indicates the sizes of each output in the split.

Attributes

  • axis: Which axis to split on. A negative value means counting dimensions from the back. Accepted range is [-rank, rank-1] where r = rank(input). Default value is 0.

  • num_outputs: Number of outputs to split parts of the tensor into. If the tensor is not evenly splittable the last chunk will be smaller.

Inputs

Between 1 and 2 inputs.

  • input (heterogeneous) - T: The tensor to split

  • split (optional, heterogeneous) - tensor(int64): Optional length of each output. Values should be >= 0.Sum of the values must be equal to the dim value at ‘axis’ specified.

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic, heterogeneous) - T: One or more outputs forming list of tensors after splitting

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSplit_2#

class mlprodict.npy.xop_auto_import_.OnnxSplit_2(*args, **kwargs)#

Version

  • name: Split (GitHub)

  • domain: main

  • since_version: 2

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 2.

Summary

Split a tensor into a list of tensors, along the specified ‘axis’. Lengths of the parts can be specified using argument ‘split’. Otherwise, the tensor is split to equal sized parts.

Attributes

  • axis: Which axis to split on. Default value is 0.

  • split: length of each output

Inputs

  • input (heterogeneous) - T: The tensor to split

Outputs

Between 1 and 2147483647 outputs.

  • outputs (variadic, heterogeneous) - T: One or more outputs forming list of tensors after splitting

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSqrt#

class mlprodict.npy.xop_auto_import_.OnnxSqrt(*args, **kwargs)#

Version

  • name: Sqrt (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Square root takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the square root is, y = x^0.5, is applied to the tensor elementwise. If x is negative, then it will return NaN.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSqrt_1#

class mlprodict.npy.xop_auto_import_.OnnxSqrt_1(*args, **kwargs)#

Version

  • name: Sqrt (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Square root takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the square root is, y = x^0.5, is applied to the tensor elementwise. If x is negative, then it will return NaN.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSqrt_13#

class mlprodict.npy.xop_auto_import_.OnnxSqrt_13(*args, **kwargs)#

Version

  • name: Sqrt (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Square root takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the square root is, y = x^0.5, is applied to the tensor elementwise. If x is negative, then it will return NaN.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSqrt_6#

class mlprodict.npy.xop_auto_import_.OnnxSqrt_6(*args, **kwargs)#

Version

  • name: Sqrt (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Square root takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the square root is, y = x^0.5, is applied to the tensor elementwise. If x is negative, then it will return NaN.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSqueeze#

class mlprodict.npy.xop_auto_import_.OnnxSqueeze(*args, **kwargs)#

Version

  • name: Squeeze (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Remove single-dimensional entries from the shape of a tensor. Takes an input axes with a list of axes to squeeze. If axes is not provided, all the single dimensions will be removed from the shape. If an axis is selected with shape entry not equal to one, an error is raised.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: Tensors with at least max(dims) dimensions.

  • axes (optional, heterogeneous) - tensor(int64): List of integers indicating the dimensions to squeeze. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • squeezed (heterogeneous) - T: Reshaped tensor with same data as input.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSqueeze_1#

class mlprodict.npy.xop_auto_import_.OnnxSqueeze_1(*args, **kwargs)#

Version

  • name: Squeeze (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Remove single-dimensional entries from the shape of a tensor. Takes a parameter axes with a list of axes to squeeze. If axes is not provided, all the single dimensions will be removed from the shape. If an axis is selected with shape entry not equal to one, an error is raised.

Attributes

  • axes: List of non-negative integers, indicate the dimensions to squeeze.

Inputs

  • data (heterogeneous) - T: Tensors with at least max(dims) dimensions.

Outputs

  • squeezed (heterogeneous) - T: Reshaped tensor with same data as input.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSqueeze_11#

class mlprodict.npy.xop_auto_import_.OnnxSqueeze_11(*args, **kwargs)#

Version

  • name: Squeeze (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Remove single-dimensional entries from the shape of a tensor. Takes a parameter axes with a list of axes to squeeze. If axes is not provided, all the single dimensions will be removed from the shape. If an axis is selected with shape entry not equal to one, an error is raised.

Attributes

  • axes: List of integers indicating the dimensions to squeeze. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).

Inputs

  • data (heterogeneous) - T: Tensors with at least max(dims) dimensions.

Outputs

  • squeezed (heterogeneous) - T: Reshaped tensor with same data as input.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxSqueeze_13#

class mlprodict.npy.xop_auto_import_.OnnxSqueeze_13(*args, **kwargs)#

Version

  • name: Squeeze (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Remove single-dimensional entries from the shape of a tensor. Takes an input axes with a list of axes to squeeze. If axes is not provided, all the single dimensions will be removed from the shape. If an axis is selected with shape entry not equal to one, an error is raised.

Inputs

Between 1 and 2 inputs.

  • data (heterogeneous) - T: Tensors with at least max(dims) dimensions.

  • axes (optional, heterogeneous) - tensor(int64): List of integers indicating the dimensions to squeeze. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(data).

Outputs

  • squeezed (heterogeneous) - T: Reshaped tensor with same data as input.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxStringNormalizer#

class mlprodict.npy.xop_auto_import_.OnnxStringNormalizer(*args, **kwargs)#

Version

  • name: StringNormalizer (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

StringNormalization performs string operations for basic cleaning. This operator has only one input (denoted by X) and only one output (denoted by Y). This operator first examines the elements in the X, and removes elements specified in “stopwords” attribute. After removing stop words, the intermediate result can be further lowercased, uppercased, or just returned depending the “case_change_action” attribute. This operator only accepts [C]- and [1, C]-tensor. If all elements in X are dropped, the output will be the empty value of string tensor with shape [1] if input shape is [C] and shape [1, 1] if input shape is [1, C].

Attributes

  • case_change_action: string enum that cases output to be lowercased/uppercases/unchanged. Valid values are “LOWER”, “UPPER”, “NONE”. Default is “NONE” Default value is 'NONE'.

  • is_case_sensitive: Boolean. Whether the identification of stop words in X is case- sensitive. Default is false Default value is 0.

  • locale: Environment dependent string that denotes the locale according to which output strings needs to be upper/lowercased.Default en_US or platform specific equivalent as decided by the implementation.

  • stopwords: List of stop words. If not set, no word would be removed from X.

Inputs

  • X (heterogeneous) - tensor(string): UTF-8 strings to normalize

Outputs

  • Y (heterogeneous) - tensor(string): UTF-8 Normalized strings

OnnxStringNormalizer_10#

class mlprodict.npy.xop_auto_import_.OnnxStringNormalizer_10(*args, **kwargs)#

Version

  • name: StringNormalizer (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

StringNormalization performs string operations for basic cleaning. This operator has only one input (denoted by X) and only one output (denoted by Y). This operator first examines the elements in the X, and removes elements specified in “stopwords” attribute. After removing stop words, the intermediate result can be further lowercased, uppercased, or just returned depending the “case_change_action” attribute. This operator only accepts [C]- and [1, C]-tensor. If all elements in X are dropped, the output will be the empty value of string tensor with shape [1] if input shape is [C] and shape [1, 1] if input shape is [1, C].

Attributes

  • case_change_action: string enum that cases output to be lowercased/uppercases/unchanged. Valid values are “LOWER”, “UPPER”, “NONE”. Default is “NONE” Default value is 'NONE'.

  • is_case_sensitive: Boolean. Whether the identification of stop words in X is case- sensitive. Default is false Default value is 0.

  • locale: Environment dependent string that denotes the locale according to which output strings needs to be upper/lowercased.Default en_US or platform specific equivalent as decided by the implementation.

  • stopwords: List of stop words. If not set, no word would be removed from X.

Inputs

  • X (heterogeneous) - tensor(string): UTF-8 strings to normalize

Outputs

  • Y (heterogeneous) - tensor(string): UTF-8 Normalized strings

OnnxSub#

class mlprodict.npy.xop_auto_import_.OnnxSub(*args, **kwargs)#

Version

  • name: Sub (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Performs element-wise binary subtraction (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

(Opset 14 change): Extend supported types to include uint8, int8, uint16, and int16.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxSub_1#

class mlprodict.npy.xop_auto_import_.OnnxSub_1(*args, **kwargs)#

Version

  • name: Sub (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Performs element-wise binary subtraction (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of element size 1 (including a scalar tensor and any tensor with rank equal to or smaller than the first tensor), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor shape(A) = (2, 3, 4, 5), shape(B) = (5,) shape(A) = (2, 3, 4, 5), shape(B) = (4, 5) shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1 shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

  • axis: If set, defines the broadcast dimensions. See doc for details.

  • broadcast: Pass 1 to enable broadcasting Default value is 0.

  • consumed_inputs: legacy optimization attribute.

Inputs

  • A (heterogeneous) - T: First operand, should share the type with the second operand.

  • B (heterogeneous) - T: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

  • C (heterogeneous) - T: Result, has same dimensions and type as A

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSub_13#

class mlprodict.npy.xop_auto_import_.OnnxSub_13(*args, **kwargs)#

Version

  • name: Sub (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Performs element-wise binary subtraction (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxSub_14#

class mlprodict.npy.xop_auto_import_.OnnxSub_14(*args, **kwargs)#

Version

  • name: Sub (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Performs element-wise binary subtraction (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

(Opset 14 change): Extend supported types to include uint8, int8, uint16, and int16.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all numeric tensors.

OnnxSub_6#

class mlprodict.npy.xop_auto_import_.OnnxSub_6(*args, **kwargs)#

Version

  • name: Sub (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Performs element-wise binary subtraction (with limited broadcast support).

If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of element size 1 (including a scalar tensor and any tensor with rank equal to or smaller than the first tensor), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet.

For example, the following tensor shapes are supported (with broadcast=1):

shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar tensor shape(A) = (2, 3, 4, 5), shape(B) = (1, 1), i.e. B is an 1-element tensor shape(A) = (2, 3, 4, 5), shape(B) = (5,) shape(A) = (2, 3, 4, 5), shape(B) = (4, 5) shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1 shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Attribute broadcast=1 needs to be passed to enable broadcasting.

Attributes

  • axis: If set, defines the broadcast dimensions. See doc for details.

  • broadcast: Pass 1 to enable broadcasting Default value is 0.

Inputs

  • A (heterogeneous) - T: First operand, should share the type with the second operand.

  • B (heterogeneous) - T: Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.

Outputs

  • C (heterogeneous) - T: Result, has same dimensions and type as A

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxSub_7#

class mlprodict.npy.xop_auto_import_.OnnxSub_7(*args, **kwargs)#

Version

  • name: Sub (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Performs element-wise binary subtraction (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First operand.

  • B (heterogeneous) - T: Second operand.

Outputs

  • C (heterogeneous) - T: Result, has same element type as two inputs

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int32), tensor(int64), tensor(uint32), tensor(uint64) ): Constrain input and output types to high-precision numeric tensors.

OnnxSum#

class mlprodict.npy.xop_auto_import_.OnnxSum(*args, **kwargs)#

Version

  • name: Sum (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Element-wise sum of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for sum.

Outputs

  • sum (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSum_1#

class mlprodict.npy.xop_auto_import_.OnnxSum_1(*args, **kwargs)#

Version

  • name: Sum (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Element-wise sum of each of the input tensors. All inputs and outputs must have the same shape and data type.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for Sum.

Outputs

  • sum (heterogeneous) - T: Output tensor. Same dimension as inputs.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSum_13#

class mlprodict.npy.xop_auto_import_.OnnxSum_13(*args, **kwargs)#

Version

  • name: Sum (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Element-wise sum of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for sum.

Outputs

  • sum (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSum_6#

class mlprodict.npy.xop_auto_import_.OnnxSum_6(*args, **kwargs)#

Version

  • name: Sum (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Element-wise sum of each of the input tensors. All inputs and outputs must have the same shape and data type.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for Sum.

Outputs

  • sum (heterogeneous) - T: Output tensor. Same dimension as inputs.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxSum_8#

class mlprodict.npy.xop_auto_import_.OnnxSum_8(*args, **kwargs)#

Version

  • name: Sum (GitHub)

  • domain: main

  • since_version: 8

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 8.

Summary

Element-wise sum of each of the input tensors (with Numpy-style broadcasting support). All inputs and outputs must have the same data type. This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

Between 1 and 2147483647 inputs.

  • data_0 (variadic, heterogeneous) - T: List of tensors for sum.

Outputs

  • sum (heterogeneous) - T: Output tensor.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxTan#

class mlprodict.npy.xop_auto_import_.OnnxTan(*args, **kwargs)#

Version

  • name: Tan (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the tangent of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The tangent of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxTan_7#

class mlprodict.npy.xop_auto_import_.OnnxTan_7(*args, **kwargs)#

Version

  • name: Tan (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Calculates the tangent of the given input tensor, element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The tangent of the input tensor computed element-wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxTanh#

class mlprodict.npy.xop_auto_import_.OnnxTanh(*args, **kwargs)#

Version

  • name: Tanh (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Calculates the hyperbolic tangent of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic tangent values of the input tensor computed element- wise

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxTanh_1#

class mlprodict.npy.xop_auto_import_.OnnxTanh_1(*args, **kwargs)#

Version

  • name: Tanh (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: False

This version of the operator has been available since version 1.

Summary

Calculates the hyperbolic tangent of the given input tensor element-wise.

Attributes

  • consumed_inputs: legacy optimization attribute.

Inputs

  • input (heterogeneous) - T: 1-D input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic tangent values of the input tensor computed element- wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxTanh_13#

class mlprodict.npy.xop_auto_import_.OnnxTanh_13(*args, **kwargs)#

Version

  • name: Tanh (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Calculates the hyperbolic tangent of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic tangent values of the input tensor computed element- wise

Type Constraints

  • T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxTanh_6#

class mlprodict.npy.xop_auto_import_.OnnxTanh_6(*args, **kwargs)#

Version

  • name: Tanh (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Calculates the hyperbolic tangent of the given input tensor element-wise.

Inputs

  • input (heterogeneous) - T: Input tensor

Outputs

  • output (heterogeneous) - T: The hyperbolic tangent values of the input tensor computed element- wise

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxTfIdfVectorizer#

class mlprodict.npy.xop_auto_import_.OnnxTfIdfVectorizer(*args, **kwargs)#

Version

  • name: TfIdfVectorizer (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

This transform extracts n-grams from the input sequence and save them as a vector. Input can be either a 1-D or 2-D tensor. For 1-D input, output is the n-gram representation of that input. For 2-D input, the output is also a 2-D tensor whose i-th row is the n-gram representation of the i-th input row. More specifically, if input shape is [C], the corresponding output shape would be [max(ngram_indexes) + 1]. If input shape is [N, C], this operator produces a [N, max(ngram_indexes) + 1]-tensor.

In contrast to standard n-gram extraction, here, the indexes of extracting an n-gram from the original sequence are not necessarily consecutive numbers. The discontinuity between indexes are controlled by the number of skips. If the number of skips is 2, we should skip two tokens when scanning through the original sequence. Let’s consider an example. Assume that input sequence is [94, 17, 36, 12, 28] and the number of skips is 2. The associated 2-grams are [94, 12] and [17, 28] respectively indexed by [0, 3] and [1, 4]. If the number of skips becomes 0, the 2-grams generated are [94, 17], [17, 36], [36, 12], [12, 28] indexed by [0, 1], [1, 2], [2, 3], [3, 4], respectively.

The output vector (denoted by Y) stores the count of each n-gram; Y[ngram_indexes[i]] indicates the times that the i-th n-gram is found. The attribute ngram_indexes is used to determine the mapping between index i and the corresponding n-gram’s output coordinate. If pool_int64s is [94, 17, 17, 36], ngram_indexes is [1, 0], ngram_counts=[0, 0], then the Y[0] (first element in Y) and Y[1] (second element in Y) are the counts of [17, 36] and [94, 17], respectively. An n-gram which cannot be found in pool_strings/pool_int64s should be ignored and has no effect on the output. Note that we may consider all skips up to S when generating the n-grams.

The examples used above are true if mode is “TF”. If mode is “IDF”, all the counts larger than 1 would be truncated to 1 and the i-th element in weights would be used to scale (by multiplication) the count of the i-th n-gram in pool. If mode is “TFIDF”, this operator first computes the counts of all n-grams and then scale them by the associated values in the weights attribute.

Only one of pool_strings and pool_int64s can be set. If pool_int64s is set, the input should be an integer tensor. If pool_strings is set, the input must be a string tensor.

Attributes

  • max_gram_length (required): Maximum n-gram length. If this value is 3, 3-grams will be used to generate the output.

  • max_skip_count (required): Maximum number of items (integers/strings) to be skipped when constructing an n-gram from X. If max_skip_count=1, min_gram_length=2, max_gram_length=3, this operator may generate 2-grams with skip_count=0 and skip_count=1, and 3-grams with skip_count=0 and skip_count=1

  • min_gram_length (required): Minimum n-gram length. If this value is 2 and max_gram_length is 3, output may contain counts of 2-grams and 3-grams.

  • mode (required): The weighting criteria. It can be one of “TF” (term frequency), “IDF” (inverse document frequency), and “TFIDF” (the combination of TF and IDF)

  • ngram_counts (required): The starting indexes of 1-grams, 2-grams, and so on in pool. It is useful when determining the boundary between two consecutive collections of n-grams. For example, if ngram_counts is [0, 17, 36], the first index (zero-based) of 1-gram/2-gram/3-gram in pool are 0/17/36. This format is essentially identical to CSR (or CSC) sparse matrix format, and we choose to use this due to its popularity.

  • ngram_indexes (required): list of int64s (type: AttributeProto::INTS). This list is parallel to the specified ‘pool_*’ attribute. The i-th element in ngram_indexes indicate the coordinate of the i-th n-gram in the output tensor.

  • pool_int64s: List of int64 n-grams learned from the training set. Either this or pool_strings attributes must be present but not both. It’s an 1-D tensor starting with the collections of all 1-grams and ending with the collections of n-grams. The i-th element in pool stores the n-gram that should be mapped to coordinate ngram_indexes[i] in the output vector.

  • pool_strings: List of strings n-grams learned from the training set. Either this or pool_int64s attributes must be present but not both. It’s an 1-D tensor starting with the collections of all 1-grams and ending with the collections of n-grams. The i-th element in pool stores the n-gram that should be mapped to coordinate ngram_indexes[i] in the output vector.

  • weights: list of floats. This attribute stores the weight of each n-gram in pool. The i-th element in weights is the weight of the i-th n-gram in pool. Its length equals to the size of ngram_indexes. By default, weights is an all-one tensor.This attribute is used when mode is “IDF” or “TFIDF” to scale the associated word counts.

Inputs

  • X (heterogeneous) - T: Input for n-gram extraction

Outputs

  • Y (heterogeneous) - T1: Ngram results

Type Constraints

  • T in ( tensor(int32), tensor(int64), tensor(string) ): Input is ether string UTF-8 or int32/int64

  • T1 in ( tensor(float) ): 1-D tensor of floats

OnnxTfIdfVectorizer_9#

class mlprodict.npy.xop_auto_import_.OnnxTfIdfVectorizer_9(*args, **kwargs)#

Version

  • name: TfIdfVectorizer (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

This transform extracts n-grams from the input sequence and save them as a vector. Input can be either a 1-D or 2-D tensor. For 1-D input, output is the n-gram representation of that input. For 2-D input, the output is also a 2-D tensor whose i-th row is the n-gram representation of the i-th input row. More specifically, if input shape is [C], the corresponding output shape would be [max(ngram_indexes) + 1]. If input shape is [N, C], this operator produces a [N, max(ngram_indexes) + 1]-tensor.

In contrast to standard n-gram extraction, here, the indexes of extracting an n-gram from the original sequence are not necessarily consecutive numbers. The discontinuity between indexes are controlled by the number of skips. If the number of skips is 2, we should skip two tokens when scanning through the original sequence. Let’s consider an example. Assume that input sequence is [94, 17, 36, 12, 28] and the number of skips is 2. The associated 2-grams are [94, 12] and [17, 28] respectively indexed by [0, 3] and [1, 4]. If the number of skips becomes 0, the 2-grams generated are [94, 17], [17, 36], [36, 12], [12, 28] indexed by [0, 1], [1, 2], [2, 3], [3, 4], respectively.

The output vector (denoted by Y) stores the count of each n-gram; Y[ngram_indexes[i]] indicates the times that the i-th n-gram is found. The attribute ngram_indexes is used to determine the mapping between index i and the corresponding n-gram’s output coordinate. If pool_int64s is [94, 17, 17, 36], ngram_indexes is [1, 0], ngram_counts=[0, 0], then the Y[0] (first element in Y) and Y[1] (second element in Y) are the counts of [17, 36] and [94, 17], respectively. An n-gram which cannot be found in pool_strings/pool_int64s should be ignored and has no effect on the output. Note that we may consider all skips up to S when generating the n-grams.

The examples used above are true if mode is “TF”. If mode is “IDF”, all the counts larger than 1 would be truncated to 1 and the i-th element in weights would be used to scale (by multiplication) the count of the i-th n-gram in pool. If mode is “TFIDF”, this operator first computes the counts of all n-grams and then scale them by the associated values in the weights attribute.

Only one of pool_strings and pool_int64s can be set. If pool_int64s is set, the input should be an integer tensor. If pool_strings is set, the input must be a string tensor.

Attributes

  • max_gram_length (required): Maximum n-gram length. If this value is 3, 3-grams will be used to generate the output.

  • max_skip_count (required): Maximum number of items (integers/strings) to be skipped when constructing an n-gram from X. If max_skip_count=1, min_gram_length=2, max_gram_length=3, this operator may generate 2-grams with skip_count=0 and skip_count=1, and 3-grams with skip_count=0 and skip_count=1

  • min_gram_length (required): Minimum n-gram length. If this value is 2 and max_gram_length is 3, output may contain counts of 2-grams and 3-grams.

  • mode (required): The weighting criteria. It can be one of “TF” (term frequency), “IDF” (inverse document frequency), and “TFIDF” (the combination of TF and IDF)

  • ngram_counts (required): The starting indexes of 1-grams, 2-grams, and so on in pool. It is useful when determining the boundary between two consecutive collections of n-grams. For example, if ngram_counts is [0, 17, 36], the first index (zero-based) of 1-gram/2-gram/3-gram in pool are 0/17/36. This format is essentially identical to CSR (or CSC) sparse matrix format, and we choose to use this due to its popularity.

  • ngram_indexes (required): list of int64s (type: AttributeProto::INTS). This list is parallel to the specified ‘pool_*’ attribute. The i-th element in ngram_indexes indicate the coordinate of the i-th n-gram in the output tensor.

  • pool_int64s: List of int64 n-grams learned from the training set. Either this or pool_strings attributes must be present but not both. It’s an 1-D tensor starting with the collections of all 1-grams and ending with the collections of n-grams. The i-th element in pool stores the n-gram that should be mapped to coordinate ngram_indexes[i] in the output vector.

  • pool_strings: List of strings n-grams learned from the training set. Either this or pool_int64s attributes must be present but not both. It’s an 1-D tensor starting with the collections of all 1-grams and ending with the collections of n-grams. The i-th element in pool stores the n-gram that should be mapped to coordinate ngram_indexes[i] in the output vector.

  • weights: list of floats. This attribute stores the weight of each n-gram in pool. The i-th element in weights is the weight of the i-th n-gram in pool. Its length equals to the size of ngram_indexes. By default, weights is an all-one tensor.This attribute is used when mode is “IDF” or “TFIDF” to scale the associated word counts.

Inputs

  • X (heterogeneous) - T: Input for n-gram extraction

Outputs

  • Y (heterogeneous) - T1: Ngram results

Type Constraints

  • T in ( tensor(int32), tensor(int64), tensor(string) ): Input is ether string UTF-8 or int32/int64

  • T1 in ( tensor(float) ): 1-D tensor of floats

OnnxThresholdedRelu#

class mlprodict.npy.xop_auto_import_.OnnxThresholdedRelu(*args, **kwargs)#

Version

  • name: ThresholdedRelu (GitHub)

  • domain: main

  • since_version: 10

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

ThresholdedRelu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the rectified linear function, y = x for x > alpha, y = 0 otherwise, is applied to the tensor elementwise.

Attributes

  • alpha: Threshold value Default value is 1.0.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxThresholdedRelu_10#

class mlprodict.npy.xop_auto_import_.OnnxThresholdedRelu_10(*args, **kwargs)#

Version

  • name: ThresholdedRelu (GitHub)

  • domain: main

  • since_version: 10

  • function: True

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

ThresholdedRelu takes one input data (Tensor<T>) and produces one output data (Tensor<T>) where the rectified linear function, y = x for x > alpha, y = 0 otherwise, is applied to the tensor elementwise.

Attributes

  • alpha: Threshold value Default value is 1.0.

Inputs

  • X (heterogeneous) - T: Input tensor

Outputs

  • Y (heterogeneous) - T: Output tensor

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

OnnxTile#

class mlprodict.npy.xop_auto_import_.OnnxTile(*args, **kwargs)#

Version

  • name: Tile (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Constructs a tensor by tiling a given tensor. This is the same as function tile in Numpy, but no broadcast. For example A = [[1, 2], [3, 4]], B = [1, 2], tile(A, B) = [[1, 2, 1, 2], [3, 4, 3, 4]]

Inputs

  • input (heterogeneous) - T: Input tensor of any shape.

  • repeats (heterogeneous) - T1: 1D int64 tensor of the same length as input’s dimension number, includes numbers of repeated copies along input’s dimensions.

Outputs

  • output (heterogeneous) - T: Output tensor of the same dimensions and type as tensor input. output_dim[i] = input_dim[i] * repeats[i]

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • T1 in ( tensor(int64) ): Constrain repeat’s type to int64 tensors.

OnnxTile_1#

class mlprodict.npy.xop_auto_import_.OnnxTile_1(*args, **kwargs)#

Version

  • name: Tile (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Repeat the elements of a tensor along an axis.

Inputs

  • input (heterogeneous) - T: Input tensor of any shape.

  • tiles (heterogeneous) - T: Number of repeated copies to make of the input tensor.

  • axis (heterogeneous) - T: Axis along which to repeat.

Outputs

  • output (heterogeneous) - T: Output tensor of same shape and type as input.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input types to float tensors.

  • T1 in ( tensor(int64) ): Constrain tiles and axis’s type to int64 tensors.

OnnxTile_13#

class mlprodict.npy.xop_auto_import_.OnnxTile_13(*args, **kwargs)#

Version

  • name: Tile (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Constructs a tensor by tiling a given tensor. This is the same as function tile in Numpy, but no broadcast. For example A = [[1, 2], [3, 4]], B = [1, 2], tile(A, B) = [[1, 2, 1, 2], [3, 4, 3, 4]]

Inputs

  • input (heterogeneous) - T: Input tensor of any shape.

  • repeats (heterogeneous) - T1: 1D int64 tensor of the same length as input’s dimension number, includes numbers of repeated copies along input’s dimensions.

Outputs

  • output (heterogeneous) - T: Output tensor of the same dimensions and type as tensor input. output_dim[i] = input_dim[i] * repeats[i]

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • T1 in ( tensor(int64) ): Constrain repeat’s type to int64 tensors.

OnnxTile_6#

class mlprodict.npy.xop_auto_import_.OnnxTile_6(*args, **kwargs)#

Version

  • name: Tile (GitHub)

  • domain: main

  • since_version: 6

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 6.

Summary

Constructs a tensor by tiling a given tensor. This is the same as function tile in Numpy, but no broadcast. For example A = [[1, 2], [3, 4]], B = [1, 2], tile(A, B) = [[1, 2, 1, 2], [3, 4, 3, 4]]

Inputs

  • input (heterogeneous) - T: Input tensor of any shape.

  • repeats (heterogeneous) - T1: 1D int64 tensor of the same length as input’s dimension number, includes numbers of repeated copies along input’s dimensions.

Outputs

  • output (heterogeneous) - T: Output tensor of the same dimensions and type as tensor input. output_dim[i] = input_dim[i] * repeats[i]

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

  • T1 in ( tensor(int64) ): Constrain repeat’s type to int64 tensors.

OnnxTopK#

class mlprodict.npy.xop_auto_import_.OnnxTopK(*args, **kwargs)#

Version

  • name: TopK (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Retrieve the top-K largest or smallest elements along a specified axis. Given an input tensor of shape [a_1, a_2, …, a_n, r] and integer argument k, return two outputs:

-Value tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n]

which contains the values of the top k elements along the specified axis

-Index tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] which

contains the indices of the top k elements (original indices from the input tensor).

If “largest” is 1 (the default value) then the k largest elements are returned. If “sorted” is 1 (the default value) then the resulting k elements will be sorted. If “sorted” is 0, order of returned ‘Values’ and ‘Indices’ are undefined.

Given two equivalent values, this operator uses the indices along the axis as

a tiebreaker. That is, the element with the lower index will appear first.

Attributes

  • axis: Dimension on which to do the sort. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is -1.

  • largest: Whether to return the top-K largest or smallest elements. Default value is 1.

  • sorted: Whether to return the elements in sorted order. Default value is 1.

Inputs

  • X (heterogeneous) - T: Tensor of shape [a_1, a_2, …, a_n, r]

  • K (heterogeneous) - tensor(int64): A 1-D tensor containing a single positive value corresponding to the number of top elements to retrieve

Outputs

  • Values (heterogeneous) - T: Tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] containing top K values from the input tensor

  • Indices (heterogeneous) - I: Tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] containing the corresponding input tensor indices for the top K values.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to numeric tensors.

  • I in ( tensor(int64) ): Constrain index tensor to int64

OnnxTopK_1#

class mlprodict.npy.xop_auto_import_.OnnxTopK_1(*args, **kwargs)#

Version

  • name: TopK (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Retrieve the top-K elements along a specified axis. Given an input tensor of shape [a_1, a_2, …, a_n, r] and integer argument k, return two outputs:

-Value tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n]

which contains the values of the top k elements along the specified axis

-Index tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] which

contains the indices of the top k elements (original indices from the input tensor).

Given two equivalent values, this operator uses the indices along the axis as

a tiebreaker. That is, the element with the lower index will appear first.

Attributes

  • axis: Dimension on which to do the sort. Default value is -1.

  • k (required): Number of top elements to retrieve

Inputs

  • X (heterogeneous) - T: Tensor of shape [a_1, a_2, …, a_n, r]

Outputs

  • Values (heterogeneous) - T: Tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] containing top K values from the input tensor

  • Indices (heterogeneous) - I: Tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] containing the corresponding input tensor indices for the top K values.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • I in ( tensor(int64) ): Constrain index tensor to int64

OnnxTopK_10#

class mlprodict.npy.xop_auto_import_.OnnxTopK_10(*args, **kwargs)#

Version

  • name: TopK (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 10.

Summary

Retrieve the top-K elements along a specified axis. Given an input tensor of shape [a_1, a_2, …, a_n, r] and integer argument k, return two outputs:

-Value tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n]

which contains the values of the top k elements along the specified axis

-Index tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] which

contains the indices of the top k elements (original indices from the input tensor).

Given two equivalent values, this operator uses the indices along the axis as

a tiebreaker. That is, the element with the lower index will appear first.

Attributes

  • axis: Dimension on which to do the sort. Default value is -1.

Inputs

  • X (heterogeneous) - T: Tensor of shape [a_1, a_2, …, a_n, r]

  • K (heterogeneous) - tensor(int64): A 1-D tensor containing a single positive value corresponding to the number of top elements to retrieve

Outputs

  • Values (heterogeneous) - T: Tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] containing top K values from the input tensor

  • Indices (heterogeneous) - I: Tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] containing the corresponding input tensor indices for the top K values.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

  • I in ( tensor(int64) ): Constrain index tensor to int64

OnnxTopK_11#

class mlprodict.npy.xop_auto_import_.OnnxTopK_11(*args, **kwargs)#

Version

  • name: TopK (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Retrieve the top-K largest or smallest elements along a specified axis. Given an input tensor of shape [a_1, a_2, …, a_n, r] and integer argument k, return two outputs:

-Value tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n]

which contains the values of the top k elements along the specified axis

-Index tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] which

contains the indices of the top k elements (original indices from the input tensor).

If “largest” is 1 (the default value) then the k largest elements are returned. If “sorted” is 1 (the default value) then the resulting k elements will be sorted. If “sorted” is 0, order of returned ‘Values’ and ‘Indices’ are undefined.

Given two equivalent values, this operator uses the indices along the axis as

a tiebreaker. That is, the element with the lower index will appear first.

Attributes

  • axis: Dimension on which to do the sort. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input). Default value is -1.

  • largest: Whether to return the top-K largest or smallest elements. Default value is 1.

  • sorted: Whether to return the elements in sorted order. Default value is 1.

Inputs

  • X (heterogeneous) - T: Tensor of shape [a_1, a_2, …, a_n, r]

  • K (heterogeneous) - tensor(int64): A 1-D tensor containing a single positive value corresponding to the number of top elements to retrieve

Outputs

  • Values (heterogeneous) - T: Tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] containing top K values from the input tensor

  • Indices (heterogeneous) - I: Tensor of shape [a_1, a_2, …, a_{axis-1}, k, a_{axis+1}, … a_n] containing the corresponding input tensor indices for the top K values.

Type Constraints

  • T in ( tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to numeric tensors.

  • I in ( tensor(int64) ): Constrain index tensor to int64

OnnxTranspose#

class mlprodict.npy.xop_auto_import_.OnnxTranspose(*args, **kwargs)#

Version

  • name: Transpose (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Transpose the input tensor similar to numpy.transpose. For example, when perm=(1, 0, 2), given an input tensor of shape (1, 2, 3), the output shape will be (2, 1, 3).

Attributes

  • perm: A list of integers. By default, reverse the dimensions, otherwise permute the axes according to the values given.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • transposed (heterogeneous) - T: Transposed output.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxTranspose_1#

class mlprodict.npy.xop_auto_import_.OnnxTranspose_1(*args, **kwargs)#

Version

  • name: Transpose (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Transpose the input tensor similar to numpy.transpose. For example, when perm=(1, 0, 2), given an input tensor of shape (1, 2, 3), the output shape will be (2, 1, 3).

Attributes

  • perm: A list of integers. By default, reverse the dimensions, otherwise permute the axes according to the values given.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • transposed (heterogeneous) - T: Transposed output.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxTranspose_13#

class mlprodict.npy.xop_auto_import_.OnnxTranspose_13(*args, **kwargs)#

Version

  • name: Transpose (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Transpose the input tensor similar to numpy.transpose. For example, when perm=(1, 0, 2), given an input tensor of shape (1, 2, 3), the output shape will be (2, 1, 3).

Attributes

  • perm: A list of integers. By default, reverse the dimensions, otherwise permute the axes according to the values given.

Inputs

  • data (heterogeneous) - T: An input tensor.

Outputs

  • transposed (heterogeneous) - T: Transposed output.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxTrilu#

class mlprodict.npy.xop_auto_import_.OnnxTrilu(*args, **kwargs)#

Version

  • name: Trilu (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Given a 2-D matrix or batches of 2-D matrices, returns the upper or lower triangular part of the tensor(s). The attribute “upper” determines whether the upper or lower part is retained. If set to true, the upper triangular matrix is retained. Lower triangular matrix is retained otherwise. Default value for the “upper” attribute is true. Trilu takes one input tensor of shape [*, N, M], where * is zero or more batch dimensions. The upper triangular part consists of the elements on and above the given diagonal (k). The lower triangular part consists of elements on and below the diagonal. All other elements in the matrix are set to zero. If k = 0, the triangular part on and above/below the main diagonal is retained. If upper is set to true, a positive k retains the upper triangular matrix excluding the main diagonal and (k-1) diagonals above it. A negative k value retains the main diagonal and |k| diagonals below it. If upper is set to false, a positive k retains the lower triangular matrix including the main diagonal and k diagonals above it. A negative k value excludes the main diagonal and (|k|-1) diagonals below it.

Attributes

  • upper: Boolean. Indicates whether upper or lower part of matrix is retained. Default is true. Default value is 1.

Inputs

Between 1 and 2 inputs.

  • input (heterogeneous) - T: Input tensor of rank 2 or higher.

  • k (optional, heterogeneous) - tensor(int64): A 0-D tensor containing a single value corresponding to the number diagonals above or below the main diagonal to exclude or include. Default value is 0 if it’s not specified.

Outputs

  • output (heterogeneous) - T: Output tensor of the same type and shape as the input tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxTrilu_14#

class mlprodict.npy.xop_auto_import_.OnnxTrilu_14(*args, **kwargs)#

Version

  • name: Trilu (GitHub)

  • domain: main

  • since_version: 14

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 14.

Summary

Given a 2-D matrix or batches of 2-D matrices, returns the upper or lower triangular part of the tensor(s). The attribute “upper” determines whether the upper or lower part is retained. If set to true, the upper triangular matrix is retained. Lower triangular matrix is retained otherwise. Default value for the “upper” attribute is true. Trilu takes one input tensor of shape [*, N, M], where * is zero or more batch dimensions. The upper triangular part consists of the elements on and above the given diagonal (k). The lower triangular part consists of elements on and below the diagonal. All other elements in the matrix are set to zero. If k = 0, the triangular part on and above/below the main diagonal is retained. If upper is set to true, a positive k retains the upper triangular matrix excluding the main diagonal and (k-1) diagonals above it. A negative k value retains the main diagonal and |k| diagonals below it. If upper is set to false, a positive k retains the lower triangular matrix including the main diagonal and k diagonals above it. A negative k value excludes the main diagonal and (|k|-1) diagonals below it.

Attributes

  • upper: Boolean. Indicates whether upper or lower part of matrix is retained. Default is true. Default value is 1.

Inputs

Between 1 and 2 inputs.

  • input (heterogeneous) - T: Input tensor of rank 2 or higher.

  • k (optional, heterogeneous) - tensor(int64): A 0-D tensor containing a single value corresponding to the number diagonals above or below the main diagonal to exclude or include. Default value is 0 if it’s not specified.

Outputs

  • output (heterogeneous) - T: Output tensor of the same type and shape as the input tensor.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxUnique#

class mlprodict.npy.xop_auto_import_.OnnxUnique(*args, **kwargs)#

Version

  • name: Unique (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Find the unique elements of a tensor. When an optional attribute ‘axis’ is provided, unique subtensors sliced along the ‘axis’ are returned. Otherwise the input tensor is flattened and unique values of the flattened tensor are returned.

This operator returns the unique values or sliced unique subtensors of the input tensor and three optional outputs. The first output tensor ‘Y’ contains all unique values or subtensors of the input. The second optional output tensor ‘indices’ contains indices of ‘Y’ elements’ first occurance in ‘X’.. The third optional output tensor ‘inverse_indices’ contains, for elements of ‘X’, its corresponding indices in ‘Y’. “. The fourth optional output tensor ‘counts’ contains the count of each element of ‘Y’ in the input.

Outputs are either sorted in ascending order or optionally in the order of the first occurrence of the values in the input.

https://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html

Example 1:

input_X = [2, 1, 1, 3, 4, 3] attribute_sorted = 0 attribute_axis = None output_Y = [2, 1, 3, 4] output_indices = [0, 1, 3, 4] output_inverse_indices = [0, 1, 1, 2, 3, 2] output_counts = [1, 2, 2, 1]

Example 2:

input_X = [[1, 3], [2, 3]] attribute_sorted = 1 attribute_axis = None output_Y = [1, 2, 3] output_indices = [0, 2, 1] output_inverse_indices = [0, 2, 1, 2] output_counts = [1, 1, 2]

Example 3:

input_X = [[1, 0, 0], [1, 0, 0], [2, 3, 4]] attribute_sorted = 1 attribute_axis = 0 output_Y = [[1, 0, 0], [2, 3, 4]] output_indices = [0, 2] output_inverse_indices = [0, 0, 1] output_counts = [2, 1]

Example 4:
input_x = [[[1., 1.], [0., 1.], [2., 1.], [0., 1.]],

[[1., 1.], [0., 1.], [2., 1.], [0., 1.]]]

attribute_sorted = 1 attribute_axis = 1

intermediate data are presented below for better understanding:

there are 4 subtensors sliced along axis 1 of input_x (shape = (2, 4, 2)): A: [[1, 1], [1, 1]],

[[0, 1], [0, 1]], [[2, 1], [2, 1]], [[0, 1], [0, 1]].

there are 3 unique subtensors: [[1, 1], [1, 1]], [[0, 1], [0, 1]], [[2, 1], [2, 1]].

sorted unique subtensors: B: [[0, 1], [0, 1]],

[[1, 1], [1, 1]], [[2, 1], [2, 1]].

output_Y is constructed from B: [[[0. 1.], [1. 1.], [2. 1.]],

[[0. 1.], [1. 1.], [2. 1.]]]

output_indices is to map from B to A: [1, 0, 2]

output_inverse_indices is to map from A to B: [1, 0, 2, 0]

output_counts = [2 1 1]

Attributes

  • axis: (Optional) The dimension to apply unique. If not specified, the unique elements of the flattened input are returned. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

  • sorted: (Optional) Whether to sort the unique elements in ascending order before returning as output. Must be one of 0, or 1 (default). Default value is 1.

Inputs

  • X (heterogeneous) - T: A N-D input tensor that is to be processed.

Outputs

Between 1 and 4 outputs.

  • Y (heterogeneous) - T: A tensor of the same type as ‘X’ containing all the unique values or subtensors sliced along a provided ‘axis’ in ‘X’, either sorted or maintained in the same order they occur in input ‘X’

  • indices (optional, heterogeneous) - tensor(int64): A 1-D INT64 tensor containing indices of ‘Y’ elements’ first occurance in ‘X’. When ‘axis’ is provided, it contains indices to subtensors in input ‘X’ on the ‘axis’. When ‘axis’ is not provided, it contains indices to values in the flattened input tensor.

  • inverse_indices (optional, heterogeneous) - tensor(int64): A 1-D INT64 tensor containing, for elements of ‘X’, its corresponding indices in ‘Y’. When ‘axis’ is provided, it contains indices to subtensors in output ‘Y’ on the ‘axis’. When ‘axis’ is not provided, it contains indices to values in output ‘Y’.

  • counts (optional, heterogeneous) - tensor(int64): A 1-D INT64 tensor containing the count of each element of ‘Y’ in input ‘X’

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input can be of any tensor type.

OnnxUnique_11#

class mlprodict.npy.xop_auto_import_.OnnxUnique_11(*args, **kwargs)#

Version

  • name: Unique (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Find the unique elements of a tensor. When an optional attribute ‘axis’ is provided, unique subtensors sliced along the ‘axis’ are returned. Otherwise the input tensor is flattened and unique values of the flattened tensor are returned.

This operator returns the unique values or sliced unique subtensors of the input tensor and three optional outputs. The first output tensor ‘Y’ contains all unique values or subtensors of the input. The second optional output tensor ‘indices’ contains indices of ‘Y’ elements’ first occurance in ‘X’.. The third optional output tensor ‘inverse_indices’ contains, for elements of ‘X’, its corresponding indices in ‘Y’. “. The fourth optional output tensor ‘counts’ contains the count of each element of ‘Y’ in the input.

Outputs are either sorted in ascending order or optionally in the order of the first occurrence of the values in the input.

https://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html

Example 1:

input_X = [2, 1, 1, 3, 4, 3] attribute_sorted = 0 attribute_axis = None output_Y = [2, 1, 3, 4] output_indices = [0, 1, 3, 4] output_inverse_indices = [0, 1, 1, 2, 3, 2] output_counts = [1, 2, 2, 1]

Example 2:

input_X = [[1, 3], [2, 3]] attribute_sorted = 1 attribute_axis = None output_Y = [1, 2, 3] output_indices = [0, 2, 1] output_inverse_indices = [0, 2, 1, 2] output_counts = [1, 1, 2]

Example 3:

input_X = [[1, 0, 0], [1, 0, 0], [2, 3, 4]] attribute_sorted = 1 attribute_axis = 0 output_Y = [[1, 0, 0], [2, 3, 4]] output_indices = [0, 2] output_inverse_indices = [0, 0, 1] output_counts = [2, 1]

Example 4:
input_x = [[[1., 1.], [0., 1.], [2., 1.], [0., 1.]],

[[1., 1.], [0., 1.], [2., 1.], [0., 1.]]]

attribute_sorted = 1 attribute_axis = 1

intermediate data are presented below for better understanding:

there are 4 subtensors sliced along axis 1 of input_x (shape = (2, 4, 2)): A: [[1, 1], [1, 1]],

[[0, 1], [0, 1]], [[2, 1], [2, 1]], [[0, 1], [0, 1]].

there are 3 unique subtensors: [[1, 1], [1, 1]], [[0, 1], [0, 1]], [[2, 1], [2, 1]].

sorted unique subtensors: B: [[0, 1], [0, 1]],

[[1, 1], [1, 1]], [[2, 1], [2, 1]].

output_Y is constructed from B: [[[0. 1.], [1. 1.], [2. 1.]],

[[0. 1.], [1. 1.], [2. 1.]]]

output_indices is to map from B to A: [1, 0, 2]

output_inverse_indices is to map from A to B: [1, 0, 2, 0]

output_counts = [2 1 1]

Attributes

  • axis: (Optional) The dimension to apply unique. If not specified, the unique elements of the flattened input are returned. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(input).

  • sorted: (Optional) Whether to sort the unique elements in ascending order before returning as output. Must be one of 0, or 1 (default). Default value is 1.

Inputs

  • X (heterogeneous) - T: A N-D input tensor that is to be processed.

Outputs

Between 1 and 4 outputs.

  • Y (heterogeneous) - T: A tensor of the same type as ‘X’ containing all the unique values or subtensors sliced along a provided ‘axis’ in ‘X’, either sorted or maintained in the same order they occur in input ‘X’

  • indices (optional, heterogeneous) - tensor(int64): A 1-D INT64 tensor containing indices of ‘Y’ elements’ first occurance in ‘X’. When ‘axis’ is provided, it contains indices to subtensors in input ‘X’ on the ‘axis’. When ‘axis’ is not provided, it contains indices to values in the flattened input tensor.

  • inverse_indices (optional, heterogeneous) - tensor(int64): A 1-D INT64 tensor containing, for elements of ‘X’, its corresponding indices in ‘Y’. When ‘axis’ is provided, it contains indices to subtensors in output ‘Y’ on the ‘axis’. When ‘axis’ is not provided, it contains indices to values in output ‘Y’.

  • counts (optional, heterogeneous) - tensor(int64): A 1-D INT64 tensor containing the count of each element of ‘Y’ in input ‘X’

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Input can be of any tensor type.

OnnxUnsqueeze#

class mlprodict.npy.xop_auto_import_.OnnxUnsqueeze(*args, **kwargs)#

Version

  • name: Unsqueeze (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Insert single-dimensional entries to the shape of an input tensor (data). Takes one required input axes - which contains a list of dimension indices and this operator will insert a dimension of value 1 into the corresponding index of the output tensor (expanded).

For example:

Given an input tensor (data) of shape [3, 4, 5], then Unsqueeze(data, axes=[0, 4]) outputs a tensor (expanded) containing same data as data but with shape [1, 3, 4, 5, 1].

The input axes should not contain any duplicate entries. It is an error if it contains duplicates. The rank of the output tensor (output_rank) is the rank of the input tensor (data) plus the number of values in axes. Each value in axes should be within the (inclusive) range [-output_rank , output_rank - 1]. The order of values in axes does not matter and can come in any order.

Inputs

  • data (heterogeneous) - T: Original tensor

  • axes (heterogeneous) - tensor(int64): List of integers indicating the dimensions to be inserted. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(expanded).

Outputs

  • expanded (heterogeneous) - T: Reshaped tensor with same data as input.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxUnsqueeze_1#

class mlprodict.npy.xop_auto_import_.OnnxUnsqueeze_1(*args, **kwargs)#

Version

  • name: Unsqueeze (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Insert single-dimensional entries to the shape of a tensor. Takes one required argument axes, a list of dimensions that will be inserted. Dimension indices in axes are as seen in the output tensor. For example:

Given a tensor such that tensor with shape [3, 4, 5], then Unsqueeze(tensor, axes=[0, 4]) has shape [1, 3, 4, 5, 1]

Attributes

  • axes (required): List of non-negative integers, indicate the dimensions to be inserted

Inputs

  • data (heterogeneous) - T: Original tensor

Outputs

  • expanded (heterogeneous) - T: Reshaped tensor with same data as input.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxUnsqueeze_11#

class mlprodict.npy.xop_auto_import_.OnnxUnsqueeze_11(*args, **kwargs)#

Version

  • name: Unsqueeze (GitHub)

  • domain: main

  • since_version: 11

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 11.

Summary

Insert single-dimensional entries to the shape of an input tensor (data). Takes one required argument axes - which contains a list of dimension indices and this operator will insert a dimension of value 1 into the corresponding index of the output tensor (expanded).

For example:

Given an input tensor (data) of shape [3, 4, 5], then Unsqueeze(data, axes=[0, 4]) outputs a tensor (expanded) containing same data as data but with shape [1, 3, 4, 5, 1].

The attribute axes should not contain any duplicate entries. It is an error if it contains duplicates. The rank of the output tensor (output_rank) is the rank of the input tensor (data) plus the number of values in axes. Each value in axes should be within the (inclusive) range [-output_rank , output_rank - 1]. The order of values in axes does not matter and can come in any order.

Attributes

  • axes (required): List of integers indicating the dimensions to be inserted. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(expanded).

Inputs

  • data (heterogeneous) - T: Original tensor

Outputs

  • expanded (heterogeneous) - T: Reshaped tensor with same data as input.

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxUnsqueeze_13#

class mlprodict.npy.xop_auto_import_.OnnxUnsqueeze_13(*args, **kwargs)#

Version

  • name: Unsqueeze (GitHub)

  • domain: main

  • since_version: 13

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 13.

Summary

Insert single-dimensional entries to the shape of an input tensor (data). Takes one required input axes - which contains a list of dimension indices and this operator will insert a dimension of value 1 into the corresponding index of the output tensor (expanded).

For example:

Given an input tensor (data) of shape [3, 4, 5], then Unsqueeze(data, axes=[0, 4]) outputs a tensor (expanded) containing same data as data but with shape [1, 3, 4, 5, 1].

The input axes should not contain any duplicate entries. It is an error if it contains duplicates. The rank of the output tensor (output_rank) is the rank of the input tensor (data) plus the number of values in axes. Each value in axes should be within the (inclusive) range [-output_rank , output_rank - 1]. The order of values in axes does not matter and can come in any order.

Inputs

  • data (heterogeneous) - T: Original tensor

  • axes (heterogeneous) - tensor(int64): List of integers indicating the dimensions to be inserted. Negative value means counting dimensions from the back. Accepted range is [-r, r-1] where r = rank(expanded).

Outputs

  • expanded (heterogeneous) - T: Reshaped tensor with same data as input.

Type Constraints

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxUpsample#

class mlprodict.npy.xop_auto_import_.OnnxUpsample(*args, **kwargs)#

Version

  • name: Upsample (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been deprecated since version 10.

Summary

Upsample the input tensor. Each dimension value of the output tensor is:

output_dimension = floor(input_dimension * scale).

Attributes

  • mode: Two interpolation modes: nearest (default), and linear (including bilinear, trilinear, etc) Default value is 'nearest'.

Inputs

  • X (heterogeneous) - T: N-D tensor

  • scales (heterogeneous) - tensor(float): The scale array along each dimension. It takes value greater than or equal to 1. The number of elements of ‘scales’ should be the same as the rank of input ‘X’.

Outputs

  • Y (heterogeneous) - T: N-D tensor after resizing

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input ‘X’ and output ‘Y’ to all tensor types.

OnnxUpsample_10#

class mlprodict.npy.xop_auto_import_.OnnxUpsample_10(*args, **kwargs)#

Version

  • name: Upsample (GitHub)

  • domain: main

  • since_version: 10

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been deprecated since version 10.

Summary

Upsample the input tensor. Each dimension value of the output tensor is:

output_dimension = floor(input_dimension * scale).

Attributes

  • mode: Two interpolation modes: nearest (default), and linear (including bilinear, trilinear, etc) Default value is 'nearest'.

Inputs

  • X (heterogeneous) - T: N-D tensor

  • scales (heterogeneous) - tensor(float): The scale array along each dimension. It takes value greater than or equal to 1. The number of elements of ‘scales’ should be the same as the rank of input ‘X’.

Outputs

  • Y (heterogeneous) - T: N-D tensor after resizing

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input ‘X’ and output ‘Y’ to all tensor types.

OnnxUpsample_7#

class mlprodict.npy.xop_auto_import_.OnnxUpsample_7(*args, **kwargs)#

Version

  • name: Upsample (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Upsample the input tensor. Each dimension value of the output tensor is:

output_dimension = floor(input_dimension * scale).

Attributes

  • mode: Two interpolation modes: nearest (default), and linear (including bilinear, trilinear, etc) Default value is 'nearest'.

  • scales (required): The scale array along each dimension. It takes value greater than or equal to 1. The number of elements of ‘scales’ should be the same as the rank of input ‘X’.

Inputs

  • X (heterogeneous) - T: N-D tensor

Outputs

  • Y (heterogeneous) - T: N-D tensor after resizing

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxUpsample_9#

class mlprodict.npy.xop_auto_import_.OnnxUpsample_9(*args, **kwargs)#

Version

  • name: Upsample (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Upsample the input tensor. Each dimension value of the output tensor is:

output_dimension = floor(input_dimension * scale).

Attributes

  • mode: Two interpolation modes: nearest (default), and linear (including bilinear, trilinear, etc) Default value is 'nearest'.

Inputs

  • X (heterogeneous) - T: N-D tensor

  • scales (heterogeneous) - tensor(float): The scale array along each dimension. It takes value greater than or equal to 1. The number of elements of ‘scales’ should be the same as the rank of input ‘X’.

Outputs

  • Y (heterogeneous) - T: N-D tensor after resizing

Type Constraints

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input ‘X’ and output ‘Y’ to all tensor types.

OnnxWhere#

class mlprodict.npy.xop_auto_import_.OnnxWhere(*args, **kwargs)#

Version

  • name: Where (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Return elements, either from X or Y, depending on condition. Where behaves like [numpy.where](https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html) with three parameters.

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

History - Version 16 adds bfloat16 to the types allowed (for the second and third parameter).

Inputs

  • condition (heterogeneous) - B: When True (nonzero), yield X, otherwise yield Y

  • X (heterogeneous) - T: values selected at indices where condition is True

  • Y (heterogeneous) - T: values selected at indices where condition is False

Outputs

  • output (heterogeneous) - T: Tensor of shape equal to the broadcasted shape of condition, X, and Y.

Type Constraints

  • B in ( tensor(bool) ): Constrain to boolean tensors.

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types (including bfloat).

OnnxWhere_16#

class mlprodict.npy.xop_auto_import_.OnnxWhere_16(*args, **kwargs)#

Version

  • name: Where (GitHub)

  • domain: main

  • since_version: 16

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 16.

Summary

Return elements, either from X or Y, depending on condition. Where behaves like [numpy.where](https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html) with three parameters.

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

History - Version 16 adds bfloat16 to the types allowed (for the second and third parameter).

Inputs

  • condition (heterogeneous) - B: When True (nonzero), yield X, otherwise yield Y

  • X (heterogeneous) - T: values selected at indices where condition is True

  • Y (heterogeneous) - T: values selected at indices where condition is False

Outputs

  • output (heterogeneous) - T: Tensor of shape equal to the broadcasted shape of condition, X, and Y.

Type Constraints

  • B in ( tensor(bool) ): Constrain to boolean tensors.

  • T in ( tensor(bfloat16), tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types (including bfloat).

OnnxWhere_9#

class mlprodict.npy.xop_auto_import_.OnnxWhere_9(*args, **kwargs)#

Version

  • name: Where (GitHub)

  • domain: main

  • since_version: 9

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 9.

Summary

Return elements, either from X or Y, depending on condition. Where behaves like [numpy.where](https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html) with three parameters.

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • condition (heterogeneous) - B: When True (nonzero), yield X, otherwise yield Y

  • X (heterogeneous) - T: values selected at indices where condition is True

  • Y (heterogeneous) - T: values selected at indices where condition is False

Outputs

  • output (heterogeneous) - T: Tensor of shape equal to the broadcasted shape of condition, X, and Y.

Type Constraints

  • B in ( tensor(bool) ): Constrain to boolean tensors.

  • T in ( tensor(bool), tensor(complex128), tensor(complex64), tensor(double), tensor(float), tensor(float16), tensor(int16), tensor(int32), tensor(int64), tensor(int8), tensor(string), tensor(uint16), tensor(uint32), tensor(uint64), tensor(uint8) ): Constrain input and output types to all tensor types.

OnnxXor#

class mlprodict.npy.xop_auto_import_.OnnxXor(*args, **kwargs)#

Version

  • name: Xor (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Returns the tensor resulted from performing the xor logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool) ): Constrain input to boolean tensor.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxXor_1#

class mlprodict.npy.xop_auto_import_.OnnxXor_1(*args, **kwargs)#

Version

  • name: Xor (GitHub)

  • domain: main

  • since_version: 1

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 1.

Summary

Returns the tensor resulted from performing the xor logical operation elementwise on the input tensors A and B.

If broadcasting is enabled, the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. See the doc of Add for a detailed description of the broadcasting rules.

Attributes

  • axis: If set, defines the broadcast dimensions.

  • broadcast: Enable broadcasting Default value is 0.

Inputs

  • A (heterogeneous) - T: Left input tensor for the logical operator.

  • B (heterogeneous) - T: Right input tensor for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool) ): Constrain input to boolean tensor.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.

OnnxXor_7#

class mlprodict.npy.xop_auto_import_.OnnxXor_7(*args, **kwargs)#

Version

  • name: Xor (GitHub)

  • domain: main

  • since_version: 7

  • function: False

  • support_level: SupportType.COMMON

  • shape inference: True

This version of the operator has been available since version 7.

Summary

Returns the tensor resulted from performing the xor logical operation elementwise on the input tensors A and B (with Numpy-style broadcasting support).

This operator supports multidirectional (i.e., Numpy-style) broadcasting; for more details please check Broadcasting in ONNX.

Inputs

  • A (heterogeneous) - T: First input operand for the logical operator.

  • B (heterogeneous) - T: Second input operand for the logical operator.

Outputs

  • C (heterogeneous) - T1: Result tensor.

Type Constraints

  • T in ( tensor(bool) ): Constrain input to boolean tensor.

  • T1 in ( tensor(bool) ): Constrain output to boolean tensor.