Skip to main content

Table 5 Configuration parameters on the training MarianCG model

From: MarianCG: a code generation transformer model inspired by machine translation

Parameter

Value

optimizer

Adam optimizer

Learning rate

\(5e^{-5}\)

Weight decay

0.01

Maximum position embeddings

512

Number of hidden layers

6

scale embedding

TRUE

Activation function

swish

Learning rate scheduler

Linear

Warmup ratio

0.05

Length penalty

0.9