From: MarianCG: a code generation transformer model inspired by machine translation
Parameter | Value |
---|---|
optimizer | Adam optimizer |
Learning rate | \(5e^{-5}\) |
Weight decay | 0.01 |
Maximum position embeddings | 512 |
Number of hidden layers | 6 |
scale embedding | TRUE |
Activation function | swish |
Learning rate scheduler | Linear |
Warmup ratio | 0.05 |
Length penalty | 0.9 |