Skip to main content

Table 4 Datasets in each experiment and distribution of the data

From: MarianCG: a code generation transformer model inspired by machine translation

Experiment

Dataset

Dataset size

Dataset split

Train

Validation

Test

Experiment 1

CoNaLa

13K

11125

1237

500

Experiment 2

DJANGO

19K

16000

1000

1805

Experiment 3

CoNaLa

26K

24687

1237

500