Skip to main content

Table 5 The number of files and pairs of SOCO-TEST after removing corrupted files

From: Classification feature sets for source code plagiarism detection in Java

Category

Number of files

Number of pairs

Number of plagiarized pairs

A1

3240

5,247,180

54

A2

3092

4,778,686

46

B1

3268

5,338,278

73

B2

2266

2,566,245

34

C1

124

7626

0

C2

88

3828

14

SUM

12,078

17,941,843

221