Skip to main content

Table 1 The extraction steps of the lexical per-class features

From: Classification feature sets for source code plagiarism detection in Java

Input:

A file pair: File1, File2

Output:

The Lexical Per-Class Features: LexicalPerClassFeatures

Procedure:

1. Extract the list of class codes of File1: ClassList1

 

2. Extract the list of class codes of File2: ClassList2

 

3. Build a lexical per-class similarity matrix, SimMatrix, where:

 

SimMatrix[I][J] = LexicalSim(ClassList1[I], ClassList2[J])

 

4. Extract the candidate list of SimMatrix: CandidateList

 

5. Calculate the Histogram Extreme Ranges of CandidateList: ExtRanges

 

6. Add the Average of CandidateList into LexicalPerClassFeatures (1 feature)

 

7. Add ExtRanges into LexicalPerClassFeatures (2 features)

 

8. Return LexicalPerClassFeatures (total 3 features)