From: Classification feature sets for source code plagiarism detection in Java
Input: | A file pair: File1, File2 |
---|---|
Output: | The Lexical Per-Class Features: LexicalPerClassFeatures |
Procedure: | 1. Extract the list of class codes of File1: ClassList1 |
 | 2. Extract the list of class codes of File2: ClassList2 |
 | 3. Build a lexical per-class similarity matrix, SimMatrix, where: |
 | SimMatrix[I][J] = LexicalSim(ClassList1[I], ClassList2[J]) |
 | 4. Extract the candidate list of SimMatrix: CandidateList |
 | 5. Calculate the Histogram Extreme Ranges of CandidateList: ExtRanges |
 | 6. Add the Average of CandidateList into LexicalPerClassFeatures (1 feature) |
 | 7. Add ExtRanges into LexicalPerClassFeatures (2 features) |
 | 8. Return LexicalPerClassFeatures (total 3 features) |