Classification feature sets for source code plagiarism detection in Java

Journal of Engineering and Applied Science

Table 6 The used parameters’ values of the “init.properties” file [23]

Parameter	Description	Value
field	True if the per-field representation is used	True
toptermquery	True if top “num_q_terms” terms (not all terms) are considered for each field after sorting terms by TFIDF score	True
num_q_terms	The number of top terms for each field if “toptermquery” is true	20
lambda	The weight (from 0 to 1) of TF with respect to IDF	0.4
minShingleSize	The minimum size of the word ngrams of terms (Note: the unigrams are included by default)	2
maxShingleSize	The maximum size of the word ngrams of terms	3
num_wanted	For each query document, the top “num_wanted” hit documents (that are sorted by relevance score) are included in the output file.	20