Skip to main content

Table 6 The used parameters’ values of the “init.properties” file [23]

From: Classification feature sets for source code plagiarism detection in Java

Parameter

Description

Value

field

True if the per-field representation is used

True

toptermquery

True if top “num_q_terms” terms (not all terms) are considered for each field after sorting terms by TFIDF score

True

num_q_terms

The number of top terms for each field if “toptermquery” is true

20

lambda

The weight (from 0 to 1) of TF with respect to IDF

0.4

minShingleSize

The minimum size of the word ngrams of terms (Note: the unigrams are included by default)

2

maxShingleSize

The maximum size of the word ngrams of terms

3

num_wanted

For each query document, the top “num_wanted” hit documents (that are sorted by relevance score) are included in the output file.

20