The eight algorithms in Ranklib [62]: {MART, RankNet, RankBoost, AdaRank, Coordi- nate Ascent, LambdaMART, ListNet, Random Forests} are tested for optimizing one of each of the evaluation metrics {NDCG@10, ERR@10, MAS P}.
Table 5.2 shows that all models built from learning to rank algorithms outperformed the baseline unsupervised BM25 score ranking in terms of the average of the evaluation measures used over the five folds: NDCG@10 (Equation (2.17)), ERR@10 (Equation
(2.18)) and MAS P@10 (Equation 2.20a). An average improvement of 25.6 % is achieved in NDCG@10 and 13.6 % in ERR@10 over the baseline BM25. The results recorded in Table 5.2 show that the Coordinate Ascent and tree-based algorithms (MART, Lamb- …show more content…
From Table 5.4 and Figure 5.5, the im- provement achieved in terms of ERR@10 for using transcription related features (”WITH TRANS”)
40over using the feature vectors of length 50 (”WITHOUT TRANS”) is obvious. It can be noted from the table that with less features (”WITHOUT TRANS”), tree-based and other algorithms, outperform Coordinate Ascent in terms of ERR@10 as well. From Table 5.3 and Table 5.4, it can be seen that Random Forest algorithm performs fairly well in both cases with the transcription features and without the transcription features, however it may be affected if irrelevant features are present in the training data as mentioned in [68].
Our proposed algorithms involve feature reduction and bagging techniques to simplify the learning to rank model, decrease the training time as in Informed Forest (Algorithm 4.1) and PCA Reduced Forest (Algorithm 4.2) and to select the best set of features as well, for better search time as in ReducedForest (Algorithm 4.3). These algorithms are