Suppose two programs (NegCue and FastContext) are available and ready to run. I will put them on the same server and write a script to execute them together at the same time and record the processing time. Thus, the processing times of both programs will be paired for each run. By the end of the experiment, I will have 200 pairs of recorded processing time.
Next, I will use paired t-test to exam rule hypothesis proposed above. Specifically, for example, let’s assume that I have recorded the processing time (in milliseconds) of NegCue as follows (simulated in R): [1] 1218 1184 656 977 443 592 1215 1001 745 820 397 437 1010 973 833 581 755 658 896 1153 840 1051 859 [24] 1084 932 927 616 932 849 433 1001 769 …show more content…
the first records: 1218ms and 495ms are a pair), then I can run paired t-test to test the hypothesis:
>t.test(negcue,fastcontext,paired=T)
Paired t-test
data: negcue and fastcontext t = 38.318, df = 199, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval: 442.4404 490.4496 sample estimates: mean of the differences 466.445
And because the P value<0.01, the simulated test results reject the null hypothesis. In another word, the results demonstrate that there is a statistical difference between the processing time of NegCue and FastContext.
Side note: I have contacted the author of NegCue many times. It seems that the current released version of NegCue does not work properly. The author told me that he would try to fix it when he is available. However, I haven’t heard any update from him yet. If I cannot have the working NegCue in time, I may need to skip this experiment, and just compare the accuracy against the NegCue previously reported