Free Essay: Psychometrics Validty and elaiability - 3197 Words

Psychometrics Validty and elaiability

Contents

Introduction
The selection of employees is one of the most significant tasks a human resources practitioner is faced with. This affects the flow of employees entering and exiting the firm. Many issues may arise if the recruiting process is not in accordance with the South African Legislation, namely the Employment equity act and the labour relations act, which governs the reliability validity, bias and fairness of psychometric assessment measures. These legislations have been brought about to protect employees against discrimination and unfair practices which were experienced during the previous dispensations. In accordance to the terms of the provisions of Section 8 of the Employment Equity Act (55 of 1998) “Psychological testing and other similar assessments of an employee are prohibited unless the test or assessment being used (a) has been scientifically shown to be valid and reliable; (b) can be applied fairly to all employees; and (c) is not biased against any employee or group”. The psychological measuring instrument we have chosen in accordance with HPCSA is the APIL B. The Ability Processing of Information and Learning Battery (APIL-B) will be critically evaluated within this essay. This psychometric assessments battery can therefore assist recruiters who use these measures to identify employees who have the potential to grow and learn with in organisations. Furthermore it should be noted that the APIL B is a cognitive measure and is not only used for recruitment and selection in organisations, but can also be used for selection into schools, universities and other areas. Hence, this essay will report the APIL B through the following headings namely evaluating the APIL B, Composition of the APIL B, Validity, Reliability, Bias and Limitations.
Evaluating the APIL B
According to Foxcroft and Roodt (2013), it is an assessment practitioner’s duty to evaluate the information offered about a measure and determine whether it is valid and reliable for its intended purpose. Foxcroft and Roodt (2013), further state that for evaluating a measure, some of the things that an assessment practitioner should consider are: how long ago it was developed; quality of manual contents; clarity of instructions and cultural appropriateness.
First conceptualized in 1994 by T.R Taylor, the APIL B - Ability, Processing of Information and Learning Battery- (Taylor, n.d.), was designed as a set of tests with the purpose of assessing ones vital cognitive capabilities. In order for the assessment to be most effective it should be administered on individuals with individuals with a minimum of twelve years educational background (Taylor, n.d.). The APIL B is ideal for identifying those who are likely to master new cognitively challenging content in a training context and establishing levels in order to place people in the correct positions. Taylor (n.d.), has identified three norms that the APIL B makes use of, namely: stanines (scale of 1 – 9) ; stens (scale of 1- 10) and percentiles (scale of 1- 99). Taylor (n.d), further states that stens are used in the Flexibility-Accuracy-Speed Tests (FAST), stanines are used in the concept formation test; the memory test and Knowledge transfer test while percentiles are used in the curve of learning test.
According to Taylor (n.d.), the APIL B is divided into five test booklets and two ancillary booklets which make up eight scores namely: Abstract thinking; Speed of information processing; accuracy of information processing; cognitive flexibility; Performance gain in a learning task; final level of proficiency; Memory and understanding and Transfer of knowledge, which will take approximately three hours and forty five minutes to administer.
Composition APIL Battery
Concept formation test
This test was designed to assess one’s ability to “think abstractly and conceptually: to form abstract concepts, reason hypothetically, theorise, build scenarios (and) trace causes” (Taylor, p. 4, n.d.). The test is comprised of thirty questions; each consisting of six depictions of similar nature the test taker must identify the depiction that does not share a characteristic that the rest of the depictions share (Taylor, n.d.).
Flexibility-Accuracy-Speed Tests (FAST)
Taylor (n.d.) suggests that “this battery within a battery measures speed (quickness) and accuracy of information processing, and cognitive flexibility”. The FAST test is made up of four individual assessments namely: Series; Mirror image; Transformations and combined tests. All four assessments are time sensitive and have been designed in such a way that it is very rare for a test taker to actually complete the entire assessment. It uses shapes of different sizes which may contain either a dot or line in the center. The basic idea of the tests is to identify a pattern and find the omitted depiction.
Curve of learning
According to Taylor (n.d.), this test focuses on a learning potential, it aims to assess ones capacity at which they are able to master new skills. It looks at future achievement potential rather than the abilities that the person already has. The test is split into four timed sessions which requires the test taker to decode a series of paired images into another set of images and once again decode these images to a set of words. Images are decoded with the aid of the first ancillary booklet, the dictionary.
Memory test
Directly after the test taker has completed the curve of learning test, the memory test is administered. It follows the same concept as the curve of learning where the test takers are required to decode images to words; however the dictionary is now taken away. The performance of the test taker on this test reflects the extent to which the test taker has understood the logical relation between the symbols and words.
Knowledge transfer test
According to Ferguson (1956, as cited in Taylor, n.d.), transferring knowledge and skills to similar areas or situations is a vital process of cognitive development. The knowledge transfer test, as the name suggests, measures this ability. The test consists of a series of connected depictions referred to as “pieces of equipment” (Taylor, p. 19, n.d.), which have a specific feature in addition to a basic shape. The test taker is required to categorize them under symbols. Test takers are also given the second ancillary booklet.
Validity
Before a test can be used on test takers, the validity of the measure needs to be established to ensure that the test is valid for the purpose it is to be used for. Foxcroft and Roodt (2013) state that the “the validity of a measure concerns what the test measures and how well it does so”. In the studies consulted it has been evident that construct and criterion validity were shown to be present in the APIL B assessment. The construct validity of a measure is the extent to which it measures the theoretical construct or trait that it is supposed to measure (Foxcroft & Roodt 2013). The second validation measure of criterion validity was defined by Phelan and Wren (2005) who stated that “Criterion-Related Validity is used to predict future or current performance”. The method that used to determine criterion related validity is predictive validity. Murphy and Davidshofer (2005) define predictive validity as a method of determining criterion validity. It also used to determine the correlation of a test takers test score and there criterion related scores.

Taylor (1995) investigated the validity on the CFT, where he gave the measure to 33 first-year university students who had been accepted into the university on merits other than their grade twelve results. Taylor correlated the marks from their CFT assessments and the marks of the course they took; which were to improve their logical thinking and reasoning skills. Therefore the correlation was 0.44 (p = 0.012). Taylor (1995, as cited in Strachan, 1998) found in another study which investigated the validity on the Curve of Learning and Memory and Understanding tests was conducted using a sample of 110 workers from a beverage manufacturing firm. The criteria for evaluating workers included facets such as their capacity to learn new procedures and concepts, to understand why things happen in the firm as a whole, and their capacity to plan and organise. These results averaged correlations of 0.35. The low correlation can be attributed to the fact that a diverse sample was not used. A further study by Taylor (1995) found criterion scores which was given to 43 employees who were enrolled in a course designed to prepare them for a promotion in junior management positions. The correlations here were reported to be 0.67 and 0.79 respectively, which can be interrupted to prove to be an arcuate predictor of performance.

In an additional study conducted by Lopes, Roodt and Mauer (2001) on the predictive validity of the APIL-B in a financial institution; the purpose was to assess the predictive validity of the APIL test battery, in order to identify learning potential. A sample of 235 successful job applicants were used to complete the test battery and found the predictive validity of the test battery was assessed using a canonical discriminant analysis procedure. The procedure was adopted in view of the nominal strength of the manager’s ratings, and due to the limited sample size the 5 point rating scale was eventually collapsed to a 2 point classification.
Reliability
It should be noted that an assessment is reliable if it measures the same construct in a consistent and precise manner over time. Foxcroft and Roodt (2009) define reliability of a measure as “the consistency to which it measures whatever it measures”. Split – half reliability was a major psychometric property of reliability used among majority of the literature we consulted.

Split Half Reliability
In the APIL B, (Taylor 1995) elucidates that split half reliability was used to investigate whether or not the APIL-B is reliable. Focroxft and Roodt (2013, p. 47) define split-half reliability as “obtained by splitting the measure into two equivalents (after a single administration of the test) and computing the correlation coefficient between these two sets of scores”.
During Taylors’ investigations into the reliability of the APIL B, he used a sample of six groups to test reliability coefficient of the flexibility, accuracy and speed test and the knowledge transfer test. These have reliability coefficients from a low of 0.70 – to a high of 0.86 and 0.71 – 0.84 respectively Taylor (1995).
In a study done by the defence force which lasted over a period of three years with new recruits. The purpose was to determine whether the psychometric evaluation processes can reliably predict the learning potential of first year recruits at the academy. The FAST considered the following; firstly, the APIL B investigated whether the FAST has a positive effect on how quickly recruits learn new abilities. It was found that a significant relationship with a reliability coefficient of (r=0,491) exists between flexibility of information processing and steepness of the learning curve. This therefore is below the accepted reliability coefficient of (r=0.70)
Secondly, it was found that a strong relationship with a reliability coefficient of (r=0,72) is apparent between speed of information processing and the total amount of work completed by the recruits. Lastly, it was determined that the small relationship with a reliability coefficient of (r=0,392) exists between accuracy of information processing and steepness of the learning curve. This therefore is below the accepted reliability coefficient of (r=0.70). However, the results concluded that three components of the FAST, are accurate in predicting how quickly new recruits in the defence force will develop new competencies. The findings also further indicated that the accuracy with which information is processed has a minimal influence on the rate a recruit will develop new competencies (Pretorius 2010).
In terms of the knowledge transfer test which investigated if there was a transfer of knowledge to crystalized abilities. Meaning it investigated if there was a transfer of what the recruits learnt and how they apply it in combat situations. Pretorius (2010) defines crystalized abilities as “are specialized insight or understanding and knowledge that emerge via transfer from existing knowledge and that is subsequently, successfully stored in memory”. The Memory and Understanding sub-test of the APIL-B was used to measure crystallized ability of recruits. It was found that a positive relationship exists between the transferring of knowledge in what the recruit learnt and crystalized abilities. The reliability coefficient was reported as a positive directional relationship between transfer of knowledge and crystallized abilities. A substantial relationship with a reliability coefficient of (r=0,515) exists between memory and understanding and crystallized abilities. This therefore suggests that a moderate correlation exists.

In terms of the curve of learning, it was found that prior learning has a positive directional effect on learning performance thus the results indicate a substantial relationship and moderate correlation with a reliability coefficient of (r=0,431). In concluding with this study, it can be said that the defence force’s use of the APIL B was not fair and efficient, as it is biased towards a historically disadvantaged groups (Pretorius 2010).

A de Goede and Theron (2010) study concurred with Pretorius (2010) where a non-probability sample of 434 new recruits from the South African Police Service Training College in Philippi, Cape Town was used. Even though the size of the selected sample is quite acceptable, making use of a non-probability sampling of the target population, caution should be taken when making generalisations. De Goede and Theron (2010), found that a score of reliability score of (r= 0.45). This suggests that a question mark hangs over the success with which at least some of the concealed variables comprising the results of the learning potential police recruits.
Standard Error of Measurement
Foxcroft and Roodt (2013, p.249), “explain that the standard error of measurement indicates the band of error around each obtained score, and examiners should aware of the standard error of measurement for each subtest before interpreting the test – takers score”. Therefore, assessors must be cognisant of the test takers history and current circumstances. Factors such as culture, transient conditions, prior learning and test wiseness can have an impact on the variance between the true score (obtained under perfect conditions) and the obtained score. Pretorius (2010) outlines that prior learning of an individual and their familiarity with taking assessment has a significant impact on their ability to perform in test conditions. While Doosi (2000) was of the view that a testees culture as well as environmental factors will also affect the scores of the historically disadvantaged people of South Africa.
Bias
Piro (2011) explains that bias “implies that test scores obtained for various subgroups of a given population cannot be interpreted in the same way across the groups”. Taylor (1995) suggests APIL-B was designed as a learning potential test and therefore limits any biased based on cultural differences. This is a result of the test being a non-verbal test, except for the instructions, and the test comprises of mainly geometric depictions thus language does not become an issue of concern.
Strachan (2008) concurs with Taylor (1995) in a study conducted with a sample of 400 individuals, 66 testees had African surnames while the large majority can be classified as white. The data analysis for both race groups were highly correlated indicating that there is no potential for bias. However, it should be kept in mind that this was not a representative sample.
Further studies were consulted to investigate the potential bias in the APIL B. A sample of 20 psychological professionals from various fields, were asked to investigate the cultural bias of the APIL-B; found that 6 out of the 20 felt that the test was bias (Doosi 2000). Thus, it can be stated that there is a potential for bias based on one’s culture. Similarly, Pretorius (2010) concurs with Doosi as he found that the APIL B was accused of being biased and under representing the cognitive capacity of individuals from historically disadvantaged backgrounds. Thus, in order to bring recruitment practices in line with legislation in the Employment Equity act, these tests was subsequently replaced with a selection battery thought to be less susceptible to culture, race and gender bias. This resulted in the measure being removed from use in the context of the defence force.
Limitations of the APIL B
According to de Goede and Theron (2010), it was found that the sample was not diverse enough for the representative target population. This is further verified by Strachan (2008), who also did not make use of a diverse sample. Therefore based on the literature from these authors, it is evident that accurate conclusions cannot be drawn indicating that there are limitations in the above studies.

Conclusion
In the end the results show that, the APIL-B is able to predict the performance of individuals not only in certain institutions but for any selection at an accurate level and therefore makes the battery a vital instrument to use. It is evident that the APIL-B is a somewhat outdated measure but still proves to be valid and reliable in measuring cognitive abilities today. However, caution could be taken when administering the APIL-B, as some authors have found that bias is present in historically disadvantaged groups. This essay therefore reported on the APIL B through evaluation of the APIL B, Composition of the APIL B, Validity, Reliability, Bias and Limitations.

Recommendations
Firstly, it should be noted that the APIL B is an outdated selection battery. In order for organisations to make fair decisions in line with the Employment Equity Act, a more relevant battery needs to be considered. Secondly, it should also not be used in its individual capacity within the recruitment and selection process and is it advisable to be used in harmony with other valid information such as candidates’ curriculum vita’s and other test results. Thirdly, the use of the APIL can be considered bias in instances where people from different cultures and race groups are affected. In addition, Strachan (2008), De Goede and Theron (2010), should make use of a more representative sample in order to draw conclusions about the reliability of their studies. Lastly, we also propose that measures within the battery not require such strict prior learning criteria as these have been shown to bias the historically disadvantaged individuals who have not had exposure to prior learning.

Reference List
Doosi, M. (2000). An investigation into the attitudes, opinions, and feelings of psychometric test administrators toward the APIL B as a culture fair assessment with special reference to the employment equity act. Unpublished master’s thesis,
University Of Natal, Durban.

Employment Equity Act (95 of 1998)…… to be continued by Ross   

Foxcroft, C., Roodt, G. (2013). Introduction to Psychological Assessment in the South
African context (4th ed). Cape Town: Oxford University Press.

De Goede, J., Theron, C. (2010). An Investigation into the internal structure of the Learning potential construct of the APIL B test battery. Management Dynamics, 19(4),30-55

Lopes, A., Roodt, G., & Mauer, R. (2001). The predictive validity of the APIL-B in a financial institution. Journal of Industrial Psychology, 27(1), 61-69.

Murphy, K.R., & Davidshofer, C.O. (2005). Psychology testing principles and applications (6th ed.). New Jersey: Pearson Educational International
Phelan, C., & Wren, J. (2005). Exploring Reliability In Academic Assessment.
Retrieved March 3, 2014 From http://www.uni.edu/chfasoa/reliabilityandvalidity.htm

Piro, K. (2011). Investigating the Impact of a Psychometric Assessment Technique In The South African Automotive Industry. Unpublished Thesis, Nelson Mandela Metropolitan University, Port Elizabeth

Pretorius, M. (2010). Validation of a selection battery used by the South African Military Academy. Unpublished master’s thesis, University of Stellenbosch, Stellenbosch

Strachan, E. J. (2008). APIL-B as a predictor of job performance in a South African financial consulting firm. Unpublished master’s thesis, University of Cape Town, Cape Town

Taylor, T.R. (1995). APIL. Johannesburg: Aprolab

Taylor, T. (n.d.). Administrator’s manual for APIL Battery. Johannesburg: Aprolab

Psychometrics Validty and elaiability

You May Also Find These Documents Helpful

Case Study Module 1 Mgt509

Case Study Module 1 Mgt509

Acta CU5: Assignment Analysis

Acta CU5: Assignment Analysis

Explain the responsibilities of the Assessor

Explain the responsibilities of the Assessor

Compare and Contrast Interviews to Non-Interview Employee Selection Tests

Compare and Contrast Interviews to Non-Interview Employee Selection Tests

Unit 306 understanding the principal of assessment

Unit 306 understanding the principal of assessment

Varied Selection Tools

Varied Selection Tools

Personality Test Reaction

Personality Test Reaction

Assessment

Assessment

Psychometrics Versus Standardized Tests

Psychometrics Versus Standardized Tests

Validity

Validity

Psychometric Test

Psychometric Test

Psychological Testing in the Workplace

Psychological Testing in the Workplace

Psychometrics

Psychometrics

Reliability Validity

Reliability Validity

Psychological Assessment

Psychological Assessment

Related Topics