Reliability and Validity Matrix

Reliability and Validity Matrix
Validity and Reliability Matrix

For each of the tests of reliability and validity listed on the matrix, prepare a 50-100-word description of the type of reliability/validity, its purpose and under what conditions these types of reliability and validity would be used as well as when they would be inappropriate. Then, prepare a 50-100-word description of each test’s strengths and a 50-100-word description of each test’s weaknesses. |TEST of |Description, Purpose, Application and |Strengths |Weaknesses |
|Reliability |Appropriateness | | |
|Inter-item | Inter-item consistency is the parallel of all | Test score calculations are more| Inter-item consistency cannot |
|Consistency |items on a scale calculated from one trial of a |accurate and clear when there is |measure intelligence or personality.|
| |test. It is used in assessing how consistent |a lot of consistency. Inter item |If the items are not homogenous with|
| |various raters and observers are of the same |consistency is great at measuring|the same difficulty and length, it |
| |phenomenon. When asking questions to research an |if a test is reliable and |would be ineffective determining |
| |idea this test can assess the response of the |consistent based on the length or|internal consistency. Even the |
| |test-taker against the idea. Different questions |shortness of a test. The |Spearman-Brown formula would fail. |
| |that test the same idea give consistent results. |inter-item consistency test can |Inter-item consistency works best on|
| |This is appropriate for example, in testing |show reliability over a period of|tests that are whole-test and long |
| |employee performance at different times over a |time. On the flip side, errors |in length rather than half-test uses|
| |period of time. Employers can use this to determine|among items can be broken down |or short test. |
| |if an employee is eligible for a raise or |and new ones can be added to | |
| |promotion. |reach a reliability measurement. | |
|Split-half | Split-half reliability randomly divides all items | Split-half reliability has its | It is not wise to divide a test in |
| |that mean to measure the same idea into two sets. |strength in being efficient and |half straight down the middle |
| |When it is difficult to measure reliability with |less tedious for test-takers than|because the content and difficulty |
| |two test or perform a test two times, split half |the parallel form. It measures |of questions will not be distributed|
| |reliability is suitable. It is appropriate with |internal consistency well. It |evenly. Many intermediary variables |
| |uneven random assignment splits need to be |also can check middle variables |are created such as fatigue during |
| |measured. It also can be used to create a small |that may cause an error in the |the second half of the test. |
| |parallel form of the same test. |analysis since the both portions |Deviations in difficulty and |
| | |of the test are taken at one |subjects of the items on the first |
| | |time. |part of the test compared to the |
| | | |second part. |
|Test/retest | Test-Retest reliability is about taking the same | Test-retest is strong in | Test-retest reliability is weak in |
| |test with the same people and two different times |reliability because the results |that the roots of an idea being |
| |to measure how stable an idea is over time. If an |measure an individuals reaction |tested can alter over time. It would|
| |idea being measured is supposed to change over a |time and perceived judgment. Such|produce sensitive results that make |
| |period then the scores would vary. It is |traits are stagnant and do not |the score of reliability appear |
| |inappropriate when measuring for example, computer |change a lot over time and are |lower than the actual measurement. |
| |skills of college students. A series of lessons |not sensitive to many intervening|For example, a college student may |
| |about computers would be on the first and second |variables. |have excellent skills when assessed |
| |test, then the test would show variance because of | |on using a HP computer but when |
| |the education provided to all testtakers. | |assessed on a MAC they could fail or|
| | | |when assessed on a computer from 15 |
| | | |years ago, they could falter. |
|Parallel and | Parallel and alternate forms that test reliability| It helps in determining what | Parallel and alternate forms are |
|alternate forms |use many occurrences of the same test items at two |questions are best to ask. It |very time consuming, cost a lot of |
| |separate times with the same test-takers. It is |measures the center idea through |money and bring fatigue for the |
| |appropriate in measuring traits that are stagnant |different variations on the same |test-taker because of the many |
| |over a long period of time and not effective when |test item. The reliability of a |changers of the same test questions |
| |measuring limited emotions or anxiety levels. |test increases when similar |over and over. These forms are not |
| |Parallel forms can be done with another form such |scores are on the same question |dependable to measure an idea that |
| |as split-half. |on many tests. |can alter over time. The tests can |
| | | |be taken months or even years apart |
| | | |causing intervening variables to |
| | | |impact the scores creating error |
| | | |variance. |
|Test of Validity |Description, Application and Appropriateness |Strengths |Weaknesses |
|Face validity | Face validity describes the particular view of a | Face validity’s strength is that| A weakness for face validity is its|
| |test-taker on the test’s validity. The measurement |a test taker has confidence in |inability to measure validity. A |
| |is not about the quantity of the actual validity |the validity of the test and is |test may look like it’s valid but |
| |but the test taker’s perception of the tests’ |more comfortable taking the test |not possess good ideas, long enough |
| |validity. It is appropriate when measuring the |or passing out the test to be |time, or be taken in a good |
| |confidence of a test taker. It measures what it is |taken. Otherwise, the test would |environment. |
| |supposed to measure. |be invalid. | |
| | | | |
| | | | |
| | | | |
| | | | |
|Content validity | Content validity is useful to test designers who | Strength for content validity | A pitfall for content validity is |
| |need to create test questions that match the |lies in that it can work in |potentially new material is prey to |
| |material being tested. It is appropriate for |reverse from job responsibilities|culture and linear changes. The |
| |college professors on a final exam. It is |to what is required for the job. |questions can have different answers|
| |ineffective for a test designer who wants new |First the questions must cover |in different fields of the world at |
| |people to have the same strengths as current |what needs to be performed the |different times. The items on the |
| |employees. |duties of the job, then a process|test have to be accurate all the way|
| | |to evaluate what an employee |around. |
| | |contributes to a position | |
| | | | |
| | | | |
|Criterion related | This method, criterion related validity, is very | A positive for criterion-related| A negative about the criterion |
| |strong in confirming validity. It is used to verify|validity is it can validate a |related validity is that it can |
| |criteria on a test and represent what is really in |test score. Using methods outside|contaminate the results. In the same|
| |the trial of test-takers who are tested. A group of|of the test to prove that the |way it can measure and diagnose a |
| |people, who have lost everything they owned from a |information on the test covers |personality disorder like |
| |natural disaster like a tornado, may all be |the subject matter that is |schizophrenia, a panel of |
| |diagnosed as depressed. If they all are tested |supposed to be covered. It is |psychiatrists would use the test |
| |using new questions and all score high for |more objective and verifiable |criterion and validity to measure. |
| |depression, then the test has proven validity. |that the previous methods and is | |
| | |a favorite. | |
| | | | |
| | | | |
| | | | |
|Construct | Other smaller types of validity are under | A strength for construct | A weakness for construct validity |
| |construct validity. This is appropriate when a test|validity is the steps used to |is there is no single idea or it is |
| |needs to measure an idea like intelligence or |verify an idea follow a |too vague. The results of the test |
| |anxiety. It is ineffective when an idea is not |particular scientific method. |will not be able to be measured |
| |clear or covers to broad a spectrum. |First a hypothesis is created, |accurately. The validity of the test|
| | |then a prediction is made and |on the idea will have no substance |
| | |then the results are measured. |or definition. |
| | |The predictions are based on | |
| | |facts and the test is used to see| |
| | |if the prediction is true. If it | |
| | |is not true then the test | |
| | |questions or idea may have to be | |
| | |reviewed. | |

