
- •Validity and reliability
- •What is Reliability?
- •What is Validity?
- •Conclusion
- •What is External Validity?
- •Psychology and External Validity The Battle Lines are Drawn
- •Randomization in External Validity and Internal Validity
- •Work Cited
- •What is Internal Validity?
- •Internal Validity vs Construct Validity
- •How to Maintain High Confidence in Internal Validity?
- •Temporal Precedence
- •Establishing Causality through a Process of Elimination
- •Internal Validity - the Final Word
- •How is Content Validity Measured?
- •An Example of Low Content Validity
- •Face Validity - Some Examples
- •If Face Validity is so Weak, Why is it Used?
- •Bibliography
- •What is Construct Validity?
- •How to Measure Construct Variability?
- •Threats to Construct Validity
- •Hypothesis Guessing
- •Evaluation Apprehension
- •Researcher Expectancies and Bias
- •Poor Construct Definition
- •Construct Confounding
- •Interaction of Different Treatments
- •Unreliable Scores
- •Mono-Operation Bias
- •Mono-Method Bias
- •Don't Panic
- •Bibliography
- •Criterion Validity
- •Content Validity
- •Construct Validity
- •Tradition and Test Validity
- •Which Measure of Test Validity Should I Use?
- •Works Cited
- •An Example of Criterion Validity in Action
- •Criterion Validity in Real Life - The Million Dollar Question
- •Coca-Cola - The Cost of Neglecting Criterion Validity
- •Concurrent Validity - a Question of Timing
- •An Example of Concurrent Validity
- •The Weaknesses of Concurrent Validity
- •Bibliography
- •Predictive Validity and University Selection
- •Weaknesses of Predictive Validity
- •Reliability and Science
- •Reliability and Cold Fusion
- •Reliability and Statistics
- •The Definition of Reliability Vs. Validity
- •The Definition of Reliability - An Example
- •Testing Reliability for Social Sciences and Education
- •Test - Retest Method
- •Internal Consistency
- •Reliability - One of the Foundations of Science
- •Test-Retest Reliability and the Ravages of Time
- •Inter-rater Reliability
- •Interrater Reliability and the Olympics
- •An Example From Experience
- •Qualitative Assessments and Interrater Reliability
- •Guidelines and Experience
- •Bibliography
- •Internal Consistency Reliability
- •Split-Halves Test
- •Kuder-Richardson Test
- •Cronbach's Alpha Test
- •Summary
- •Instrument Reliability
- •Instruments in Research
- •Test of Stability
- •Test of Equivalence
- •Test of Internal Consistency
- •Reproducibility vs. Repeatability
- •The Process of Replicating Research
- •Reproducibility and Generalization - a Cautious Approach
- •Reproducibility is not Essential
- •Reproducibility - An Impossible Ideal?
- •Reproducibility and Specificity - a Geological Example
- •Reproducibility and Archaeology - The Absurdity of Creationism
- •Bibliography
- •Type I Error
- •Type II Error
- •Hypothesis Testing
- •Reason for Errors
- •Type I Error - Type II Error
- •How Does This Translate to Science Type I Error
- •Type II Error
- •Replication
- •Type III Errors
- •Conclusion
- •Examples of the Null Hypothesis
- •Significance Tests
- •Perceived Problems With the Null
- •Development of the Null
Test of Equivalence
Testing equivalence involves ensuring that a test administered to two people, or similar tests administered at the same time give similar results.
Split-testing is one way of ensuring this, especially in tests or observations where the results are expected to change over time. In a school exam, for example, the same test upon the same subjects will generally result in better results the second time around, so testing stability is not practical.
Checking that two researchers observe similar results also falls within the remit of the test of equivalence.
Test of Internal Consistency
The test of internal consistency involves ensuring that each part of the test generates similar results, and that each part of a test measures the correct construct.
For example, a test of IQ should measure IQ only, and every single question must also contribute. One way of doing this is with the variations upon the split-half tests, where the test is divided into two sections, which are checked against each other. The odd-even reliability is a similar method used to check internal consistency.
Physical sciences often use tests of internal consistency, and this is why sports drugs testers take two samples, each measured independently by different laboratories, to ensure that experimental or human error did not skew or influence the results.
Statistical Reliability
Statistical reliability is needed in order to ensure the validity and precision of the statistical analysis.
Bottom of Form
It refers to the ability to reproduce the results again and again as required. This is essential as it builds trust in the statistical analysis and the results obtained.
For example, suppose you are studying the effect of a new drug on the blood pressure in mice. You would want to do a number of tests and if the results are found to be good in controlling blood pressure, you might want to try it out in humans too.
The statistical reliability is said to be low if you measure a certain level of control at one point and a significantly different value when you perform the experiment at another time.
However, if the reliability is low, this means that the experiment that you have performed is difficult to be reproduced with similar results then the validity of the experiment decreases. This means that people will not trust in the abilities of the drug based on the statistical results you have obtained.
In many cases, you can improve the reliability by taking in more number of tests and subjects. Simply put, reliability is a measure of consistency.
Reliability can be measured and quantified using a number of methods.
Consider the previous example, where a drug is used that lowers the blood pressure in mice. Depending on various initial conditions, the following table is obtained for the percentage reduction in the blood pressure level in two tests. (Disclaimer: This is just an illustrative example - no test has actually been conducted)
Time after injection |
Test 1 |
Test 2 |
1 min |
5.86 |
5.89 |
2 min |
6.35 |
6.41 |
3 min |
7.12 |
6.95 |
4 min |
9.18 |
9.01 |
5 min |
12.36 |
12.13 |
6 min |
14.26 |
14.93 |
7 min |
16.96 |
15.89 |
Ideally, the two tests should yield the same values, in which case the statistical reliability will be 100%. However, this doesn't happen in practice, and the results are shown in the figure below. The dotted line indicates the ideal value where the values in Test 1 and Test 2 coincide.
Using the above data, one can use the change in mean, study the types of errors in the experimentation including Type-I and Type-II errors or using retest correlation to quantify the reliability.
The use of statistical reliability is extensive in psychological studies, and therefore there is a special way to quantify this in such cases, using Cronbach's Alpha. This gives a measure of reliability orconsistency.
With an increase in correlation between the items, the value of Cronbach's Alpha increases, and therefore in psychological tests and psychometric studies, this is used to study relationship between parameters and rule out chance processes.
Reproducibility
Reproducibility is regarded as one of the foundations of the entire scientific method, a benchmark upon which the reliability of an experiment can be tested.
Bottom of Form
The basic principle is that, for any research program, an independent researcher should be able toreplicate the experiment, under the same conditions, and achieve the same results.
This gives a good guide to whether there were any inherent flaws within the experiment and ensures that the researcher paid due diligence to the process of experimental design.
A replication study ensures that the researcher constructs a valid and reliable methodology and analysis.