
- •Validity and reliability
- •What is Reliability?
- •What is Validity?
- •Conclusion
- •What is External Validity?
- •Psychology and External Validity The Battle Lines are Drawn
- •Randomization in External Validity and Internal Validity
- •Work Cited
- •What is Internal Validity?
- •Internal Validity vs Construct Validity
- •How to Maintain High Confidence in Internal Validity?
- •Temporal Precedence
- •Establishing Causality through a Process of Elimination
- •Internal Validity - the Final Word
- •How is Content Validity Measured?
- •An Example of Low Content Validity
- •Face Validity - Some Examples
- •If Face Validity is so Weak, Why is it Used?
- •Bibliography
- •What is Construct Validity?
- •How to Measure Construct Variability?
- •Threats to Construct Validity
- •Hypothesis Guessing
- •Evaluation Apprehension
- •Researcher Expectancies and Bias
- •Poor Construct Definition
- •Construct Confounding
- •Interaction of Different Treatments
- •Unreliable Scores
- •Mono-Operation Bias
- •Mono-Method Bias
- •Don't Panic
- •Bibliography
- •Criterion Validity
- •Content Validity
- •Construct Validity
- •Tradition and Test Validity
- •Which Measure of Test Validity Should I Use?
- •Works Cited
- •An Example of Criterion Validity in Action
- •Criterion Validity in Real Life - The Million Dollar Question
- •Coca-Cola - The Cost of Neglecting Criterion Validity
- •Concurrent Validity - a Question of Timing
- •An Example of Concurrent Validity
- •The Weaknesses of Concurrent Validity
- •Bibliography
- •Predictive Validity and University Selection
- •Weaknesses of Predictive Validity
- •Reliability and Science
- •Reliability and Cold Fusion
- •Reliability and Statistics
- •The Definition of Reliability Vs. Validity
- •The Definition of Reliability - An Example
- •Testing Reliability for Social Sciences and Education
- •Test - Retest Method
- •Internal Consistency
- •Reliability - One of the Foundations of Science
- •Test-Retest Reliability and the Ravages of Time
- •Inter-rater Reliability
- •Interrater Reliability and the Olympics
- •An Example From Experience
- •Qualitative Assessments and Interrater Reliability
- •Guidelines and Experience
- •Bibliography
- •Internal Consistency Reliability
- •Split-Halves Test
- •Kuder-Richardson Test
- •Cronbach's Alpha Test
- •Summary
- •Instrument Reliability
- •Instruments in Research
- •Test of Stability
- •Test of Equivalence
- •Test of Internal Consistency
- •Reproducibility vs. Repeatability
- •The Process of Replicating Research
- •Reproducibility and Generalization - a Cautious Approach
- •Reproducibility is not Essential
- •Reproducibility - An Impossible Ideal?
- •Reproducibility and Specificity - a Geological Example
- •Reproducibility and Archaeology - The Absurdity of Creationism
- •Bibliography
- •Type I Error
- •Type II Error
- •Hypothesis Testing
- •Reason for Errors
- •Type I Error - Type II Error
- •How Does This Translate to Science Type I Error
- •Type II Error
- •Replication
- •Type III Errors
- •Conclusion
- •Examples of the Null Hypothesis
- •Significance Tests
- •Perceived Problems With the Null
- •Development of the Null
Reliability and Science
Reliability is something that every scientist, especially in social sciences and biology, must be aware of.
In science, the definition is the same, but needs a much narrower and unequivocal definition.
Another way of looking at this is as maximizing the inherent repeatability or consistency in an experiment. For maintaining reliability internally, a researcher will use as many repeat sample groups as possible, to reduce the chance of an abnormal sample group skewing the results.
If you use three replicate samples for each manipulation, and one generates completely different results from the others, then there may be something wrong with theexperiment.
For many experiments, results follow a ‘normal distribution' and there is always a chance that your sample group produces results lying at one of the extremes. Using multiple sample groups will smooth out these extremes and generate a more accurate spread of results.
If your results continue to be wildly different, then there is likely to be something very wrong with your design; it is unreliable.
Reliability and Cold Fusion
Reliability is also extremely important externally, and another researcher should be able to perform exactly the same experiment, with similar equipment, under similar conditions, and achieve exactly the same results. If they cannot, then the design is unreliable.
A good example of a failure to apply the definition of reliability correctly is provided by the cold fusion case, of 1989
Fleischmann and Pons announced to the world that they had managed to generate heat at normal temperatures, instead of the huge and expensive tori used in most research into nuclear fusion.
This announcement shook the world, but researchers in many other institutions across the world attempted to replicate the experiment, with no success. Whether the researchers lied, or genuinely made a mistake is unclear, but their results were clearly unreliable.
Reliability and Statistics
Physical scientists expect to obtain exactly the same results every single time, due to the relative predictability of the physical realms. If you are a nuclear physicist or an inorganic chemist, repeat experiments should give exactly the same results, time after time.
Ecologists and social scientists, on the other hand, understand fully that achieving exactly the same results is an exercise in futility. Research in these disciplines incorporates random factors and natural fluctuations and, whilst any experimental design must attempt to eliminate confounding variables and natural variations, there will always be some disparities.
The key to performing a good experiment is to make sure that your results are as reliable as is possible; if anybody repeats the experiment, powerful statistical tests will be able to compare the results and the scientist can make a solid estimate of statistical reliability.
The Definition of Reliability Vs. Validity
Reliability and validity are often confused, but the terms actually describe two completely different concepts, although they are often closely inter-related. This distinct difference is best summed up with an example:
A researcher devises a new test that measures IQ more quickly than the standard IQ test:
If the new test delivers scores for a candidate of 87, 65, 143 and 102, then the test is not reliable or valid, and it is fatally flawed.
If the test consistently delivers a score of 100 when checked, but the candidates real IQ is 120, then the test is reliable, but not valid.
If the researcher's test delivers a consistent score of 118, then that is pretty close, and the test can be considered both valid and reliable.
Reliability is an essential component of validity but, on its own, is not a sufficient measure of validity. A test can be reliable but not valid, whereas a test cannot be valid yet unreliable.
Reliability, in simple terms, describes the repeatability and consistencyof a test. Validity defines the strength of the final results and whether they can be regarded as accurately describing the real world.