
- •Validity and reliability
- •What is Reliability?
- •What is Validity?
- •Conclusion
- •What is External Validity?
- •Psychology and External Validity The Battle Lines are Drawn
- •Randomization in External Validity and Internal Validity
- •Work Cited
- •What is Internal Validity?
- •Internal Validity vs Construct Validity
- •How to Maintain High Confidence in Internal Validity?
- •Temporal Precedence
- •Establishing Causality through a Process of Elimination
- •Internal Validity - the Final Word
- •How is Content Validity Measured?
- •An Example of Low Content Validity
- •Face Validity - Some Examples
- •If Face Validity is so Weak, Why is it Used?
- •Bibliography
- •What is Construct Validity?
- •How to Measure Construct Variability?
- •Threats to Construct Validity
- •Hypothesis Guessing
- •Evaluation Apprehension
- •Researcher Expectancies and Bias
- •Poor Construct Definition
- •Construct Confounding
- •Interaction of Different Treatments
- •Unreliable Scores
- •Mono-Operation Bias
- •Mono-Method Bias
- •Don't Panic
- •Bibliography
- •Criterion Validity
- •Content Validity
- •Construct Validity
- •Tradition and Test Validity
- •Which Measure of Test Validity Should I Use?
- •Works Cited
- •An Example of Criterion Validity in Action
- •Criterion Validity in Real Life - The Million Dollar Question
- •Coca-Cola - The Cost of Neglecting Criterion Validity
- •Concurrent Validity - a Question of Timing
- •An Example of Concurrent Validity
- •The Weaknesses of Concurrent Validity
- •Bibliography
- •Predictive Validity and University Selection
- •Weaknesses of Predictive Validity
- •Reliability and Science
- •Reliability and Cold Fusion
- •Reliability and Statistics
- •The Definition of Reliability Vs. Validity
- •The Definition of Reliability - An Example
- •Testing Reliability for Social Sciences and Education
- •Test - Retest Method
- •Internal Consistency
- •Reliability - One of the Foundations of Science
- •Test-Retest Reliability and the Ravages of Time
- •Inter-rater Reliability
- •Interrater Reliability and the Olympics
- •An Example From Experience
- •Qualitative Assessments and Interrater Reliability
- •Guidelines and Experience
- •Bibliography
- •Internal Consistency Reliability
- •Split-Halves Test
- •Kuder-Richardson Test
- •Cronbach's Alpha Test
- •Summary
- •Instrument Reliability
- •Instruments in Research
- •Test of Stability
- •Test of Equivalence
- •Test of Internal Consistency
- •Reproducibility vs. Repeatability
- •The Process of Replicating Research
- •Reproducibility and Generalization - a Cautious Approach
- •Reproducibility is not Essential
- •Reproducibility - An Impossible Ideal?
- •Reproducibility and Specificity - a Geological Example
- •Reproducibility and Archaeology - The Absurdity of Creationism
- •Bibliography
- •Type I Error
- •Type II Error
- •Hypothesis Testing
- •Reason for Errors
- •Type I Error - Type II Error
- •How Does This Translate to Science Type I Error
- •Type II Error
- •Replication
- •Type III Errors
- •Conclusion
- •Examples of the Null Hypothesis
- •Significance Tests
- •Perceived Problems With the Null
- •Development of the Null
Establishing Causality through a Process of Elimination
Establishing causality through elimination is the easiest way to prove that an experiment has high internal validity.
As with the lemming example, there could be many other plausible explanations for the apparent causal link between prey and predator.
Researchers often refer to any such confounding variable as the 'Missing Variable,' an unknown factor that may underpin the apparent relationship.
The problem is, as the name suggests, that the variable is missing, and trying to find it is almost impossible. The only way to nullify it is through strong experimental design, eliminating confounding variables and ensuring that they cannot have any influence.
Randomization, control groups and repeat experiments are the best way to eliminate these variables and maintain high validity.
In the lemming example, researchers use a whole series of experiments, measuring predation rates, alternative food sources and lemming breeding rates, attempting to establish a baseline.
Internal Validity - the Final Word
Just to leave you with an example of how difficult measuring internal validity can be:
In the experiment where researchers compared a computer program for teaching Greek against traditional methods, there are a number of threats to internal validity.
The group with computers feels special, so they try harder, the Hawthorne Effect.
The group without computers becomes jealous, and tries harder to prove that they should have been given the chance to use the shiny new technology.
Alternatively, the group without computers is demoralized and their performance suffers.
Parents of the children in the computerless group feel that their children are missing out, and complain that all children should be given the opportunity.
The children talk outside school and compare notes, muddying the water.
The teachers feel sorry for the children without the program and attempt to compensate, helping the children more than normal.
We are not trying to depress you with these complications, only illustrate how complex internal validity can be.
In fact, perfect internal validity is an unattainable ideal, but any research design must strive towards that perfection.
For those of you wondering whether you picked the right course, don't worry. Designing experiments with good internal validity is a matter of experience, and becomes much easier over time.
For the scientists who think that social sciences are soft - think again!
Content Validity
Content validity, sometimes called logical or rational validity, is the estimate of how much a measure represents every single element of a construct.
Bottom of Form
For example, an educational test with strong content validity will represent the subjects actually taught to students, rather than asking unrelated questions.
Content validity is often seen as a prerequisite to criterion validity, because it is a good indicator of whether the desired trait is measured. If elements of the test are irrelevant to the main construct, then they are measuring something else completely, creating potential bias.
In addition, criterion validity derives quantitative correlations from test scores.
Content validity is qualitative in nature, and asks whether a specific element enhances or detracts from a test or research program.