- •LECTURE 9
- •MACHINE LEARNING OVERVIEW
- •LEARNING
- •MAINTAINING A BALANCE
- •A PARTIAL CHARACTERISATION OF LEARNING TASKS
- •MAINTAINING A BALANCE
- •MAINTAINING A BALANCE
- •INDUCTIVE LOGIC PROGRAMMING
- •EXAMPLE LEARNED LP
- •STOCHASTIC LOGIC PROGRAMS
- •AUTOMATED THEORY FORMATION
- •OTHER MACHINE LEARNING METHODS
- •BIOINFORMATICS OVERVIEW
- •FROM SEQUENCE TO STRUCTURE
- •PROBLEM NUMBER ONE
- •PROBLEM NUMBER TWO
- •OTHER AIMS OF BIOINFORMATICS
- •SOME CURRENT
- •A SUBSTRUCTURE SERVER
- •THE SUBSTRUCTURE SERVER
- •USING MEDICAL ONTOLOGIES
- •GENE ONTOLOGY DISCOVERY
- •STUDYING BIOCHEMICAL NETWORKS
- •CLOSED LOOP MACHINE LEARNING
- •FUTURE DIRECTIONS FOR MACHINE LEARNING IN BIOINFORMATICS
- •BIOCHEMICAL PATHWAYS
LECTURE 9
MACHINE LEARNING OVERVIEW
Ultimately about writing programs which improve with experience
Experience through data
Experience through knowledge
Experience through experimentation (active)
Some common tasks:
Concept learning for prediction
Clustering
Association rule mining
ML
Machine learning consists in programming computers to optimize a performance criterion by using example data or past experience.
The optimized criterion can be the accuracy provided by a predictive model—in a modelling problem—, and the value of a fitness or evaluation function—in an optimization problem.
LEARNING
In a modelling problem, the ‘learning’ term refers to running a computer program to induce a model by using training data or past experience.
Machine learning uses statistical theory when building computational models since the objective is to make inferences from a sample.
The two main steps in this process are to induce the model by processing the huge amount of data and to represent the model and making inferences efficiently.
MAINTAINING A BALANCE
Predictive |
Supervised |
tasks |
learning |
Descriptive |
Unsupervised |
tasks |
learning |
Know what you’re looking for
Don’t know what you’re looking for
Don’t know you’re even looking
A PARTIAL CHARACTERISATION OF LEARNING TASKS
Concept learning
Outlier/anomaly detection
Clustering
Concept formation
Conjecture making
Puzzle generation
Theory formation
MAINTAINING A BALANCE
IN PREDICTIVE/DESCRIPTIVE TASKS
Predictive tasks
From accuracy to understanding
Need to show statistical significance
But hypotheses generated often need to be understandable
Difference between the stock market and biology
Descriptive tasks
From pebbles to pearls
Lots of rubbish produced
Cannot rely on statistical significance
Have to worry about notions of
MAINTAINING A BALANCE
IN SCIENTIFIC DISCOVERY TASKS
Machine learning researchers
Are generally not domain scientists also
Extremely important to collaborate
To provide interesting projects
Remembering that we are scientists not IT consultants
To gain materials
Data, background knowledge, heuristics,
To assess the value of the output
INDUCTIVE LOGIC PROGRAMMING
Concept/rule learning technique (usually)
Hypotheses represented as Logic Programs
Search for LPs
From general to specific or vice-versa
One method is inverse entailment
Use measures to guide the search
Predictive accuracy and compression (info. theory)
Search performed within a language bias
Produces good accuracy and understanding
Logic programs are easier to decipher than
EXAMPLE LEARNED LP
fold('Four-helical up-and-down bundle',P) :- helix(P,H1),
length(H1,hi),
position(P,H1,Pos), interval(1 =< Pos =< 3), adjacent(P,H1,H2), helix(P,H2).
Predicting protein folds from helices