CMSB tutorial 9: Machine Learning - WEKA
The aim of this lab is to give you more practical experience in the use of WEKA
for Machine Learning applications from the
lecture on machine Learning for micro-array classification.
Resources:
Some useful resources about WEKA are at the website
www.cs.waikato.ac.nz/ml/weka
The WEKA datafiles for this tutorial can be found
here.
Exercises:
-
Ensure that you have worked through the previous
Weka
tutorial.
-
Look at one of the confusion matrices output by Weka from e.g.
java weka.classifiers.trees.Id3 -t PATH/weather.nominal.arff
Compute the values for
- TP (True Positives)
- FP (False Positives)
- TN (True Negatives)
- FN (False Negatives)
Use these values to compute the following measures of performance
- Accuracy
- Positive Predicted Value -- PPV
- Negative Predicted Value -- NPV
- TP-rate (Sensitivity / Recall)
- TN-rate (Specificity)
- F-measure (van Rijsbergen)
- Correlation Coefficient
-
Repeat these for confusion matrices generated by other classifiers, to see how
their performance measurements differ from that for Id3.
-
Use different cross-validation fold values [with the -x option], e.g.
java weka.classifiers.trees.Id3 -t PATH/weather.nominal.arff -x 5
and then recompute the values for the measures of performance above.
-
Area Under the Curve (AUC) computations:
Download
these files to a workspace on your XP machine.
Make sure that you can understand the files
auc_execute.bat,
AUC.pl, and
yeast.txt.
Then run auc_execute by clicking on the icon. Have a look at the results file
auc_results.txt.