CMSB tutorial 9: Machine Learning - WEKA

David Gilbert and José Antonio Reyes

The aim of this lab is to give you more practical experience in the use of WEKA for Machine Learning applications from the lecture on machine Learning for micro-array classification.

Resources:

Some useful resources about WEKA are at the website www.cs.waikato.ac.nz/ml/weka

The WEKA datafiles for this tutorial can be found here.

Exercises:

  1. Ensure that you have worked through the previous Weka tutorial.

  2. Look at one of the confusion matrices output by Weka from e.g.
    java weka.classifiers.trees.Id3 -t PATH/weather.nominal.arff
    Compute the values for Use these values to compute the following measures of performance
    • Accuracy
    • Positive Predicted Value -- PPV
    • Negative Predicted Value -- NPV
    • TP-rate (Sensitivity / Recall)
    • TN-rate (Specificity)
    • F-measure (van Rijsbergen)
    • Correlation Coefficient

  3. Repeat these for confusion matrices generated by other classifiers, to see how their performance measurements differ from that for Id3.

  4. Use different cross-validation fold values [with the -x option], e.g.
    java weka.classifiers.trees.Id3 -t PATH/weather.nominal.arff -x 5
    and then recompute the values for the measures of performance above.

  5. Area Under the Curve (AUC) computations:
    Download these files to a workspace on your XP machine.
    Make sure that you can understand the files auc_execute.bat, AUC.pl, and yeast.txt.
    Then run auc_execute by clicking on the icon. Have a look at the results file auc_results.txt.