So August’s ISACA meeting was on Applied Machine Learning and presented by Jay Luan, a Data Scientist from Cylance. I was particularly excited about this because Cylance has made top of many lists including best place to work and startup of the year. Furthermore they have raised close to 200 million dollars in less than 5 five years so they are doing something right!
I have uploaded the presentation, but here are some of the notes from the meeting.
- Label – pretty self explanatory but an example would be: if the end point is active or not active, or is the file malicious or not
- Clustering – type of unsupervised learning. Unsupervised learning is when the algorithm decides how to best group things together. For example: NMAP clustering would detail what ports are open. An example of a clustering algorithm is k-means. In this the user, tells the algorithm how many clusters or groups the user wants. The algorithm keeps on moving the centroids by taking the average of all the closest data points. This process is repeated until the centroid is in the group of cluster and the user can stop the process. A key limitation to K-means is that there is no real way to identify outliers, so this algorithm would need to be paired up with something like DBSCAN ( Density-Based Spatial Clustering of Application with Noise). Another way to patch up the limitation from K-means is by interactive clustering also known as “Human in the loop.”
- Classification – type of supervised learning.
- Feature – anything that describes what you’re looking at. For example, features of a car would be the number of wheels it has, the color, etc. It makes sense that the more features that can be extracted, the higher the probability of creating more groups, thus increasing the accuracy of the model. Features can be plotted on x-y-z axis, and mathematically the distance between any two points describes the discrepancy.
- Vector – an array of numbers essentially since a string of string or bytes cannot be inputted into the model
Link to download the presentation: