I have been using Hadoop to parse web logs. Using Hadoop, I have been able to parse the logs to get multiple features. The output results are separated using a comma. The output can then be fed into Weka to perform clustering analysis.
I have been using Weka rather than Apache Mahout. Reasons:
- Weka gives me a visual analysis of results.
- Gui-based mechanism is helpful to identify and understand the relation of one dimension with another when visually represented on a 2-dimensional space.
I will move onto Apache Mahout soon, once I understand the relationship of 1 feature with another.
Comments
Leave a comment Trackback