Gautam's Blog

The technical blog of Gautam!

Browsing Posts in data mining

I have been learning R and planning to use this in Hadoop environment.

Check out: http://www.stat.purdue.edu/~sguha/rhipe/

Increase the JVM heapspace of weka in the properties file from 128M to 1024M in:

C:\Program Files\Weka-3-6\RunWeka.ini

change the entry for maxheap to:

maxheap=1024m

I have been using Hadoop to parse web logs. Using Hadoop, I have been able to parse the logs to get multiple features. The output results are separated using a comma. The output can then be fed into Weka to perform clustering analysis.

I have been using Weka rather than Apache Mahout. Reasons:

  • Weka gives me a visual analysis of results.
  • Gui-based mechanism is helpful to identify and understand the relation of one dimension with another when visually represented on a 2-dimensional space.

I will move onto Apache Mahout soon, once I understand the relationship of 1 feature with another.

Weka: http://www.cs.waikato.ac.nz/ml/weka/

R project: http://cran.cnr.berkeley.edu/

Powered by WordPress Web Design by SRS Solutions © 2012 Gautam's Blog Design by SRS Solutions