A fantastic post by Kristóf Kovács: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
A fantastic post by Kristóf Kovács: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
I would like to take a look at how to parse libpcap files in Hadoop. A problem is that the files are not easily ‘splittable’. However, we can parse PCAP files using Java using PcapDumper (sample code in the distribution’s SVN): data needs to be serialized using protocol buffers. Watch for this patch.
Eucalyptus is a private cloud-computing platform that implements the Amazon specification for EC2, S3, and EBS.
http://open.eucalyptus.com/downloads
Dr. Rich Wolski’s talk at USENIX LISA 2009. http://www.usenix.org/events/lisa09/stream1/wolski.html
Cassandra version 0.6 supports Hadoop. Check out the documentation here: http://wiki.apache.org/cassandra/HadoopSupport
I have been learning R and planning to use this in Hadoop environment.
Check out: http://www.stat.purdue.edu/~sguha/rhipe/
A great post on Yahoo’s blog about Hadoop I/O pipeline: http://developer.yahoo.net/blogs/hadoop/2009/08/the_anatomy_of_hadoop_io_pipel.html
I am yet to check out Hivo, but here are the results otherwise. I am loading to an empty table, without much indexes; otherwise, I might need to disable indexing and re-enable indexing after the load data.
| Real(Sec) | User(Sec) | System(Sec) | |
| Writing to HDFS (MR time) | 204.034 | 1.388 | 0.24 |
| SQL Inserts from 4 Reducers (includes MR time) | 2280.26 | 4.68 | 0.828 |
| SQL Inserts from 10 Reducers (includes MR time) | 2640.184 | 4.12 | 0.624 |
| SQL Inserts from Namenode (does not include MR time) | 7835.52 | 61.652 | 29.926 |
| Inload file sequential Inserts (includes MR time) | 290.46 | 13.509 | 2.604 |