A great post on Yahoo’s blog about Hadoop I/O pipeline: http://developer.yahoo.net/blogs/hadoop/2009/08/the_anatomy_of_hadoop_io_pipel.html
A great post on Yahoo’s blog about Hadoop I/O pipeline: http://developer.yahoo.net/blogs/hadoop/2009/08/the_anatomy_of_hadoop_io_pipel.html
I have been playing around with FUSE. Hopefully, I get some time to work on it! I am planning to work on a provenance framework.
mbox has been designed as a mechanism to store the emails that are received by the users. Basically, mboxes were single file per user; so file locking constraints existed. For example, when the user is reading his emails, the mail server could not deliver emails to the user’s mailbox.
To counteract this, maildir was introduced. maildir created the concept of a file per message. This reduced the effects of the locking by creating 3 directories: new, cur and temp. Any email is received is written to the file into the new directory. when the user is accessing the emails, an hardlink of the file is created in the cur directory and removed from the new directory. When the emails are deleted, the hardlink is added into the temp directory and removed from the cur directory.
Some of the performance comparisons between these boxes are shown here.
New email servers are utilizing the concept of a database to enhance the performance of a database system. Exchange server uses a database for the storing the emails. This is an innovative concept as a database can significantly improve the performance of the system. However, PostPath reports that the database requires frequent read and write operations that effect the performance. The article also cites that the complications arising from backup, restore, database-corruption and compaction and, finally, disaster recovery. Another interesting read is the fact that exchange server can be circumvented to use Linux storage solutions to overcome database related issues.
Some examples maintained by Robert Love: http://www.kernel.org/pub/linux/kernel/people/rml/inotify/utils/
> In inotify example, on any event, the name of the file is printed. I
> am looking for absolute path to the file and the filename. I have
> tried programming it, but was unsuccessful. Can you give me some pointers?
“Ah, okay. There is no nontrivial way to do this inside of the kernel (because, among other reasons, file paths are not definite to get to any one file. Also, it is nontrivial to assemble paths inside of the kernel).
So what you need to do is save the mapping back from a given watch descriptor (wd) to the path.
So if you create a watch at /foo/bar/ and it is wd=1, when an event on
wd=1 arrives with payload “baz” you somehow (hash seems smart) map wd=1 back to /foo/bar/ and append baz.
Of course, if you are only watching a single thing, you can hardcode the mapping backward.”
An article about Inotify: http://www-128.ibm.com/developerworks/linux/library/l-inotify.html?ca=dgr-lnxw07Inotify
Kernel Patch: http://www.kernel.org/pub/linux/kernel/people/rml/inotify/utils/