Since I know alot of the forum work in IT, figured I'd ask here.
Trying to work out a way to take large volumes of email chains that are unclassified (stored as flat files in form ID.txt, can easily be XML). Then to group them via similarities (ie: what the email is about).
So say 27 of them appeared to be about componant xyz which fails to run when you do various things. I'd want them to get grouped. I'm not looking for 100% accuracy more for targeting places for humans to look.
I'm thinking of trying out Mahout (Hardoop) but was wondering if anyone else has tried this ?
Since I know alot of the forum work in IT, figured I'd ask here.
Trying to work out a way to take large volumes of email chains that are unclassified (stored as flat files in form ID.txt, can easily be XML). Then to group them via similarities (ie: what the email is about).
So say 27 of them appeared to be about componant xyz which fails to run when you do various things. I'd want them to get grouped. I'm not looking for 100% accuracy more for targeting places for humans to look.
I'm thinking of trying out Mahout (Hardoop) but was wondering if anyone else has tried this ?