I have a dataset of around 600k individual words, and their frequency of use in two corpora. I'm comparing the two (one corpus is 'correct' the other is not). I wanted to show how much corruption there was in the wrong corpus by visualizing the entirety of it (a box) with histograms on the bottom-left and top-right showing false positives and false negatives eating into the 'clean' data.
I could just say the data is x, y% are false positive, z% are false negatives. But it's just not the same, is it?
Make pretty pictures.
I have a dataset of around 600k individual words, and their frequency of use in two corpora. I'm comparing the two (one corpus is 'correct' the other is not). I wanted to show how much corruption there was in the wrong corpus by visualizing the entirety of it (a box) with histograms on the bottom-left and top-right showing false positives and false negatives eating into the 'clean' data.
I could just say the data is x, y% are false positive, z% are false negatives. But it's just not the same, is it?