You are reading a single comment by and its replies. Click here to read the full conversation.
  • Make pretty pictures.

    I have a dataset of around 600k individual words, and their frequency of use in two corpora. I'm comparing the two (one corpus is 'correct' the other is not). I wanted to show how much corruption there was in the wrong corpus by visualizing the entirety of it (a box) with histograms on the bottom-left and top-right showing false positives and false negatives eating into the 'clean' data.

    I could just say the data is x, y% are false positive, z% are false negatives. But it's just not the same, is it?

About

Avatar for   started