You are reading a single comment by @number21 and its replies. Click here to read the full conversation.
  • Do you mind me asking what the corpus is? I might attempt something similar at some point so this looks really interesting.

    Although, I can't really help with the visualising, I'd probably want to try it in Power BI or Splunk or something but I don't even know how we'd structure the data yet.

  • Happy to chat about it. Corpus is duplicates of certain Eighteenth century texts, ocr'd and keyed, but this is a subset of a larger corpus which is only ocr'd. We use the texts for "proper" historic work, but trying to use digital methods to do so. Stuff I'm doing now is trying to say something about the quality of the ocr'd corpus, and what methods may work best/worst with texts with OCR errors.

About

Avatar for number21 @number21 started