Okay, I'm probably way out on this but it really looks like a find-and-replace with wildcards would do it (Word can replace 'everything between this character and that character' with a space, or nothing). If you had a galley of the text with all formatting there would be common ranges of characters that you could use to identify the sections you want to clean out.
::edit:: as suggested up there!
This is essentially what I'm doing (removing any instance of "HYPERLINK" when using the 'field code' view in Word breaks the code and the notes are gone when pasted again). But it's also what I want to avoid because it makes a mundane, but doable, process into something much more time consuming by adding a third app and few more key combo steps.
Stupid computers.
Cheers for looking, I'll grab some HTML now that I'm back at my desk.
However, just to clarify: it doesn't corrupt anything in terms of representing the original text, but it maintains the text (albeit not as superscript) which is a corruption of the original source material.
[edit: and just saw you edited! ha - here's an example of the above anyway, as I'd already found it: