You are reading a single comment by and its replies. Click here to read the full conversation.
  • Providing a very small fragment of source HTML, suitably edited to remove context, would really help in trying to understand exactly what the underlying issue is.

    I've just tested copying superscript HTML characters into Word (Paste Special > Unformatted text) and don't find it corrupts anything, and into Notepad++, and it just comes in as regular unformatted text.

    [edit: crossed over with your previous reply. Thing is, the "2" is part of the text content in HTML, so a method to strip it out probably needs to distinguish it by looking at the HTML formatting tags which surround it]

  • Cheers for looking, I'll grab some HTML now that I'm back at my desk.

    However, just to clarify: it doesn't corrupt anything in terms of representing the original text, but it maintains the text (albeit not as superscript) which is a corruption of the original source material.

    [edit: and just saw you edited! ha - here's an example of the above anyway, as I'd already found it:

    When Mr. <span class="lineunder">Faulkner</span> delivered me your former letter (for I have since had one sent me hither by Mr. <span class="lineunder">Pope</span><a href="/item/swifjoOU0040435a1c/nts/002" title="2 [marked '1' in source]" class="notecall_nts">2</a>) I was just got up from my bed
    

    ]

  • Okay, I'm probably way out on this but it really looks like a find-and-replace with wildcards would do it (Word can replace 'everything between this character and that character' with a space, or nothing). If you had a galley of the text with all formatting there would be common ranges of characters that you could use to identify the sections you want to clean out.

    ::edit:: as suggested up there!

About

Avatar for   started