You are reading a single comment by and its replies. Click here to read the full conversation.
  • It's behind a pay wall, but there's nothing special happening on the surface. If you were going to suggest scraping somehow, that won't work. Got in trouble for that earlier in the week.

  • Providing a very small fragment of source HTML, suitably edited to remove context, would really help in trying to understand exactly what the underlying issue is.

    I've just tested copying superscript HTML characters into Word (Paste Special > Unformatted text) and don't find it corrupts anything, and into Notepad++, and it just comes in as regular unformatted text.

    [edit: crossed over with your previous reply. Thing is, the "2" is part of the text content in HTML, so a method to strip it out probably needs to distinguish it by looking at the HTML formatting tags which surround it]

About

Avatar for   started