javascript - How to force the browser to stop parsing dynamically inserted code to HTML 4? -

April 15, 2012

i need parse old html pdf file, have jar this, accepts legit xhtml code. have parse old html code jar accept it. know how html-code parse idea use html-parser john resig parse tags (img, br, meta) straight xml, have needed effect (mostly closing tags) on them.

my actual attempt looks this:

function fixtags() {     var tagstoparse = new array( "br", "img", "input", "meta" );      for(i = 0; < tagstoparse.length; i++) {         var elements = document.getelementsbytagname(tagstoparse[i]);         for(j = 0; j < elements.length; j++) {                 elements[j].outerhtml = htmltoxml(elements[j].outerhtml);         }     } }

the problem here browser interpret new code element html4, leads him changing stuff wanted change. example <br> becomes <br/> after going through parser, browser interpret html4 , outerhtml property of element <br> again.

my first attempt solve force document xhtml temporarily:

var root = document.getelementsbytagname("html")[0]; root.setattribute("xml", "http://www.w3.org/1999/xhtml");

but doesn't seem bother browser @ in behaviour.

the "obvious" solution of building string-tree out of dom, replacing strings there , traversing tree string want seems bit heavy , complex "little" problem, that's why ask you.

so if has idea easier solution, happy, application ie-only ie-exclusive solutions accepted well.

for use case, it's easiest use existing html -> xhtml converter, example: http://www.it.uc3m.es/jaf/html2xhtml/simple-form.html

if want in browser, naive solution try this, using naive regexes (you shouldn't use regexp parse xml) , xmlserializer.

var serializer = new xmlserializer(); var xml = serializer.serializetostring(document).replace(/<(img|meta|input|br|link)([^>]*)/gi, function (ignore, tagname, attributes) {     return '<' + tagname + attributes + ' />'; });

you can less naive regex if doesn't work, think document can converted pdf in first place should trick.

edit: note regex assumes none of tags self-closing before operation.

Search This Blog

Roma

javascript - How to force the browser to stop parsing dynamically inserted code to HTML 4? -

Comments

Post a Comment

Popular posts from this blog

How to logout from a login page in asp.net -

How do i redirect a user to the previous page they came from after logging in? HTML/ASP -

java - More than one row with the given identifier was found: 1, for class: com.model.Diagnosis -