JavaScript sanitize HTML string and remove ID, class and other attributes - javascript

I need help to sanitize my HTML text provided by user. I have following HTML code:
var htmlStr = `<p id="test" class="mydemo">TEhis is test</p>
<pre class="css">
<html>
<body class="test"></body>
</html>
</pre>`;
I want to remove ID, Class or any attribute from all the tags OTHER then <PRE> and <CODE> tags using plain JavaScript.
I tried following but not getting proper output:
sanitizeHtml(html: any) {
let temp = document.createElement('div');
temp.innerHTML = html;
// let t1 = temp.querySelectorAll('*');
temp.querySelectorAll('*').forEach(node => {
if(node.nodeName !== 'PRE') {
return node.removeAttribute('id');
}
})
console.log(temp);
// return html.replace(/\s*(\w+)=\"[^\"]+\"/gim, '').replace(/<script>[\w\W\s\S]+<\/script>/gim);
}
Please let me know if you need further information on it.

This is a little mechanical, and perhaps not the optimal solution, however you could achieve this by chaining .replace() with the following regular expressions to sanitise your HTML string as needed:
function sanitizeHtml(html) {
var htmlSanitized = html
.replace(/<pre[\w\s"=]*>/gi, function(match) {
// Add a place holder to attrbitues on pre elements to prevent
// removal of these in subsequent step
return match.replace(/=/gi, 'EQUALS')
})
.replace(/\w+="\w+"/gi,'')
.replace(/\s+>/gi,'>')
.replace(/EQUALS/i,'=')
return htmlSanitized;
}
var htmlStr = `<p id="test" class="mydemo">TEhis is test</p>
<pre class="css">
<html>
<body class="test"></body>
</html>
</pre>`;
console.log(sanitizeHtml(htmlStr));

Related

HTML Change Within Jquery Textarea Value

I want to extract html codes from a textarea value but failed.
I want to detect and replace images with textarea value.
Below is an example of what I want to do.
TEXTAREA
<textarea class="editor"><img src="x1"><img src="x2"></textarea>
The code below is an example of what I want to do, I know it's wrong.
var editor_images = $('.editor').val().find('img');
editor_images.each(function(key, value) {
$(this).attr('src','example');
});
If you want to replace multiple attributes or tags, then your question may be too broad. However, the example below gives you an idea of how to replace an image attribute within the textarea:
function replaceValueOfTextArea(searchAttr, replaceAttr, value) {
const editor = document.querySelector('.editor');
const imgs = editor.value.match(/<img[a-zA-Z0-9="' ]+>/g);
let textAreaNewValue = '';
for (let img of imgs) {
const regMatch = new RegExp(`(?<!img)${searchAttr}`, "gi");
const match = img.match(regMatch);
if (match) {
const regAttr = new RegExp(`${searchAttr}=["|'][^"|']+["|']`, "gi");
textAreaNewValue += img.replace(regAttr, `${replaceAttr}="${value}"`);
} else {
textAreaNewValue += img;
}
}
editor.value = String(textAreaNewValue);
}
replaceValueOfTextArea('src', 'src', 'https://example.com');
<textarea class="editor"><img src="x1"><img alt="x2"></textarea>
You can use jQuery's $.parseHTML() to parse an HTML string into DOM nodes. Then you can use this method to turn them back into HTML before reinserting them in your <textarea>:
// Get contents of editor as HTML and parse into individual <img> nodes:
let nodes = $.parseHTML( $('.editor').val() )
// Map through <img> nodes and change src attribute, and return as HTML text:
let html = nodes.map(function(node){
$(node).attr('src', 'example')
return $('<div>').append($(node).clone()).html();
})
// Insert HTML text back into editor:
$('.editor').html( html.join('') )
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<textarea class="editor"><img src="x1"><img src="x2"></textarea>

How to inject html tags within a script in Javascript?

I have a string where I would like to inject a <br> tag based on array values.
My html code:
'Hello people. `<span>`Hello to everyone`</span>`';
<script>
let array = ['Hello', 'to everyone'];
</script>
I need to inject <br> tag between 'Hello' and 'to everyone' inside <span> based on contents of array. How can I do it without replacing the tags and not affecting the first 'Hello' word?
Expected output:
'Hello people. `<span>`Hello `<br>` to everyone`</span>`'
You should do this by clearing the contents of the <span> element, then iterate over the array with a foreach method where you add the contents of the array then a line break(<br>).You can define the line break with const lineBreak = document.createElement('br'); , then you can add it to the span element with some DOM manipulation like this elem.appendChild(lineBreak)
Here is the full code:
<head>
<title>hello world</title>
</head>
<body>
<p>Hello people. <span id='message' style="WHITE-SPACE: br">Hello to everyone</span></p>
<script>
function split_words(){
let array = ['Hello', 'to everyone'];
const elem = document.getElementById('message');
let result = '';
const lineBreak = document.createElement('br');
elem.innerHTML = '';
array.forEach(word => {
elem.innerHTML += word;
elem.appendChild(lineBreak)
});
}
window.onload = split_words();
</script>
</body>
You could just use \n or <br/> within the text.
Your code will look something like so
document.write(`Hello people. <span>Hello <br> to everyone</span>`);

How to insert elements into html document from string?

I have a string of html code stored in the localStorage, and what I want is to convert that string into a document and add that doc to an existing page. So far I came up with:
var data = localStorage.getItem("data");
var frag = document.createRange().createContextualFragment(data);
document.body.appendChild(frag);
but in the page the document fragment is just a simple string.
EDIT
I currently have the html:
<html>
<head>
</head>
<body>
</body>
</html>
The string I saved for test purpose to the localStorage was <p>Test</p>
The result I am trying to get:
<html>
<head>
</head>
<body>
<p>Test</p>
</body>
</html>
The result I get:
<html>
<head>
</head>
<body>
"<p>Test</p>"
</body>
</html>
If the text in local storage is HTML, you can insert it at the beginning of, at the end of, in front of, or after any other existing element by using insertAdjacentHTML. For example, to add to the document using the HTML in html inside the document body at the end:
document.body.insertAdjacentHTML("beforeend", html);
Example:
const html = "<p>This is a new paragraph with <em>emphasized</em> text.</p>";
document.body.insertAdjacentHTML("beforeend", html);
<p>This paragraph is already on the page.</p>
You can also use the innerHTML property of an existing element if you want to completely remove that element's current contents and replace them with what's defined in the HTML string:
someElement.innerHTML = html;
Example:
const html = "This is the new content for the paragraph, with <em>emphasized</em> text.";
document.getElementById("existing-paragraph").innerHTML = html;
<p id="existing-paragraph">This paragraph is already on the page.</p>
If it's not HTML, you can put it in an element (such as a p or div) and append that somewhere via appendChild or insertBefore, e.g.:
const p = document.createElement("p");
p.textContent = text;
document.body.appendChild(p);
Example:
const text = "This is plain text, so things like <this> don't get misinterpreted as HTML.";
const p = document.createElement("p");
p.textContent = text;
document.body.appendChild(p);
Or just append it as raw text using createTextNode:
const textNode = document.createTextNode(text);
document.body.appendChild(textNode);
Example:
const text = "This is plain text, so things like <this> don't get misinterpreted as HTML.";
const textNode = document.createTextNode(text);
document.body.appendChild(textNode);
There's lots more to explore on MDN.
In the comments we've figured out that the text in local storage has already been HTML-encoded, like this:
<p>Testing <em> one two three</em></p>
That means that whatever code put the text in local storage encoded it before doing that (because local storage doesn't do that; it faithfully stores and returns the exact string you give it). The best solution is probably to update that code so that it doesn't do that.
If you can't update that code, you can interpret that text as HTML, you just have to do it twice: Once to interpret the < and such so they're < again, then again to insert and parse the resulting HTML. The easy way to do that is to create an element (a div for instance), set its innerHTML, and then read its textContent. Here's an example:
Example:
const textFromLocalStorage = "<p>Testing <em> one two three</em></p>";
const div = document.createElement("div");
div.innerHTML = textFromLocalStorage;
const decodedHtml = div.textContent;
document.body.insertAdjacentHTML("beforeend", decodedHtml);
<p>This paragraph is already on the page.</p>
the arguments of appendChild are html elements.
When you send to this a string, it converts to textnode.
You need to use innerHTML method
may be that ?
const DomParser = new DOMParser();
let data = localStorage.getItem("data");
let frag = DomParser.parseFromString( data, 'text/html').body.firstChild
document.body.appendChild(frag);
expecting data is only 1 html element (it can have many html sub elements)
sample code
const DomParser = new DOMParser();
let data = '<p>Test</p>' // eq: localStorage.getItem("data");
let frag = DomParser.parseFromString( data, 'text/html').body.firstChild;
document.body.appendChild(frag);
p {
font-size: 30px;
color: red;
}

Javascript remove one html tag with a specific structure

I can remove all html tags from the text but I cannot remove just the structure in span tags with data-word inside ...
function strip(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}
Original is:
I <span data-word="word1" class="synonyms" title="word2">word3</span> <b>word4<b>.
The result should be:
I word3 <b>word4</b>.
With the script from above the result I get is:I word3 word4. So the remaining html is not preserved.
It is code from Strip HTML from Text JavaScript.
Select the elements you want to remove, replace them by their inner HTML and take the inner HTML.
function stripDataWordTags(container) {
var node = container.cloneNode(true);
Array.prototype.slice.call(node.getElementsByClassName("synonyms"))
.forEach(function(a, i) {
a.parentElement.insertBefore(document.createTextNode(a.innerHTML), a);
a.parentElement.removeChild(a);
});
return node.innerHTML;
}
// Demo and usage:
alert(stripDataWordTags(document.getElementById("test")));
<div id="test">
I <span data-word="test" class="synonyms">love</span> <b>ECMAScript 5</b>.
</div>

Make HTML text bold

I wrote this function which takes in a word as input and puts it in a <b> tag so that it would be bold when rendered in HTML. But when it actually does get rendered, the word is not bold, but only has the <b> tag arround it.
Here is the function:
function delimiter(input, value) {
return input.replace(new RegExp('(\\b)(' + value + ')(\\b)','ig'), '$1<b>$2</b>$3');
}
On providing the value and input, e.g. "message" and "This is a test message":
The output is: This is a test <b>message</b>
The desired output is: This is a test message
Even replacing the value with value.bold(), returns the same thing.
EDIT
This is the HTML together with the JS that I m working on:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Test</title>
<script>
function myFunction(){
var children = document.body.childNodes;
for(var len = children.length, child=0; child<len; child++){
if (children[child].nodeType === 3){ // textnode
var highLight = new Array('abcd', 'edge', 'rss feeds');
var contents = children[child].nodeValue;
var output = contents;
for(var i =0;i<highLight.length;i++){
output = delimiter(output, highLight[i]);
}
children[child].nodeValue= output;
}
}
}
function delimiter(input, value) {
return unescape(input.replace(new RegExp('(\\b)(' + value + ')(\\b)','ig'), '$1<b>$2</b>$3'));
}
</script>
</head>
<body>
<img src="http://some.web.site/image.jpg" title="knorex"/>
These words are highlighted: abcd, edge, rss feeds while these words are not: knewedge, abcdefgh, rss feedssss
<input type ="button" value="Button" onclick = "myFunction()">
</body>
</html>
I'm basically getting the result of the delimiter function and changing the nodeValue of a child node.
Is it possible there is something wrong with the way I'm taking back what the function is returning to me?
This is what I do:
children[child].nodeValue = output;
You need to have the markup processed as HTML, instead of being just set to replace existing content in a text node. For this, replace the statement
children[child].nodeValue= output;
by the following:
var newNode = document.createElement('span');
newNode.innerHTML = output;
document.body.replaceChild(newNode, children[child]);

Categories