JavaScript - Efficiently find all elements containing one of a large set of strings - javascript

I have a set of strings and I need to find all all of the occurrences in an HTML document. Where the string occurs is important because I need to handle each case differently:
String is all or part of an attribute. e.g., the string is foo: <input value="foo"> -> Add class ATTR to the element.
String is the full text of an element. e.g., <button>foo</button> -> Add class TEXT to the element.
String is inline in the text of an element. e.g., <p>I love foo</p> -> Wrap the text in a span tag with class TEXT.
Also, I need to match the longest string first. e.g., if I have foo and foobar, then <p>I love foobar</p> should become <p>I love <span class="TEXT">foobar</span></p>, not <p>I love <span class="TEXT">foo</span>bar</p>.
The inline text is easy enough: Sort the strings descending by length and find and replace each in document.body.innerHTML with <span class="TEXT">$1</span>, although I'm not sure if that is the most efficient way to go.
For the attributes, I can do something like this:
sortedStrings.each(function(it) {
document.body.innerHTML.replace(new RegExp('(\S+?)="[^"]*'+escapeRegExChars(it)+'[^"]*"','g'),function(s,attr) {
Again, that seems inefficient.
Lastly, for the full text elements, a depth first search of the document that compares the innerHTML to each string will work, but for a large number of strings, it seems very inefficient.
Any answer that offers performance improvements gets an upvote :)
EDIT: I went with a modification on Bob's answer. delim is an optional delimiter around the string (to differentiate it from normal text), and keys is the list of strings.
function dfs(iterator,scope) {
scope = scope || document.body;
$(scope).children().each(function() {
return dfs(iterator,this);
var escapeChars = /['\/.*+?|()[\]{}\\]/g;
function safe(text) {
return text.replace(escapeChars, '\\$1');
function eachKey(iterator) {
var key, lit, i, len, exp;
for(i = 0, len = keys.length; i < len; i++) {
key = keys[i].trim();
lit = (delim + key + delim);
exp = new RegExp(delim + '(' + safe(key) + ')' + delim,'g');
$(function() {
keys = keys.sort(function(a,b) {
return b.length - a.length;
dfs(function() {
var a, attr, html, val, el = $(this);
eachKey(function(key,lit,exp) {
// check attributes
for(a in el[0].attributes) {
attr = el[0].attributes[a].nodeName;
val = el.attr(attr);
if(exp.test(val)) {
// check all content
html = el.html().trim();
if(html === lit) {
el.html(key); // remove delims
} else if(exp.test(html)) {
// check partial content
Under the assumption that the traversal is the most expensive operation, this seems optimal, although improvements are still welcome.

Trying to parse HTML with regex is a mug's game. It simply can't handle even the basic strucures of HTML, never mind the quirks. There's so much wrong with your snippet already. (Doesn't detect unquoted attributes; fails for a wide variety of punctuation in it due to lack of HTML-escaping, regex-escaping or CSS-escaping(*); failure for attributes with - in; strange non-use of replace...)
So, use the DOM. Yes, that'll mean a traversal. But then so does a selector like the [attr*=] you're using already.
var needle= 'foo';
$('*').each(function() {
var tag= this.tagName.toLowerCase();
if (tag==='script' || tag==='style' || tag==='textarea' || tag==='option') return;
// Find text in attribute values
for (var attri= this.attributes.length; attri-->0;)
if (this.attributes[attri].value.indexOf(needle)!==-1)
// Find text in child text nodes
for (var childi= this.childNodes.length; childi-->0;) {
var child= this.childNodes[childi];
if (child.nodeType!=3) continue;
// Sole text content of parent: add class directly to parent
if ( && element.childNodes.length===1) {
// Else find index of each occurence in text, and wrap each in span
var parts=;
for (var parti= parts.length; parti-->1;) {
var span= document.createElement('span');
span.className= 'TEXT';
var ix=[parti].length;
var trail= child.splitText(ix);
this.insertBefore(span, trail);
(The reverse-loops are necessary as this is a destructive iteration of content.)
(*: escape doesn't do any of those things. It's more like URL-encoding, but it's not really that either. It's almost always the wrong thing; avoid.)

There is really no good way to do this. Your last requirement makes you have to traverse the entire dom.
for the first 2 requirements i would select all elements by tag name, and interate over them inserting the stuff as needed.
only performance improvement i can think of is to do this on the server side at all costs, this may even mean an extra post to have your faster server do the work, otherwise this can be really slow on say, IE6


Finding HTML tags by using `content`'s of them from a Google Chrome extension [duplicate]

How can I find DIV with certain text? For example:
SomeText, text continues.
Trying to use something like this:
var text = document.querySelector('div[SomeText*]').innerTEXT;
But ofcourse it will not work. How can I do it?
OP's question is about plain JavaScript and not jQuery.
Although there are plenty of answers and I like #Pawan Nogariya answer, please check this alternative out.
You can use XPATH in JavaScript. More info on the MDN article here.
The document.evaluate() method evaluates an XPATH query/expression. So you can pass XPATH expressions there, traverse into the HTML document and locate the desired element.
In XPATH you can select an element, by the text node like the following, whch gets the div that has the following text node.
//div[text()="Hello World"]
To get an element that contains some text use the following:
//div[contains(., 'Hello')]
The contains() method in XPATH takes a node as first parameter and the text to search for as second parameter.
Check this plunk here, this is an example use of XPATH in JavaScript
Here is a code snippet:
var headings = document.evaluate("//h1[contains(., 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
var thisHeading = headings.iterateNext();
console.log(thisHeading); // Prints the html element in console
console.log(thisHeading.textContent); // prints the text content in console
thisHeading.innerHTML += "<br />Modified contents";
As you can see, I can grab the HTML element and modify it as I like.
You could use this pretty simple solution:
.find(el => el.textContent === 'SomeText, text continues.');
The Array.from will convert the NodeList to an array (there are multiple methods to do this like the spread operator or slice)
The result now being an array allows for using the Array.find method, you can then put in any predicate. You could also check the textContent with a regex or whatever you like.
Note that Array.from and Array.find are ES2015 features. Te be compatible with older browsers like IE10 without a transpiler:'div'))
.filter(function (el) {
return el.textContent === 'SomeText, text continues.'
Since you have asked it in javascript so you can have something like this
function contains(selector, text) {
var elements = document.querySelectorAll(selector);
return, function(element){
return RegExp(text).test(element.textContent);
And then call it like this
contains('div', 'sometext'); // find "div" that contain "sometext"
contains('div', /^sometext/); // find "div" that start with "sometext"
contains('div', /sometext$/i); // find "div" that end with "sometext", case-insensitive
This solution does the following:
Uses the ES6 spread operator to convert the NodeList of all divs to an array.
Provides output if the div contains the query string, not just if it exactly equals the query string (which happens for some of the other answers). e.g. It should provide output not just for 'SomeText' but also for 'SomeText, text continues'.
Outputs the entire div contents, not just the query string. e.g. For 'SomeText, text continues' it should output that whole string, not just 'SomeText'.
Allows for multiple divs to contain the string, not just a single div.
[...document.querySelectorAll('div')] // get all the divs in an array
.map(div => div.innerHTML) // get their contents
.filter(txt => txt.includes('SomeText')) // keep only those containing the query
.forEach(txt => console.log(txt)); // output the entire contents of those
<div>SomeText, text continues.</div>
<div>Not in this div.</div>
<div>Here is more SomeText.</div>
Coming across this in 2021, I found using XPATH too complicated (need to learn something else) for something that should be rather simple.
Came up with this:
function querySelectorIncludesText (selector, text){
return Array.from(document.querySelectorAll(selector))
.find(el => el.textContent.includes(text));
querySelectorIncludesText('button', 'Send')
Note that I decided to use includes and not a strict comparison, because that's what I really needed, feel free to adapt.
You might need those polyfills if you want to support all browsers:
* String.prototype.includes() polyfill
* #see
if (!String.prototype.includes) {
String.prototype.includes = function (search, start) {
'use strict';
if (search instanceof RegExp) {
throw TypeError('first argument must not be a RegExp');
if (start === undefined) {
start = 0;
return this.indexOf(search, start) !== -1;
You best see if you have a parent element of the div you are querying. If so get the parent element and perform an element.querySelectorAll("div"). Once you get the nodeList apply a filter on it over the innerText property. Assume that a parent element of the div that we are querying has an id of container. You can normally access container directly from the id but let's do it the proper way.
var conty = document.getElementById("container"),
divs = conty.querySelectorAll("div"),
myDiv = [...divs].filter(e => e.innerText == "SomeText");
So that's it.
If you don't want to use jquery or something like that then you can try this:
function findByText(rootElement, text){
var filter = {
acceptNode: function(node){
// look for nodes that are text_nodes and include the following string.
if(node.nodeType === document.TEXT_NODE && node.nodeValue.includes(text)){
return NodeFilter.FILTER_ACCEPT;
return NodeFilter.FILTER_REJECT;
var nodes = [];
var walker = document.createTreeWalker(rootElement, NodeFilter.SHOW_TEXT, filter, false);
//give me the element containing the node
return nodes;
//call it like
var nodes = findByText(document.body,'SomeText');
//then do what you will with nodes[];
for(var i = 0; i < nodes.length; i++){
//do something with nodes[i]
Once you have the nodes in an array that contain the text you can do something with them. Like alert each one or print to console. One caveat is that this may not necessarily grab divs per se, this will grab the parent of the textnode that has the text you are looking for.
Google has this as a top result for For those who need to find a node with certain text.
By way of update, a nodelist is now iterable in modern browsers without having to convert it to an array.
The solution can use forEach like so.
var elList = document.querySelectorAll(".some .selector");
elList.forEach(function(el) {
if (el.innerHTML.indexOf("needle") !== -1) {
// Do what you like with el
// The needle is case sensitive
This worked for me to do a find/replace text inside a nodelist when a normal selector could not choose just one node so I had to filter each node one by one to check it for the needle.
Use XPath and document.evaluate(), and make sure to use text() and not . for the contains() argument, or else you will have the entire HTML, or outermost div element matched.
var headings = document.evaluate("//h1[contains(text(), 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
or ignore leading and trailing whitespace
var headings = document.evaluate("//h1[contains(normalize-space(text()), 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
or match all tag types (div, h1, p, etc.)
var headings = document.evaluate("//*[contains(text(), 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
Then iterate
let thisHeading;
while(thisHeading = headings.iterateNext()){
// thisHeading contains matched node
Here's the XPath approach but with a minimum of XPath jargon.
Regular selection based on element attribute values (for comparison):
// for matching <element class="foo bar baz">...</element> by 'bar'
var things = document.querySelectorAll('[class*="bar"]');
for (var i = 0; i < things.length; i++) {
things[i].style.outline = '1px solid red';
XPath selection based on text within element.
// for matching <element>foo bar baz</element> by 'bar'
var things = document.evaluate('//*[contains(text(),"bar")]',document,null,XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,null);
for (var i = 0; i < things.snapshotLength; i++) {
things.snapshotItem(i).style.outline = '1px solid red';
And here's with case-insensitivity since text is more volatile:
// for matching <element>foo bar baz</element> by 'bar' case-insensitively
var things = document.evaluate('//*[contains(translate(text(),"ABCDEFGHIJKLMNOPQRSTUVWXYZ","abcdefghijklmnopqrstuvwxyz"),"bar")]',document,null,XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,null);
for (var i = 0; i < things.snapshotLength; i++) {
things.snapshotItem(i).style.outline = '1px solid red';
There are lots of great solutions here already. However, to provide a more streamlined solution and one more in keeping with the idea of a querySelector behavior and syntax, I opted for a solution that extends Object with a couple prototype functions. Both of these functions use regular expressions for matching text, however, a string can be provided as a loose search parameter.
Simply implement the following functions:
// find all elements with inner text matching a given regular expression
// args:
// selector: string query selector to use for identifying elements on which we
// should check innerText
// regex: A regular expression for matching innerText; if a string is provided,
// a case-insensitive search is performed for any element containing the string.
Object.prototype.queryInnerTextAll = function(selector, regex) {
if (typeof(regex) === 'string') regex = new RegExp(regex, 'i');
const elements = [...this.querySelectorAll(selector)];
const rtn = elements.filter((e)=>{
return e.innerText.match(regex);
return rtn.length === 0 ? null : rtn
// find the first element with inner text matching a given regular expression
// args:
// selector: string query selector to use for identifying elements on which we
// should check innerText
// regex: A regular expression for matching innerText; if a string is provided,
// a case-insensitive search is performed for any element containing the string.
Object.prototype.queryInnerText = function(selector, text){
return this.queryInnerTextAll(selector, text)[0];
With these functions implemented, you can now make calls as follows:
document.queryInnerTextAll('', 'go');
This would find all divs containing the link class with the word go in the innerText (eg. Go Left or GO down or go right or It's Good)
document.queryInnerText('', 'go');
This would work exactly as the example above except it would return only the first matching element.
document.queryInnerTextAll('a', /^Next$/);
Find all links with the exact text Next (case-sensitive). This will exclude links that contain the word Next along with other text.
document.queryInnerText('a', /next/i);
Find the first link that contains the word next, regardless of case (eg. Next Page or Go to next)
e = document.querySelector('#page');
e.queryInnerText('button', /Continue/);
This performs a search within a container element for a button containing the text, Continue (case-sensitive). (eg. Continue or Continue to Next but not continue)
I had similar problem.
Function that return all element which include text from arg.
This works for me:
function getElementsByText(document, str, tag = '*') {
return [...document.querySelectorAll(tag)]
el => (el.text && el.text.includes(str))
|| (el.children.length === 0 && el.outerText && el.outerText.includes(str)))
Since there are no limits to the length of text in a data attribute, use data attributes! And then you can use regular css selectors to select your element(s) like the OP wants.
for (const element of document.querySelectorAll("*")) {
element.dataset.myInnerText = element.innerText;
document.querySelector("*[data-my-inner-text='Different text.']").style.color="blue";
<div>SomeText, text continues.</div>
<div>Different text.</div>
Ideally you do the data attribute setting part on document load and narrow down the querySelectorAll selector a bit for performance.
I was looking for a way to do something similar using a Regex, and decided to build something of my own that I wanted to share if others are looking for a similar solution.
function getElementsByTextContent(tag, regex) {
const results = Array.from(document.querySelectorAll(tag))
.reduce((acc, el) => {
if (el.textContent && el.textContent.match(regex) !== null) {
return acc;
}, []);
return results;

A generic way to extract and replace text from the DOM [duplicate]

This question already has an answer here:
Replace each word in webpage's paragraphs with a button containing that text
(1 answer)
Closed 5 years ago.
I have two wrappers:
function wrapSentences(str, tmpl) {
return str.replace(/[^\.!\?]+[\.!\?]+/g, tmpl || "<sentence>$&</sentence>")
function wrapWords(str, tmpl) {
return str.replace(/\w+/g, tmpl || "<word>$&</word>");
I use these in our extension to wrap every word and sentence on any webpage the user visits for TTS and settings purposes.
document.body is the most atomic element on every website, but doing body.innerHTML = wrapWords(body.innerText) will (obviously) replace any element that was in between the different text nodes, thus breaking (the visual part of) the website. I'm looking for a way to find any closest element around any text without knowing anything specific about that element, so I can replace it with a wrapped equivalent without altering the website in any way.
I found several examples that go to the deepest child, but they all rely on passing something (node or id) the extension has no way of knowing about. We will use rangy for highlighting, but have the same issue... I always end up having to pass a node or id that the extension is unable to be aware of when visiting random sites.
One of the examples that needs a node passed:
function replaceTextNodes(node, newText) {
if (node.nodeType === 3) {
//Filter out text nodes that contain only whitespace
if (!/^\s*$/.test( { = newText;
} else if (node.hasChildNodes()) {
for (let i = 0, len = node.childNodes.length; i < len; ++i) {
replaceTextNodes(node.childNodes[i], newText);
I'll be happy to explain it better if needed. I fear my wording may not always be the best, I'm aware of that.
It looks like what you want is all the text nodes on the page... This question might have your answer.
Using the function from the first answer:
Edit: now wrapping text in <word> nodes, not just their textContent
function textNodesUnder(el){
var n, a=[], walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
while(n=walk.nextNode()) a.push(n);
return a;
exp = /(?:(\W+)|(\w+))/g
.filter(t => !/^\s*$/.test(t.textContent))
.forEach(t => {
let s = t.textContent, match
while(match = exp.exec(s)) {
let el
if(match[1] !== undefined) {
el = document.createTextNode(match[1])
else {
el = document.createElement("word")
el.textContent = match[2]
t.parentNode.insertBefore(el, t)

Remove Text from Element Without Removing Reference

I would like to replace all the text in some element (including text in children) with some other text. For example, the html
<div id="myText">
This is some text.
This is some other text.
<p id="toHide">
This is even more text.
Click this text to hide it.
should become
<div id="myText">
That is some text.
That is some other text.
<p id="toHide">
That is even more text.
Click That text to hide it.
Essentially, I've replaced all of /this/gi with "That". However, I cannot use the following:
$("#myText").innerHTML = $("#myText").innerHTML.replace(/this/gi, "");
This is because I keep a lot of references to the children of myText. This references will be erased. I realize that in simple cases, I can just update these references, but I have a fairly large file, and many references (and it would be troublesome and error prone to have to update every reference every time this function is called).
I also store some data not visible to innerHTML. For example, I use
$("#toHide").test = "test";
This is lost when writing to innerHTML.
How can I replace text in a div without innerHTML (preferably without jquery)?
Here's a solution:
var n, walker = document.createTreeWalker(document.getElementById("myText"), NodeFilter.SHOW_TEXT);
while (n = walker.nextNode()) {
n.nodeValue = n.nodeValue.replace(/this/ig, "that");
Basically, walk all the text nodes, and substitute their values.
For better compatibility, here's some reusable code:
function visitTextNodes(el, callback) {
if (el.nodeType === 3) {
for (var i=0; i < el.childNodes.length; ++i) {
visitTextNodes(el.childNodes[i], callback);
Then you can do:
visitTextNodes(document.getElementById("myText"), function(el) {
el.nodeValue = el.nodeValue.replace(/this/ig, "that");
You can use DOM methods (a.k.a. the old and safe way)
function replaceText(el, pattern, txt) {
for(var i=0; i<el.childNodes.length; ++i) {
var node = el.childNodes[i];
case 1: // Element
replaceText(node, pattern, txt); continue;
case 3: // Text node
node.nodeValue = node.nodeValue.replace(/this/gi, "that"); continue;
Here my version of replaceText:
function replaceText(elem) {
if(elem.nodeType === Node.TEXT_NODE) {
elem.nodeValue = elem.nodeValue.replace(/this/gi, 'that')
var children = elem.childNodes
for(var i = 0, len = children.length; i < len; ++i)
NB this take an element as the first parameter and traverse all children, hence it works even with complex elements.
Here the updated fiddle:

Replacing content of html document

I am trying to replace a word in an html document with selected word using javascript.
var node=document.body;
var childs=node.childNodes;
var n=childs.length,i=0;
while (i < n) {
if (node.nodeType == 3) {
if (node.textContent) {
but string is not getting replaced...pls help
Add document.body=node; at the end. When you set node to equal body you are copying the value, not editing it by reference.
I'm not sure why you're trying to work with the text node directly. console.log on nodeValue shows that the textContent of displayed tags is neither retrieved nor set in your code.
This works great. Live demo here (click).
<p>something to be replaced.</p>
and the js:
var childs = document.body.childNodes;
var len = childs.length;
for (var i=0; i<len; ++i) {
var node=childs[i];
if (node.nodeName === 'P') {
node.textContent = node.textContent.replace("to be replaced","was replaced");
There is a much simpler method using the String replace method. For example, you can convert the body of the page into a string and use regular expressions to replace the word. This means that you can avoid having to traverse the entire DOM and node lists, which is unnecessarily slow for your task.

ID Ends With in pure Javascript

I am working in a Javascript library that brings in jQuery for one thing: an "ends with" selector. It looks like this:
It will find the elements in which the id ends with "foo".
I am looking to do this without jQuery (straight JavaScript). How might you go about this? I'd also like it to be as efficient as reasonably possible.
Use querySelectorAll, not available in all browsers (like IE 5/6/7/8) though. It basically works like jQuery:
You will need to iterate over all elements on the page and then use string functions to test it. The only optimizations I can think of is changing the starting point - i.e. not document.body but some other element where you know your element will be a child of - or you could use document.getElementsByTagName() to get an element list if you know the tag name of the elements.
However, your task would be much easier if you could use some 3rd-party-javascript, e.g. Sizzle (4k minified, the same selector engine jQuery uses).
So, using everything that was said, I put together this code. Assuming my elements are all inputs, then the following code is probably the best I am going to get?
String.prototype.endsWith = function(suffix) {
return this.indexOf(suffix, this.length - suffix.length) !== -1;
function getInputsThatEndWith(text) {
var result = new Array();
var inputs = document.getElementsByTagName("input");
for(var i=0; i < inputs.length; i++) {
return result;
I put it on JsFiddle:
#ThiefMaster touched on how you can do the check, but here's the actual code:
function idEndsWith(str)
if (document.querySelectorAll)
return document.querySelectorAll('[id$="'+str+'"]');
var all,
elements = [],
all = document.getElementsByTagName('*');
len = all.length;
regex = new RegExp(str+'$');
for (i = 0; i < len; i++)
if (regex.test(all[i].id))
return elements;
This can be enhanced in a number of ways. It currently iterates through the entire dom, but would be more efficient if it had a context:
function idEndsWith(str, context)
if (!context)
context = document;
...CODE... //replace all occurrences of "document" with "context"
There is no validation/escaping on the str variable in this function, the assumption is that it'll only receive a string of chars.
Suggested changes to your answer:
RegExp.quote = function(str) {
return str.replace(/([.?*+^$[\]\\(){}-])/g, "\\$1");
}; // from
String.prototype.endsWith = function(suffix) {
return !!this.match(new RegExp(RegExp.quote(suffix) + '$'));
function getInputsThatEndWith(text) {
var results = [],
inputs = document.getElementsByTagName("input"),
numInputs = inputs.length,
for(var i=0; i < numInputs; i++) {
var input = inputs[i];
if( results.push(input);
return results;
Implementing String.endsWith using a regex instead of indexOf() is mostly a matter of preference, but I figured it was worth including for variety. If you aren't concerned about escaping special characters in the suffix, you can remove the RegExp.quote() bit, and just use
new RegExp(suffix + '$').
If you know the type of DOM elements you are targeting,
then get a list of references to them using getElementsByTagName , and then iterate over them.
You can use this optimization to fasten the iterations:
ignore the elements not having id.
target the nearest known parent of elements you want to seek, lets say your element is inside a div with id='myContainer', then you can get a restricted subset using
document.getElementById('myContainer').getElementsByTagName('*') , and then iterate over them.