Finding HTML tags by using `content`'s of them from a Google Chrome extension [duplicate] - javascript

How can I find DIV with certain text? For example:
SomeText, text continues.
Trying to use something like this:
var text = document.querySelector('div[SomeText*]').innerTEXT;
But ofcourse it will not work. How can I do it?

OP's question is about plain JavaScript and not jQuery.
Although there are plenty of answers and I like #Pawan Nogariya answer, please check this alternative out.
You can use XPATH in JavaScript. More info on the MDN article here.
The document.evaluate() method evaluates an XPATH query/expression. So you can pass XPATH expressions there, traverse into the HTML document and locate the desired element.
In XPATH you can select an element, by the text node like the following, whch gets the div that has the following text node.
//div[text()="Hello World"]
To get an element that contains some text use the following:
//div[contains(., 'Hello')]
The contains() method in XPATH takes a node as first parameter and the text to search for as second parameter.
Check this plunk here, this is an example use of XPATH in JavaScript
Here is a code snippet:
var headings = document.evaluate("//h1[contains(., 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
var thisHeading = headings.iterateNext();
console.log(thisHeading); // Prints the html element in console
console.log(thisHeading.textContent); // prints the text content in console
thisHeading.innerHTML += "<br />Modified contents";
As you can see, I can grab the HTML element and modify it as I like.

You could use this pretty simple solution:
.find(el => el.textContent === 'SomeText, text continues.');
The Array.from will convert the NodeList to an array (there are multiple methods to do this like the spread operator or slice)
The result now being an array allows for using the Array.find method, you can then put in any predicate. You could also check the textContent with a regex or whatever you like.
Note that Array.from and Array.find are ES2015 features. Te be compatible with older browsers like IE10 without a transpiler:'div'))
.filter(function (el) {
return el.textContent === 'SomeText, text continues.'

Since you have asked it in javascript so you can have something like this
function contains(selector, text) {
var elements = document.querySelectorAll(selector);
return, function(element){
return RegExp(text).test(element.textContent);
And then call it like this
contains('div', 'sometext'); // find "div" that contain "sometext"
contains('div', /^sometext/); // find "div" that start with "sometext"
contains('div', /sometext$/i); // find "div" that end with "sometext", case-insensitive

This solution does the following:
Uses the ES6 spread operator to convert the NodeList of all divs to an array.
Provides output if the div contains the query string, not just if it exactly equals the query string (which happens for some of the other answers). e.g. It should provide output not just for 'SomeText' but also for 'SomeText, text continues'.
Outputs the entire div contents, not just the query string. e.g. For 'SomeText, text continues' it should output that whole string, not just 'SomeText'.
Allows for multiple divs to contain the string, not just a single div.
[...document.querySelectorAll('div')] // get all the divs in an array
.map(div => div.innerHTML) // get their contents
.filter(txt => txt.includes('SomeText')) // keep only those containing the query
.forEach(txt => console.log(txt)); // output the entire contents of those
<div>SomeText, text continues.</div>
<div>Not in this div.</div>
<div>Here is more SomeText.</div>

Coming across this in 2021, I found using XPATH too complicated (need to learn something else) for something that should be rather simple.
Came up with this:
function querySelectorIncludesText (selector, text){
return Array.from(document.querySelectorAll(selector))
.find(el => el.textContent.includes(text));
querySelectorIncludesText('button', 'Send')
Note that I decided to use includes and not a strict comparison, because that's what I really needed, feel free to adapt.
You might need those polyfills if you want to support all browsers:
* String.prototype.includes() polyfill
* #see
if (!String.prototype.includes) {
String.prototype.includes = function (search, start) {
'use strict';
if (search instanceof RegExp) {
throw TypeError('first argument must not be a RegExp');
if (start === undefined) {
start = 0;
return this.indexOf(search, start) !== -1;

You best see if you have a parent element of the div you are querying. If so get the parent element and perform an element.querySelectorAll("div"). Once you get the nodeList apply a filter on it over the innerText property. Assume that a parent element of the div that we are querying has an id of container. You can normally access container directly from the id but let's do it the proper way.
var conty = document.getElementById("container"),
divs = conty.querySelectorAll("div"),
myDiv = [...divs].filter(e => e.innerText == "SomeText");
So that's it.

If you don't want to use jquery or something like that then you can try this:
function findByText(rootElement, text){
var filter = {
acceptNode: function(node){
// look for nodes that are text_nodes and include the following string.
if(node.nodeType === document.TEXT_NODE && node.nodeValue.includes(text)){
return NodeFilter.FILTER_ACCEPT;
return NodeFilter.FILTER_REJECT;
var nodes = [];
var walker = document.createTreeWalker(rootElement, NodeFilter.SHOW_TEXT, filter, false);
//give me the element containing the node
return nodes;
//call it like
var nodes = findByText(document.body,'SomeText');
//then do what you will with nodes[];
for(var i = 0; i < nodes.length; i++){
//do something with nodes[i]
Once you have the nodes in an array that contain the text you can do something with them. Like alert each one or print to console. One caveat is that this may not necessarily grab divs per se, this will grab the parent of the textnode that has the text you are looking for.

Google has this as a top result for For those who need to find a node with certain text.
By way of update, a nodelist is now iterable in modern browsers without having to convert it to an array.
The solution can use forEach like so.
var elList = document.querySelectorAll(".some .selector");
elList.forEach(function(el) {
if (el.innerHTML.indexOf("needle") !== -1) {
// Do what you like with el
// The needle is case sensitive
This worked for me to do a find/replace text inside a nodelist when a normal selector could not choose just one node so I had to filter each node one by one to check it for the needle.

Use XPath and document.evaluate(), and make sure to use text() and not . for the contains() argument, or else you will have the entire HTML, or outermost div element matched.
var headings = document.evaluate("//h1[contains(text(), 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
or ignore leading and trailing whitespace
var headings = document.evaluate("//h1[contains(normalize-space(text()), 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
or match all tag types (div, h1, p, etc.)
var headings = document.evaluate("//*[contains(text(), 'Hello')]", document, null, XPathResult.ANY_TYPE, null );
Then iterate
let thisHeading;
while(thisHeading = headings.iterateNext()){
// thisHeading contains matched node

Here's the XPath approach but with a minimum of XPath jargon.
Regular selection based on element attribute values (for comparison):
// for matching <element class="foo bar baz">...</element> by 'bar'
var things = document.querySelectorAll('[class*="bar"]');
for (var i = 0; i < things.length; i++) {
things[i].style.outline = '1px solid red';
XPath selection based on text within element.
// for matching <element>foo bar baz</element> by 'bar'
var things = document.evaluate('//*[contains(text(),"bar")]',document,null,XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,null);
for (var i = 0; i < things.snapshotLength; i++) {
things.snapshotItem(i).style.outline = '1px solid red';
And here's with case-insensitivity since text is more volatile:
// for matching <element>foo bar baz</element> by 'bar' case-insensitively
var things = document.evaluate('//*[contains(translate(text(),"ABCDEFGHIJKLMNOPQRSTUVWXYZ","abcdefghijklmnopqrstuvwxyz"),"bar")]',document,null,XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,null);
for (var i = 0; i < things.snapshotLength; i++) {
things.snapshotItem(i).style.outline = '1px solid red';

There are lots of great solutions here already. However, to provide a more streamlined solution and one more in keeping with the idea of a querySelector behavior and syntax, I opted for a solution that extends Object with a couple prototype functions. Both of these functions use regular expressions for matching text, however, a string can be provided as a loose search parameter.
Simply implement the following functions:
// find all elements with inner text matching a given regular expression
// args:
// selector: string query selector to use for identifying elements on which we
// should check innerText
// regex: A regular expression for matching innerText; if a string is provided,
// a case-insensitive search is performed for any element containing the string.
Object.prototype.queryInnerTextAll = function(selector, regex) {
if (typeof(regex) === 'string') regex = new RegExp(regex, 'i');
const elements = [...this.querySelectorAll(selector)];
const rtn = elements.filter((e)=>{
return e.innerText.match(regex);
return rtn.length === 0 ? null : rtn
// find the first element with inner text matching a given regular expression
// args:
// selector: string query selector to use for identifying elements on which we
// should check innerText
// regex: A regular expression for matching innerText; if a string is provided,
// a case-insensitive search is performed for any element containing the string.
Object.prototype.queryInnerText = function(selector, text){
return this.queryInnerTextAll(selector, text)[0];
With these functions implemented, you can now make calls as follows:
document.queryInnerTextAll('', 'go');
This would find all divs containing the link class with the word go in the innerText (eg. Go Left or GO down or go right or It's Good)
document.queryInnerText('', 'go');
This would work exactly as the example above except it would return only the first matching element.
document.queryInnerTextAll('a', /^Next$/);
Find all links with the exact text Next (case-sensitive). This will exclude links that contain the word Next along with other text.
document.queryInnerText('a', /next/i);
Find the first link that contains the word next, regardless of case (eg. Next Page or Go to next)
e = document.querySelector('#page');
e.queryInnerText('button', /Continue/);
This performs a search within a container element for a button containing the text, Continue (case-sensitive). (eg. Continue or Continue to Next but not continue)

I had similar problem.
Function that return all element which include text from arg.
This works for me:
function getElementsByText(document, str, tag = '*') {
return [...document.querySelectorAll(tag)]
el => (el.text && el.text.includes(str))
|| (el.children.length === 0 && el.outerText && el.outerText.includes(str)))

Since there are no limits to the length of text in a data attribute, use data attributes! And then you can use regular css selectors to select your element(s) like the OP wants.
for (const element of document.querySelectorAll("*")) {
element.dataset.myInnerText = element.innerText;
document.querySelector("*[data-my-inner-text='Different text.']").style.color="blue";
<div>SomeText, text continues.</div>
<div>Different text.</div>
Ideally you do the data attribute setting part on document load and narrow down the querySelectorAll selector a bit for performance.

I was looking for a way to do something similar using a Regex, and decided to build something of my own that I wanted to share if others are looking for a similar solution.
function getElementsByTextContent(tag, regex) {
const results = Array.from(document.querySelectorAll(tag))
.reduce((acc, el) => {
if (el.textContent && el.textContent.match(regex) !== null) {
return acc;
}, []);
return results;


Two blocks in getElementById [duplicate]

doStuff(document.getElementById("myCircle1" "myCircle2" "myCircle3" "myCircle4"));
This doesn't work, so do I need a comma or semi-colon to make this work?
document.getElementById() only supports one name at a time and only returns a single node not an array of nodes. You have several different options:
You could implement your own function that takes multiple ids and returns multiple elements.
You could use document.querySelectorAll() that allows you to specify multiple ids in a CSS selector string .
You could put a common class names on all those nodes and use document.getElementsByClassName() with a single class name.
Examples of each option:
doStuff(document.querySelectorAll("#myCircle1, #myCircle2, #myCircle3, #myCircle4"));
// put a common class on each object
function getElementsById(ids) {
var idList = ids.split(" ");
var results = [], item;
for (var i = 0; i < idList.length; i++) {
item = document.getElementById(idList[i]);
if (item) {
doStuff(getElementsById("myCircle1 myCircle2 myCircle3 myCircle4"));
This will not work, getElementById will query only one element by time.
You can use document.querySelectorAll("#myCircle1, #myCircle2") for querying more then one element.
ES6 or newer
With the new version of the JavaScript, you can also convert the results into an array to easily transverse it.
const elementsList = document.querySelectorAll("#myCircle1, #myCircle2");
const elementsArray = [...elementsList];
// Now you can use cool array prototypes
elementsArray.forEach(element => {
How to query a list of IDs in ES6
Another easy way if you have an array of IDs is to use the language to build your query, example:
const ids = ['myCircle1', 'myCircle2', 'myCircle3'];
const elements = document.querySelectorAll( => `#${id}`).join(', '));
No, it won't work.
document.getElementById() method accepts only one argument.
However, you may always set classes to the elements and use getElementsByClassName() instead. Another option for modern browsers is to use querySelectorAll() method:
document.querySelectorAll("#myCircle1, #myCircle2, #myCircle3, #myCircle4");
I suggest using ES5 array methods:
["myCircle1","myCircle2","myCircle3","myCircle4"] // Array of IDs
.map(document.getElementById, document) // Array of elements
Then doStuff will be called once for each element, and will receive 3 arguments: the element, the index of the element inside the array of elements, and the array of elements.
getElementByID is exactly that - get an element by id.
Maybe you want to give those elements a circle class and getElementsByClassName
document.getElementById() only takes one argument. You can give them a class name and use getElementsByClassName() .
Dunno if something like this works in js, in PHP and Python which i use quite often it is possible.
Maybe just use for loop like:
function doStuff(){
for(i=1; i<=4; i++){
var i = document.getElementById("myCiricle"+i);
Vulgo has the right idea on this thread. I believe his solution is the easiest of the bunch, although his answer could have been a little more in-depth. Here is something that worked for me. I have provided an example.
<h1 id="hello1">Hello World</h1>
<h2 id="hello2">Random</h2>
<button id="click">Click To Hide</button>
document.getElementById('click').addEventListener('click', function(){
function doStuff() {
for(var i=1; i<=2; i++){
var el = document.getElementById("hello" + i); = 'none';
Obviously just change the integers in the for loop to account for however many elements you are targeting, which in this example was 2.
The best way to do it, is to define a function, and pass it a parameter of the ID's name that you want to grab from the DOM, then every time you want to grab an ID and store it inside an array, then you can call the function
<p id="testing">Demo test!</p>
function grabbingId(element){
var storeId = document.getElementById(element);
return storeId;
grabbingId("testing").syle.color = "red";
You can use something like this whit array and for loop.
<p id='fisrt'>??????</p>
<p id='second'>??????</p>
<p id='third'>??????</p>
<p id='forth'>??????</p>
<p id='fifth'>??????</p>
<button id="change" onclick="changeColor()">color red</button>
var ids = ['fisrt','second','third','forth','fifth'];
function changeColor() {
for (var i = 0; i < ids.length; i++) {
For me worked flawles something like this
document.getElementById("myCircle1") ,
document.getElementById("myCircle2") ,
document.getElementById("myCircle3") ,
Use jQuery or similar to get access to the collection of elements in only one sentence. Of course, you need to put something like this in your html's "head" section:
<script type='text/javascript' src='url/to/my/jquery.1.xx.yy.js' ...>
So here is the magic:
.- First of all let's supose that you have some divs with IDs as you wrote, i.e.,
...some html...
<div id='MyCircle1'>some_inner_html_tags</div>
...more html...
<div id='MyCircle2'>more_html_tags_here</div>
<div id='MyCircleN'>more_and_more_tags_again</div>
.- With this 'spell' jQuery will return a collection of objects representing all div elements with IDs containing the entire string "myCircle" anywhere:
This is all! Note that you get rid of details like the numeric suffix, that you can manipulate all the divs in a single sentence, animate them... Voilá!
Prove this in your browser's script console (press F12) right now!
As stated by jfriend00,
document.getElementById() only supports one name at a time and only returns a single node not an array of nodes.
However, here's some example code I created which you can give one or a comma separated list of id's. It will give you one or many elements in an array. If there are any errors, it will return an array with an Error as the only entry.
function safelyGetElementsByIds(ids){
if(typeof ids !== 'string') return new Error('ids must be a comma seperated string of ids or a single id string');
ids = ids.split(",");
let elements = [];
for(let i=0, len = ids.length; i<len; i++){
const currId = ids[i];
const currElement = (document.getElementById(currId) || new Error(currId + ' is not an HTML Element'));
if(currElement instanceof Error) return [currElement];
return elements;
safelyGetElementsByIds('realId1'); //returns [<HTML Element>]
safelyGetElementsByIds('fakeId1'); //returns [Error : fakeId1 is not an HTML Element]
safelyGetElementsByIds('realId1', 'realId2', 'realId3'); //returns [<HTML Element>,<HTML Element>,<HTML Element>]
safelyGetElementsByIds('realId1', 'realId2', 'fakeId3'); //returns [Error : fakeId3 is not an HTML Element]
If, like me, you want to create an or-like construction, where either of the elements is available on the page, you could use querySelector. querySelector tries locating the first id in the list, and if it can't be found continues to the next until it finds an element.
The difference with querySelectorAll is that it only finds a single element, so looping is not necessary.
document.querySelector('#myCircle1, #myCircle2, #myCircle3, #myCircle4');
here is the solution
if (
document.getElementById('73536573').value != '' &&
document.getElementById('1081743273').value != '' &&
document.getElementById('357118391').value != '' &&
document.getElementById('1238321094').value != '' &&
document.getElementById('1118122010').value != ''
) {
You can do it with document.getElementByID Here is how.
function dostuff (var here) {
if(add statment here) {
document.getElementById('First ID'));
document.getElementById('Second ID'));
There you go! xD

How to compare if an HTML element exists in the node array?

selectedContentWrap: HTML nodes.
htmlVarTag: is an string.
How do I check if the HTML element exists in the nodes?
The htmlVarTag is a string and don't understand how to convert it so it check again if there is a tag like that so that if there is I can remove it?
here is output of my nodes that is stored in selectedContentWrap
var checkingElement = $scope.checkIfHTMLinside(selectedContentWrap,htmlVarTag );
$scope.checkIfHTMLinside = function(selectedContentWrap,htmlVarTag){
var node = htmlVarTag.parentNode;
while (node != null) {
if (node == selectedContentWrap) {
return true;
node = node.parentNode;
return false;
Well if you could paste the content of selectedContentWrap I would be able to test this code, but I think this would work
// Code goes here
var checkIfHTMLinside = function(selectedContentWrap,htmlVarTag){
for (item of selectedContentWrap) {
if (item.nodeName.toLowerCase() == htmlVarTag.toLowerCase()){
return true;
return false;
Simplest is use angular.element which is a subset of jQuery compatible methods
$scope.checkIfHTMLinside = function(selectedContentWrap,htmlVarTag){
// use filter() on array and return filtered array length as boolean
return selectedContentWrap.filter(function(str){
// return length of tag collection found as boolean
return angular.element('<div>').append(str).find(htmlVarTag).length
Still not 100% clear if objective is only to look for a specific tag or any tags (ie differentiate from text only)
Or as casually mentioned to actually remove the tag
If you want to remove the tag it's not clear if you simply want to unwrap it or remove it's content also ... both easily achieved using angular.element
Try using: node.innerHTML and checking against that
is it me or post a question on stackoverflow and 20min after test testing I figure it.,...
the answer is that in the selectedContentWrap I already got list of nodes, all I need to do i compare , so a simple if for loop will fit.
To compare the names I just need to use .nodeName as that works cross browser ( correct me if I am wrong)
Some dev say that "dictionary of tag names and anonymous closures instead" - but couldn't find anything. If anyone has this library could you please post it to the question?
here is my code.
var node = selectedContentWrap;
console.log('node that is selectedwrapper', selectedContentWrap)
for (var i = 0; i < selectedContentWrap.length; i++) {
console.log('tag name is ',selectedContentWrap[i].nodeName);
var temptagname = selectedContentWrap[i].nodeName; // for debugging
if(selectedContentWrap[i].nodeName == 'B' ){
console.log('contains element B');

How to check if element contains string after converting everything to lowercase

I'm using a simple input field to search through a list on my website using this code:
$('#f-search').keyup(function() {
var q = $('#f-search').val().toLowerCase();
$("#f-list .f // toLowerCase // :contains('q')").css('border-color', '#900');
My problem is that the list elements (.f) contain unpredictable capital letters, so in order to accurately check it against the input I need to convert it to lower case, but I don't know how to do that and then use :contains. For example, if one .f contains "WoWoWoWoWzzzziees" but the user typed "wow", it wouldn't be a match with my current code, but I'd like it to be.
What you want is:
$("#f-list .f").filter(function() {
return $(this)
.text() // or .html() or .val(), depends on the element type
.indexOf(q) != -1;
}).css('border-color', '#900');
which compares the text inside your elements selected by "#f-list .f" and, if they contain what is in the q variable, they get the css modification applied.
If you also want the list to be reset each time, you can do this:
$("#f-list .f").css('border-color', 'WHATEVER IT WAS').filter(function() {
return $(this)
.text() // or .html() or .val(), depends on the element type
.indexOf(q) != -1;
}).css('border-color', '#900');
For better performance you could cache your list like this:
var f_list = $("#f-list .f"),
f_search = $('#f-search');
f_search.keyup(function() {
var q = f_search.val().toLowerCase();
f_list.css('border-color', 'WHATEVER IT WAS').filter(function() {
return $(this)
.text() // or .html() or .val(), depends on the element type
.indexOf(q) != -1;
}).css('border-color', '#900');
You can go as far as creating your own custom :contains selector in jQuery:
From here and here and here:
jQuery.expr[":"].Contains = jQuery.expr.createPseudo(function(arg) {
return function( elem ) {
return jQuery(elem).text().toUpperCase().indexOf(arg.toUpperCase()) >= 0;
And use it like this (please note the new selector is :Contains with uppercase C):
$("#f-list .f:Contains('q')").css('border-color', '#900');

How to filter elements returned by QuerySelectorAll

I'm working on a javascript library, and I use this function to match elements:
$ = function (a)
var x;
if (typeof a !== "string" || typeof a === "undefined"){ return a;}
//Pick the quickest method for each kind of selector
return document.getElementById(a.split('#')[1]);
else if(a.match(/^([\w\-]+)$/))
x = document.getElementsByTagName(a);
x = document.querySelectorAll(a);
//Return the single object if applicable
return (x.length === 1) ? x[0] : x;
There are occasions where I would want to filter the result of this function, like pick out a div span, or a #id div or some other fairly simple selector.
How can I filter these results? Can I create a document fragment, and use the querySelectorAll method on that fragment, or do I have to resort to manual string manipulation?
I only care about modern browsers and IE8+.
If you want to look at the rest of my library, it's here:
To clarify, I want to be able to do something like $_(selector).children(other_selector) and return the children elements matching that selector.
So here's my potential solution to the simplest selectors:
tag_reg = /^([\w\-]+)$/;
id_reg = /#([\w\-]+$)/;
class_reg = /\.([\w\-]+)$/;
function _sel_filter(filter, curr_sel)
var i,
len = curr_sel.length,
matches = [];
if(typeof filter !== "string")
return filter;
//Filter by tag
if(curr_sell[i].tagName.toLowerCase() == filter.toLowerCase())
else if(filter.match(class_reg))
else if(filter.match(id_reg))
return document.getElementById(filter);
console.log(filter+" is not a valid filter");
return (matches.length === 1) ? matches[0] : matches;
It takes a tag like div, an id, or a class selector, and returns the matching elements with the curr_sel argument.
I don't want to have to resort to a full selector engine, so is there a better way?
I don't think I get the question right. Why would you want to "filter" the result of querySelectorAll() which infact, is some kind of a filter itself. If you query for div span or even better #id div, those results are already filtered, no ?
However, you can apply Array.prototype.filter to the static result of querySelectorAll like follows:
var filter = Array.prototype.filter,
result = document.querySelectorAll('div'),
filtered = result, function( node ) {
return !!node.querySelectorAll('span').length;
That code would first use querySelectorAll() to query for all <div> nodes within the document. Afterwards it'll filter for <div> nodes which contain at least one <span>. That code doesn't make much sense and is just for demonstrative purposes (just in case some SO member wants to create a donk comment)
You can also filter with Element.compareDocumentPosition. I'll also tell if Elements are disconnected, following, preceding, or contained. See MDC .compareDocumentPosition()
Note: NodeList is not a genuine array, that is to say it doesn't have
the array methods like slice, some, map etc. To convert it into an
array, try Array.from(nodeList).
for example:
let highlightedItems = Array.from(userList.querySelectorAll(".highlighted"));
highlightedItems.filter((item) => {
Most concise way in 2019 is with spread syntax ... plus an array literal [...], which work great with iterable objects like the NodeList returned by querySelectorAll:
[...document.querySelectorAll(".myClass")].filter(el=>{/*your code here*/})
Some browsers that support qsa also support a non-standard matchesSelector method, like:
...that will return a boolean representing whether element matched the selector provided. So you could iterate the collection, and apply that method, retaining positive results.
In browsers that don't have a matchesSelector, you'd probably need to build your own selector based method similar to the selector engine you're building.

JavaScript - Efficiently find all elements containing one of a large set of strings

I have a set of strings and I need to find all all of the occurrences in an HTML document. Where the string occurs is important because I need to handle each case differently:
String is all or part of an attribute. e.g., the string is foo: <input value="foo"> -> Add class ATTR to the element.
String is the full text of an element. e.g., <button>foo</button> -> Add class TEXT to the element.
String is inline in the text of an element. e.g., <p>I love foo</p> -> Wrap the text in a span tag with class TEXT.
Also, I need to match the longest string first. e.g., if I have foo and foobar, then <p>I love foobar</p> should become <p>I love <span class="TEXT">foobar</span></p>, not <p>I love <span class="TEXT">foo</span>bar</p>.
The inline text is easy enough: Sort the strings descending by length and find and replace each in document.body.innerHTML with <span class="TEXT">$1</span>, although I'm not sure if that is the most efficient way to go.
For the attributes, I can do something like this:
sortedStrings.each(function(it) {
document.body.innerHTML.replace(new RegExp('(\S+?)="[^"]*'+escapeRegExChars(it)+'[^"]*"','g'),function(s,attr) {
Again, that seems inefficient.
Lastly, for the full text elements, a depth first search of the document that compares the innerHTML to each string will work, but for a large number of strings, it seems very inefficient.
Any answer that offers performance improvements gets an upvote :)
EDIT: I went with a modification on Bob's answer. delim is an optional delimiter around the string (to differentiate it from normal text), and keys is the list of strings.
function dfs(iterator,scope) {
scope = scope || document.body;
$(scope).children().each(function() {
return dfs(iterator,this);
var escapeChars = /['\/.*+?|()[\]{}\\]/g;
function safe(text) {
return text.replace(escapeChars, '\\$1');
function eachKey(iterator) {
var key, lit, i, len, exp;
for(i = 0, len = keys.length; i < len; i++) {
key = keys[i].trim();
lit = (delim + key + delim);
exp = new RegExp(delim + '(' + safe(key) + ')' + delim,'g');
$(function() {
keys = keys.sort(function(a,b) {
return b.length - a.length;
dfs(function() {
var a, attr, html, val, el = $(this);
eachKey(function(key,lit,exp) {
// check attributes
for(a in el[0].attributes) {
attr = el[0].attributes[a].nodeName;
val = el.attr(attr);
if(exp.test(val)) {
// check all content
html = el.html().trim();
if(html === lit) {
el.html(key); // remove delims
} else if(exp.test(html)) {
// check partial content
Under the assumption that the traversal is the most expensive operation, this seems optimal, although improvements are still welcome.
Trying to parse HTML with regex is a mug's game. It simply can't handle even the basic strucures of HTML, never mind the quirks. There's so much wrong with your snippet already. (Doesn't detect unquoted attributes; fails for a wide variety of punctuation in it due to lack of HTML-escaping, regex-escaping or CSS-escaping(*); failure for attributes with - in; strange non-use of replace...)
So, use the DOM. Yes, that'll mean a traversal. But then so does a selector like the [attr*=] you're using already.
var needle= 'foo';
$('*').each(function() {
var tag= this.tagName.toLowerCase();
if (tag==='script' || tag==='style' || tag==='textarea' || tag==='option') return;
// Find text in attribute values
for (var attri= this.attributes.length; attri-->0;)
if (this.attributes[attri].value.indexOf(needle)!==-1)
// Find text in child text nodes
for (var childi= this.childNodes.length; childi-->0;) {
var child= this.childNodes[childi];
if (child.nodeType!=3) continue;
// Sole text content of parent: add class directly to parent
if ( && element.childNodes.length===1) {
// Else find index of each occurence in text, and wrap each in span
var parts=;
for (var parti= parts.length; parti-->1;) {
var span= document.createElement('span');
span.className= 'TEXT';
var ix=[parti].length;
var trail= child.splitText(ix);
this.insertBefore(span, trail);
(The reverse-loops are necessary as this is a destructive iteration of content.)
(*: escape doesn't do any of those things. It's more like URL-encoding, but it's not really that either. It's almost always the wrong thing; avoid.)
There is really no good way to do this. Your last requirement makes you have to traverse the entire dom.
for the first 2 requirements i would select all elements by tag name, and interate over them inserting the stuff as needed.
only performance improvement i can think of is to do this on the server side at all costs, this may even mean an extra post to have your faster server do the work, otherwise this can be really slow on say, IE6