Is it possible to use jQuery to grab the HTML of another web page into a div? - javascript

I am trying to integrate with the FireShot API to given a URL, grab HTML of another web page into a div then take a screenshot of it.
Some things I will need to do after getting the HTML
grab <link> & <script> from <head>
grab <body> into <div>
But 1st, it seems when I try to do a
$.get("http://google.com", function(data) { ... });
I get a 200 in firebug colored red. I think it has to do with sites not allowing you to grab their page with JS? Then is opening a window the best I can do? But how might I control the other page with jQuery or call fsapi on that page?
UPDATE
I tried to do something like below to do something when the new window is ready, but FireBug says "Permission denied to access property 'document'"
w = window.open($url.val());
setTimeout(function() { // if I dont do this, I always get about:blank, is there a better way around this?
$(w.document).ready(function() {
console.log(w.document.body);
});
}, 1000);

I believe the cross-site security setup within Javascript is basically blocking this. You'd likely have to proxy the content through your own domain.
There are a couple other options I think for break the cross-site security constraints, but I'm not sure I'd promote them.

If the "another page" locates within the same domain of your hosting page, yes, you can. Please refer to jQuery's $().load() API.
Otherwise, you're disallowed to do so by the browser's Cross-Site Security Policy. At this moment, you can choose to use iFrame instead of DIV.
Some jQuery plugins, e.g. thickbox provides ability to load pages to appropriate container automatically.

Unless I am correct, I do not believe you can AJAX a page cross domain (e.g. from domain1.com to domain2.com). To get around this, you can have a PHP "proxy" script that does the "getting" of the page and then pass it to JS.
For example, in JS you would get() http://mydomain.com/get/?domain=http://google.com and then do what you need to do!

Related

Load external page and Replace text

Would it be possible to load an external page inside a container and replace text elements?
We work with ad campaigns and earn a percentage whenever a user signs up.
Can a script replace certain words? For instance “User” to “Usuario” or “Password” to “Contraseña” without affecting the original website or its functions.
Note: These links always pass through a redirection.
Example:
http://a2g-secure.com/?E=/0yTeQmWHoKOlN6zUciCXQwUzfnVGPGN&s1=
Note 2: Using an iframe is out of the question due to “Same-origin policy”.
I'm not sure if this answers your question, but you might find it useful.
(Perhaps you might give a step-by-step example of what you're trying to accomplish?)
If we assume that a browser attempts to retrieve page P from a proxy which first retrieves the content of page P from its actual home and then performs some transformation on its content before returning that page content to the browser, what you're describing is a Reverse HTTP Proxy and is a very well-known page serving technique.
Rather than performing complex transformations at the server (which require specialized knowledge of the page layout), this technique is usually used to inject a single line into the retrieved source that calls a JavaScript file to actually perform the required transformation at the browser.
So in essence:
Browser requests Page P from Proxy 1.
Proxy 1 retrieves the actual Page P from its real home, Server 2.
Proxy 1 adds the line <script src="//proxy1.com/transform.js"></script> to the source of Page P.
Proxy 1 then returns the modified source of Page P to Browser.
Once the Browser has received the page content, the JavaScript file is also retrieved, which can then modify the page contents in any way required.
This technique can be used to solve your "Same origin policy" issue by loading an iframe from a URL that points to the same server as that which provided the parent or owning page of the iframe which acts as proxy, like:
http://example.com/?proxy_target=//server2.com/pageP.html
Thus, the browser only "sees" content from a single server.
You would need to load the external page server-side, and then you can do whatever you want with it. You can do serverside string replacement, or you can do it later in javascript.
But, remember that as soon as you add a whole webpage into for example a div in your own page, the css from your page will affect it.
Plus, you would need to manipulate all the links in the documents, to have absolute urls. If the page depends on ajax, there is pretty much no way to accomplish what you want to do.
If on the other hand the pages you will be loading are static html, it is possible, though there are a lot of things you need to take care of before you can actually present the page to the user, like adjusting links, urls to stylesheets and so on.
It seems you are trying to localize a website on the fly, using your server as a proxy for that content. Does it make sense? If that's the case, depending on the size of your operation, there are several proxy translation services out there (I'll name them if needed).
Basically, they scrape a website, providing a way for you to translate and host the translated content. Of course, this depends on your relationship with the content providers. You should also take this into consideration, since modifying content, even for translation, can be a copyright problem.
All things considered, if you trust the provider's javascript, the solution involves scraping the content, as mentioned in other answers, and serving that modified content. You really need to trust the origin...
update per request
http://www.easyling.com
http://www.smartling.com
http://www.motionpoint.com
http://www.lionbridge.com/solutions/translation-proxy/
http://www.sajan.com/translation-proxy-technology-and-traditional-website-translation-understanding-your-options/
They are all aimed at enterprise-grade projects, but I would say Easyling is the most accessible.
Hope this helps.
Using the .load() callback function, this will replace the text
$(function(){
$("#Content").load("http://example.com?user=Usuario",function() {
$(this).html($(this).html().replace("user", +get param value+));
});
redirection u can use
// similar behavior as an HTTP redirect
window.location.replace("url");
// similar behavior as clicking on a link
window.location.href = "url";
The answer is NO, not without using a server-side proxy. For a really good overview of how to use a proxy, see this YUI page: https://developer.yahoo.com/javascript/howto-proxy.html (Be patient, as it will take time to load, but the illustrations are worth it!)
When I try to do this in jsfiddle to see what data that the 3 parameters contain, then the error below appears:
$(function() {
$(this).load('https://stackoverflow.com/questions/36003367/load-external-page-and-replace-text', function(responseText, textStatus, jqXHR){
debugger;
});
});
ERROR:
XMLHttpRequest cannot load Load external page and Replace text.
No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'https://fiddle.jshell.net' is therefore not allowed access.

Permission issues checking if parent site is my parent domain within iframe

I've read several of the questions on this but am still a little confused.
For example: OK, I can't post examples because of hyperlink limitations
Here is my exact situation.
I have a site at mydomain.com
One of the pages has an iframe to another page at sub.mydomain.com
I am trying to prepare an onload script that if the page is not in an iframe or the parent domain of the page containing the iframe is not mydomain.com then redirect to mydomain.com.
After the initial permission issues I realised the problem with sub domains counting as separate domains.
One of the posts above says that "could each use either foo.mydomain.com or just mydomain.com"
So I tried (for testing):
onload="document.domain='mydomain.com';alert(parent.location.href);"
This produced the error (http replaced with lar
Error: Permission denied for <http://sub.mydomain.net> (document.domain=<http://mydomain.net>) to get property Location.href from <http://mydomain.net> (document.domain has not been set).
Source File: http://sub.mydomain.net/?pageID=1&framed=1
Line: 1
Removing the alert produces no errors.
Maybe I am going about this the wrong way since I do not need to interact with the parent just read its domain if there is one.
A nice simple top.domain. For read only there must be a way so that people can prevent their own pages being used within other people's sites.
You can't (easily) do this because of security restrictions.
This answer from #2771397 might point you in the right direction.
OK, while looking at the error console I still had open when I got home a wee lightbulb lit up. I am pretty new to javascript (can you tell ;) but I thought "If it has try/catch"...
well here is a hack at least to get the name of the top domain and an example of how I will use it in my site to show content only if the page is a frame in the correct domain.
Firstly the header will have the following partially PHP generated function:
function getParentDomain()
{
try
{
var wibble=top.location.href;
}
catch(err)
{
if (err.message.indexOf('http://mydomain.com')!=-1)
{
createCookie('IAmAWomble','value')
}
}
}
Basically the value will be something based on the PHP session I think. This will be executed at page load.
If the page is not within the proper site or if javascript is not enabled then the cookie will not be created.
PHP will then attempt to read the correct value from the cookie and show the content or an error message as appropriate.
I do see a slight flaw in this for first visit since page load will run after PHP has generated the content but I'm sure I can work around this somehow. I thought I'd post because this is at least what I was initially asking for and that is a way to read the URL of a parent site if it is in a different domain to the site in the frame.
IIUC you want to use the window.parent attribute: “A reference to the parent of the current window or subframe.”
Assumably, window.parent.document.location.host contains the container page URL domain name.

JQuery Cross-Domain .load() (self-constructing widget)

I'm creating a widget for people to use on their websites, however to keep the widget future-proof, i want it to be self constructing using an external javascript.
So the widget code i would ask people to put on their websites would be something like:
<div id="my_widget"></div>
<script type="text/javascript" src="http://www.external-domain.com/mywidget.js"></script>
mywidget.js would then use jquery's .load() to populate the #my_widget div with an iframe.
Problem is this doesn't work....
what do i need to do?
(note i dont want to have the iframe in the code i give to my customers)
It depends on what url you are specifying in the load function. If this url is not hosted on the same domain that executes the page containing this script won't work due to same origin policy restriction. One possible workaround to make cross domain ajax calls is to use JSON-P if you have control over the server side script which is used in the load function.
Here's the idea behind JSON-P:
You provide a server side script hosted on Domain A which will return JSONP formatted response. Domain A is your domain for which you have full control.
This server side script could be called from Domain B using AJAX.
Let's suppose that http://domainA.com/myscript?jsoncallback=foo returns the following response:
foo({ result: '<strong>Hello World</strong>' });
Now inside mywidget.js you could call this script:
$.getJSON('http://domainA.com/myscript?jsoncallback=?', function(data) {
$('#my_widget').html(data.result);
});
All that is left is to tell the users include mywidget.js script and provide a placeholder with id="my_widget" to host the results (you could even generate this placeholder in the success callback).
Remark: When using JSONP you are limited to GET requests only. This means that there's a limit in the size of the request you can send.
You have total control of their page since you're executing your code on their site.
You can create iframes document.createElement("iframe"), inject it anywhere on the page document.getElementById("my_widget").appendChild(iframe) and do whatever else you feel like.
One thing to be careful with this is to not clutter their namespaces... avoid any usual namespace and make up your own (__my_widget or whatever else is weird). And try to keep the namespaces counts as low as 1, or even none if possible.
Don't use load, use an iframe if you're just trying to load html from your site.

Is there a way to mitigate downloading of resources (images/css and js files) with Javascript?

I have a html page on my localhost - get_description.html.
The snippet below is part of the code:
<input type="text" id="url"/>
<button id="get_description_button">Get description</button>
<iframe id="description_container" src="#"/>
When the button is clicked the src of the iframe is set to the url entered in the textbox. The pages fetched this way are very big with lots of linked files. What I am interested in the page is a block of text contained in a <div id="description"> element.
Is there a way to mitigate downloading of resources linked in the page that loads into the iframe?
I don't want to use curl because the data is only available to logged in users and the steps to take with curl to get the content is too complicated. The iframe is simple as I use this on a box which sends the right cookies to identify the request as coming from a logged in user, but the problem is that it is very wasteful to get nearly 1 MB of data to keep 1 KB of it and throw out the rest.
Edit
If the proposed method just works in Firefox it is fine, so I added Firefox tag. Also, it is possible that the answer actually is from the realm of Firefox add-on techniques, so I added that tag as well.
The problem is not that I cannot get at what I'm looking for, rather, the problem is the easy iframe method is wasteful.
I know that Firefox does allow loading only the text of a page. If you open a page and press Ctrl+U you are taken to 'view page source' window, There links behave as normal and are clickable, if you click on a link in source view, the source of the new page is loaded into the view source window, without the linked resources being downloaded, exactly what I'm trying to get. But I don't know how to access this behaviour.
Another example is the Adblock add-on. It somehow kills elements before they get loaded. With plain Javascript this is not possible. Because it only is triggered too late to intervene in good time.
The Same Origin Policy forbids any web page to access contents of any other web page in a different domain so basically you cannot do that.
However it seems that with some browsers it is allowed to access web pages content if you are trying to access it from a local web page which seems to be your case.
Safari, IE 6/7/8 are browser that allow a local web page to do so via XMLHttpRequest (source: Google Browser Security Handbook) so you may want to choose to use one of those browsers to do what you need (note that future versions of those browsers may not allow to do so anymore).
A part from this solution I only see two possibities:
If the web pages you need to fetch content from are somehow controlled by you, you can create a simpler interface to let other web pages to get the content you need (for example allowing JSONP requests).
If the web pages you need to fetch content from are not controlled by you the only solution I see is to fetch content server side logging in from the server directly (I know that you don't want to do so, but I don't see any other possibility if the previous I mentioned are not practicable)
Hope it helps.
Actually I've seen Cross Domain jQuery .load request before, here: http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/
The author claims that codes like these found on that page
$('#container').load('http://google.com'); // SERIOUSLY!
$.ajax({
url: 'http://news.bbc.co.uk',
type: 'GET',
success: function(res) {
var headline = $(res.responseText).find('a.tsh').text();
alert(headline);
}
});
// Works with $.get too!
would work. (The BBC code might not work because of the recent redesign, but you get the idea)
Apparently it is using YQL wrapped into a jQuery plugin to do the trick. Now I cannot say I fully understand what he is doing there but it appears to work, and fits the bill. Once you load the data I suppose it is a simple matter of filtering out the data that you need.
If you prefer something that works at the browser level, may I suggest Mozilla's Jetpack framework for lightweight extensions. I've not yet read the documentations in its entirety but it should contain the APIs needed for this to work.
There are various ways to go about this in AJAX, I'm going to show the jQuery way for brevity as one option, though you could do this in vanilla JavaScript as well.
Instead of an <iframe> you can just use a container, let's say a <div> like this:
<div id="description_container"></div>
Then to load it:
$(function() {
$("#get_description_button").click(function() {
$("#description_container").load($("input").val() + " #description");
});
});
This uses the .load() method which takes a string in this format: .load("url selector"), then takes that element in the page and places it's content inside the container you're loading, in this case #description_container.
This is just the jQuery route, mainly to illustrate that yes, you can do what you want, but you don't have to do it exactly like this, just showing the concept is getting what you want from an AJAX request, rather than in an <iframe>.
Your description sounds like you are fetching pages from the same domain (you said that you need to be logged in and have session credentials) so have you tried to use async request via XMLHttpRequest? It might complain if the html on a page is particularly messed up but you chould still be able to get raw text via .responseText and extract what you need with a regex.

Cross domain iframe content load detection

I have a rather interesting problem. I have a parent page that will create a modal jquery dialog with an iframe contained within the dialog. The iframe will be populated with content from a 3rd party domain. My issue is that I need to create some dialog level javascript that can detect if the content of the iframe loaded successfully and if it hasn't within a 5 second time frame, then to close the dialog and return the user to the parent page.
I have researched numerous solutions and only two are of any true value.
Get the remote site to include a javascript line of document.domain = 'our-domain.com'.
Use a URL Fragment hack, but again I would need the request that the remote site
able to modify the URL by appending '#some_value' to the end of the URL and my dialog window would have to poll the URL until it either sees it or times out.
Are these honestly the only options I have to work with? Is there not a simpler way to just detect this?
I have been researching if there's a way to poll for http response errors, but this still remains confined to the same restrictions.
Any help would be immensely appreciated.
Thanks
The easiest way (if you can get code added to the external sites) is to have them add an invisible iframe pointing to a special html file on your domain. This could then use parent.parent.foo() to notify the original window about the load event.
Listening for the "load" event will only tell you if the window loaded, not what was loaded or if the document is ready for interaction.
Nicholas Zakas has an article about detecting if an iframe loaded: http://www.nczonline.net/blog/2009/09/15/iframes-onload-and-documentdomain/. Basically you have this code snippet:
var iframe = document.createElement("iframe");
iframe.src = "simpleinner.htm";
if (iframe.attachEvent){
iframe.attachEvent("onload", function(){
alert("Local iframe is now loaded.");
});
} else {
iframe.onload = function(){
alert("Local iframe is now loaded.");
};
}
document.body.appendChild(iframe);
I haven't tested it, but I'm pretty sure jQuery should be able to handle it by doing something like $("#iframe").load(function () { alert("Local iframe is now loaded."); });
You could try using postMessage for communication between frames.
This will require the remote site to include some specific JavaScript to post a message to the parent document when it has finished loading.
It's possible to do this with an onload handler on the iframe itself. Unfortunately (surprise!) IE makes it difficult. The only way I could get this to work was to compose HTML for the iframe, then append it to the document with innerHTML. Then I have to poll to see when the iframe appears in the DOM, which varies depending on if the page is loading. Here's a link to the source: http://svn.openlaszlo.org/openlaszlo/trunk/lps/includes/source/iframemanager.js
See create(), __finishCreate() and gotload(). Feel free to take a copy of this and use it yourself!
Regards,
Max Carlson
OpenLaszlo.org
This is how I detected the loading of a Cross-Domain Iframe,
Set a unique id for the iframe ( U may use any sort of identifier, it doesn't matter )
<iframe id="crossDomainIframe" src=""> </iframe>
Set window event listener:
document.getElementById("crossDomainIframe").addEventListener('load',
function actionToPerform(){
//Do your onLoad actions here
}
)
In any case you will need some sort of cooperation from the other domain's server, as you are trying to abuse the Same Origin Policy (SOP)
The first solution document.domain=... won't work if domains are different. It works only for subdomains and ports, as described in the link above.
The only option that allows cross domain communication without polling is JSONP or script injection with a JS function callback. This method is available in all Google APIs and works well.
We've explained on our blog a way to sandbox those calls in an iframe to secure them. While postMessage is better now, the window.name hack has the advantage of working on old browsers.
Ironically, SOP does not prevent you to POST anything to another domain. But you won't be able to read the response.

Categories