How to detect user visited external website

How to detect user visited external website - javascript

External website A offers a form to be filled out only once. When a user has filled it out, the form will be hidden when he calls the website A again due to cookies.
Now I want to detect whether a user has been on website A. Basically, I think, I need to request website A "in the name of" this user and parse the response.
I tried using embedding, iframe, cross domain requesting, cross domain with proxy server. Either the browser restrictions block me or I can request the website, but with another session!
How can this be done?

Without the co-operation of the other website: it cannot. Browsers are designed to make that sort of invasion of privacy impossible.
If the other site is willing to expose that information, you could use Ajax via JSONP or CORS, or you could redirect to user to a URL on the other site which, in turn, redirects back to your site with a query string that indicates if the form has been filled out.

Related

Is there a standard way of Identifying 'Domain not Owned' sites when using http/https?

I am using a node webshot library to take a image of an web site say at http://x.y.z.com/blah . If the website exists I get a nice image. If the website does not exist I may or may not get an error. If I get an error case I can use a default image. However, I am finding out that some domains are being redirected to the infamous Domain selling sites or a "search for" Domain site. For example, http://notawebsite.com.org is redirected to http://www.com.org/?notfound=notawebsite.com.org. I have also checked dns to see if I can invalidate the site ahead of time but it resolves fine ( to the www.com.org address ). So is there anything I can do to determine if a url site is redirected to one of theses Domain search/selling sites?

Is there a standard way of Identifying 'Domain not Owned' sites when using http/https?
No, not really. In the example you cite, the server for http://notawebsite.com.org returns a 301 redirect. It seems to me that you just decide that if you're getting a redirect to a different domain (and not just a redirect to a different page on the same domain and not just a redirect from http to https on the same domain), then the URL you were attempting to access is apparently not active on its own.
There is no standard way to know whether the site you are redirect to is just a domain seller vs. an actual active domain. You could manually investigate a bunch of sites you get redirects on and teach your code how to identify some common domain sellers doing this, but that would be a somewhat unending task that probably need regular human intervention to tell the difference between a real site and a domain selling site. You could, in the end, built up a blacklist of domain seller's domains and refuse to catalog any URL that redirects to any domain on your blacklist. But, it would probably take some manual intervention to build and maintain the blacklist.
You also have no way of knowing for sure that all URLs on a given domain where you're getting a redirect do a similar redirect, but you can certainly say that the URL you tried to get the snapshot from is not directly active on its own. If the user goes to that domain in their browser, they won't see any content for that domain in their browser because the redirect will change the URL.
So is there anything I can do to determine if a url site is redirected to one of theses Domain search/selling sites?
Build your own blacklist of reseller domains that show up in redirects like this. Then whenever you attempt to request a page URL for purposes of grabbing a webshot and you get a 3xx status code back from the request, you check the redirect domain to see if it is on your blacklist.

automatic login to a website

I have got a 3rd party website, which my customer wants to me to login into in order to download some data periodicaly.
The data is customer specific, and password protected.
I have the username/password, and I have searched for ways to do the login automatically so that I can pull data, but so far with no success.
This is a method that I have tried:
http://crunchify.com/automatic-html-login-using-post-method-autologin-a-website-on-double-click/
When I look into the login page of the website which I am trying to login to (view source), I don't see the login form, but if I click on "inspect element" in chrome on the fields of the page it does show that there is a login form hiding in there.
Any suggestions
Edit:
Here is the website which I need to autologin to: http://portal.dorad.co.il/#/Login unfortunatlly it's not in english. The first field is the username, the second field is the password and the button is the login
Edit2:
Taking pomeh's advice, I was able to find the jQuery code that is being triggerted when the text boxes are being modified. Now I want to run this script manually using element.DomContainer.Eval
(function(n,t){function vi(n){var t=n.length,r=i.type(n);return i.isWindow(n)?!1:1===n.nodeType&&t?!0:"array"===r||"function"!==r&&(0===t||"number"==typeof t&&t>0&&t-1 in n)}function ne(n){var t=li[n]={};return i.each(n.match(s)||[],function(n,i){t[i]=!0}),t}function uu(n,r,u,f){if(i.acceptData(n)){var s,h,c=i.expando,a="string"==typeof r,l=n.nodeType,o=l?i.cache:n,e=l?n[c]:n[c]&&c;if(e&&o[e]&&(f||o[e].data)||!a||u!==t)return e||(l?n[c]=e=tt.pop()||i.guid++:e=c),o[e]||(o[e]={},l||(o[e].toJSON=i.noop)),("object"==typeof r||"function"==typeof r)&&
...
(t=n(this);r=r.not(t),t.removeData(f),r.length||clearTimeout(c)},add:function(t){function s(t,u,e){var s=n(this),o=n.data(this,f);o.w=u!==i?u:s.width(),o.h=e!==i?e:s.height(),r.apply(this,arguments)}if(!u[o]&&this[e])return!1;var r;if(n.isFunction(t))return r=t,s;r=t.handler,t.handler=s}}}(jQuery,this)
I am not sure how to activate it and give it the relevant data.

If you have the right mix of technical requirements then you want Single-Site-Sign-On (SSSO).
Not all of my clients have SSL and I don't want my user name and password on all of their sites. They are however all on the same server. Since my site supports SSL I can log in to my own site securely.
What you need to do conceptually speaking is log the IP of the administrator account along with the data/time stamp. Then if you visit your client's website (again, on the same server) from that same IP you can have your scripting language check the file. I require a short time-span (anywhere between 30 seconds to two minutes tops) and the same IP address. You can add additional technical requirements to strengthen security of course though your options will be limited as the domain name will be different. If the IP matches the criteria emulate the user being authenticated (static obviously since you likely won't/shouldn't have your administrative account information on their site) and you can be automatically signed in.

Maybe you could do this using a web scraping framework like:
Goutte for PHP (https://github.com/fabpot/goutte)
Scrapy for Python (http://scrapy.org/)
node.io for Node.js (https://github.com/chriso/node.io)
request for Node.js (https://github.com/mikeal/request)
WatiN for .Net (http://watin.org/)
In any case, I think a client side solutions will bring a lot of problems to do this. Maybe you can login into it using a form tag which points to the page, but you won't be able to manipulate the page afterwards. Also, you may not be able to use AJAX due to CORS restriction. You could embed the target page as an iframe but you can't either manipulate the page because of differents domains used (you can do that under certains conditions but it's hard to achieve this imho). So a server side solutions sounds better to me.

Google API: Authorized JavaScript Origins

I'm implementing a Google+ Sign-In for our web service, and stumbled on "Authorized JavaScript Origins". Our clients have web addresses either as a sub-domain of our main domain, or as a custom domain name. Since the login page is under that sub-domain (or custom domain), and in order to make the Google+ Sing-In button work, that custom domain/sub-domain should be (manually) entered in the "Authorized JavaScript Origins" list (with both http and https).
Does anybody know a way to do that automatically (through some API maybe)?
If not, then how do you do it?

Not sure if there is an API for this. At first glance I don't see one. The alternative (aside from manually adding domains all the time) is to use a hidden iframe on each site - this iframe would come from your domain and would be the only thing that calls google services. The main sites would communicate with the iframe (postMessage) to tell it what to send google. This of course, opens up a security risk (anybody could load your iframe into their page and do bad things on your behalf) so you'll want to make sure that the iframe code refuses to do anything unless it's running within a page on a known-good domain.

You can also have a common URL which all subdomains point to when trying to log in with Google. Then have this URL redirect to your actual Google login path. Beats having to deal with an iframe this way.

Finally I made it to work, however there may be some fixes to apply.
So a server is host for many domain and subdomains (childs) which all of them needs google sign-in and there is a main domain (parent).
I implemented a general login page on parent which childs open this page via window.open() as popup. As client is in a popup, it is very likely that auth2 cannot open another popup, so the parent will do the google auth with {ux_mode: 'redirect'} parameter as gapi.auth2.SignInOptions.
Process will continue to your callback page which you provided as another gapi.auth2.SignInOptions parameter which is redirect_uri and is on parent.
On this page google may have provided you the golden id_token which you must authenticate this token on your server. And this was the main twist which you should use this information to create a token on your server which parent asked server to create, but send it to child on client side (for example via query parameter) to use it for later usage.
I will happily take any advice for security leaks or any comment which may ease the process just a little.

How to make a user scan the html of another website

Is there a way for me in javascript to enable a user to parse a html page as they would see it.
So imagine a button on my website and if they click on it, I get a javascript string which contains the entire html page of e.g. bbc.co.uk, as that user sees it.
Is that possible?

Arbitrary 3rd party websites? No. If you could do that you could read people's bank statements from their online banking, the email from web mail services and so on. This security measure is called the same origin policy.
You can read data from co-operating websites via CORS (for HTTP requests) and postMessage (for frames).

Facebook Connect help

According to the Facebook API documentation, most of the work is handled through javascript.
That means that all the processing is done, and then the front end checks if the user is connected to Facebook/authorized. right?
My question is:
Suppose a user goes to my site for the first time ever.
He clicks on "facebook connect". The javascript verifies him as authentic, and it "redirects" to another page on my server. From then on, how do I know that the user is actually authenticated to my website, since everything is done on frontend?
I think this is correct, but aren't there some security issues..:
-After user clicks Login, Facebook redirects to a page on my site. AND they also create a cookie with a specific "Facebook ID" that is retrieved only from this user. My backened will "read" the cookie and grab that ID...and then associate it to my userID.
If that is correct...then it doesn't make sense. What if people steal other people's "facebook ID" and then forge the cookie? And then my backend sees the cookie and thinks it's the real user...?
Am I confused? If I am confused, please help me re-organize and tell me how it's like.

Facebook Connect uses a clever (or insane, depending on your point of view) hack to achieve cross-site communication between your site and Facebook's authentication system from within the browser.
The way it works is as follows:
Your site includes a very simple static HTML file, known as the cross-domain communications channel. This file is called xd_receiver.htm in the FB docs, but it can be named anything you like.
Your site's login page includes a reference to the Javascript library hosted on Facebook's server.
When a user logs in via the "Connect" button, it calls a function in Facebook's JS API which pops up a login dialog. This login box has an invisible iframe in which the cross-domain communications file is loaded.
The user fills out the form and submits it, posting the form to Facebook.
Facebook checks the login. If it's successful, it communicates this to your site. Here's where that cross-domain stuff comes in:
Because of cross-domain security policies, Facebook's login window can not inspect the DOM tree for documents hosted on your server. But the login window can update the src element of any iframe within it, and this is used to communicate with the cross-domain communications file hosted on your page.
When the cross-domain communications file receives a communication indicating that the login was successful, it uses Javascript to set some cookies containing the user's ID and session. Since this file lives on your server, those cookies have your domain and your backend can receive them.
Any further communication in Facebook's direction can be accomplished by inserting another nested iframe in the other iframe -- this second-level iframe lives on Facebook's server instead of yours.
The cookies are secure (in theory) because the data is signed with the secret key that Facebook generated for you when you signed up for the developer program. The JS library uses your public key (the "API key") to validate the cookies.
Theoretically, Facebook's Javascript library handles this all automatically once you've set everything up. In practice, I've found it doesn't always work exactly smoothly.
For a more detailed explanation of the mechanics of cross-domain communication using iframes, see this article from MSDN.

Please someone correct me if I'm wrong - as I am also trying to figure all this stuff out myself. My understanding with the security of the cookies is that there is also a cookie which is a special signature cookie. This cookie is created by combining the data of the other cookies, adding your application secret that only you and FB know, and the result MD5-Hashed. You can then test this hash server-side, which could not easily be duplicated by a hacker, to make sure the data can be trusted as coming from FB.
A more charming explaination can be found here - scroll about halfway down the page.

Same issues here, and I think Scott is closer to the solution.
Also Im using "http://developers.facebook.com/docs/?u=facebook.jslib-alpha.FB.init" there open source js framework. So things are a little different.
For me, via the opensource js framework, facebook provides and sets a session on my site with a signature. So what I am thinking is to recreate that signature on my side. - if they both match then the user is who he says he is.
So basically if a user wanted to save something to my database, grab the session signature set up by facebook and recreate that signature with php and validate it against the one facebook gave me?
if($_SESSION['facebookSignature'] == reGeneratedSignature){
// save to database
}else{
// go away I don't trust you
}
But how do you regenerate that signature? preferably without making more calls to Facebook?

We Keep Coding

JavaScript is the programming language of the Web.