I am a noob with angular, and making javascript crawlable. I've been searching for it, but I don't really get it so far.
I am working on a AngularJs thingie which is using client-side JSON.
There is a navigation with pages, but... each link is using a function getPage(n) to slice a chunk of JSON and Angular renders it.
Is it OK to put a href="#!page=n" to each link? When I add that hash #! to the url and press enter, and a function renders the right items, is that enough to make it crawlable?
I've read something about snapshots, but it requires Java? I have a webhost which is not really flexible, it does NOT TomCat or NodeJs.
I think it's much better practice these days to use HTML5 history.pushState, and thus provide a unique URL for every page.
More information here.
Check this older stackoverflow question - Making angular crawlable - Beginning of Project
A friend of mine uses - https://prerender.io/
Both these solutions are essentially caching versions of your rendered views, so the crawler can index your site.
Related
I'm currently working on a project that scrapes grocery store pages for data given a search query (i.e., cereal) and display that in a Spinner view. However, I'm having some difficulty finding a way to scrape the data off the pages. I tried using Jsoup as that was the concensus online, but that doesn't support JavaScript.
The issue lies that most, if not all, sites like these use DOM storage for up-to-date stock listings and prices. That's why libraries like Jsoup won't work as they will return the HTML for no JavaScript. I currently have a prototype that displays the page via a WebView but I see no way of getting the data.
I've tried to research how to get around this but it's quite confusing to be quite honest to find an elegent solution, if that even exists.
If anyone can help, or at the very least point me in the right direction, that would be most appreciated! Thanks ^_^
Selenium would be a good option for web scraping. https://www.selenium.dev/ It basically has access to the website's DOM. In past experience, a dynamically generated web page can be difficult to scrape. RegExp will be your friend. https://regexone.com/
Me and my friend have created a website which we want to use as en experiment for school purpose.
https://www.daniellindgren.se/
But we are encountering some problem when we want Google bots to crawl the subpages, like CV and contact.
When we use Google webmaster tool to how the indexing from Google goes, it says that they can't crawl anything else then the startpage.
We have built a sitemap and we have also declared that in the robots.txt.
But we read somewhere that Mithril can cause problem for Google bots because their links to subpages starts with an "?".
Is there any workaround for that we can use or what other solution is there? Should we maybe try to re-make it a single-page application instead?
I don't see any "?" in the links on your site, and in general Google should be able to index SPA:s nowadays.
But it's not always working, so an option could be to use Mithril to render the templates server-side as well. Depending on your backend it may take a little bit of work. If you're using Node.js it's easy with mithril-node-render, if not I recommend Haxe and mithril-hx for cross-platform support.
Then you need to change the routing strategy so a request from outside the application hits the server as well. Unless you think about it from the beginning, you probably need to rewrite quite a bit of the backend to make it more isomorphic.
But your site doesn't have much client-side functionality however, so as it is right now, I'd treat the site as a non-SPA, and use Mithril when you want some dynamic, ajax-driven functionality.
I am helping to design a HTML client for a game that I am collaborating with others on.
This client is going to need to have multiple scenes/pages like the login, the lobby, the actual game page, etc.
Normally, I would be just fine with navigating in and out of pages. However, the client needs to have a constant connection with the server via a Websocket, so therefore it cannot navigate away from the page.
Option 1: Put everything in one file
Rather than having each scene in its own separate page, just cram all the HTML together.
Then, when one scene is needed, simply hide all the other scenes.
I do not think that this way is the way to go: from what I know about HTML and web practices, this is not a very smart practice.
Option 2: jQuery's .load()
Using jQuery's handy-dandy .load method, an external HTML file can be easily loaded into the current HTML file.
To me, this seems that this might be the best way. However, I am not very familiar with how this method acts, so I do not know if this will cause bumps in the road ahead.
Option 3: ???
This is where I need help. Unless one of the two above options is the best way, what is the best option for my situation?
Notes
I am not looking for speed here; it's okay if there is some delay between loads.
Not sure about your backend but it seems like you are looking for a single page app solution.
I recommend AngularJS, currently maintained by Google.
Others have mentioned Angular, but I'll just throw out the recommendation of KnockoutJS as well. Knockout has a less steep learning curve than AngularJS, and though it doesn't have as many features as Angular does, the latest version does have better browser support.
Both Knockout and Angular have excellent documentation and tutorials available via their websites.
I would recommend Angular 2. Since you seem like a beginner in front-end (correct me if I am wrong), starting with Angular 2 would be great over 1 because then you can learn this really good JS framework only once. It will help you keep everything organized and will prevent you from making your markup messy.
Are there any best practices for implementing a long-lived JavaScript app, i.e. a web app that consists of a single page and loads other pages into the content area via AJAX? (Gmail is a good example of this.)
I already read about pro and cons, SEO, performance, etc. (http://stackoverflow.com/questions/1499129/one-page-only-javascript-applications), I'm interested in patterns how to implement this.
I'd like to avoid large frameworks (e.g. Cappuccino, Echo2, SproutCore, Claypool).
How would I manage dynamically loading content while maintaining the #link portion of the URL (for bookmarking)?
Don't get me wrong, I have an idea how to implement this myself, but this problem must have been solved before.
Are there articles on this? Maybe a tiny JavaScript library?
Thanks!
Mark
I found JQuery Address http://www.asual.com/jquery/address/ extremely easy to set up. $.address.change() let's you know whenever something was clicked (works with back and forth as well) and you just parse self.location.hash and build your app from there. It seems lighweight enough as well, if you can handle using JQuery.
Here is an article to help you with the History bookmarks problem: http://codinginparadise.org/weblog/2005/09/ajax-dhtmlhistory-and-historystorage.html. It's quite old, but the solution still works.
I made several apps using this "long lived" apps, and one thing you should take into account is IE's tendency to leak memory.
I would also recommend you to use a JS library, like JQuery to help you with the AJAX and DHTML.
Heard about javascript pushstate?
http://badassjs.com/post/840846392/location-hash-is-dead-long-live-html5-pushstate
It's meant to replace location.hash
Up to know, for DB driven web sites, I've used php (and CodeIgniter) to populate the data within the page prior to rendering, what I'm thinking about doing now is to develop a javascript (via jquery) page, make it as interactive as possible and then connect to the db through ajax/json calls - so NO data populated to the screen prior to rendering.
WHY? sort of an idea that I can, some day, hook the same web page to different data sources - a true separation of page from data - linking only via ajax.
I think the biggest issue could be performance...are there other things to watch out for? What's the best approach to handling security (stateless/sessionless)?
The biggest question is accessibility. What about those people using screenreaders, for which Javascript doesn't work? What about those on mobile phones (non-smartphones), again with very limited or no Javascript functionality? What about those people who have simply disabled JS? Event these days, you simply can't assume that everyone can use JS.
I like the original idea, but perhaps this would be better done via a simple server-side wrapper, which calls out to your data source but which can be quickly and easily changed to point at a different one.
Definitely something I've considered doing but you'd probably want to develop some kind of framework (or see if someone already has) if you're going to do this. Brute forcing this kind of thing will lead to a lot of redundant code and unnecessary hair loss. Perhaps a jQuery plugin? I'd be very interested to see what you came up with.