Terence.Tech | Coding projects of Terence Chisholm
You can crawl sites and capture information from all the pages of your site or multiple sites. All in one go.
Unlike other site crawlers, that use libraries like PhantomJS and other things you have to be an expert to understand... this crawler is proper ole school Javascript, with a few bits of jQuery too.
That means, you can view and download the source, and tweak it... with basic JavaScript knowledge. Awesome.
How to use it:
- Set up Safari on your Mac with the Developer Menu options checked for "Disable Site-specific Hacks" and "Disable Cross-Origin Restrictions". Here's how
- Then, go to terence.tech/crawler
- Type in your home page root URL (make sure you are NOT logged into a CMS for the site you want to crawl), then
- Press CRAWL
Easy.
After it's done, you can copy and paste a spreadsheet with SEO Data and other page data into Excel or Numbers (for Mac). You can also do fancy stuff like:
- Find the "needle in the haystack" by typing in some code or text you want to check each page for and it will mark up your crawl data spreadsheet with pages that have that needle in the haystack.
- Extract code from each page, either the entire page source code, or by Class-Name or DIV-ID.
- Crawl as many sites as you like at the same time. With options to set domains to include in crawl, pages to omit from the crawl, wildcard settings for crawling anything you can think of
- Save your crawl configuration as a shareable link, and bookmark it or email it to co-workers.
- And the usual crawler stuff like: Find broken links, SEO Content Audits, URL Redirects, etc.
- See a bit of the process google probably goes through when they scan your pages. (They use JavaScript too).
You can set advanced configurations on your crawl, by going to
http://terence.tech/crawler/#config then click the (X) to close the controller layer, and type in the configurations settings you desire, then click the link to save those as a shareable URL. Then press CRAWL and start your crawl.