22 | Crawling and Indexing with Charlie Williams

Alina Ghost

Charlie and I met at a Brighton SEO speakers party a couple of years ago and so we thought we’d catch up about SEO tech, specifically crawling and indexing to help you with the fundamentals.

Twitter: @pagesauce

Email: Charlie@chopped.io

Site: Chopped.io

Charlie works as an SEO consultant at Chopped but used to work for Screaming Frog and SEOptimiser and has been in the industry for nearly 10 whole years.  He has worked on large and small sites and particularly specialises in technical SEO and content, so on-site optimisation.

Crawling Basics

The Screaming Frog SEO spider tool lets you crawl a website and helps you replicate what a crawler would do. It’s a free tool that offers information about your meta data, canonical tags and much much more. It’s free for 500 scripts or less, including images and JavaScript (so not just pages.) Other tools that offer free demos are Sitebulb, DeepCrawl and Rit. The latter offers free whole projects as a trial.

3 Steps to Starting an SEO Tech Audit

1) Crawl the site to find all the pages that you expect to find. It’s simple to do but also one of the easiest to forget about. There are lots of data points but you need to know what is happening to your site. If you find unexpected things then understand what’s happening and investigate.

2) Go through the basics, such as title tags and meta descriptions. Are they pointing to right URL? Etc. These are also the quick wins, such as finding 404 pages to redirect or kill via a 410. Check redirects and make sure they’re doing what they’re doing if going through a migration to check everything is working as you expect it to.

3) Work on your site structure and information architecture. Is it tiering the layers correctly. Important pages, are there any missing? What pages are the most important? Migration is a good excuse to review your architecture. The structure of a website needs to make sense so do user testing; have they taken the right journey and if not, what is a better journey and what page to show them in a better way? Search engines see metrics that show your site helps customers, converts and having people stay on the page and engage. Search traffic is great but it’s only useful if it makes you money so UX is very important.

Example: We started a lot of user testing ourselves in Amara, how people react to the header of the page and find the correct pages. We’re working out how content is performing for the business and how to make sure that inspirational content is shown at the right place at the right time.

Google’s Feedback Loop

Google gets the best feedback loop you can imagine! Every time they get a search result and how people react, they know what you type and what else you might be looking for. Google has learnt from their feedback loop when you’re trying to ‘buy’ a dishwasher rather than read about it.

We have to do the same thing with user testing and judge how customers are reacting to our navigation, pages, content and layouts. SEOs now are required to do a better job of serving that.

A recent Tweet made me laugh where someone wanted to know how the next generation interacts with Google. So he asked his daughter and she said that she clicks on the bottom links on the first page of Google because she feels sorry for them!

@sfmorris tweeted:

google feedback loop test
We need to funnel customers through to their intent, offering them the right answers so that the categorisation is correct in the header. Thus, offering crawlers the understanding too the hierarchy of your pages.

As an SEO you need to tweak the organisation so that it’s sensible and to remove silos (orphan pages.) Group items and themes together. It can be content with a certain subject matter or ecommerce subcategories. The basic process of getting pages crawled and index, the fundamentals stay the same. Have a smart structure and content that’s easy to discover and indexed.

10:00. It doesn’t matter whether you have a small or a big site. Obviously bigger sites need clever internal architecture for them to be discovered but the rules and regulations are the same. JavaScript rendering client side or server side but the rules change when there are more technologies involved. Technological process of crawling and indexing is a technical process that’s followed through so your job as SEO is to help sites do this and putting signposts along the way. Google wants to partner the reader or customer to the piece of content or product. Making Google’s life easier therefore helps you rank.

AMP pages for example is something the industry argues whether or not we should do this. Crawling is a huge expense for Google so processing power is massive and if we do things that’s easier for them means we’re saving them money. But, you want the search process to work well so you want crawling to be efficient and clear. Not having 404s, good site speed and using the best code.

AMP & mobile first indexing.

18:00. Google has taken a conscious effort to provide the same visibility to sites with good site speed and coding to show up alongside AMP, without penalising them for not having it, which was noted by many SEOers. AMP was mostly for larger companies with legacy sites to help them show in the search results but Google is encouraging people to not just create AMP but to improve site speed on a responsive site (rather than having mobile sites.) Nick Wilsdon from Vodafone is really successful with AMP product pages.

19:30. Mobile first indexing has changed indexing and crawling. Mobile has overtaken desktop in terms of traffic and understand the difference of how it’s seen depending on what it’s rendered on is important. A fully responsive website may have the same internal link architecture but a separate mobile site is where is gets tricky. Do a crawl of a mobile site, mobile user agent and compare to the desktop one. Mobile rendered version of the pages will show key differences that you may not want to see. A fully responsive site should have little difference.

Russel Moz wrote about the change in link graph across the internet. Comparing large sides where internal linking has changed on the mobile version, which then lost two thirds of external links due to content being cut off. These are some of the problems you might be facing. If you lose links then some pages are no longer found and so they are seen as orphan pages and suffer in performance as a result. We’re showing less content on mobile as it’s a different experience and you as a site owner should cater for this but at the same time know the implications it has on your crawlability.

23:48. Charlie gives an example of a medium sized ecommerce store. MPU side box with a set of links pointing to sub categories. On mobile it wasn’t visible but in the code it was there. Rendering this showed there was no column so Google sees that they’re hidden (not cloaking but mobile-friendly layout) which meant there are no internal links – the website shrunk by 75%. If Google was to obey the rendering then all the sub categories would be seen as less important. You could be losing out.

Common issues in blogs and sites

25:50. I always find sites with a pagination issue on their site, where there is no code to say the page number information.  You can use main SEO plugins like Yoast to do this with a click of a button. Rel=next and rel=prev but sometimes we see sites add more coding that’s not necessary.

One of the main things that happens a lot with WordPress or similar, is that people have dummy pages for author, page types etc and putting sites live still sees those pages if they’re live and linked to without realising it. Empty and duplicate content.

29:10. Migrations are highly common recently (http to https etc) Charlie says that getting to a client who wants to do a migration it’s most likely that they’ve already been through various iterations already. Patrick Strocks from Search Engine Land wrote about using The Wayback Machine API to pull the history of a website. Find out how many pages have existed on those sites and what’s happening to them now.

Charlie worked on a migration recently where nothing was migrated and found that the top 30 pages had 80% that were 404ing. Link equity was being lost and wasted so he used 301 redirects to put it back in place. He saw a 97% organic increase because of this change. It’s worth taking the time to go back through to investigate.

Subdomain to Subfolder International Site Migration

33:30. Be vigilant regarding 301 redirects, check your log files. I went through this with Amara in the summer of 2018 and we saw an increase of 11% growth on top of our YOY growth in the year. Putting everything under one domain is worth it. Indexing and crawling plays a big part to keeping things tidy. Make things as easy as possible.

36:00. To summarise, technical SEO can be seen as too techy or as something that you look at and never do again but actually you need to do stuff all the time. Crawling and indexing is fundamental, if it’s not working then you’re wasting an opportunity. Content needs to be found and supporting great content with schema markup and links is essential too. Technical can be seen as overwhelming or geeky but if we can do it then everyone can. Just learn the principles and SEOs are really friendly. Put it into practice and you’ll learn fast and do cool things. You can guide how bots can crawl your site and what they crawl and how they regard it using Robot.txt file and tagging.

38:30. Looking at Amara log files I noticed that after the international site migration we had hundreds of 301 redirects on our robots.txt file. After working it back and investigating, a theory was that when bots were visiting our subdomains they were directed to the robots.txt file but since the migration they were being redirected to a subfolder robots.txt file BUT since the migration we’ve collated it all into one robots.txt. I therefore asked our web dev team to cut down the redirects so that bots are directly going to the new one rather than having to go through a chain. This did improve our bot traffic pathway.

Robots.txt is the first file that bots check to see what they’re able to do. Technical offers fresh challenges all the time.

Tip: start your own blog or website so that you can tinker around with the code.

Have patience and put some time to understand it. It’s doable!

Music credit: I Dunno (Grapes of Wrath Mix) by spinningmerkaba (c) copyright 2017 Licensed under a Creative Commons Attribution (3.0) license. http://dig.ccmixter.org/files/jlbrock44/56346 Ft: Jlang, 4nsic, grapes.

Weekly SEO Newsletter