Read Time: 3 Minutes, 40 Seconds
Search Engine Optimization (SEO) is often overlooked by advertisers for the ‘quick wins’ that paid media can offer. While a PPC campaign does have its advantages, it’ll only get you so far, and disregarding the benefits reaped by SEO could come at a major detriment to advertisers. With approximately 75% of searchers ignoring paid advertisements on the SERP (choosing to focus solely on the organic results), identifying how Google crawls your site and optimizing your website to complement Google’s algorithms has never been so important.
Here is an example of the impact that SEO can have on your site, and how NMPi by Incubeta helped online book merchant Alibris obtain actionable recommendations to help improve their website’s performance.
Server Log Reporting for Alibris.
Alibris is an online store that sells new books, used books, out-of-print books, rare books, and other media through an online network of independent booksellers. Due to the nature of Alibris’ separate mobile and desktop configuration, it was not possible to emulate how Google crawls its website. Having access to this data is fundamental to the SEO process as it would allow us to identify any indexing issues Alibris was having, and implement solutions to fix them which would boost their product ranking on the SERP. Even Google’s own analytical platforms such as Google Search Console couldn’t provide Alibris with a clear, single source of truth when analyzing their website’s crawlability. NMPi by Incubeta was tasked with determining how often the website (both mobile and desktop) was being accessed by various web crawlers and agents (including Google’s own web crawlers).
Without the availability of this data through Alibris’s analytics platforms, NMPi by Incubeta had to access the logs kept by their cloud security company, CloudFlare. These logs generate over 100 million rows of data every single week, and the report we created needed to be updated in near real-time, which involved storing and processing large quantities of data. We used Google’s BigQuery database to hold the gigabytes of data for processing and Google’s DataStudio service to handle the reporting. We then partnered with Incubeta’s technology team to partner up to produce a server log report.
The problem we faced was that CloudFlare purged its data fairly regularly. We had a good source of fresh data within the logs, but no historical data. This meant that the report was unable to show any historical trends. To combat this we worked with the client to gain access to their web server logs. The data stored in these logs weren’t purged often, so we were able to find enough information here to satisfy our historical data requirements. However, the web server logs didn’t store data in a “clean” manner, so when we tried to import a sample directly to BigQuery, the import job failed. We had to spend time writing rules that identified what the errors were so that we could fix them in the huge raw files. Once the logs were imported, we then had to familiarize ourselves with the CloudFlare API, and write a script that checked CloudFlare every 5 minutes and imported the latest log events.
Returning to the server logs, we then wrote code that converted the server log format into the CloudFlare format. We also now downloaded all of the server logs (going back to July 2019), cleaned them, and converted them to match CloudFlare’s data. Alibris now had a real-time database of server events, with historical data going back to July 2019. With the data in place, we connected it to DataStudio and worked with the SEO team to create a final report.
Knowing that high volumes of raw log data would be difficult to interpret, we processed the data so that it could be presented in a digestible manner through dynamic charts and tables. This allowed us to better analyze the data which would inform our SEO strategy and way forward.
The server log report gave us access to unbiased, real-world data, to compare against site crawls, Google Search Console, and data from other 3rd party tools – allowing us to identify a high number of crawlability roadblocks and errors originating from Googlebot requests such as:
- Server-side redirect chains and loops (72% of Googlebot requests, June 2020)
- Broken resources & inaccessible pages (1 % of Googlebot requests, June 2020)
- Working pages (only 27% of Googlebot requests made in June resolved with a 200 status)
Through our combined efforts, we were able to provide Alibris with a clear, actionable strategy to tackle their website’s crawl inefficacies and boost indexation numbers which in turn improved their website’s performance across all channels.