SEO, Data Science & Correlative Analysis For Google Organic Traffic

SEO data science

This article will touch on how data science can be used in SEO and look at how useful correlative analysis should be used during the content creation process. For those not familiar with these topics, there will be examples and pictures but as should be expected when covering any complicated topic, the scope of the article will be limited to the main purpose.

It always bothers me when I see the recent trend to present SEO using the term ‘scientific search engine optimisation’, I understand that it’s a great way to allay the fears of CEOs and business owners, but it does rankle.

All digital marketing efforts should be based on some hard data, and SEO’s have been doing this since the year dot. Whether that was pushing the limits of keyword densities back in the day, to enumerating backlink profiles to judge competitiveness, these were always data driven approaches.

There should be no place for the ‘publish and pray’ method that most within the industry will have encountered at some stage.

How Can Data Science Help Organic Rankings?

As search engine optimization specialists, we have a huge amount of data at our fingertips. Whether this is pulled from Google dorks.


From proprietary data sources such as Ahrefs, Majestic etc.


Or from a private calculated method using our own metric. I’m thinking back to ‘thirst’ as a way of calculating whether a keyword was worth pursuing which relied on MOZ’s link data, keyword volume and Keyword Planner’s CPC.

The point being that a data driven approach to SEO and digital marketing in general will;

  • Reduce the time to rank
  • Uncover key phrases that you can compete with.
  • Indicate the resources needed to compete.
  • Dictate your content strategy
  • Stop the misallocation of resources

These are only some of the benefits, this list could be much longer, the point being, that if you are not taking a data driven approach to your decisions and allocation of resources, you are going to lose to the little guy, with a smaller budget and more RNR time.

What Is Correlative Analysis & How Can It Help Me Rank?

The last two years have seen a huge compression in the SERPs, where many of the articles for highly competitive keywords all look pretty similar and cover the topic in the same way.

Why is this?

Cue the rise in correlative analysis tools.

We’ve always matched our content to the search intent Google shows in the SERPS and then tailored content creation around what was already ranking. But correlative analysis tools for SEO have taken this arduous process and simplified it.

Correlative analysis tools allow us to scrape the top 100 results for a search term and see which factors correlate to higher rankings. The simplest example of this is word count.

If Google is rewarding articles with a word count of between 1000 and 1200 words, then this is where we need to aim with our article.

Likewise, if Google is rewarding sites that load in under a second, or that have a keyword density of 3.2% or …

To understand how deep into this you can go, take a look at the small part of a CORAseo report below. This is from some months ago and the developer adds new metrics all the time.


The key is not to have everything perfectly lined up, but to hit the important notes. Like everything it’s a case of time/cost vs reward.

Whilst CoraSEO is great for folks who want to get at the data directly, you’ll need a fair amount of experience to interpret it correctly.

There are a number of other tools on the market that hide a lot of the data and only present the user with limited highlighted insights. I’m only going to cover the ones that I’ve used to give a flavour of what’s out there.

Different Tools For Different Tasks

You wouldn’t use a hammer to cut down a tree, nor a saw to mend a fence. Some of these tools are suited to certain workflows and SEO goals such as quickly optimising existing content or giving your content writers the best briefs possible, whilst others will help you build the perfect page structure for your PWA’s landing pages.



Set up by brothers from a little Polish town, SurferSEO has a very intuitive interface that allows you to focus on the main aspects of page optimisation. There are a large number of options still available under the hood, but Surfer does an excellent job of keeping things simple. If you need your writers to access the tool, they won’t be lost and they’ve recently added a content writing module.

The page audit tool is great at highlighting where you can improve and bring your page into line with the competition covering basic things like keyword in the Title, url and meta description, the exact and partial keywords in your h tags, content and codebase and more technical aspects such as Time To First Byte(TTFB) and schema.


It’s perhaps important to note that SurferSEO uses something called True Density to calculate the optimal occurrences of words and phrases on a page rather than the more common TF-IDF method. They’ve done a great job of explaining the difference on their blog.



If the other two tools on this list have opted for a streamlined approach that takes care and hides the data behind carefully designed interfaces, Cora takes the opposite route.

It’s a self hosted windows application that can take 30 mins to run covering almost every aspect of a page that you could want to analyse. At last count it compared over 400 factors and dumped everything out into an excel spreadsheet so that you could examine the data yourself.


To be clear, these are not the types of spreadsheets that you can send off to a writer, the data has to be interpreted but can uncover important insights. It calculates correlation using 2 methods and then ranks the signals for you, giving you actionable information.


Page Optimiser Pro

Kyle Roof’s leaked test showing how he ranked a local site by only using ‘lorem ipsum’ content and keywords sprinkled precisely into the right places confirmed what many of us believed for years.

Google can not parse your content and assign it some sort of quality score. The amount of processing power required to do this just doesn’t make it feasible on any large scale. That’s not to say that they don’t have some algo that they run on highly competitive key phrases, but when you consider the breadth of content published all the time, even Google are not capable of something like this.


Kyle Roof capitalised on his fame and brought POP to market. I got in at the beta and saw some great results using it, but I don’t want to comment on it too much, as I moved to SurferSEO and Kyle’s team have continued to update it’s features including various options for the aggressiveness of optimisation, a content editor and much more.

There are other tools out there, SEO PowerSuite have a module for TF-IDF content optimisation and there a couple of others, but these don’t offer the same breadth of data that the 3 above do.

Depending on your site(s) and your workflow and your goal, one of these tools may be more suited than the others. Below I’ve pulled out 3 use cases, but your milage may vary.

Task 1 # Page Optimisation

The optimisation of an existing page is probably the most common task I’ve used correlation analysis tools for. The main reason for this is simple, you can get much quicker results working with an existing piece of content, than publishing a new article.

If you just need to check keyword densities and TF-IDF then is a great and affordable tool for the task. You add your keyword and target url and it breaks down how you stack up to each of the competition and highlights any common missing phrases in your content.


Perhaps the one caveat with this simple tool is that your article does have to be published and available to be crawled for it to do it’s thing, so it can’t be used earlier in the content creation process.

NB – Whilst the prevailing wisdom is to keep the your keyword density below 2.5%, like most SEO ideas, you’ll quickly find that many of your successful competition are a lot more aggressive.

Page Optimizer Pro and SurferSEO are equally apt for revising and updating content. They give you much better and deeper analysis of things to improve on your page but at that additional cost.

Task 2 # Content Creation & Content Briefs

Producing high quality content briefs for your writers is essential. If you leave the entire process up to the writer, you run the risk of getting back a poorly researched article that might not match the target search intent. It should all be a system, quality inputs produce quality outputs.

Giving your writers a topical structure including key phrases you want mentioned will always lead to better articles, particularly if you have a high churn of writers that need to become familiar with your site’s audience, tone and your structural expectations of the content.

Both Page Optimizer Pro and SurferSEO have features that can help with the process without you getting too bogged down in the weeds.

Both Surfer and POP have a content creation tool baked in to assist your writers in ticking the needed boxes as they write. Alternatively you can review the the lexical recommendations and include what you want in your briefs.

Task 3 # Templating

If you are building a progressive web application with hundreds or thousands of landing pages for different target phrases or geographic locations. You need to have the optimal structure across all these end points.

To give you a better understanding, let me give you the example of who are targeting the following types of keywords across the UK and Ireland.

Rugby Pitches & Clubs in {CITY/TOWN} (and the derivatives which would include rent, pitch etc)

Basketball Court Hire in {CITY/TOWN} (and the derivatives)

{CITY/TOWN} Sports Facility Hire


You get the idea, each of their circa 9,000 pages are hyper targeted to a set of keywords and a geo location.

Some things to consider when building your landing page template would be;

  • How many times your keyword is used in H2 tags
  • The optimal keyword density
  • LSI keywords that should be used in the body of the article
  • The number of partial match phrases need in H tags
  • Word count

The process for building out a template for this type of situation might work like this.

  • Run Cora reports for key locations that you have ascertained to be competitive.
  • Average out the results.
  • Pick out the key factors that correlate with higher rankings.
  • Hand off to your structural requirements to the web designers.

Obviously the greater the number of locations you use to build your data set, the better the results will be across all locations. And the better your page template, the more organic visitors you’ll get with less links and less need to rebuild the pages in the future.

NB. TLDR – Avoid changing your url, page content or codebase for low volume, low competition keywords. It’s always better to do it right the first time. Whilst Google does reward some types of queries for freshness, it’s gotten much better at discerning which types of queries should get a bump when you update or add content and changing your content or codebase will often result in a reduction in rankings for a time until GoogleBot comes back and spends the processing power to reevaluate your page.