Recently a client of mine sent me an alarming email he received – it was a mention.com mention that showed that a website he owns is listed under an website vulnerability list. The website is actually mentioned a sub-domain address for a marketing software that my client uses, thus, the potential exploit is not on his server – but on the marketing software itself. Also, this particular potential-exploit isn’t of a real concern to us (nor to the marketing software development team, apparently 🙂 ).
I quickly realized that if the reported exploit is global to all the users of the software, I can use the list of exploitable websites to find out all the clients of this marketing software. This is important as everyone who uses that particular software is a potential prospect for my client.
But how do I take a list of thousand of potentially exploitable domains and check who uses this specific marketing software?
Luckily, this is rather easy in theory – when trying to surf directly to a sub-domain that is redirected to the marketing software servers, the user is redirected to the main website, coupled with a unique URL parameter.
So let’s say that the URL for the marketing software is marketing.clientdomain.com, surfing to this address will redirect the user to clientdomain.com/?ms=1.
I knew then, that in order to filter the domains that uses the marketing software, I’ll have to check each domain for the URL it redirects to, and than filter this list in a spreadsheet software. However, manually checking each link in a browser is definitely not an option when you got a list with hundreds or more of potential domains.
So how did I get to filter the list?
Obviously I googled. It is important to know what to google. We are looking on a way to check domain redirection. If you do SEO, you already know the answer, but for the rest of you out there, the answer has to do with the HTTP header response. What I needed is a tool that will check redirection per URL so I googled for ‘check redirect url’, which resulted for many SEO tools, usually online websites that offer you the ability to check one particular URL for redirection. This, of course, isn’t sufficient when you have many URLs to check:
My next google check contained the word ‘Excel’ – obviously, if I have a list of domains, and I need to check who fit a criteria, doing the whole process of checking redirects would be most convenient from within Excel. And indeed, Excel can be used for this purpose, rather easily, with an Excel add on called “SEOTools for Excel“. I recommend downloading and install their free version, it can be useful for many purposes including this one (but on small scale, I’ll explain further in this post).
Using the SEOTools for Excel, I manage to do a small scale test of multiple domains – and indeed, some of them included the URL parameter indicating they (almost certainly) use the marketing software I was looking for. The problem however, that again, I faced a scaling problem.
Excel is a great tool, but it’s actually one of the worse tools for this type of work – excel is very slow in executing URL retrieval and is built inherently for our purposes: Excel UI will stuck while it execute its functions. while it’s not an issue for a list with 100 URLS, when you have over 10,000 of them, Excel will be hell to use. The fact that excel will refresh the cell content (thus requesting the information again and again) under many circumstances add insult to injury. So…
Excel is not the solution!
I knew at this point that I must find a way to automate this process with a dedicated tool. The online ones, which I mentioned earlier, were mostly focused on scanning one domain. Even those that support multiple domains are limited (to bulks of 100 per scan), and there is always the risk of your information being stolen.
If a tool doesn’t exists, build one – or get someone to build it for you.
I started by googling the same strings, but this time with the world ‘Python’. Python is a very common, simple to use and understand programming language. I knew that even though I’m not a programmer, the chances I’ll find a working example and would be able to understand how the code works is very likely. I was not wrong.
I found the following code somewhere over stackoverflow.com:
for url in [“http://url1”, “http://url2”, ]:
connection = urllib2.urlopen(url)
except urllib2.HTTPError, e:
This simple code checks the URLS included in it, for their header response and print that on screen. This is nice, but a bit too simple for our purposes.
I had the domain list in an external file, the list was very big, it also contained some broken URLS. The code that you see here can handle only specific type of exceptions (errors in execution). But URLs retrieval can fail for many reasons. Also, printing the results inside a command line window isn’t practical either.
In the end, with the help of a programmer friend, we’ve modified the code to do the following:
- Check all the URLs in a file called source.csv
- Write all the successful responses to a text file called output.txt
- Ignore all exceptions – we don’t really care why a URL can’t be retrieve in this use-case
The resulting code is attached, together with an example source CSV:
The end result?
I’ve created a huge list of companies that uses a particular marketing software – all of them are potential leads for my client.
The list cost nothing to create and is based on a publicly available information that was harvest by someone else 🙂
There are, however, paid service that will do the scanning and technology identification for you, they are not cheap, but well worth the time if you build your potential lead lists based on technologies that companies uses – a known and recommended one is Built-with.