Read previous: Using the Internet as an Investigative Tool
Lance Hawk speaks here on specific data acquisition and authentication software that helps document the findings, and outlines the role of Google services in facilitating investigative research.
Okay, what are the general processes here? Well, there’s probably a couple of tools you should have. First off, you’ve probably heard the terms ‘screen scraper’, ‘screen grabber’, something like that. The one I like, pretty slick, and it’s free for me – is FastStone Capture. With this tool, you can actually take it right out to a Word document, Power Point; you can take video files out. You can basically use a hotkey combination, it’s called up, and when you see something you like – press a key and it will record this, press a key and it will record that.
There are several others out there: Snagit, Hyperionics, CamStudio. I’ve just used FastStone Capture, it’s been out there quite a while.
Now, if you ever capture something and it looks like it’s going to litigation, the second rule of forensics, after acquisition, is what’s called ‘authentication’. How can you prove if I capture the graphic from this gentleman’s company, and he has a website up? How do I show that wasn’t modified by the time I captured it to the time it’s produced as evidence in court? And you use something called authentication for that. There is a nice slick little program that does all the levels of authentication called HashCalc (see screenshot).
So if I were to actually run, say, that graphic that I scraped off this website, I’d run it through HashCalc. It gives you something like an algorithm, which is just a mathematical series, so you note that down. And when you talk about it when you go to court, you can run something against it and say: “See, I can prove that, I captured it on this date, and from that date forward it wasn’t altered”. The same thing done with computer forensics, so you have to use the same type of process.
I’ll now talk about one of the things I do, especially for oversees. Well, you’re gonna capture whatever it is, but somehow you need some kind of an indexing system. The indexing system we use is actually dtSearch. dtSearch is the indexing system used by FTK1, if anybody is forensics examiner here. Now, for 2000 bucks, what I’ve done is I deployed dtSearch around the world in different locations. And if we make an acquisition oversees, then what we’ll do is we will feed it through this tool called dtSearch, so it’ll index all the emails, it’ll index all the documents, it’ll index the PowerPoints – all of that stuff. An alternative to dtSearch, what a lot of people like is something called Copernic, and it actually does have a free release for home use only.
One other tool I debated on putting up here has to do with capturing websites content. And I use the tool called BlackWidow (see image). It does a great job of going behind the scenes. I use this a lot in case we have an investigation somewhere in the Middle East – we are selling our product and they actually have our graphic files. So I use BlackWidow to rip their website and pull out all the JPEGs and all the documents. Therefore I am able to do a compare between what they have, what we have, and if it’s an exact match in every way, shape or form. So you might consider a tool like that. There are some limitations depending on the coding and everything else, but I think you need some of these tools to at least start.
Okay, now let’s actually get into another important thing. This changed recently, so this is actually relatively new. There used to be a ‘Preferences’ button for Google, but you now see this ‘little wheel’ (see image). And in the wheel, the number one mistake I think people make when they do searching, especially with Google or Bing, or something like that – is keeping the default value. That’s an important point, that’s generally set to use moderate filtering. Now, that’s good, and it’s bad in a way. It’s good because it screens out a lot of explicit crap. And it’s also bad because, once again, it screens out a lot of explicit crap – that’s a technical term, IT term, I still maintain.
You got be careful here because – I don’t know your place of business, whether they have some kind of rule set – this isn’t supposed to be, you know, a crap protector where that’s supposed to be On or Off. Most businesses say they want some filtering done because they don’t want somebody to do some search on the word ‘Titanic’ one time. You know, do a search on ‘Titanic’ and see what might come up with ‘Moderate’ filtering versus ‘Do Not Filter’.
At home, yes, I always recommend possibly even almost up to strict filtering. But a lot of times, probably once every 2-3 months, I get a call where an auditor has done something wrong. And they say: “Could you please look, I can believe we didn’t find anything?” – I say: “Did you change your filter settings?”. They say: “Oh, let me know where filter settings are”. A lot of people don’t know that, so one good point.
And Google has got 10 Billion+ web pages, approximately 18% of what’s out there, which is impressive. So you wanna set your preferences.
Now, cache is king. Cache has saved my butt in many investigations. People find out you are investigating them, looking into something. What are they gonna do? They’re gonna change their website, if it’s a website, if it’s a website investigation that you’re doing. Well, this little line here – ‘Cached’; think of it as a backup, almost an online backup. You go to ‘Cached’, and the site shows up a lot faster, and it’ll be basically the latest backup of whatever that website is. ‘Cached’ is great, especially if there is congestion, everything else – you wanna go to ‘Cached’.
Another good thing, you don’t use it too much and a lot people don’t realize its importance, is ‘Similar’. Anybody know what ‘Similar’ means? Okay, it’s good to find out maybe your competition. For example, ‘Air Products’ makes the product called ‘Surfynol 104’, and we had people who were stealing that. I could actually go and do a search on ‘Surfynol 104’, look ‘Similar’, and it will give you, like, similar, your competitors’ opposing products. If I would click that there, it would give ‘Air Products’ competitors – you know, people who sell something similar. It’s just good to know about.
‘Google Alerts’ – hopefully people are using them. If not, they are great to use. You basically set them up for a variety if types, whether it’s news, it actually could be a separate blog, it could be a separate website, video group, whatever. And the one thing I would recommend though, you don’t want it ‘as-it-happens’ because sometimes you get hammered with stuff coming at you. I mean once a day, once a week probably should be fine. And then ‘Email Length’ could be, you know, 20 results – up to 50 results. I have it set for 50 but, you know, it can be like ‘Air products and Chemicals’ – if that comes up in the news, I wanna know about it several times a day.
Among one of the best blog searches is the ‘Google Blog Search’. What’s nice about that is that it’s not restricted to the Blogger blogs. Anything that publishes what’s known as a site feed, Google will basically capture. And once again, you have that ‘More’ coming off Google to get to some of this stuff.
If you do any investigation, you are going to see almost everything I talk about today: case in point, current investigation where we are working on somebody who has done something bad overseas, you know, dealing with the email, dealing with misappropriation of company assets, all of that stuff. Well, Internet history is very important: whether they’re in, what they are doing. So I’ll go through the whole thing from Google to the meta search engines, to the blogs, to the tweet searches – and you’ll see actually the process here.
The con with ‘Google Blog Search’ is that most progressive places publish their site feed, but if they don’t – Google won’t cover it.
The ‘Advanced Blog Search’ (see screenshot) gives you a lot more capability. You can customize your search, put exclusions in, as well as dates. Dates become very important when it comes to investigation. So the capability to restrict the date is really fantastic – unfortunately, you don’t see that too much in other blog searching.
Use the plus sign (+) to search for a common word. Like ‘Air Products’ and then the word ‘and’, you know, you would use the plus sign. Use minus sign (–) to exclude.
If you wanna search for Lance Hawk, and Lance Hawk is all you want – put the quotes around it (” “), and people just don’t do it. Also, it actually does have a wildcard capability, which is just the Period (.).Now, Google advanced operators – I highly recommend this (see image). If you do a search on that, there is a gentleman by the name Johnny Long I don’t know if people here know of, but he is a great guy. He is the father probably of Google hacking. And there is a lot you can do with this. I use Google advanced operators a lot, especially when I am searching websites and I am looking for just, say, Excel files or just documents. Just by knowing how to use a few operators, you can really fine tune a search. This is very high level, and this could be a whole session just in itself.
Okay, ‘site:’ operator is to be used if you wanna restrict the search to a specific site. A lot of times I am interested in somebody who possibly stole a chemical from us, and I think it’s, say, ‘ACME’ company, I can restrict it to just ‘ACME’ company with just the ‘site:’ operator.
‘filetype:’ operator – this is probably what I use more then anything, searching for PDFs, or if I am just searching for JPEGs, or just searching for TIFF files, or something like that, so use the ‘filetype:’ operator.
‘link:’ operator is used if you wanna search within the hyperlinks for a specific term. ‘cache:’ – I use it once in a while. ‘intitle:’ – again, I use that more when I have the title of a document that I think might have made its way out, so I put the document name with ‘intitle:’. And ‘inurl:’ – just as it says.
Once again, Johnny Long actually has free white paper you can download, and he has a couple of books out.
1 – FTK (Forensic Toolkit) is computer forensics software that scans a hard drive looking for various information. It can for example locate deleted emails and scan a disk for text strings to use them as a password dictionary to crack encryption.