Time for Open Source Intelligence and the 'Deep Web'


Deep WebBy Tyson Johnson, Vice President of Business Development, BrightPlanet

“Deep Web” is a vague description of the Internet not typically accessible by search engines. The Deep Web is often misinterpreted as the “Dark Web” and the two terms get frequently interchanged in media. While browsing the Internet, the Deep Web is usually right in front of you, you may just not notice it yet. Whether you are looking through thousands of unstructured Web pages or trying to answer narrowly targeted questions, the Deep Web and Surface Web co-exist and can help you answer some of your toughest security questions from, “Where is the next protest taking place?” to “Whose selling my companies goods online fraudulently?”. To understand how to leverage Open Source Intelligence (OSINT) from both the Surface Web and the Deep Web, it’s important to understand first where they are and what you can find there.

II. What is the Deep Web?


The Deep Web is a part of the Internet not accessible to link-crawling search engines like Google. The only way a user can access this portion of the Internet is by typing a query into a Web search form, thereby retrieving content within a database that is not linked by standard Web pages. In layman’s terms, the only way to access the Deep Web is by conducting a search within a particular website.

The Surface Web is the portion of the Internet that can be found via link-crawling techniques. Link-crawling means connecting via an HTML hyperlink from one page to another. Google can find this Surface Web data very easily. Surface Web search engines (Google/Bing/Yahoo!) can lead you to websites that have unstructured Deep Web content. Think of searching for Government Court Cases at the Common Wealth Courts Portal. Google can take you to the portal page, but it can’t find the results of your searches within the Courts Portal. By entering a search query into this database, you are completing a Deep Web search and finding Deep Web content. There are millions of disparate sources online today that contain Deep Web information; anything from government databases, travel sites, Web pages requiring logins, and even some social media pages.

Dark Web and Deep Web – Not the Same Thing!

The Dark Web refers to any Web page that has been intentionally concealed to hide in plain sight or reside within a separate, but public layer of the standard Internet. The Internet is built around Web pages that reference other Web pages; if you have a destination Web page which has no inbound links you have concealed that page and it cannot be found by users or search engines. One example of this would be a blog posting that has not been published yet. The blog post may exist on the public Internet, but unless you know the exact URL, it will never be found. Other examples of Dark Web content and techniques include:

  • Search boxes that will reveal a Web page or answer if a special keyword is searched. Try this by searching “distance from Sioux Falls to New York” on Google.
  • Sub-domain names that are never linked to; for example, “internal.brightplanet.com”
  • Relying on special HTTP headers to show a different version of a Web page
  • Images that are published but never actually referenced, for example “/image/logo_back.gif”
  • Virtual private networks that exist within the public Internet, which often require additional software to access.

A specific (and the most famous) example of Dark Web content is the TOR (The Onion Router) Network.

Hidden within the public Web is an entire network of different content which can only be accessed through a special Web browser called the TOR browser. The TOR browser and TOR network give users a completely anonymous browsing experience through the use of dedicated proxy servers worldwide to reroute traffic through different servers. Unlike a traditional Web exchange, which finds the fastest direct route to get data from the request Web page, TOR users are anonymized by routing all data through a random route and encrypting the final destination and source address of the request many layers within (similar to an onion with multiple layers).

While personal freedom and privacy are admirable goals of the TOR network, the ability to traverse the Internet with complete anonymity nurtures a platform ripe for what is considered illegal activity in some countries, including:

  • Controlled substance marketplaces
  • Armories selling all kinds of weapons
  • Unauthorized leaks of sensitive information
  • Money laundering
  • Copyright infringement
  • Credit Card fraud and identity theft

III. Who can use Web data or OSINT?

There are certainly no exclusions when it comes to sectors that can benefit from gathering and analyzing data from the different areas on the Web. In the following two case studies, we’ll analyze two industries where pioneering companies have already realized the potential of Web data at scale.

OSINT and Brand Protection

A Fortune 100 company in a high-margin industry was hemorrhaging potential profits to overseas counterfeiters. These counterfeiters advertised brand name products at a fraction of the retail price on trade boards, fly-by-night websites, message boards, and social media.

The company’s traditional strategy included hiring an external brand protection firm; however this solution wasn’t scalable to the wide scope of the Internet, where not only legitimate profits were being siphoned off by fraudulent websites, but also customers were using fake and illegitimate products that could cause physical harm or even death.

A scalable process was developed to monitor key areas of the Internet for any mention of the company’s brand name products. Websites, message boards, trade boards, and social media were automatically monitored and collected.

Websites were then flagged for counterfeit activity, accumulated and delivered to the Fortune 100 company via customized weekly reports. The reports also contained competitor’s product information, contact information, e-commerce data, and WhoIS data to help create targets of websites for fraudulent goods. Automated monitoring and collection at scale revealed a whole new level of consciousness towards illegal online markets, which could now be targeted more accurately and thoroughly.

OSINT and Law Enforcement

Utilizing OSINT and Web data can be the missing piece to crack a different case, identify new threats, and monitor communication that may be vital in keeping communities safe. Criminals exploit whatever technology is available; therefore it becomes necessary for law enforcement to monitor the same technology.

Pattern and trend analysis derived from OSINT can paint a virtual picture of a criminal’s pattern of life. For example, if an individual uses a social network to advertise illegal drugs at the same time every day, it’s likely that person will continue advertising drugs within the same time-frame until that person is caught.

OSINT can be used to track threats and potential attacks by monitoring online communication against violent terms and conversations among individuals of interest. Between all of the social media outlets, message boards, and forums, monitoring what is being said and who is saying it is extremely difficult.

Embracing OSINT and Deep Web

It is time for security risk management practitioners to embrace and utilize OSINT data, whether it is improving the insights into ongoing Threat Risk Vulnerability Assessments, monitoring real-time for security threats during events, or monitoring for threats to their physical assets. A security professional’s goal is not only to reduce the likelihood and impact of a threat event, but to also show the return on investment to internal stakeholders creating efficiencies and reduce exposures. Leveraging open sources is no longer simply the domain of sales and marketing – it’s time for security leadership to get engaged.