The K Experience

How many websites on the WWW?
Present - Here and Now
Thursday, 10 April 2008

Netcraft Site Count

Poking around the internet today, I started to wonder, "Exactly how many websites are out there on the internet?" At that moment I started to hope that someone had been keeping count. To my joy, after a bit more poking and searching, I managed to find out that someone had, Netcraft, and they had the number at around 160million, which then this got me thinking about lots of other stuff, like how it's all managed...

I generally try to keep The K Experience techy free as there are lots of other sites out there that can fill us in on geek knowledge way better than I can. But on this occasion, I just couldn't ignore my engineering roots. It was when I was looking over website traffic details for certain sites, including this one, that I thought of the above question, (suprisingly I hadn't thought of it before). The traffic rankings for some of the sites I was looking at were in the range of thousands to hundreds of thousands, to millions, but out of how many?

My search led me to the Netcraft, a UK based company started in 1995, which specialises in security services (anti-fraud and anti-phishing tools), and internet research data and analysis. As of March 2008, they identified 162,662,052 websites, most of these hosted by Apache (82m), Microsoft (58m), and Google(9m) (Further breakdown of figures can be found here). However only about 40% of those sites are active, meaning available to the public (not private). Obviously, that's not the exact number, as Netcraft does not have complete power to count all the sites everywhere, but it's near enough.

The graph at the top shows the number of sites at about 0 in 1995, probably because that was when the company was created, and therefore no data was created by them before this point. In fact, internet data only started being recorded and collected in this way from around that time, '95, '96. But if we look at the massive development and growth that took place before this, we'll realise that the internet and the World Wide Web had come a long way from their creation to the time of recorded sites. People usually make the mistake of refering to the World Wide Web as the internet and vise versa. However, the two are not the same, and have very different histories. The internet is a collection of computers which can communicate with each other, and the world wide web is the collection of web pages connected by hyperlinks. So the internet in the infrastructure which the WWW uses to work.

ARPANET Deployment team
Interface Message Processor (IMP) and the ARPANET Deployment Team

Just to go over a bit of history, the internet started off as "the ARPANET (Advanced Research Projects Agency Network) developed by DARPA of the United States Department of Defense way back in 1969. The first connection was between the research institute at UCLA (my grad school), and Stanford Research Institute (SRI). This was back in the days, when computers were the sizes of refrigerators. So at this moment I doubt anyone was thinking that the same technology could be used to connect individuals together around the world like what we see and use now.

Tim Berners-Lee
Tim Berners-Lee Inventor of the Internet
Inventor of the internet, and founder/director of W3C

It wasn't until 1989, 20 years later, that Tim Berners-Lee (pictured above) a fellow Brit invented the WWW, starting from the idea of using hypertext to share information between researchers. The most important part in the planning process (in my opinion anyway), was Berners-Lee's concious decision to make the WWW freely available to everyone, with no patent and no royalties due. There could have easily been a price put over it, which would have made someone rediculously wealthy, but would have restricted it's growth and development. However, by making that decision he allowed the internet to be accessible to all giving it room to develop a culture and identity of its own. One that has changed the world in ways no one could have ever imagined. I think it was due to that decision that the World Wide Web is what it is today, with currently over 160million website which span the globe.

Unfortunately, among those 160million, there is a rediculous amount of garbage that we have to wade through. But mechanisms have been put in place to help do this.

One of the earliest webpages taken from 1996.
Yahoo 1996

To begin with, companies such as Yahoo! took it upon themselves to sort through all the sites. There weren't that many back in 1995 when it started, but there were enough for the general surfer to need the service. That is still happening, with Google now taking the lead in that business. Unfortunately, with the system as it is, if a certain site is not in Google's top 10 or so for its keyword searches, then the site might as well not exist. Despite the supposed power of Google's magic algorithm, it's not perfect. However we are now at a stage where the responsibility has moved to the individual surfers to decide what is read worthy and what is not, with with the use of web 2.0 and sites such as Stumbleupon, Digg, Reddit, Fark and lots of others allowing surfers to have a say.

But will this help to sift through all the junk on the net. 160million is a lot, with ever increasing numbers of sites added daily. Can the collective consciousness of all the surfers decide what is really relevant? Even if it could, Google as well as other search engines would still very much be the gatekeeper between what the voting sites get and what they don't

But, just as the last 20 years since the advent of the WWW have been filled with changes in internet culture, so will the next 20. Currently Berners-Lee is working on an idea to solve this very problem, called the semantic web, which some are coining as web 3.0. The Semantic Web, will use the semantics (meaning: the study of meaning in communication) of the internet to help it choose what is more relevant to the users search. On this Berners-Lee said,

"I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize." Tim Berners-Lee, 1999

The term "Intelligent agent" that Berners-Lee uses above may not actually be in terms of artificial intelligence, but is more related to a software agent acting on behalf of the user to do mundane tasks. However, one would not be at fault to think that it could actually lead to AI. There's been quite a bit of talk of the internet becoming concious of itself, using the seemingly limitless processing power available. So that's where it may need to be headed to handle the number of sites it will have to deal with in the future, for when it gets to 160billion, which at it's current rate of growth, is not actually that far off. Intelligent internet? What will that hold for us?

Souces: Picture of ARPANET machine taken from ed-thelen.org, picture of ARPANET deployment team from Luxorion, screenshot of Yahoo! 1996 website taken from the Way Back When Machine, Wikipedia used for information on ARPANET, DARPA, and TIm Berners-Lee, other sources Netcraft.

 


Related Articles

 

 

Comments
Add New
+/-+/- Comment Form
Write comment
Name:
Email:
 
Website:
Title:
UBBCode:
[b] [i] [u] [url] [quote] [code] [img] 
 
 
Please input the anti-spam code that you can read in the image.

3.25 Copyright (C) 2007 Alain Georgette / Copyright (C) 2006 Frantisek Hliva. All rights reserved."


Digg!Reddit!Del.icio.us!Facebook!
 
< Prev   Next >
Home arrow Present arrow Here and Now arrow How many websites on the WWW?