Software‎ > ‎

AlexaList

Description

Generate "top sites" list from Alexa (www.alexa.com). The list is written to a text file, as a service for other tools.

Download

File Version Date Comments
AlexaList.jar 1.0 23-Oct-2012

Running

Usage: AlexaList global/country/category

Example:

AlexaList global
Global top 500 (up to) sites.
Output file '.../results/alexa/alexa.global.2012-02-12.17-15-57'
Downloaded page 1. Found 777 lines. URL: http://www.alexa.com/topsites/global;0
Downloaded page 2. Found 777 lines. URL: http://www.alexa.com/topsites/global;1
...
Downloaded page 20. Found 777 lines. URL: http://www.alexa.com/topsites/global;19

$ head .../results/alexa/alexa.global.2012-02-12.17-15-57
google.com
facebook.com
youtube.com
yahoo.com
baidu.com
wikipedia.org
live.com
qq.com
amazon.com
twitter.com

Technical

A typical Alexa list displays 25 results per page. The page link for the global list is http://www.alexa.com/topsites/global; suffixed with the page number minus 1. For example, the first page is http://www.alexa.com/topsites/global;0, the second ishttp://www.alexa.com/topsites/global;1, etc.

Country pages look like this: http://www.alexa.com/topsites/countries;7/US (page 8 in this example).

Category pages look like this: http://www.alexa.com/topsites/category;1/Top/News/Breaking_News.

Currently each relevant entry in the HTML has the following structure:

<li class="site-listing">
 <div class="count">45</div>
 <div class="desc-container">
 <h2>
 <a href="/siteinfo/craigslist.org">Craigslist.org</h2> 
 <span class="small topsites-label">craigslist.org 
As of 18-Mar-2012 there are two bugs in Alexa that are relevant to this code:
  1. Sometimes a page is missing the last item (25th).
  2. There is no "Next" button after the 20th page (with the 500 item) but it is sometimes possible to get to page 21 with a link.