Next: 4. What are They
Up: Who is Interested in Repository
Previous: 2. The Stony Brook
3. What are People Looking For?
Out of the 1.5 million hits recorded on this site over the one year
interval,
393,467 of them were to primary html and shtml files.
This latter count more accurately represents the number of mouseclicks
performed by users than the total hits, since most of the remaining hits
are on image files associated with these pages.
Therefore, we will limit further analysis to hits on these files.
Because user ID information is not logged on our WWW server, it is difficult
to judge exactly how many different people accounted for these hits.
Based on the roster of machines which accessed the site,
I estimate that roughly 40,000 different people paid a visit during this 10 week study.
Some fraction of hits came from webcrawler
robots instead of human users,
however I believe they had only a minor effect
on our statistics.
Observe that the least frequently
clicked shtml file (containing the copyright notice for the site)
was hit only 298 times versus 17,733 hits for the most frequently accessed
page (the front page).
Table 1:
Hits by Major Section Index
Problem Category 
Index Hits 
Subsection Hits 
Problems 
Data Structures 
6698 
14492 
6 
Numerical Problems 
4499 
12812 
11 
Combinatorial Problems 
3419 
10114 
10 
Graph Problems: Polynomial 
4496 
17492 
12 
Graph Problems: Hard 
3849 
13447 
11 
Computational Geometry 
7031 
25129 
16 
Set and String Problems 
3003 
10776 
9 
Totals 
32995 
104262 
75 

Table 1 reports the number of hits distributed among our
highest level of classification 
the seven major subfields of algorithms.
Two different hit measures are reported for each subfield, first the number of hits
to the menu of problems within the subfield, and second the total number of hits
to individual problem pages within this subfield.
Computational geometry proved to be the most popular subfield by both measures,
although outweighed by the interest in graph problems split across two subtopics.
Data structures recorded the highest ``perproblem'' interest, but
I was surprised by the relative lack of enthusiasm for set and string algorithms.
Table 2:
Hits by Programming Language Index
Programming Language 
Index Hits 
Implementations 
C language 
4776 
37 
C++ 
5397 
11 
Fortran 
868 
6 
Lisp 
698 
1 
Mathematica 
652 
3 
Pascal 
1556 
5 
Totals 
13947 
63 

Table 2 reports the number of hits distributed among
the various programming language submenus.
C++ seems to have supplanted C as the most popular programming language among developers,
although there is clearly a lag in the size of the body of software written
in C++.
C remains the source language for over half the implementations
available on the Algorithm Repository.
User interest in Mathematica rivals that of Fortran, perhaps suggesting that computer
algebra systems are becoming the language of choice for scientific computation.
There was no submenu associated with Java, reflecting what was available when I built
the repository.
The total number of implementations in Table 2 is greater than
56 because seven codes are written in more than one language.
Table 3:
Most and least popular algorithmic problems, by repository hits.
Most Popular Problems 
Hits 
Least Popular Problems 
Hits 
shortestpath 
3660 
generatingsubsets 
854 
kdtrees 
3198 
edgecoloring 
817 
dictionaries 
3022 
satisfiability 
792 
travelingsalesman 
2963 
independentset 
789 
convexhull 
2963 
cryptography 
786 
nearestneighbor 
2936 
textcompression 
767 
minimumspanningtree 
2922 
maintainingarrangements 
727 
voronoidiagrams 
2815 
setpacking 
703 
triangulations 
2786 
planardrawing 
678 
sorting 
2734 
median 
671 
graphdatastructures 
2596 
bandwidth 
629 
stringmatching 
2304 
factoringintegers 
628 
suffixtrees 
2213 
shortestcommonsuperstring 
520 
priorityqueues 
2208 
determinants 
520 
geometricprimitives 
2162 
feedbackset 
483 

Table 3 reports the 15 most popular and least popular
algorithmic problems, as measured by the number of hits the associated pages
received.
Hit counts for all of the 75 problems appears in Table 6.
Several observations can be drawn from this data:

Shortest path (with 3660 hits) was the most popular of the algorithmic
problems over the course of the study.
Much of this interest was no doubt from students in
algorithms courses seeking an edge.
However its coursepartner minimum spanning tree proved less popular,
finishing in seventh place with 2922 hits.

Of the six data structure problems, only
set unionfind (number 31)
failed to make the top 15 cut.

There is more interest in kdtrees (3198 hits) than nearestneighbor searching (2936 hits),
even though the former is most often used as a means to solve the latter.

People seem much more interested in generating permutations than subsets (1474 hits to 854 hits),
presumably reflecting the perceived difficulty of the task.

Surprisingly popular problems include traveling salesman (number 4), suffix trees (number 13),
the knapsack problem (number 18).
These might reflect educational interest, although Table 5 shows that almost twice as
many total .com hits were recorded than total .edu hits.

Surprisingly unpopular problems include independent set (number 64), planar drawing (number 69),
and satisfiability (number 63).
Such obviously commercial problems as cryptography (number 65) and
text compression (number 66)
proved unpopular presumably because better WWW resources exist elsewhere for these problems.

My users (people seeking programs)
rated NPcomplete problems substantially differently than users of the compendium of optimization problems [#!CK98!#]
(people seeking references).
Our five most popular hard problems, in order were traveling salesman (2963 hits), knapsack (1926 hits), graph coloring (1847 hits),
Hamiltonian cycle (1681 hits), and bin packing (1496 hits).
Their most popular hard problem was vertex cover, which with 876 hits
ranked 58th of all problems on our list.
It is interesting to note that only 17,733 hits occurred
to the frontpage of the site,
which suggests that most
visitors never saw the main index of the site.
This implies that most users initially entered the site through
a keywordoriented search engine, and gives credence to the notation
that these hits measure problem interest more than just directionless
wandering through the site.
Next: 4. What are They
Up: Who is Interested in Repository
Previous: 2. The Stony Brook
Steve Skiena
19991015