Next: 4. What are They
Up: Who is Interested in Repository
Previous: 2. The Stony Brook
3. What are People Looking For?
Out of the 1.5 million hits recorded on this site over the one year
interval,
393,467 of them were to primary html and shtml files.
This latter count more accurately represents the number of mouse-clicks
performed by users than the total hits, since most of the remaining hits
are on image files associated with these pages.
Therefore, we will limit further analysis to hits on these files.
Because user ID information is not logged on our WWW server, it is difficult
to judge exactly how many different people accounted for these hits.
Based on the roster of machines which accessed the site,
I estimate that roughly 40,000 different people paid a visit during this 10 week study.
Some fraction of hits came from webcrawler
robots instead of human users,
however I believe they had only a minor effect
on our statistics.
Observe that the least frequently
clicked shtml file (containing the copyright notice for the site)
was hit only 298 times versus 17,733 hits for the most frequently accessed
page (the front page).
Table 1:
Hits by Major Section Index
Problem Category |
Index Hits |
Subsection Hits |
Problems |
Data Structures |
6698 |
14492 |
6 |
Numerical Problems |
4499 |
12812 |
11 |
Combinatorial Problems |
3419 |
10114 |
10 |
Graph Problems: Polynomial |
4496 |
17492 |
12 |
Graph Problems: Hard |
3849 |
13447 |
11 |
Computational Geometry |
7031 |
25129 |
16 |
Set and String Problems |
3003 |
10776 |
9 |
Totals |
32995 |
104262 |
75 |
|
Table 1 reports the number of hits distributed among our
highest level of classification -
the seven major subfields of algorithms.
Two different hit measures are reported for each subfield, first the number of hits
to the menu of problems within the subfield, and second the total number of hits
to individual problem pages within this subfield.
Computational geometry proved to be the most popular subfield by both measures,
although outweighed by the interest in graph problems split across two subtopics.
Data structures recorded the highest ``per-problem'' interest, but
I was surprised by the relative lack of enthusiasm for set and string algorithms.
Table 2:
Hits by Programming Language Index
Programming Language |
Index Hits |
Implementations |
C language |
4776 |
37 |
C++ |
5397 |
11 |
Fortran |
868 |
6 |
Lisp |
698 |
1 |
Mathematica |
652 |
3 |
Pascal |
1556 |
5 |
Totals |
13947 |
63 |
|
Table 2 reports the number of hits distributed among
the various programming language submenus.
C++ seems to have supplanted C as the most popular programming language among developers,
although there is clearly a lag in the size of the body of software written
in C++.
C remains the source language for over half the implementations
available on the Algorithm Repository.
User interest in Mathematica rivals that of Fortran, perhaps suggesting that computer
algebra systems are becoming the language of choice for scientific computation.
There was no submenu associated with Java, reflecting what was available when I built
the repository.
The total number of implementations in Table 2 is greater than
56 because seven codes are written in more than one language.
Table 3:
Most and least popular algorithmic problems, by repository hits.
Most Popular Problems |
Hits |
Least Popular Problems |
Hits |
shortest-path |
3660 |
generating-subsets |
854 |
kd-trees |
3198 |
edge-coloring |
817 |
dictionaries |
3022 |
satisfiability |
792 |
traveling-salesman |
2963 |
independent-set |
789 |
convex-hull |
2963 |
cryptography |
786 |
nearest-neighbor |
2936 |
text-compression |
767 |
minimum-spanning-tree |
2922 |
maintaining-arrangements |
727 |
voronoi-diagrams |
2815 |
set-packing |
703 |
triangulations |
2786 |
planar-drawing |
678 |
sorting |
2734 |
median |
671 |
graph-data-structures |
2596 |
bandwidth |
629 |
string-matching |
2304 |
factoring-integers |
628 |
suffix-trees |
2213 |
shortest-common-superstring |
520 |
priority-queues |
2208 |
determinants |
520 |
geometric-primitives |
2162 |
feedback-set |
483 |
|
Table 3 reports the 15 most popular and least popular
algorithmic problems, as measured by the number of hits the associated pages
received.
Hit counts for all of the 75 problems appears in Table 6.
Several observations can be drawn from this data:
-
Shortest path (with 3660 hits) was the most popular of the algorithmic
problems over the course of the study.
Much of this interest was no doubt from students in
algorithms courses seeking an edge.
However its course-partner minimum spanning tree proved less popular,
finishing in seventh place with 2922 hits.
-
Of the six data structure problems, only
set union-find (number 31)
failed to make the top 15 cut.
-
There is more interest in kd-trees (3198 hits) than nearest-neighbor searching (2936 hits),
even though the former is most often used as a means to solve the latter.
-
People seem much more interested in generating permutations than subsets (1474 hits to 854 hits),
presumably reflecting the perceived difficulty of the task.
-
Surprisingly popular problems include traveling salesman (number 4), suffix trees (number 13),
the knapsack problem (number 18).
These might reflect educational interest, although Table 5 shows that almost twice as
many total .com hits were recorded than total .edu hits.
-
Surprisingly unpopular problems include independent set (number 64), planar drawing (number 69),
and satisfiability (number 63).
Such obviously commercial problems as cryptography (number 65) and
text compression (number 66)
proved unpopular presumably because better WWW resources exist elsewhere for these problems.
-
My users (people seeking programs)
rated NP-complete problems substantially differently than users of the compendium of optimization problems [#!CK-98!#]
(people seeking references).
Our five most popular hard problems, in order were traveling salesman (2963 hits), knapsack (1926 hits), graph coloring (1847 hits),
Hamiltonian cycle (1681 hits), and bin packing (1496 hits).
Their most popular hard problem was vertex cover, which with 876 hits
ranked 58th of all problems on our list.
It is interesting to note that only 17,733 hits occurred
to the front-page of the site,
which suggests that most
visitors never saw the main index of the site.
This implies that most users initially entered the site through
a keyword-oriented search engine, and gives credence to the notation
that these hits measure problem interest more than just directionless
wandering through the site.
Next: 4. What are They
Up: Who is Interested in Repository
Previous: 2. The Stony Brook
Steve Skiena
1999-10-15