Term Project

Performance Evaluation (CSE 605), Fall 2003

Five Week Project (25% of your final grade)

Final report due: Dec 15, 2003 (hard deadline)

10 min project presentation by each group Dec 9 and 11.

General Guidelines

The project should be done in groups of 1-3 students. Each member of the group will get the same grade. With more people in a group more substantial work will be expected (naturally). It highly recommended that you work in a group rather than alone. The project requires a fair amount of understanding and reading. This is better done in a group setting. It is recommended that the group members divide the work carefully, including the reading part; so that you can accomplish a lot with relatively little effort.

Your grade in the project will be determined by the (i) difficulty of your task (40%), (ii) your accomplishments (40%) and (iii) your final report (20%). The difficulty will be judged by the amount of reading you have to do and the complexity of the problem that you will solve. The latter includes the amount of tools/programs that need to be develop, the analysis you need to do etc. Your accomplishment will be judged by the soundness, completeness and clarity of your work.

The final report should be a self contained document underlying the objectives of the study with the necessary technical background, the methodology you followed (with justifications, if appropriate), description of your experiments/analysis, results obtained, important observations or conclusions, limitations of the study (if any) and references. The report should be written in the style of a research paper. There is no stipulated length or formatting requirement, but with tight formatting (i.e., absolutely no unnecessary white space) and 11 point font, I would expect that it would be at least 5 pages. Quality of writing and organization of the report will count more than the length. You should some amount of time exclusively for report writing alone.

To make sure that you are on the right track and are working on something of appropriate complexity, I strongly recommend (but do not require) that you meet with me at least once after you have formed a good idea about your project. To be meaningful, this meeting should happen early within the second or third week of your project. The purpose of this meeting is to make sure that you are not attempting something too easy or too difficult. A hidden (but critical) agenda is also to make sure that you are actually doing something!

Significant performance in the project will make you eligible for extra credits. In case you are in the borderline between two grades, the extra credits will be used to bump you up to the higher grade. If you are not in the borderline, extra credits will not do you any good. All group members will be eligible for the same extra credits.

In the following there are two project ideas. You are not limited to these ideas alone. If you can come up with similar ideas within your interest domain. However, the requirement is that it should involve an interesting and/or challenging modeling or performance evaluation component (similar to the projects here). In case you are proposing your own project, you will need to meet with me briefly to explain what you propose to do, so that I know that you are doing something appropriate. You will need my approval before you get started.

Project Idea 1:

Characterization of WWW traffic and related studies

Description

With the popularity of the web, there has been a significant amount of research in the last 5 years characterizing and understanding the nature of the WWW traffic. The hope is that understanding the nature of traffic will lead to a better protocol (http) or system (server, client, caches) design. Many new technologies such as hierarchical/cooperating web caches or persistent http are results of such studies. Your goal is to carry out some specific studies on your own using actual web traffic traces and report your experiences. To give you an idea about what type of studies are useful, I have provided links to some important papers in this area. They are fairly easy papers to follow. Choose a specific set of experiments/analyses described in one or more papers and try to reproduce them using public domain web traces available from the Internet Traffic Archive. Good idea to choose a trace different from those the authors used. It is very useful to comment on whether the behavior the authors have observed is very specific to the trace the authors have used. It is not unusual to see different behaviors as the nature of the web traffic has changed over time.

You do not need to limited to these references alone. They are just representative. You are welcome to do your own literature survey. A good starting point for further research is of course the references cited in these papers and later/similar papers by the same authors. However, the references/pointers I have provided should be enough for you to gather a broad idea of the field and then focus on some specific issues for your project.

Reference Papers

[If link is missing, download from the appropriate digital library: IEEE or ACM]

Martin Arlitt and Carey Williamsom, "Web server workload characterization: The search for invariants," Proc. ACM SIGMETRICS Conference, 1996. An extended version is here.
Hari Balakrishnan et. al., "Analyzing stability in wide area network performance," Proc. ACM SIGMETRICS Conference, 1997.
A. Mahanti, C. Williamson and D. Eager, "Traffic analysis of a web proxy caching hierarchy," IEEE Network, May/June 2000. [We do not seem to have access to any proxy trace. But this paper is useful nevertheless.]
M. Arlitt and T. Jin, "Workload characterization study of the 1998 world cup web site," IEEE Network, May/June 2000.
M. Arlitt's web page has some of his publications in this area. Particularly, the ones on cache replacement policies can be useful.
Shudong Jin and Azer Bestavros. Sources and Characteristics of Web Temporal Locality. In Proceedings of Mascots'2000: The IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, San Fransisco, CA, August 2000. [Azer Bestavros's web page for similar other papers.]
Mark Crovella's web site has some very good papers. Look for papers on client access patterns and generating representative workload. The latter paper develops a tool called Surge, that can generate representative workloads for web servers, useful for people who do not have access to traces or cannot deal with them for some reason or want to parameterize the workload. The tool is available via his web site.

Traces

Some web traces are publicly available from the Internet traffic archive. Two different types of traces available -- on the server and client side. Depending on your study, you will choose to work with only one or both types of traces. You will need to figure out the trace format and develop/gather tools necessary to read the trace, and make sure that the trace you chose to work with does contain the relevant details that you need for your study. For example, if you wish to study inter-arrival times of requests to a http server, choosing a server-side trace that does not contain the timestamps of the incoming http requests does not do you any good. Because of time limitations, good idea to stick to just one trace, and possibly study different things with the same trace. On the other hand, studying certain behavior across a wide variety of traces will also be quite interesting. Because of possible disk space limitations, it is a good idea to select a portion of a large trace for your study. If several groups are using the same trace(s), I can coordinate to make the trace available on-line in a shared disk area, so that you do not face storage problems, and also have access to large traces that will make your results statistically more meaningful.

Approach

You should start out by (i) reading the papers, (ii) defining an appropriate scope of your project (e.g., select one or two well-defined studies from the papers that you understood well), (iii) selecting/studying an appropriate trace that you can use.

Project Idea 2:

Simulation Modeling and Evaluation of Multichannel MAC protocol for Wireless Networks

Description:

This project involves investigating the benefit of using multiple sub-channels (by splitting a single available channel into N subchannels) and using a CSMA (carrier-sense multiple access) based protocol on each subchannel independently. The protocol is targeted for wireless multi-hop (ad hoc) network. The CSMA protocol to use could be similar to IEEE 802.11 WLAN standard. For basic understanding of how MAC layer simulations are written, look at the Chapter 6 of the MD book (available in CS library reserve) where the basic Ethernet (IEEE 802.3) model is described in detail. You can use this model as a base and extend this to wireless; or I can also make available to you a little more sophisticated (but with little documentation) wireless network simulation model that uses an interface very similar to SMPL.

You will need to extend the models to multiple channels. If you are a little mathematically inclined, it is also of interest to build simple analytical model of the multichannel protocols. If you are interested, I can tell you some of my thoughts in this direction that you can use as a starting point.

References:

My publication page has some papers on multichannel wireless MAC protocols done in our group. You can get basic ideas there.
In the eighties, Ajmone Marsan did some basic work on multichannel Ethernet (references in the papers before). I have hard copies that you can borrow to copy. As these papers are much older, soft copies are not available.
You need to know CSMA-based MAC protocols fairly well, particularly 802.3 (Ethernet) and 802.11 (Wireless LAN). A modern networking textbook should have enough to get you started. I can supply more pointers for you learn the protocols, if you need.

Approach:

You should get started defining the scope of your work including the simulation/analytical modeling approach you will take, software tools you will use and the actual nature of the protocol you will model.

Project Idea 3:

Cooperative Caching in Irregular Networks

Description:

The goal of this project is to investigate performance of caching schemes in a network. The network in question can be anything - the Internet, a wireless ad hoc network or sensor network. Depending on the network, different issues can become critical. But for the most part, you can do this work without assuming a specific network. The basic idea here is that there are several objects that are created/served by several servers, who own the objects. There are also clients who access these objects. The servers and clients are simply pieces of software sitting on network nodes. A node can be server(s) and/or client(s) of various different object(s). The server updates the object it owns at some rate (update rate). A client accesses the object it is interested in at a give rate (access rate). Each access (update) means that the object needs to be transferred between server to caches (from caches to clients). This incurs cost in terms of byte-hops transmitted. The quantity byte-hop per unit time models the load on the network. Now, we want to cache these objects at network nodes such that overall load on the network is minimized. Note that caching closer to clients improve the access cost and closer to servers improve the update cost.

Note here, we can assume that the network is of infinite capacity; so no queue needs to be modeled. The goal is to study the effect on load for "interesting" strategies for placement of caches, choice of number of caches, cache replacement algorithms etc. The strategies are up to you to figure. Assume, the network is of random and unknown topology; and that the access pattern is dynamic, i.e., the same nodes do not act as clients for the same objects always and/or the access rates change. Also, assume that memory on the nodes are limited so that replacement algorithms become important.

References:

On the Placement of Web Server Replicas, by Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker, Proc. INFOCOM 2001.
A Survey of Web Caching Schemes for the Internet, Jia Wang, ACM Computer Communication Review, 1999.
Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility, Takahiro Hara, Proc. INFOCOM 2001.
Energy-Conserving Data Placement and Asynchronous Multicast in Wireless Sensor Networks, by Sagnik Bhattacharya, Hyung Kim, Shashi Prabh, Tarek Abdelzaher, ACM MobiSys 2003.
The references within the above papers are also good pointers.

Pointers for Literature Search

A good web search engine like google is always a very good source. Most authors do keep their papers on their web sites. Search using the authors' names and may be some keywords from the title of the paper you are looking for. Also accesses to IEEE digital library and ACM digital library are free from campus computers. Look them up for IEEE and ACM publications.