I am giving you a lot of
things already written in ClientJava.zip,
so you only need to make clear how many cases you need to handle and what to do
for each case. Read the document VERY carefully (maybe except Cache Issues) and
do as instruction. Since there are many cases in this experiment, please write
the report in a very readable way. I suggest you use Word or PDF instead of TXT.
Sybase Adaptive Server's query optimizer attempts to produce an efficient query plan for each query it executes. An efficient plan minimizes the number of I/O operations required, since I/O represents the major execution cost. The optimizer generally produces several "good" plans, evaluates their cost and chooses the best one.
Associated with each query plan the optimizer must choose an access path to get the data from each table. Depending on the indices you have created, the optimizer's choices include:
Table Scan - Reads all of the data pages of a table.
Index Access - Uses the proper index to find only the data pages needed.
Index Covering - The query can be satisfied using only the search key attributes of an unclustered index. Hence, only index pages need to be accessed; no access to data pages is required.
A table scan requires that the entire table be retrieved and hence may cause a large number of disk reads with a resulting poor response time. Sybase attempts to reduce this time by transferring multiple pages in a single I/O operations and by prefetching pages - fetching pages that have not yet been requested. With such techniques, table scans can often be done efficiently. However, scans of large tables are still costly. The situation can be improved if you have the right set of indices on your tables. In that case the query plan can use index access or index covering and will require fewer I/O operations.
It is important to keep index entries as small as possible. The page size in Sybase is 4k and you can create an index whose entries have up to 1250 bytes, but very few index entries will fit in a page, the index tree will have more levels, and each level will have more pages. As a result a search must descend more levels and the index scan needed to perform a range search will need to read more pages.
Carefully choose between clustered index and unclustered. You can create only one clustered index per table because the data for a clustered index is ordered by search key.
The following experiment will demonstrate how indices affect the performance of selection.
It is very important to consider the affect of the data cache when measuring performance. When running a sequence of experiments, the data cache at the start of one experiment will contain the pages left by the previous experiment. If the next experiment uses the same pages, it will find them in the cache and no disk I/O is needed. As a result, no meaningful conclusion can be deduced by comparing the execution time of the two experiments. To avoid this problem it is necessary to make sure that when an experiment is initiated, the pages in the cache are useless. Hence, we need to flush the cache between experiments. So we need to know how Sybase Adaptive Server uses the cache. When Adaptive Server starts, it allocates memory for executable code and data structures. The remaining memory is used for a data cache, called the default cache. The command sp_cacheconfig returns information about the cache. If you execute the command in RDB you will get:
Cache Name Status Type Config Value Run Value
------------------------------ --------- -------- ------------ ------------
default data cache Active Default 0.00 Mb 32.00 Mb
In this experiment, as in the previous experiment, statistics should be obtained in single-user mode by running on the bench machine. The default cache on that machine is 8 Mb. In order to flush the pages out it we create a small (1 Mb is the smallest allowed) named cache - called test - and we bind the databases used in the experiment to test. As a result, all pages in tables used in this experiment are buffered in test. Executing sp_cacheconfig on the bench machine yields:
Cache Name Status Type Config Value Run Value
Default data cache active Default 0.00 Mb 8.00 Mb
Test active Mixed 1.00 Mb 1.00 Mb
Generally, a DBMS will use some form of LRU policy in deciding which pages to keep in a cache. It would then appear that by scanning a large (dummy) table between experiments, pages remaining in the cache from one experiment would be overwritten by the more recently referenced pages of the dummy table before the next experiment starts. Unfortunately, Sybase recognizes that when a query plan calls for a table scan (as opposed to using an index), once a page has been read it will not be needed in the immediate future. Hence, an MRU (fetch and discard) policy is used for scans in which pages that have been recently scanned are candidates to be overwritten by other pages that are subsequently scanned. You need to force Sybase to use an index in the query plan for accessing the dummy table in order to get it to use an LRU policy for managing the table's pages in the cache.
The table tppt.tppt.initDummy should be used to flush the cache test. The table is in the tppt database - you do not need to create it.
CREATE TABLE initDummy (
value
integer,
attrib integer,
Filler1 char(255),
Filler2 char(255),
Filler3 char(255),
Filler4 char(255),
Filler5 char(255),
Filler6 char(255),
Filler7 char(255),
Filler8 char(255),
Filler9 char(255),
Filler10 char(255)
)
CREATE NONCLUSTERED INDEX dummyindex ON initDUMMY (value)
Since a row must be entirely contained in a single page, each page of initDummy contains only one row. The query plan for the statement:
SELECT *
FROM tppt.tppt.initDummy
WHERE value > 100 AND attrib = 0
uses dummyindex (not a table scan) and hence an LRU algorithm is used in test as pages of the table are retrieved. Since the table is initialized so that no row satisfies the condition Attrib = 0, the pages will be brought into the cache, but the result set produced by the SELECT statement is empty.
The query plan used by Sybase to execute a query can be obtained by setting the showplan option on with the command:
set
showplan
on
Since you don't want the DBMS to output the same plan over and over again as you re-execute the same query during an experiment, to get the query plan for a particular SQL statement use jisql to (1) set showplan on, then (2) execute the statement (the query plan will be output), and then (3) set showplan off. Since you don't have an account on the bench machine you will only be able to do this on the multi-user server (RDB).
The experiment consists of two parts and uses two tables.
CREATE TABLE Student (
Id char(9) NOT NULL,
Name char(50) NOT NULL,
Password char(20) NOT NULL,
Validity char(1) NOT NULL,
Town char(20)
)
CREATE TABLE Transcript (
StudId char(9) NOT NULL,
CourseId char(6) NOT NULL,
SectionNo INT,
Semester char(6) NOT NULL,
Year INT,
Grade char(1) NOT NULL
)
This part of the experiment uses only the table Student. There are 60000 rows in Student, and since each row occupies 100 bytes and the default page size in Sybase is 4k bytes, approximately 40 rows fit in one page. Since no key is specified, the table will be created as a heap storage structure. Initialize the table by copying rows from the read-only table tppt.tppt.initStudent using an insert statement:
insert into Student select * from tppt.tppt.initStudent
Run the experiment using the Student table with no index or with one of the following indexes:
1) create clustered index point1 on Student(Id)
2) create nonclustered index point2 on Student(Id)
3) create nonclustered index point3 on Student(Id, Validity)
4) create nonclustered index point4 on Student(Validity, Id)
5) create clustered index range1 on Student(Town)
6) create nonclustered index range2 on Student(Town)
7) create nonclustered index range3 on Student(Town, Id)
8) create nonclustered index range4 on Student(Id, Town)
Note that it takes much longer to rebuild the table with a clustered index than with an unclustered index. This is because each row has to be inserted in the table in alphabetical order based on the search key.
You should implement two stored procedures to be run as transactions. Flush flushes the cache using the technique discussed previously and must be called before each execution of StudentSearch to empty the cache.
StudentSearch: search Student using a SELECT statement. Create versions of the procedure for each of the statements:
select Validity from Student where Id = '100003000';
(
point query)
select Id from
The StudentSearch stored procedure should include the "with recompile" option. This guarantees that its query plan will be rebuilt each time it is called. Without this option the query plan created when the stored procedure is first introduced into the schema will be used despite new indexes that you subsequently create.
This
part of the experiment uses both Student and Transcript tables and involves a
join. In addition to the Flush stored procedure, create a stored
procedure CourseIdGrade.
CourseIdGrade: Output course Ids and grades using the following select statement.
SELECT T.CourseId, T.Grade
FROM Student S, Transcript T
WHERE S.Id = T.StudId AND S.Name = '100000001'
a) Initialize Student and Transcript using the following commands. This creates a Student table with 5000 rows and a Transcript table with 5000 rows (one course for each student).
INSERT INTO Student SELECT * FROM tppt.tppt.initStudent
WHERE StudId<='100005000'
INSERT INTO Transcript SELECT S.StudId , C.CourseId, 1, 'Fall', 2003, 'A'
FROM tppt.tppt.initStudent S, tppt.tppt.initCourse C
WHERE S.StudId <='100005000' and C.CourseId='CSE501'
Run the experiment with following options:
1) no indexes on either table
2)
CREATE
NONCLUSTERED INDEX
3)
CREATE
NONCLUSTERED INDEX
CREATE NONCLUSTERED INDEX
b) Initialize tables using the following commands. This creates a Student table with 3000 rows and a Transcript table with 3000 rows (30 students take 100 courses and the rest take none).
INSERT INTO Student SELECT * FROM tppt.tppt.initStudent
WHERE StudId<='100003000'
INSERT INTO Transcript SELECT S.StudId , C.CourseId, 1, 'Fall', 2003, 'A'
FROM tppt.tppt.initStudent S, tppt.tppt.initCourse C
WHERE S.StudId <='100000030' and C.CourseId<='CSE600'
Run the experiment with following options:
1) no index
2)
CREATE
NONCLUSTERED INDEX
3)
CREATE
CLUSTERED INDEX
CREATE NONCLUSTERED INDEX
int
StudentSearch(Connection con);
int
Flush(Connection con);
or ( depending on this is for part1 or part2)
int
CourseIdGrade(Connection con);
int
Flush(Connection con);
>>
Flush
StudentSearch
or ( depending on this is for part1 or part2)
>>
Flush
CourseIdGrade
Since there are many
cases in this experiment, please write the report in a very readable way. I
suggest you use Word or PDF instead of TXT.
1. Turn in a TABLE showing the average response time for StudentSearch and CourseId Grade measured by the tool for each index type and query.
You can use some tables like:
Index Type| Average response time | Query |
PointQuery |
RangeQuery |
No index |
*number* |
*number* |
point1 |
*number* |
-------no experiment-------- |
point2 |
*number* |
-------no experiment-------- |
point3 |
*number* |
-------no experiment-------- |
point4 |
*number* |
-------no experiment-------- |
range1 |
-------no experiment-------- |
*number* |
range2 |
-------no experiment-------- |
*number* |
range3 |
-------no experiment-------- |
*number* |
range4 |
-------no experiment-------- |
*number* |
and
Index Type| Average response time |DB state |
(a) |
(b) |
No index |
*number* |
*number* |
StudId |
*number* |
-------no experiment-------- |
StudId , Name |
*number* |
-------no experiment-------- |
StudId |
-------no experiment-------- |
*number* |
CourseId , StudId |
-------no experiment-------- |
*number* |
2. Finish what are required in Step 8: .Vary the range and determine the point at which the optimizer switches from using the index range2 to not using that index.
3. Compare the response times for the point query, compare the response times for the range queries, compare the response time for the join query. (In each case give the query plan used by Sybase and explain the time that you got on SingleUser in terms of the query plan used - which you can get on MultiUser. )
4. What effect does the introduction of the clustered index CourseId in part 2.b have on the response time? Why?
5. Include in your submission all the original timing report files from TPPT SingleUser.
Last updated on Dec 8, 2008 by
Tingbo Hou