CSE-506 (Spring 2019) Homework Assignment #1
		  (100 points, 16% of your overall grade)
			   Version 3 (2/22/2019)
		      Due Thursday 3/7/2019 @ 11:59pm
		  Extended to Saturday 3/9/2019 @ 11:59pm

* PURPOSE:

To get your Linux kernel development environment working; to make small
changes to the kernel and test them out; to learn about system calls.

* BACKGROUND:

Encrypting files is very useful and important nowadays, but many OSs do not
support this feature natively (yet).  Your task is to create a new system
call that can take an input file, encrypt or decrypt it, and then produce an
output file.

Note that while we give you more details below, it is up to you to inspect
the kernel sources to find out similar helpful examples of code; those will
provide you with even greater details than what we provide here.

The expected amount of written code for this assignment would be 500-700
lines of kernel code, and another 200-300 lines of user-level code; plus
some shell scripts to prove you've tested your code.  Note, however, that a
lot of time may be spent reading existing sources and debugging the code you
write.

* TASK:

Create a Linux kernel module (in vanilla 4.20.y Linux that's in your HW1 GIT
repository) that, when loaded into Linux, will support a new system call
called

	sys_cpenc(infile, outfile, keybuf, keylen, flags)

where "infile" is the name of an input file to encrypt or decrypt, "outfile"
is the output file, "keybuf" is a buffer holding the cipher key, "keylen" is
the length of that buffer, and "flags" determine if you're encrypting or
decrypting.

If "flags & 0x1" is non-zero, then you should encrypt the infile onto outfile.
If "flags & 0x2" is non-zero, then you should decrypt the infile onto outfile.
If "flags & 0x4" is non-zero, then you should just copy the infile to outfile.

An unencrypted (cleartext) file is just a sequence of arbitrary bytes.  An
encrypted (ciphertext) file has two sections.  The first section is a fixed
length "preamble" and contains some information to validate the decryption
key (e.g., a secure hash/checksum of the user-level pass-phrase).  This
first section may include other information as you see fit (e.g., original
file size, and stuff for validating extra-credit part of this
assignment---see below).  The second section is just the input file data,
encrypted as per the cipher block size, cipher key, etc.  With this header,
for example, you can verify in the kernel that the user is passing the same
decryption key that was used to encrypt the file (else error).

The most important thing system calls do first is ensure the validity of the
input they are given.  You must check for ALL possible bad conditions that
could occur as the result of bad inputs to the system call.  In that case,
the system call should return the proper errno value (EINVAL, EPERM, EACCES,
etc.)  Consult the system errno table and pick the right error numbers for
different conditions.

The kinds of errors that could occur early during the system call's
execution are as follows (this is a non-exhaustive list):

- missing arguments passed
- null arguments
- pointers to bad addresses
- keylen and length of keybuf don't match
- invalid flags or combinations of flags
- input file cannot be opened or read
- output file cannot be opened or written
- input or output files are not regular, or they point to the same file
- trying to decrypt a file w/ the wrong key (what errno should you return?)
- ANYTHING else you can think of (the more error checking you do, the better)

After checking for these errors, you should open the input and output files
and begin copying data between the two, optionally encrypting or decrypting
the data before it is written.  Your code must be efficient.  Therefore, do
not waste extra kernel memory (dynamic or stack) for the system call.  Make
sure you're not leaking any memory.  On the other hand, for efficiency, you
should copy the data in chunks that are native to the system this code is
compiled on, the system page size (PAGE_CACHE_SIZE or PAGE_SIZE).  Hint:
allocate one page as temporary buffer.

Note that the last page you write could be partially filled.  So your code
should handle files whose size isn't a perfect multiple of the page size, as
well as zero length files.  Also note that ciphers have a native block size
(e.g., 64 bit) and your file may have to be padded to the cipher block size.
Lastly, certain ciphers/modes don't care about blocking sizes so they won't
need padding; I recommend you use the "CTR" mode of encryption, so you don't
have to worry about such padding.

The output file should be created with the user/group ownership of the
running process, and its protection mode should NOT be less than the input
file.

Both the input and output files may be specified using relative or absolute
pathnames.  Do not assume that the files are always in the current working
directory.

If no error occurred, sys_cpenc() should return 0 to the calling process.
If an error occurred, it should return -1 and ensure that errno is set for
the calling process.  Choose your errno's appropriately.

If an error occurred in trying to write some of the output file, the system
call should NOT produce a partial output file.  Instead, remove any
partially-written output file and return the appropriate error code.

Write a C program called "tcpenc" that will test call your syscall.  The
program should have no output upon success and use perror() to print out
information about what errors occurred.  The program should take three
arguments:

- flag: -e to encrypt; -d to decrypt; -c to copy
- flag: -C ARG to specify the type of cipher (as a string name)
  [Note: this flag is mainly for the extra credit part]
- flag: -p ARG to specify the encryption/decryption key if needed
- flag: -h to provide a helpful usage message
- input file name
- output file name
- any other options you see fit.

You can process options using getopt(3).  (Note that specifying the password
on the command line is highly insecure, but it'd make grading easier.  In
reality, one would use getpass(3) to input a password.)  You should be able
to execute the following command:

	./xcpenc -p "this is my password" -e infile outfile

User-level passwords should be at least 6 characters long.  Nevertheless,
you should not just pass the password into the kernel as is: it is too
short.  You need to ensure that you pass a correctly sized encryption key
into the kernel.  You should remove any newline character ('\n'), and then
convert the human readable password into a good length key.  Use a
cryptographic checksum algorithm such as MD5(3) or SHA1(3) to generate a
good key to pass to the kernel (see libssl man pages).  An even better way
would be to use a PKCS#5 library to generate secure hashes (check "man -k
pkcs" for more info).

To prove you've tested your code, write a series of short /bin/sh shell
scripts to test your code.  I expect to see at least 10 such test scripts
(the more the better).  Each shell script should be numbered as test01.sh,
test02.sh, etc.  Each shell script should start with "#!/bin/sh" and include
a comment explaining WHAT is being tested.  Each script should test one
thing.  Examples are to test successful functionality and verify it, as well
as to test failures (e.g., when passing bad inputs to the syscall).  Here's
an example:

#!/bin/sh
# test basic copy functionality
set -x
echo dummy test > in.test.$$
/bin/rm -f out.test.$$
./xcpenc -c in.test.$$ out.test.$$
retval=$?
if test $retval != 0 ; then
	echo xcpenc failed with error: $retval
	exit $retval
else
	echo xcpenc program succeeded
fi
# now verify that the two files are the same
if cmp in.test.$$ out.test.$$ ; then
	echo "xcpenc: input and output files contents are the same"
	exit 0
else
	echo "xcpenc: input and output files contents DIFFER"
	exit 1
fi

* SYSTEM CALLS IN the Linux Kernel:

As of kernel 2.6, a kernel module is not allowed to override system calls
(long story, I'll tell you in class :-) So I am giving you a patch that can
add a new syscall to Linux.  Note that the patch was written for an older
64-bit kernel, and it will not apply cleanly on your 4.20 git repo; so
you'll have to manually patch your code.

Note also that the patch not just updates kernel code, but also creates a
sample CSE-506/ subfolder under your kernel tree, if one doesn't already
exist.  That the files in the CSE-506/ folder provide a working example of a
module that can hook into the kernel and set the new syscall dispatch
routine, as well as sample user code.  Study these files carefully to
understand what the code does.  And don't forget to git-add (and
commit+push) those files!

The patch creates a single syscall that takes only one parameter: a "void*",
into which you'd have to pack your args depending on the mode.

You can download the patch from:

	http://www.cs.stonybrook.edu/~ezk/cse506-s19/cse506-syscall.patch

It is recommended you first get a working kernel in your VM without the
special syscall patch (this'll be a challenge on its own).  Only then apply
the patch and test it.  Afterwards, you can get started with the heart of
the assignment -- the file encrypting system call.

* A BASELINE KERNEL TEMPLATE

To make getting started easier, we've provided you with a baseline template
and your own Virtual Machine (VM).  The template VM includes a working Linux
kernel you'd have to configure and build.  See the online class instructions
how to start your own VM (using VMware VSphere client) and login to it.  In
this assignment, you will do all programming in your own personal VM.

To get root privileges, use sudo, but find the proper instructions here:

1. login to the scm machine, then run
2. cat /scm/cse506git-s19/.p
3. once you login as "root" to your VM, change the root passwd asap with the
   command "passwd" and follow the prompts.

You will have to login as root to your own VM, then you'll need to compile
the kernel and the test software:

# cd /usr/src
# git clone ssh://USER@scm.cs.stonybrook.edu:130/scm/cse506git-s19/hw1-USER
	(where "USER" is your CS userid)
# cd hw1-USER
# git checkout wrapfs
	NOTE: the kernel has a "master" vanilla 4.20 branch.  But I created
	a "wrapfs" branch that includes some changes like exporting
	vfs_read() and vfs_write() that you'll need for future
	assignments.  That's why you should switch to the wrapfs branch.
# git branch hw1
	NOTE: this creates a branch "hw1" off of the current branch
	(wrapfs).  Do all your code in YOUR OWN BRANCH.  Be sure to git push
	your changes in this new branch using "git push --all" (otherwise
	only some code in some branches may be pushed but not all code in
	all branches).  One benefit of using a branch is that you can find
	out exactly what you changed since you branched off of "wrapfs"
	using the command "git diff wrapfs hw1".
# make config
	NOTE: Check online instructions how to configure a minimal kernel.
	      your hw1 will be graded on this minimal configured kernel.
	      Refer to "SUBMISSION" section for details.
# make
# make modules
# make modules_install install
# reboot
	NOTE: Ensure you've booted into the 4.20 kernel...

If the above works, download the cse506-syscall.patch and apply it to your
hw1-USER tree.  See the patch(1) program for help how to apply patches.  You
may have to reconfigure your kernel to auto-generate the new system call
vector numbers.  Don't forget to "git add" new files in CSE-506, then "git
commit -a" all new files, and finally "git push" so these changes are pushed
to your remote git repository permanently.  Use "git branch -v" to find out
which branch you're in, and "git status" to find out which files need to be
added, committed index updated, etc.

After rebuilding your patched kernel, you'd have to reinstall the kernel as
per the instructions just above, and reboot again to run the patched kernel
that supports the new syscall.  Once it comes back up, if all works well,
then you can build the overriding syscall module and try the new system
call:

# cd hw1-USER/CSE-506
# make
	NOTE: To build the HW1 sample files.

Check the source files in the CSE-506 subdir and study them.  The
sys_cpenc.c implements a dummy system call that simply returns 0 if you pass
a non-null argument to the system call, and returns EINVAL if you pass zero.
This is your syscall template to implement.

The xcpenc.c file is a sample user level program to pass a number to the
system call.  And the install_module.sh script is used to load up the new
kernel module (and unload an old one first, if any).  To test this system
call, try this:

# sh install_module.sh
# dmesg | tail
	(use this optional command to see the kernel modules loaded.
	 You'll see some messages when a module is un/loaded.)
# ./xhw1 17
syscall returned 0
# ./xhw1
syscall returned -1 (errno=22)

Run the "dmesg" command to see the last printk messages from the kernel.

Note that the system call is designed to pass one "void*" arg from userland
to the kernel.  So, in order to pass multiple arguments, pack them into a
single void*, for example:

struct myargs {
	int arg1;
	char arg2[24];
};
struct myargs args;
args.arg1 = 100;
strcpy(args.arg2, "hello world");
rc = mysyscall((void *) &args);

* USING THE CIPHERS:

You should perform all of your encryption in "CTR" mode on whole pages (4KB
on Linux x86).  If you use other cipher modes, you may have to pad your
data.

Use the Linux kernel built-in CryptoAPI.  To learn how to use it, see the
kernel documentation that comes with the CryptoAPI option.  You don't need
to be an expert in security or encryption to do this assignment.  Part of
what this assignment will teach you is how to work with someone else's code,
even if all you understand is the API to that code (and not the internals).

For this assignment, use the AES cipher only (i.e., hard-code it in your
kernel code).  (But see the Extra Credit section below.)

* READING FILES FROM INSIDE THE KERNEL

Here's an (old) example function that can open a file from inside the
kernel, read some data off of it, then close it.  This will help you in this
assignment.  You can easily extrapolate from this how to write data to
another file.  (Warning: the code below is from 2.4.  Adapt it as needed.)

/*
 * Read "len" bytes from "filename" into "buf".
 * "buf" is in kernel space.
 */
int
wrapfs_read_file(const char *filename, void *buf, int len)
{
    struct file *filp;
    mm_segment_t oldfs;
    int bytes;

    /* Chroot? Maybe NULL isn't right here */
    filp = filp_open(filename, O_RDONLY, 0);
    if (!filp || IS_ERR(filp)) {
	printk("wrapfs_read_file err %d\n", (int) PTR_ERR(filp));
	return -1;  /* or do something else */
    }

    if (!filp->f_op->read) /* better: use vfs_read() */
	return -2;  /* file(system) doesn't allow reads */

    /* now read len bytes from offset 0 */
    filp->f_pos = 0; /* start offset */
    oldfs = get_fs();
    set_fs(KERNEL_DS);
    /* better: use vfs_read() */
    bytes = filp->f_op->read(filp, buf, len, &filp->f_pos);
    set_fs(oldfs);

    /* close the file */
    filp_close(filp, NULL);

    return bytes;
}

* TESTING YOUR CODE:

You may choose to hard-code the syscall into your kernel, or do it as a
loadable kernel module (loadable kernel modules makes it easier to
unload/reload a new version of the code).  Write user-level code to test
your program carefully.

If you choose a kernel module, then once your module is loaded, the new
system call behavior should exist, and you can run your program on various
input files.  Check that each error condition you coded for works as it
should.  Check that the modified file is changed correctly.

Finally, although you may develop your code on any Linux machine, we will
test your code using the same Virtual Machine distribution (with all
officially released patches applied as of the date this assignment is
released), and using the Linux 4.20.y kernel.  It is YOUR responsibility to
ensure that your code runs well under these conditions.  We will NOT test or
demo your code on your own machine or laptop!  So please plan your work
accordingly to allow yourself enough time to test your code on the machines
for which we have given you a login account (these are the same exact
machines we will test your code on when we grade it).

Additionally, we strongly suggest that you enable CONFIG_DEBUG_SLAB and
other useful debugging features under the "Kernel hacking" configuration
menu.  When grading the homework, we will use a kernel tuned for
debugging---which may expose bugs in your code that you can't easily catch
without debugging support.  So it's better for YOU to have caught and fixed
those bugs before we do.

Lastly, note that even if your system call appears to work well, it's
possible that you've corrupted some memory state in the kernel, and you may
not notice the impact until much later.  If your code begins behaving
strangely after having worked better before, consider rebooting your VM.

* STYLE AND MORE:

Aside from testing the proper functionality of your code, we will also
carefully evaluate the quality of your code.  Be sure to use a consistent
style, well documented, and break your code into separate functions and/or
source files as it makes sense.

To be sure your code is very clean, it should compile with "gcc -Wall
-Werror" without any errors or warnings!  We'll deduct points for any
warning that we feel should be easy to fix.

Read Documentation/CodingStyle to understand which coding style is preferred
in the kernel and stick to it for this assignment.  Run your kernel code
through the syntax checker in scripts/checkpatch.pl (with the "strict"
option turned on), and fix every warning that comes up.  Cleaner code tends
to be less buggy.

If the various sources you use require common definitions, then do not
duplicate the definitions.  Make use of C's code-sharing facilities such as
common header files.

You must include a README file with this and any assignment.  The README
file should briefly describe what you did, what approach you took, results
of any measurements you might have made, which files are included in your
submission and what they are for, etc.

Remember that while the code must all be yours, if you consulted any online
resources, you MUST clearly list them in detail (e.g., exactly where and
what) in your README and your code.  Feel free to include any other
information you think is helpful to us in this README; it can only help your
grade (esp. for Extra Credit).

* SUBMISSION

You will need to submit all of your sources, headers, scripts, Makefiles,
and README.  Do not commit regenerable files like binaries or temporary
files like "#" and "~" files.  Submit all of your files using GIT.  See
general GIT submission guidelines on the class Web site.  Be sure to commit
any new branch you created as described above AND push those changes.

As part of this assignment, you should also build a 4.20.y kernel that's as
small as you can get (but without breaking the normal CentOS7 boot).  For
example, there are dozens of file systems available: you need at least ext4,
but you don't need XFS or Reiserfs.  Commit your .config kernel file into
GIT, but rename it "kernel.config".  We will grade you on how small your
kernel configuration is with the following exceptions:

1. All start time servers that run by default in the VM provided, should
   start without failing.

2. We won't count "kernel hacking" options: so you may enable as many of
   them as you'd like.

To submit new files, put them under the directory named "CSE-506" inside
hw1-<user> directory that you checked out.  Remember to git add, commit, and
push this new directory.  Put all new files that you add in this directory.
This may include user space program (.c and .h files), README, kernel files
(in case you are implementing system call as a loadable kernel module),
Makefile, kernel.config, or anything else you deem appropriate.

For existing kernel source to which you make modification, use git add,
commit, and push as mentioned on the class web site.

There must be a Makefile in CSE-506/ directory.  Doing a "make" in CSE-506/
should accomplish the following:

1. Compile user space program to produce an executable by the name "xcpenc".
   This will be used to test your system call.

2. In case you are implementing system call as a loadable kernel module, the
   "make" command should also produce a sys_xcpenc.ko file which can be
   insmod into the kernel.

(Use gcc -Wall -Werror in makefile commands.  We will anyway add them if you
don't :-)

The CSE-506/ directory should also contain a "kernel.config" file which will
be used to bring up your kernel.

Note that in case you are implementing system call directly in the kernel
code (and not as a loadable kernel module), then just compiling and
installing your kernel should activate the system call.

Just to give you an idea how we will grade your submission: We will check
out your git repository.  We will use kernel.config file in CSE-506/
subdirectory to compile and then install your kernel.  We will then do a
make in CSE-506/ subdirectory.  If your implementation is based on a
loadable module, we will expect sys_xcpenc.ko to be present in CSE-506/
after doing a make.  We will then insmod it and use CSE-506/xcpenc (also
generated as part of make) to test your system call on various inputs.  Note
that insmod step will be skipped in case you implement system call directly
into the kernel.

PLEASE test your submission before submitting it, by doing a dry run of
above steps.  DO NOT make the common mistake of writing code until the very
last minute, and then trying to figure out how to use GIT and skipping the
testing of what you submitted.  You will lose valuable points if you do not
get to submit on time or if you submission is incomplete!!!

Make sure that you follow above submission guidelines strictly.  In
particular, do a separate git clone of your committed code to ensure that
you pushed everything you needed and no more.  We *will* deduct points for
not following this instructions precisely.

* EXTRA CREDIT (OPTIONAL, total 25 points)

If you do any of the extra credit work, then your EC code must be wrapped in

	#ifdef EXTRA_CREDIT // EC code here #else // base assignment code
		here #endif

[A] 4 points.

Augment your module to utilize the Initialization Vector (IV) part of the
xcpenc.  Without having to know much about the IV, it is useful to
understand that setting it to a different value each time you encrypt or
decrypt a chunk of bytes produces stronger encryption that is harder to
break.  A common way to set the 8 bytes of the IV is as follows:

- first 8 bytes are the index of the page (or page number) that you are
  encrypting or decrypting (e.g., on an i386 system with a 4096-byte page
  size, bytes 0-4095 are in page 0, bytes 4096-8191 are in page 1, etc.).

- set the remaining 8 bytes to the inode number of the file.

Note: Your first IV information (assuming you "chain" them) should be stored
in the cipher file preamble.

[B] 6 points

Support multiple ciphers.  You should pass the cipher name as a string using
the "-c ARG" option.  Change the system call to accept an extra argument at
the end called "char *cipher".  This variable should be a constant string,
null terminated, whose value can be one of: "blowfish" for the Blowfish
cipher; "des" for DES; "des3_ede" for Triple DES; etc.  The type of cipher
must always be specified and must always be a valid cipher that the Linux
kernel CryptoAPI understands.  All kernel-supported ciphers should be
allowed; return EINVAL if the user specifies an invalid cipher name.  The
cipher name (or ID) should also be stored in the preamble.

[C] 5 points.

Support multiple encryption unit sizes and key lengths.  You will have to
augment the system call as needed to pass the new info, and the user-level
tool.  For example:

	$ ./xcpenc -u 16000 -l 256 -e infile outfile

where -l specifies the key length to 256 bits, and -u specifies that the
encryption unit should be in whole chunks of 16000 bytes (instead of the
default 4KB).  If not specified, -u should default to PAGE_SIZE, and -l to
128 bits.  Note that the argument to -l can be any valid key length that the
cipher accepts (for example, Blowfish can't use keys smaller than 128 bits);
however, the argument to -u can be ANY positive number that the cipher will
accept (even odd numbers).

[D] 5 points

The five students who have the smallest working kernel config files will
receive 1-5 points each (smallest config gets 5 points, next smallest gets 4
points, etc.)  The kernel must boot and no errors/warnings show up during
CentOS7's boot (without modifying the boot sequence scripts).

[E] 5 points

Extra credit at grader's discretion, up to 5 pts, for any particularly
clever solutions/enhancements, or for extra, very nice test scripts.  Be
sure to document anything extra you did in your README so the graders notice
it.

If you submit by the original deadline of 3/7/2019, you automatically get 3
more EC points.

Good luck.

* Copyright Statement

(c) 2019 Erez Zadok
(c) Stony Brook University

DO NOT POST ANY PART OF THIS ASSIGNMENT OR MATERIALS FROM THIS COURSE ONLINE
IN ANY PUBLIC FORUM, WEB SITE, BLOG, ETC.  DO NOT POST ANY OF YOUR CODE,
SOLUTIONS, NOTES, ETC.  DOING SO COULD AFFECT YOUR GRADE AND/OR DEGREE EVEN
AFTER GRADUATION!

* Change History:

1/9/2019: draft 1
1/11/2019: draft 2 (TA review)
1/11/2019: draft 2b (TA review)
1/22/2019: add copyright statement
3/6/2019: EC[E] clarification for submission dates