CSE-506 (Spring 2019) Homework Assignment #1 (100 points, 16% of your overall grade) Version 3 (2/22/2019) Due Thursday 3/7/2019 @ 11:59pm Extended to Saturday 3/9/2019 @ 11:59pm * PURPOSE: To get your Linux kernel development environment working; to make small changes to the kernel and test them out; to learn about system calls. * BACKGROUND: Encrypting files is very useful and important nowadays, but many OSs do not support this feature natively (yet). Your task is to create a new system call that can take an input file, encrypt or decrypt it, and then produce an output file. Note that while we give you more details below, it is up to you to inspect the kernel sources to find out similar helpful examples of code; those will provide you with even greater details than what we provide here. The expected amount of written code for this assignment would be 500-700 lines of kernel code, and another 200-300 lines of user-level code; plus some shell scripts to prove you've tested your code. Note, however, that a lot of time may be spent reading existing sources and debugging the code you write. * TASK: Create a Linux kernel module (in vanilla 4.20.y Linux that's in your HW1 GIT repository) that, when loaded into Linux, will support a new system call called sys_cpenc(infile, outfile, keybuf, keylen, flags) where "infile" is the name of an input file to encrypt or decrypt, "outfile" is the output file, "keybuf" is a buffer holding the cipher key, "keylen" is the length of that buffer, and "flags" determine if you're encrypting or decrypting. If "flags & 0x1" is non-zero, then you should encrypt the infile onto outfile. If "flags & 0x2" is non-zero, then you should decrypt the infile onto outfile. If "flags & 0x4" is non-zero, then you should just copy the infile to outfile. An unencrypted (cleartext) file is just a sequence of arbitrary bytes. An encrypted (ciphertext) file has two sections. The first section is a fixed length "preamble" and contains some information to validate the decryption key (e.g., a secure hash/checksum of the user-level pass-phrase). This first section may include other information as you see fit (e.g., original file size, and stuff for validating extra-credit part of this assignment---see below). The second section is just the input file data, encrypted as per the cipher block size, cipher key, etc. With this header, for example, you can verify in the kernel that the user is passing the same decryption key that was used to encrypt the file (else error). The most important thing system calls do first is ensure the validity of the input they are given. You must check for ALL possible bad conditions that could occur as the result of bad inputs to the system call. In that case, the system call should return the proper errno value (EINVAL, EPERM, EACCES, etc.) Consult the system errno table and pick the right error numbers for different conditions. The kinds of errors that could occur early during the system call's execution are as follows (this is a non-exhaustive list): - missing arguments passed - null arguments - pointers to bad addresses - keylen and length of keybuf don't match - invalid flags or combinations of flags - input file cannot be opened or read - output file cannot be opened or written - input or output files are not regular, or they point to the same file - trying to decrypt a file w/ the wrong key (what errno should you return?) - ANYTHING else you can think of (the more error checking you do, the better) After checking for these errors, you should open the input and output files and begin copying data between the two, optionally encrypting or decrypting the data before it is written. Your code must be efficient. Therefore, do not waste extra kernel memory (dynamic or stack) for the system call. Make sure you're not leaking any memory. On the other hand, for efficiency, you should copy the data in chunks that are native to the system this code is compiled on, the system page size (PAGE_CACHE_SIZE or PAGE_SIZE). Hint: allocate one page as temporary buffer. Note that the last page you write could be partially filled. So your code should handle files whose size isn't a perfect multiple of the page size, as well as zero length files. Also note that ciphers have a native block size (e.g., 64 bit) and your file may have to be padded to the cipher block size. Lastly, certain ciphers/modes don't care about blocking sizes so they won't need padding; I recommend you use the "CTR" mode of encryption, so you don't have to worry about such padding. The output file should be created with the user/group ownership of the running process, and its protection mode should NOT be less than the input file. Both the input and output files may be specified using relative or absolute pathnames. Do not assume that the files are always in the current working directory. If no error occurred, sys_cpenc() should return 0 to the calling process. If an error occurred, it should return -1 and ensure that errno is set for the calling process. Choose your errno's appropriately. If an error occurred in trying to write some of the output file, the system call should NOT produce a partial output file. Instead, remove any partially-written output file and return the appropriate error code. Write a C program called "tcpenc" that will test call your syscall. The program should have no output upon success and use perror() to print out information about what errors occurred. The program should take three arguments: - flag: -e to encrypt; -d to decrypt; -c to copy - flag: -C ARG to specify the type of cipher (as a string name) [Note: this flag is mainly for the extra credit part] - flag: -p ARG to specify the encryption/decryption key if needed - flag: -h to provide a helpful usage message - input file name - output file name - any other options you see fit. You can process options using getopt(3). (Note that specifying the password on the command line is highly insecure, but it'd make grading easier. In reality, one would use getpass(3) to input a password.) You should be able to execute the following command: ./xcpenc -p "this is my password" -e infile outfile User-level passwords should be at least 6 characters long. Nevertheless, you should not just pass the password into the kernel as is: it is too short. You need to ensure that you pass a correctly sized encryption key into the kernel. You should remove any newline character ('\n'), and then convert the human readable password into a good length key. Use a cryptographic checksum algorithm such as MD5(3) or SHA1(3) to generate a good key to pass to the kernel (see libssl man pages). An even better way would be to use a PKCS#5 library to generate secure hashes (check "man -k pkcs" for more info). To prove you've tested your code, write a series of short /bin/sh shell scripts to test your code. I expect to see at least 10 such test scripts (the more the better). Each shell script should be numbered as test01.sh, test02.sh, etc. Each shell script should start with "#!/bin/sh" and include a comment explaining WHAT is being tested. Each script should test one thing. Examples are to test successful functionality and verify it, as well as to test failures (e.g., when passing bad inputs to the syscall). Here's an example: #!/bin/sh # test basic copy functionality set -x echo dummy test > in.test.$$ /bin/rm -f out.test.$$ ./xcpenc -c in.test.$$ out.test.$$ retval=$? if test $retval != 0 ; then echo xcpenc failed with error: $retval exit $retval else echo xcpenc program succeeded fi # now verify that the two files are the same if cmp in.test.$$ out.test.$$ ; then echo "xcpenc: input and output files contents are the same" exit 0 else echo "xcpenc: input and output files contents DIFFER" exit 1 fi * SYSTEM CALLS IN the Linux Kernel: As of kernel 2.6, a kernel module is not allowed to override system calls (long story, I'll tell you in class :-) So I am giving you a patch that can add a new syscall to Linux. Note that the patch was written for an older 64-bit kernel, and it will not apply cleanly on your 4.20 git repo; so you'll have to manually patch your code. Note also that the patch not just updates kernel code, but also creates a sample CSE-506/ subfolder under your kernel tree, if one doesn't already exist. That the files in the CSE-506/ folder provide a working example of a module that can hook into the kernel and set the new syscall dispatch routine, as well as sample user code. Study these files carefully to understand what the code does. And don't forget to git-add (and commit+push) those files! The patch creates a single syscall that takes only one parameter: a "void*", into which you'd have to pack your args depending on the mode. You can download the patch from: http://www.cs.stonybrook.edu/~ezk/cse506-s19/cse506-syscall.patch It is recommended you first get a working kernel in your VM without the special syscall patch (this'll be a challenge on its own). Only then apply the patch and test it. Afterwards, you can get started with the heart of the assignment -- the file encrypting system call. * A BASELINE KERNEL TEMPLATE To make getting started easier, we've provided you with a baseline template and your own Virtual Machine (VM). The template VM includes a working Linux kernel you'd have to configure and build. See the online class instructions how to start your own VM (using VMware VSphere client) and login to it. In this assignment, you will do all programming in your own personal VM. To get root privileges, use sudo, but find the proper instructions here: 1. login to the scm machine, then run 2. cat /scm/cse506git-s19/.p 3. once you login as "root" to your VM, change the root passwd asap with the command "passwd" and follow the prompts. You will have to login as root to your own VM, then you'll need to compile the kernel and the test software: # cd /usr/src # git clone ssh://USER@scm.cs.stonybrook.edu:130/scm/cse506git-s19/hw1-USER (where "USER" is your CS userid) # cd hw1-USER # git checkout wrapfs NOTE: the kernel has a "master" vanilla 4.20 branch. But I created a "wrapfs" branch that includes some changes like exporting vfs_read() and vfs_write() that you'll need for future assignments. That's why you should switch to the wrapfs branch. # git branch hw1 NOTE: this creates a branch "hw1" off of the current branch (wrapfs). Do all your code in YOUR OWN BRANCH. Be sure to git push your changes in this new branch using "git push --all" (otherwise only some code in some branches may be pushed but not all code in all branches). One benefit of using a branch is that you can find out exactly what you changed since you branched off of "wrapfs" using the command "git diff wrapfs hw1". # make config NOTE: Check online instructions how to configure a minimal kernel. your hw1 will be graded on this minimal configured kernel. Refer to "SUBMISSION" section for details. # make # make modules # make modules_install install # reboot NOTE: Ensure you've booted into the 4.20 kernel... If the above works, download the cse506-syscall.patch and apply it to your hw1-USER tree. See the patch(1) program for help how to apply patches. You may have to reconfigure your kernel to auto-generate the new system call vector numbers. Don't forget to "git add" new files in CSE-506, then "git commit -a" all new files, and finally "git push" so these changes are pushed to your remote git repository permanently. Use "git branch -v" to find out which branch you're in, and "git status" to find out which files need to be added, committed index updated, etc. After rebuilding your patched kernel, you'd have to reinstall the kernel as per the instructions just above, and reboot again to run the patched kernel that supports the new syscall. Once it comes back up, if all works well, then you can build the overriding syscall module and try the new system call: # cd hw1-USER/CSE-506 # make NOTE: To build the HW1 sample files. Check the source files in the CSE-506 subdir and study them. The sys_cpenc.c implements a dummy system call that simply returns 0 if you pass a non-null argument to the system call, and returns EINVAL if you pass zero. This is your syscall template to implement. The xcpenc.c file is a sample user level program to pass a number to the system call. And the install_module.sh script is used to load up the new kernel module (and unload an old one first, if any). To test this system call, try this: # sh install_module.sh # dmesg | tail (use this optional command to see the kernel modules loaded. You'll see some messages when a module is un/loaded.) # ./xhw1 17 syscall returned 0 # ./xhw1 syscall returned -1 (errno=22) Run the "dmesg" command to see the last printk messages from the kernel. Note that the system call is designed to pass one "void*" arg from userland to the kernel. So, in order to pass multiple arguments, pack them into a single void*, for example: struct myargs { int arg1; char arg2[24]; }; struct myargs args; args.arg1 = 100; strcpy(args.arg2, "hello world"); rc = mysyscall((void *) &args); * USING THE CIPHERS: You should perform all of your encryption in "CTR" mode on whole pages (4KB on Linux x86). If you use other cipher modes, you may have to pad your data. Use the Linux kernel built-in CryptoAPI. To learn how to use it, see the kernel documentation that comes with the CryptoAPI option. You don't need to be an expert in security or encryption to do this assignment. Part of what this assignment will teach you is how to work with someone else's code, even if all you understand is the API to that code (and not the internals). For this assignment, use the AES cipher only (i.e., hard-code it in your kernel code). (But see the Extra Credit section below.) * READING FILES FROM INSIDE THE KERNEL Here's an (old) example function that can open a file from inside the kernel, read some data off of it, then close it. This will help you in this assignment. You can easily extrapolate from this how to write data to another file. (Warning: the code below is from 2.4. Adapt it as needed.) /* * Read "len" bytes from "filename" into "buf". * "buf" is in kernel space. */ int wrapfs_read_file(const char *filename, void *buf, int len) { struct file *filp; mm_segment_t oldfs; int bytes; /* Chroot? Maybe NULL isn't right here */ filp = filp_open(filename, O_RDONLY, 0); if (!filp || IS_ERR(filp)) { printk("wrapfs_read_file err %d\n", (int) PTR_ERR(filp)); return -1; /* or do something else */ } if (!filp->f_op->read) /* better: use vfs_read() */ return -2; /* file(system) doesn't allow reads */ /* now read len bytes from offset 0 */ filp->f_pos = 0; /* start offset */ oldfs = get_fs(); set_fs(KERNEL_DS); /* better: use vfs_read() */ bytes = filp->f_op->read(filp, buf, len, &filp->f_pos); set_fs(oldfs); /* close the file */ filp_close(filp, NULL); return bytes; } * TESTING YOUR CODE: You may choose to hard-code the syscall into your kernel, or do it as a loadable kernel module (loadable kernel modules makes it easier to unload/reload a new version of the code). Write user-level code to test your program carefully. If you choose a kernel module, then once your module is loaded, the new system call behavior should exist, and you can run your program on various input files. Check that each error condition you coded for works as it should. Check that the modified file is changed correctly. Finally, although you may develop your code on any Linux machine, we will test your code using the same Virtual Machine distribution (with all officially released patches applied as of the date this assignment is released), and using the Linux 4.20.y kernel. It is YOUR responsibility to ensure that your code runs well under these conditions. We will NOT test or demo your code on your own machine or laptop! So please plan your work accordingly to allow yourself enough time to test your code on the machines for which we have given you a login account (these are the same exact machines we will test your code on when we grade it). Additionally, we strongly suggest that you enable CONFIG_DEBUG_SLAB and other useful debugging features under the "Kernel hacking" configuration menu. When grading the homework, we will use a kernel tuned for debugging---which may expose bugs in your code that you can't easily catch without debugging support. So it's better for YOU to have caught and fixed those bugs before we do. Lastly, note that even if your system call appears to work well, it's possible that you've corrupted some memory state in the kernel, and you may not notice the impact until much later. If your code begins behaving strangely after having worked better before, consider rebooting your VM. * STYLE AND MORE: Aside from testing the proper functionality of your code, we will also carefully evaluate the quality of your code. Be sure to use a consistent style, well documented, and break your code into separate functions and/or source files as it makes sense. To be sure your code is very clean, it should compile with "gcc -Wall -Werror" without any errors or warnings! We'll deduct points for any warning that we feel should be easy to fix. Read Documentation/CodingStyle to understand which coding style is preferred in the kernel and stick to it for this assignment. Run your kernel code through the syntax checker in scripts/checkpatch.pl (with the "strict" option turned on), and fix every warning that comes up. Cleaner code tends to be less buggy. If the various sources you use require common definitions, then do not duplicate the definitions. Make use of C's code-sharing facilities such as common header files. You must include a README file with this and any assignment. The README file should briefly describe what you did, what approach you took, results of any measurements you might have made, which files are included in your submission and what they are for, etc. Remember that while the code must all be yours, if you consulted any online resources, you MUST clearly list them in detail (e.g., exactly where and what) in your README and your code. Feel free to include any other information you think is helpful to us in this README; it can only help your grade (esp. for Extra Credit). * SUBMISSION You will need to submit all of your sources, headers, scripts, Makefiles, and README. Do not commit regenerable files like binaries or temporary files like "#" and "~" files. Submit all of your files using GIT. See general GIT submission guidelines on the class Web site. Be sure to commit any new branch you created as described above AND push those changes. As part of this assignment, you should also build a 4.20.y kernel that's as small as you can get (but without breaking the normal CentOS7 boot). For example, there are dozens of file systems available: you need at least ext4, but you don't need XFS or Reiserfs. Commit your .config kernel file into GIT, but rename it "kernel.config". We will grade you on how small your kernel configuration is with the following exceptions: 1. All start time servers that run by default in the VM provided, should start without failing. 2. We won't count "kernel hacking" options: so you may enable as many of them as you'd like. To submit new files, put them under the directory named "CSE-506" inside hw1- directory that you checked out. Remember to git add, commit, and push this new directory. Put all new files that you add in this directory. This may include user space program (.c and .h files), README, kernel files (in case you are implementing system call as a loadable kernel module), Makefile, kernel.config, or anything else you deem appropriate. For existing kernel source to which you make modification, use git add, commit, and push as mentioned on the class web site. There must be a Makefile in CSE-506/ directory. Doing a "make" in CSE-506/ should accomplish the following: 1. Compile user space program to produce an executable by the name "xcpenc". This will be used to test your system call. 2. In case you are implementing system call as a loadable kernel module, the "make" command should also produce a sys_xcpenc.ko file which can be insmod into the kernel. (Use gcc -Wall -Werror in makefile commands. We will anyway add them if you don't :-) The CSE-506/ directory should also contain a "kernel.config" file which will be used to bring up your kernel. Note that in case you are implementing system call directly in the kernel code (and not as a loadable kernel module), then just compiling and installing your kernel should activate the system call. Just to give you an idea how we will grade your submission: We will check out your git repository. We will use kernel.config file in CSE-506/ subdirectory to compile and then install your kernel. We will then do a make in CSE-506/ subdirectory. If your implementation is based on a loadable module, we will expect sys_xcpenc.ko to be present in CSE-506/ after doing a make. We will then insmod it and use CSE-506/xcpenc (also generated as part of make) to test your system call on various inputs. Note that insmod step will be skipped in case you implement system call directly into the kernel. PLEASE test your submission before submitting it, by doing a dry run of above steps. DO NOT make the common mistake of writing code until the very last minute, and then trying to figure out how to use GIT and skipping the testing of what you submitted. You will lose valuable points if you do not get to submit on time or if you submission is incomplete!!! Make sure that you follow above submission guidelines strictly. In particular, do a separate git clone of your committed code to ensure that you pushed everything you needed and no more. We *will* deduct points for not following this instructions precisely. * EXTRA CREDIT (OPTIONAL, total 25 points) If you do any of the extra credit work, then your EC code must be wrapped in #ifdef EXTRA_CREDIT // EC code here #else // base assignment code here #endif [A] 4 points. Augment your module to utilize the Initialization Vector (IV) part of the xcpenc. Without having to know much about the IV, it is useful to understand that setting it to a different value each time you encrypt or decrypt a chunk of bytes produces stronger encryption that is harder to break. A common way to set the 8 bytes of the IV is as follows: - first 8 bytes are the index of the page (or page number) that you are encrypting or decrypting (e.g., on an i386 system with a 4096-byte page size, bytes 0-4095 are in page 0, bytes 4096-8191 are in page 1, etc.). - set the remaining 8 bytes to the inode number of the file. Note: Your first IV information (assuming you "chain" them) should be stored in the cipher file preamble. [B] 6 points Support multiple ciphers. You should pass the cipher name as a string using the "-c ARG" option. Change the system call to accept an extra argument at the end called "char *cipher". This variable should be a constant string, null terminated, whose value can be one of: "blowfish" for the Blowfish cipher; "des" for DES; "des3_ede" for Triple DES; etc. The type of cipher must always be specified and must always be a valid cipher that the Linux kernel CryptoAPI understands. All kernel-supported ciphers should be allowed; return EINVAL if the user specifies an invalid cipher name. The cipher name (or ID) should also be stored in the preamble. [C] 5 points. Support multiple encryption unit sizes and key lengths. You will have to augment the system call as needed to pass the new info, and the user-level tool. For example: $ ./xcpenc -u 16000 -l 256 -e infile outfile where -l specifies the key length to 256 bits, and -u specifies that the encryption unit should be in whole chunks of 16000 bytes (instead of the default 4KB). If not specified, -u should default to PAGE_SIZE, and -l to 128 bits. Note that the argument to -l can be any valid key length that the cipher accepts (for example, Blowfish can't use keys smaller than 128 bits); however, the argument to -u can be ANY positive number that the cipher will accept (even odd numbers). [D] 5 points The five students who have the smallest working kernel config files will receive 1-5 points each (smallest config gets 5 points, next smallest gets 4 points, etc.) The kernel must boot and no errors/warnings show up during CentOS7's boot (without modifying the boot sequence scripts). [E] 5 points Extra credit at grader's discretion, up to 5 pts, for any particularly clever solutions/enhancements, or for extra, very nice test scripts. Be sure to document anything extra you did in your README so the graders notice it. If you submit by the original deadline of 3/7/2019, you automatically get 3 more EC points. Good luck. * Copyright Statement (c) 2019 Erez Zadok (c) Stony Brook University DO NOT POST ANY PART OF THIS ASSIGNMENT OR MATERIALS FROM THIS COURSE ONLINE IN ANY PUBLIC FORUM, WEB SITE, BLOG, ETC. DO NOT POST ANY OF YOUR CODE, SOLUTIONS, NOTES, ETC. DOING SO COULD AFFECT YOUR GRADE AND/OR DEGREE EVEN AFTER GRADUATION! * Change History: 1/9/2019: draft 1 1/11/2019: draft 2 (TA review) 1/11/2019: draft 2b (TA review) 1/22/2019: add copyright statement 3/6/2019: EC[E] clarification for submission dates