* Wrapfs code, cont.

In VFS, functions named getXXX and putXXX, are usually ones that increase
and decrease a refcnt on object XXX, respectively.

The purpose for "grabbing" a refcount (meaning increasing it by 1 is to
ensure that no one else working in parallel is trying to decrease the
object's RC by 1, and then try to free it.

Imagine two processes on a multi-core system: each process can execute a
syscall at the same time!  What if one tries to unlink(file), and another
tries to read(file)?  Both will try to access the same dentry/inode.  Both
will try to access concurrently: this is called a "race".  Only one of them
should win this race.

If unlink() wins, the file is unlinked, and the read should get an error
like "ENOENT".  If the read wins, it should be allowed to read the file, and
the unlink might have to wait, or get an error.  But what if, while read()
is reading the file, the unlink() is allowed to proceed: it decrements the
RC by 1, RC is now 0, and then the kernel deallocate the dentry/inode pair?
In that case, the read will be continuing to work on a free()'d object,
leading to kernel corruptions/crashes (a use-after-free bug).

To prevent this, the read() would "get" the dentry/inode, but grabbing a
refcnt on it.  When read() is done, it can "put" the object, thus releasing
the RCs it holds.  This ensures that the dentry/inode objects could NOT be
freed while read() is using them, and unlink is actually allowed to operate.

Reference counting is another form of concurrency management: locks is
another one.

* Other wrapfs methods:

dentry.c only needs to support 2 methods

file.c:

see splice(2) syscall for splice_read/write methods.  splice allows you to
insert data into the middle of the file w/o having to shift the remaining
data or copy the whole file.

see fallocate() for wrapfs_fallocate: fallocate(2) allows you to reserve a
file of size X contiguously on disk, w/o having the f/s alloc each LBA/block
at a time.

readdir() is how you read a directory.  The readdir syscall gets a buffer
(like read(2)), and fills the buffer w/ as many directory entries
<name,inum> as can fit, returning the size of data available.  Then you have
to readdir() again and again until it returns 0 bytes (meaning you've
reached the end of the directory).  Exact behavior as read(2).

ioctl(2) is a generic syscall takes a file descriptor, an int, and a void*
buf.  Designed as a "catch all" syscall, so the kernel can implement any new
functionality, w/o having to add/change syscalls.  Changing the syscall API
is very challenging, requires maintaining backwards compatibility code, can
break a lot of existing code.  Ioctl(2) enables the extensibility of the
syscall API.

The ioctl takes a void* and an int.  The int can be any number (2^32
possibilities, or 4B), say indicating an operation code; and the void* can
be any buffer of any size, containing any info.  When you call an ioctl, the
kernel, based on the int code, will decide how to interpret the void*: that
is, what struct was put in the void*?

Some ioctls are implemented in the VFS itself, but most times the VFS just
passes the ioctl to the f/s whose file descriptor it belongs to.  For
example, ioctls are used by various f/s (e.g., xfs, btrfs) to manage
snapshots (create, delete, revert), etc.

* Tips for developing/debugging kernel code

If your kernel does not crash, you can unmount wrapfs, unmount the lower
f/s, rmmod wrapfs.ko, rebuild the module, then insmod it again, then mount
lower, then mount wrapfs (upper).

You can change wrapfs code, then run "make" and "make modules" at the top of
the linux kernel tree.  It'd recompile all that's needed, but mainly wrapfs.
You can the just

# cd fs/wrapfs
# insmod ./wrapfs.ko

If the kernel crashed, reboot first.  If you get any strange kernel console
message (run "dmesg" tool), best to reboot again.  Why: b/c even if the
kernel did not crash, it could now have corrupt data structures that would
confuse you for many iterations.


How to find out which functions to inspect: start by seeing what the issue
is?  If the issue is, say, with symlinks, then go check ->symlink and
->readlink; if the issue is with ACLs, then go check xattr methods (b/c ACLs
are logically structured above extended attributes).

Also, READ the code for the xfstests regression that you're executing: it'll
tell you what syscalls it's running, and it's documented.

Then, you can go into wrapfs and add some debugging printfs, like UDBG (see
wrapfs.h).  Example

int foo(int x)
{
  int i;

  UDBG;
  if (x > 0) {
      UDBG;
      // do something
  } else {
      UDBG;
      // do something else
  }

  UDBG;
  return i;
}

Adding such simple printk's lets you trace the code simply.  Also, the last
printk to show up before a crash, is likely right before the code that
failed: helps you narrow down where the failure is.

Don't add too many printks via UDBG, b/c it could clutter your code and
result in too many console message.

If you get an "oops" stack trace (run dmesg), you it can help you understand
where the failure happened.  An oops trace includes

1. it starts with a message such as "null pointer deref in XXX"
2. a stack trace of functions in reverse order
3. the actual function name where the failure happened, with a hex offset of
   the compiled binary inside that function.  So you can get a rough idea
   where in the function the failure took place: start? middle? end?

A few other helpers:

1. You can insert a BUG_ON(cond) that will force a kernel crash if "cond" is
true.  This is an assertion.  It forces the kernel to stop.  Inspect the
console messages, then reboot.

2. You can use a similar macro, called WARN_ON(cond), that will produce a
stack trace, but w/o crashing the kernel/module: it continues executing.

3. You can also just call "dump_stack()" anywhere to get a full stack trace.

Key:

1. don't write a lot of code.
2. write few lines, test several times, save your code, reboot.
3. writing lots of kernel code will take exponentially longer to dev/debug.

Take snapshots of your VMs at key stages (e.g., before trying a new kernel,
in case it doesn't work).  Good to take a small number of snapshots you can
revert to.  If you don't have any snapshots, you'll waste a lot of time
reinitializing the VM.  If you take too many snapshots, it'll consume too
much space and your VM will be slow.  A good number is 4-6 snapshots.

GIT: commit your code frequently.  Even if it's broken code.  We only grade
the final code you committed.  If you don't preserve your code in the git
server outside your VM, if your VM crashes, your actual code could be wiped!

Worse, if you revert a snapshot, your git code will also be reverted: in
that case, you should have committed and git-pushed it outside of your VM,
so after a snapshot restore, you need to a "git pull" to get back all your
code.