* VFS Inode ops cont.

	int (*link) (struct dentry *,struct inode *,struct dentry *);

hardlinking a file: the first dentry is the "source" name that you want to
link to (and must exist).

The 2nd arg (inode), is the destination directory into which you want to
create a second name (alias) for the source name.

the 3rd arg (dentry), is a negative dentry of the new name you want to
create.

If ->link is successful, then the 2nd dentry will become positive, pointing
to the first dentry's inode.  That inode's refcnt will increase by 1.

	int (*rename) (struct user_namespace *,
			struct inode *, // parent dir inode of "src"
			struct dentry *, // the name of "src" object to rename
			struct inode *, // parent dir inode of "dst"
			struct dentry *, //  the name of the "dst" object to rename
			unsigned int); // special flags like RENAME_EXCHANGE

rename(2) in userland is used as rename(src,dst).

Renaming is hard to accomplish b/c you have to (1) remove an older name from
some dir, and (2) add it to another dir.  The kernel has to accomplish the
add+remove of names ATOMICALLY.  In linux kernel there are complicated
locking mechanisms to support renaming (TBD).

->rename takes flags (see fs.h).  One flag is RENAME_EXCHANGE, which allows
  a file system that natively supports it, to SWAP two names (atomically).
  These flags are not used with regular rename(2) syscall but with a new
  syscall rename2(2).

How to swap two names (a.txt and b.txt) in userland:
$ mv b.txt a.txt # success: a.txt will be replaced with b.txt!
$ mv a.txt b.txt # error, get an ENOENT

Must use a 3rd temp variable

$ mv a.txt tmp.txt
$ mv b.txt a.txt
$ mv tmp.txt a.txt

Thus, a RENAME_EXCHANGE has to perform THREE ops atomically.

* Lookup & permission

Most ->inode ops return an int, that translates directly to an errno code
(ENOENT, EPERM, etc.)

One big exception is ->lookup() which returns a dentry.

	struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);

Lookup takes an inode for a directory that exists (1st arg), and a name for
something to lookup (in the 2nd arg, dentry).  The dentry passed as 2nd arg,
is going to be a negative dentry.

Then lookup goes into the f/s code, and searches for the object in question.
Only the actual f/s can know HOW to search for named objects in its own
formatted data (e.g., ext4 on hard disks, NFS on network servers, etc.).

If successful, ->lookup returns a POSITIVE dentry (and it changes the 2nd
arg passed from negative to positive).

If ->lookup failed, it returns a NEGATIVE dentry.  That dentry will be
cached by the VFS in the dcache.  Subsequent attempts to lookup that name
will be captured by the VFS (b/c a cached entry exists), and the VFS will
return ENOENT to the caller (syscall).

	int (*permission) (struct user_namespace *, struct inode *, int);

The ->permission method takes an inode for an existing object (dir, file,
etc.), and a 3rd arg of "flags" for what permission you're looking for.
Flags can be READ, WRITE, RDWR, EXECUTE, etc.  Flags are bitmapped, so they
can be logically OR'd together.

The ->permission method will be implemented by the f/s to check if the
current running task (process) has permission to access the inode in
question with the permissions asked for in the flags.

Recall in the kernel there are many "struct task" objects, one per running
process.  Each task also knows who started it (the user or uid), and thus
what permissions each task can access.  At any point in time, you can check
a global variable called "struct task *current" that refers to the currently
running task on the CPU/core in question.

->permission would check the currently running process/task against the
permissions and ownership of the file/inode in question.  If they match
(user, group, other -- like chmod), then permission is granted and
->permission will return a 0 (success); else ->permission will return an
error (e.g., EPERM, EACCES).

* Pathname resolution in the kernel

Consider a user issuing the command

$ rm /home/jdoe/src/project/foo.c

That would translate into the unlink(2) syscall as

unlink("/home/jdoe/src/project/foo.c");

In the kernel, we will invoke the syscall entry point,
sys_unlink("/home/jdoe/src/project/foo.c"), and now the VFS takes over to
begin issuing a sequence of methods to perform this unlink.


->unlink(inode, dentry)
	inode: the inode for the parent directory named "project".  This
	means that there has to be a positive dentry with name "project".
	dentry: "foo.c"

But how did the dentry+inode with name "project" came to exist in memory?
A: we looked it up!

Q: where did we lookup the "project" dentry?
A: In its parent ("src")

Q: and where did we look that one in?
A: again, its parent, etc.

Where do I start the lookup?  A: I lookup "home" in a dentry for "/", the
latter is called the "root dentry" or "root inode".  Can I look up "/"?  A:
no, can't look it up.  The root dentry of every file system is created at
the time you mount the file system.  That root dentry is allocated and
filled in (not via lookup, but "manually") by the file system mount code (a
superblock ->mount method).  The root dentry for every f/s is stored inside
"struct superblock".  There is a single struct superblock for every mounted
f/s.

Pathname resolution procedure:

->lookup(inode for "project", dentry called "foo.c")
	if lookup fails, return error (e.g., ENOENT) right here.
	if lookup succeeded, continue
->permission(inode for "project", permission to modify directory (write to it)
	if no perm, return error, else cont.
->permission(inode for "foo.c", permission to remove it)
	if no perm, return error, else cont.
	else, we finally go on to call ->unlink
->unlink(inode for "project", dentry called "foo.c")
	if succeeded, returns 0 (which returns from syscall)
	else, return error

Simplified:

1. Take a pathname like "/home/jdoe/src/project/foo.c"

2. break it up on a delimiter ('/' in unix, '\' in windows)

3. start to lookup each component in its parent dire
	e.g, lookup "home" in /, etc.

3a if the patname starts with a "/", (an "absolute" pathname) then begin
   looking up in the "root dentry" that's stored in the SB.

3b if the pathname does NOT start with a "/", we call it a "relative"
   pathname.  Then, start the lookup from the "current working directory",
   which is stored in struct task, specifically in current->cwd (a field
   inside struct task that names the dentry of the current working dir
   (CWD).

Note: chdir(2) or cd(1), changes the dentry of current->cwd.  And the refcnt
of the cwd dentry has to be +1.

In sum, we perform pairs of lookup+permission, returning errors as soon as
they're discovered, else continue, until we reach the final method.

4. For actual lookup, first check the dcache: if found cached entry, return
it or use it in next lookup/permission pair.

Call f/s ->lookup only if entry isn't found in dcache.  Once f/s returns a
dentry from ->lookup, then cache it in dcache for next time.

5. VFS knows what to expect.  Each component of a pathname must be a
directory.  The last "leaf" component (foo.c) can be any type of object.

So when ->lookup returns successfully on an intermediate component (e.g.,
"src"), the VFS has to check what type it is:

if it is of type DIR, good, continue

if it is of type FILE, return error (ENOTDIR)
- same error if the type happens to be a block/char device

if the type of object found is a SYMLINK, we now have to resolve the
symlink:

We now invoke a "recursive" procedure, to

(a) issue ->readlink to retrieve the "content" of the symlinks

(b) now treat the returned content string as if it replaced the "src" with
the content of the symlink.  NOW we actually begin to resolve the symlink as
a pathname: the usual stuff as above (parse the pathname on a '/', do
another lookup, then permission, cache lookup, etc.)

If while resolving this symlink's content, we find another symlink, then we
invoke another "recursive" procedure to ->readlink, then lookup+permission.

Note: when starting a "new" lookup routine, I may come across symlinks
multiple times.  Each time results in another "recursive" call.  If the
no. of symlinks I've come across during this one pathname lookup exceeds a
threshold, then we abort the entire lookup and return ELOOP.

6. If while looking up, we find a new mount point, then we have to traverse
into that mount point's superblock root directory (sb->d_root).

7. after each lookup, check ->permission

Overall: This lookup is quite complex and why the kernel spends a lot of
time, effort, and code.  Traditionally called the namei() routine, sometimes
called lookup_pn() or path_walk().


Q: what about '~' in userland.

A: '~' is replaced by the shell (bash, zsh, etc.) with the contents of a
user-level variable called $HOME.  $HOME is usually set to one's home, such
as "/homes/jdoc".  In short, the kernel doesn't see a '~' but an actual
pathname.