* inode->i_private and containers

struct inode, dentry, etc. are VFS level structures.  They're designed to
hold common data that every file system needs.

However, every file system needs a place to store its own special data per
inode, per dentry, per file, etc.  This extra data is stored in a per-f/s
structure.  For example, ext3 needs to store pointers to disk blocks (LBA)
in "struct ext4_inode"; NFS needs to store the IP addr and port of the
server to send RPC packets to (struct nfs_inode); encrypting file systems
need to store cipher + key info (struct ecryptfs_inode).  Where to store
that?

File systems may have their own struct XXX_dentry, struct XXX_superblock,
etc. to store extra info for those structures.

Ideally, we want to be able to find struct ext3_inode from a struct inode.
So the two need to be associated somehow.

Method 1: have a pointer from struct inode to struct XXX_inode.

At very end of struct inode, there is a "generic" void pointer field:

	void			*i_private; /* fs or device private pointer */

Such a private pointer, can be used by the f/s to store any extra info, such
as its own struct ext3_inode.

Note: the VFS will not touch anything in inode->i_private, so whomever puts
something there, has to remove it, free up any memory allocated, etc.  The
VFS designers knew that there would be uses of their VFS structures that
they could not foresee, so they added this void* i_private field, as a form
of "extensibility": a way to extend the VFS w/o having to change fields in
major data structures.

This worked, allowing file systems to add stuff to VFS structures w/o
changing the VFS structures.

Disadvantage: the inode struct is large and can't fit inside a single CPU
cache line.  So when we jump to a pointer inside struct inode, most likely
that ptr will be in a different mem location.  So we have to flush the CPU
caches, and reload them with data from a different memory location.  This
would result in a memory-bound access to these VFS structures.  Worse, we
could be bouncing back and forth b/t struct inode and struct ext3_inode,
each time flushing the other.

Method 2: embed all these struct XXX_inode structures inside the inode
struct itself.

struct inode {
  // all the usual fields of the inode struct here
  // at the end, we use a union for all the per-f/s structs
  union {
	struct ext3_inode private_ext3_inode;
	struct xfs_inode private_xfs_inode;
	struct btrfs_inode private_btrfs_inode;
	struct nfs_inode private_nfs_inode;
	struct nfs_XXX private_XXX_inode;
	// and so on for all file systems supported in the kernel
	void *i_private; // for anything else
  } u;
}

With above, a f/s can access its own inode from the VFS inode as:
inode->u.private_ext3_inode.  Note, the union stores the actual structures,
not pointers to them.

In C, a "union" allows you to store multiple structures or data types.  You
can refer to each one by its name, but you can access only one of them at a
time.  The union reserves space that's enough to hold the largest of the
fields inside.

Benefit: the per-f/s structures are all laid out in memory in the SAME
location where the inode is.  So now, we minimize the chances of CPU cache
line flushes, b/c the union fields are more likely to fit together in a CPU
cache line with (some of) the rest of the inode structure.

Disadvantages:

1. Not very extensible, b/c if you want to support a new f/s, and you don't
want to use the slow i_private field, then you have to change struct inode,
recompile the kernel, install, reboot.  Worse, if your f/s isn't part of
mainline, then you have to maintain your own separate kernel tree all the
time.

2. sizeof() the union is the max() of all member fields.  If one member
field was, say, 10 bytes, and another was 50 bytes, then we need to reserve
50 bytes of space.  So I'm wasting precious physical kernel memory, some of
which is never used (e.g., 40 bytes extra space for the member that only
needs 10 bytes).  Ironically, it also increases the total size of struct
inode, and hence somewhat defeats the purpose of reducing CPU cache line
flushing.

Method 3: containers

It used to be the case, that the VFS allocated a struct inode.  But the VFS
doesn't know yet, on behalf of whom it is allocating an inode, so we were
stuck with methods 1+2 above.  The main change came when the responsibility
of allocating an (VFS inode was pushed to the f/s itself, with a superblock
method called SB->alloc_inode (in some kernels it is called an "iget"
method).

See sample code in file 23.c.

Benefits:

1. Containers allocate exactly as many bytes as needed.  So no wasted
extra memory such as with the "union" method.

2. The mem for the VFS and per-fs inode is contiguous, and hence more likely
to fit in CPU caches.  So we minimize memory/CPU flushes.  Better
performance.

Disadvantage: Coding is a bit more complex, but linux offers methods in
header files to make it easy to find the VFS inode from the per-fs inode,
and vice versa; also methods to help in alloc/free these "containers".

Q: Why is the per-fs inode before the VFS inode and not after?

A: It could have been designed to be after, and you'd get most benefits.
But it was decided that the fields in the beginning of struct inode, were
more likely to be accessed when the per-fs inode was used, so by placing the
per-fs inode at the start of the VFS inode, we get a bit better CPU cache
locality.


Q: how does the VFS even know to call a file system to allocate an inode?

A: The VFS will call ->lookup on a f/s, iff the object (dentry/inode) are
not cached.  It is up to that f/s ->lookup method (e.g., ext3_lookup,
nfs_lookup) to (1) find out if the file exists (else return error); and (2)
if the file exists, then the f/s will invoke its OWN ->alloc_inode, to alloc
the container, and return the VFS inode back up to the VFS.

[We'll see that when analyzing wrapfs code]

* struct path

see <linux/path.h>

struct path {
	struct vfsmount *mnt;
	struct dentry *dentry;
} __randomize_layout;

This struct contains a dentry and another "mnt" struct, used for advanced
namespaces in linux.  But path->dentry is the most commonly used part.

* struct file

Same stuff: locks, operations, void * private.

Also contains the open(2) mode, and offset as expected.

Recall a struct file points to a dentry, and dentry points to an inode.
Where's the dentry inside struct file?  It's in

file->f_path.dentry

and the inode is in

file->f_path.dentry->d_inode

struct file contains

	struct inode		*f_inode;	/* cached value */

Q: Why make a copy of a ptr to inode and stuff it inside file struct?

A: by keeping a cached copy of the inode, we can get from file to inode
directly, w/o having to dereference a dentry ptr, which could cause extra
memory accesses and CPU cache flushes.  Note this is a cached copy, namely
we do NOT increase the inode's refcount!

* struct file_operations

As expected, methods to read, write, lseek, mmap, etc. -- the kind of things
you do on an open file descriptor.

* struct super_block

The usual stuff, plus

	struct dentry		*s_root;

That is the root dentry of the file system (ref counted).

* struct super_operations

methods to: alloc/free inode, flush inodes, sync file system, ->statfs (for
serving statfs(2) syscall), mount/unmount hooks.

Also support a freeze/thaw f/s method.  Useful to turn an entire f/s into
readonly mode, so no mutating ops are allowed (e.g., write, mkdir).  You can
then unfreeze/thaw the file system later.  Useful when (1) taking snapshots
of a file system, to suspend all writes to the f/s for a short period of
time; also (2) as an emergency, such as when a disk goes bad (e.g., you get
an EIO error), to prevent further damage, the VFS will freeze the entire f/s
indefinitely (to let users copy as much data as possible before a permanent
failure).

Recall vfs.txt or vfs.rst in the linux Documentation/filesystems folder, has
more info about all methods.