* Receiving packets from the network Each NIC has its own hardware address (Ethernet or MAC addr) that uniquely identifies THIS host interface. Ethernet addrs are 48-bit long, and the first few bits identify the vendor (e.g., intel, broadcom, apple, etc.). $ ifconfig en2 en2: flags=8863 mtu 1500 options=6463 ether f8:ff:c2:03:bb:1f inet 192.168.1.103 netmask 0xffffff00 broadcast 192.168.1.255 nd6 options=201 status: active A NIC first has to "listen" to packets on the wire. NICs continuously listed to packets on the wire. But the NIC would only take action IFF the bits its sees on the wire are prefixed with this NIC's MAC address (that's why the MAC addr often shows up at the head of any packet). Example, Ethernet MAC addr is 48 bits. If NIC sees a packet whose MAC address does NOT match its own MAC addr, just ignore. If the NIC sees a packet whose MAC addr matches, then it has to receive these bits and place them into its own memory! ... meaning the NIC has to ensure it has enough room for the new packet. 1. If we don't have enough room in the NIC's RAM? "drop the packet": simply don't receive the packet into the NIC's RAM (so you don't "drop" anything so much as "never receive" or "ignore" it in the first place). What happens if we ignore/drop the packet? Eventually, the sender (depending on protocol) will wait some time for an acknowledgment (ACK), but since this NIC never took the packet, clearly it can't ACK it. Not getting an ACK is a signal to the sender, to wait, timeout, and resend the packet. Hopefully next time the packet is resent, this NIC will be able to receive it. Some protocols like UDP have not transmission control, they are called "unreliable": packets can be lost, reordered, duplicated, etc. So if you use something like UDP, you'd and you want guaranteed delivery, you'll need to add some "reliability" control on top. TCP, however, has guaranteed delivery, ordered delivery, and no duplicates. Let's say the network is busy, with lots of computers sending packets. This would lead eventually to busy NICs that can't receive a packet, then sender timeouts and retransmissions. What if every sender were to resend a "lost" packet as soon as the previous one timed out?! Not a good policy, b/c it's possible that everyone will timeout around the same time, then "flood" the network with retransmissions again, which'd overload the network again, resulting in more timeouts -- a never-ending cycle of flooding. To prevent this form of "synchronized flooding", protocols like TCP would use an "exponential backoff" timeout: if your timeout for packet 1 one T, then next time you wait 2T, next timeout you wait 4T, then 8T, etc. Exponential backup is a form of (self) "throttling of heavy writers". In practice, we don't backoff exactly 2x the previous timeout value, to avoid this "synchronized flooding" situation. Instead, you backoff an exponential RANDOM value that's between 2T and 4T. TCP also recognizes when the network is no longer congested, and starts to reduce the timeout value (called a "congestion window or cwnd"). So it increases its transmission rate, not too quickly, so as not to overload the network right away (esp. if multiple senders do the same). Instead, TCP uses a "slow start" algorithm that increases its transmission rates LINEARLY. This behavior (exponential backoff vs. slow start) is observed by users: any small network hiccup can result in TCP backing off so much that apps are effectively "off the net" and it takes a long time (minutes or longer) to recover slowly. Another possible policy: when NIC doesn't have room to receive the packet, maybe it can DISCARD a packet it already has in its own memory, and make room for this new packet that's seen on the wire? Generally, this doesn't change much: b/c you're still dropping/ignoring one pkt for another. But, if you have any sort of packet or "flow" prioritization, you may permit discarding lower-priority packets for higher priority ones. Or, if the packet you want to receive is TCP, you may discard a UDP packet in your NIC (b/c UDP doesn't provide any guarantees). Either way, you have to think carefully about fairness (e.g., to avoid starving lower priority transmissions indefinitely). 2. If we have enough room in the NIC's RAM, we can receive the packet - copy the packet into the NIC's RAM (e.g., 1500B or however long the packet is -- packets usually encode their own length). Knowing the length allows the NIC to reserve enough room. - next: NIC has to get the packet to the OS itself. If the NIC can't get the packet to the OS quickly enough, then it'll find itself with a full memory (and we're at condition #1 above -- NIC RAM is full). - Thus, the NIC should pass on the packet to the OS ASAP. Once the NIC has a guarantee that the packet was received by the OS, the NIC can free up its memory for this packet, and can receive other packets. - How does a NIC tell the OS "I have a packet for you?" It raises a hardware interrupt assigned to this particular NIC (e.g., PCI card). Interrupt numbers are usually configured by the BIOS during h/w bootup. - when an interrupt is raised, the following happens (a) whatever the OS was doing, it saves its CPU state in memory (b) then the CPU executes an "interrupt handler" from a preconfigured list of ihandlers, associated with this specific NIC (or service, like "networking interrupts"). (c) interrupt handlers have to run fast! While they run, most/all other interrupts are masked off (blocked). There are some platforms that permit "nested interrupts" but it's a lot more complex to support. And there are some exceptions like a high-priority interrupt can interrupt another ihandler running (e.g., the "clock" interrupt is a high priority one, to ensure we have an accurate clock). In effect, when one ihandler runs, other interrupts are blocked, results in "throttling" all others who may interrupt the CPU. So an ihandler has to run fast, so we can resume receiving other interrupts. (d) a network ihandler runs: it has to take the packet off of the NIC and place it into some location in kernel memory. We want to put it into a queue of "just received" packets, for later processing. The NIC packet has to be stored into an skb. So the ihandler has to get/alloc a free SKB to put the packet into. Recall that mem allocations can block, and we can't afford to block inside an ihandler. The fastest way is if SKB Services would "preallocate" and always keep some free skbs ready to be used by ihandlers. Once the ihandler has access to an skb, it can then copy the bytes from the NIC to the SKB (using machine instrux), or more likely -- setup Direct Memory Access (DMA) processing to copy the NIC bytes to the newly assigned SKB. (e) Once the bytes are copied, there's a reverse signaling mechanism (e.g., DMA) to tell the NIC "we're done": at that point, the NIC can free the memory for the packet. (f) One policy is for the ihandler to do as much processing of the packet as it can, so it can be given to the user process waiting for it (e.g., a __user buffer in a read(2) syscall). But this isn't a good policy, b/c it makes the ihandler run longer (while other interrupts are blocked). A better policy is just to place the packet (and skb) into a queue for "further processing" later on. (Next time we'll discuss this and network queue processing using softirqs) * Transmitting packets (from OS to the NIC) Packet information flows from apps in userland, to syscalls, to many layers inside the OS. Eventually, a packet to be transmitted is on some queue of, say, Ethernet frames at the lowest abstraction layer of the networking layers in Linux (e.g., device drivers). So these SKBs "just" need to be given to the NIC. How does the CPU transmit (xmit) some data from RAM to the NIC itself? Again, we can DMA processing or special "IO" instructions designed to copy bytes over to a specific device address. Sometimes, devices like NICs will map their own phys mem into the kernel addr space, so the kernel can "write" to the NIC's memory. Can the NIC even take the packet from the OS?! Does it have room in RAM? What if the NIC is full? How can it tell the OS "don't send me any more stuff now". That is, how does the NIC tell the main OS to "throttle back"? Every device has a special physical wire that connects it to the main CPU or the PCI bus where the CPU can query the state of that "LINK" line. LINK_STATE_XON means "transmission to device is allowed"; LINK_STATE_XOFF means "transmission to device is DIS-allowed". The CPU checks LINK_STATE: if it is "OFF", the CPU will not transmit anything to the NIC and just wait some time, and retry later. This effectively causes the main OS to "throttle back". Once a NIC has taken a packet from the OS, it'll tell the OS "I have it", and the NIC will try to xmit the packet on the wire. Again, the NIC wants to "get rid" of the packet quickly, so it'll look for a "quiet time" on the wire then place the packet on the wire. NICs "sample" the wire using various techniques (e.g., CDMA, etc.). Once the NIC transmits the packet, it can remove it from its own RAM to make room for more packets to receive from the network (or packets from the OS to transmit). For UDP, once an OS has given a pkt to the NIC, the OS can discard the packet and make room in main OS memory. For TCP, the OS will have to wait for an ACK received, before the OS can discard the packet.