commit 3318442766afdb790212c94ac490e7e27a7d68a8
parent e19a06a8fe24bf4199195180d2a84bc6df04db2d
Author: Andrew Laack <andrew@laack.co>
Date: Sun, 12 Oct 2025 16:20:08 -0500
Took notes on filesystems
Diffstat:
12 files changed, 245 insertions(+), 0 deletions(-)
diff --git a/docs/B-Tree.md b/docs/B-Tree.md
@@ -0,0 +1,52 @@
+# B-Tree
+
+**Definition:** A B-Tree is a self-balancing data structure that facilitates log time search, sequential access, insertion, and deletion. This is a generalization of binary search trees that allows nodes to have >2 children.
+
+A B-tree of order m is defined as follows (according to Knuth):
+
+1. Every node has at most m children
+2. Every node, except for the root and the leaves has at least ceil(m/2) children
+3. The root has at least two children unless it is a leaf
+4. A non-leaf node with k children contains k-1 keys
+
+## Algorithms
+
+### Search
+
+Start at the root node. Each key in the root node is ordered so find min(key) s.t. key >= root. We then go to the child i, corresponding to the index of the min key. If there are no smaller keys, we travel to node i+1. We repeat this until finding the key we are searching for.
+
+### Insertion
+
+Search for the location the key should reside. Insert the key. If this insertion results in |key| > m then split the current node in half with the center-most key being propagated to the parent node.
+
+This process is then applied to the parent if it breaks the node key count invariant until the B-tree is in a valid state.
+
+### Deletion
+
+#### Deletion From A Leaf
+
+1) Find the key in the tree
+2) Remove the key
+
+#### Deletion From an Internal Node
+
+1) Find the key in the tree
+2) Remove the key
+3) If the leaf now has fewer than t-1 keys, rebalance the tree
+
+#### Deletion From an Internal Node
+
+1) Find the key k in the tree
+2) If:
+ - The left child of k has at least t keys:
+ - Find the predecessor of k in that subtree and recursively delete it, replacing k with the predecessor
+ - Else if the right child of k has at least t keys:
+ - Find the successor of k in that subtree and recursively delete it, replacing k with the successor
+ - Else if both children have exactly t-1 keys:
+ - Merge k and both children into one node, then recursively delete k from the merged node
+
+#### Rebalancing (node has < t-1 keys)
+
+- Try to borrow from a sibling
+- Else merge the node with a sibling and pull down the separating key from the parent
+- If the parent now has too few keys, recursively rebalance upward
diff --git a/docs/BTRFS.md b/docs/BTRFS.md
@@ -0,0 +1,18 @@
+# BTRFS (B-Tree File System)
+
+**Source:** Wikipedia :\
+
+**Definition:** BTRFS is a copy-on-write file system and logical volume manager for use on Linux.
+
+## COW B-Tree
+
+The COW [B-Tree](B-Tree.md)
+
+## Features
+
+- Pooling
+- Snapshots
+- Subvolumes
+- Integrity Checks
+- Pooling
+- GPL V3 (Compared with [ZFS's](ZFS.md) CDDL)
diff --git a/docs/C.md b/docs/C.md
@@ -0,0 +1,25 @@
+# C
+
+This index tracks c related concepts.
+
+## Links
+
+### Libraries
+
+- [stdio](stdio.md)
+
+### Other
+
+- [Escape Characters](EscapeCharacters.md)
+
+### Types
+
+NOTE: Type sizes are system and OS dependent. As such, stdint should be used in cases where portability and consistency are required.
+
+- [Double](Double.md)
+- [Float](Float.md)
+- Int
+- Long
+- Long Long
+- Char
+- Long Double
diff --git a/docs/ComputerScience.md b/docs/ComputerScience.md
@@ -17,3 +17,4 @@ This is the index for my Computer Science related notes.
- [Computer Architecture](ComputerArchitecture.md)
- [Linux Stuff](LinuxStuff.md)
- [CPP](CPP.md)
+- [C](C.md)
diff --git a/docs/Double.md b/docs/Double.md
@@ -0,0 +1,7 @@
+# Double (Precision Floating Point)
+
+**Source:** C Programming
+
+**Chapter:** 1.1
+
+**Definition:** A double is a double precision floating point number.
diff --git a/docs/EscapeCharacters.md b/docs/EscapeCharacters.md
@@ -0,0 +1,13 @@
+# Escape Characters
+
+**Source:** C Programming
+
+**Chapter:** 2.3
+
+## Some Escape Characters Supported By C
+
+- \n, new line
+- \t, tab
+- \b, backspace (interesting inclusion)
+- \\, backslash
+- \", quote
diff --git a/docs/Ext4.md b/docs/Ext4.md
@@ -0,0 +1,15 @@
+# Ext4 (Fourth Extended File System)
+
+**Source:** [Arch Wiki](https://wiki.archlinux.org/title/Ext4)
+**Definition:** Ext4 is the current default filesystem for Linux. The filesystem provides good performance with acceptable resilience.
+
+## Features
+
+- Block and filename encryption
+- Journaling
+- Checksums
+- ...
+
+## Use Cases
+
+Ext4 is the best filesystem for most general purpose use cases. It provides solid performance and resilience, is an overall improvement to prior Ext filesystems, and destroys NTFS and other Microsoft filesystems.
diff --git a/docs/Float.md b/docs/Float.md
@@ -0,0 +1,15 @@
+# Float
+
+**Source:** C Programming
+
+**Chapter:** 1.1
+
+**Definition:** A float (floating point number) is a data type that can represent fractional numbers.
+
+## Representation
+
+The standard representation for a f32 is as follows:
+
+- 1 bit - Signedness
+- 8 bits - Exponent
+- 23 bits - Mantissa
diff --git a/docs/LinuxStuff.md b/docs/LinuxStuff.md
@@ -3,6 +3,17 @@
These are links to linux stuff that I want to remember, but sometimes forget. Consider, I am starting this on 24/04/16 so I will not include any basic things as I already know them well.
+# Utilities
- [rsync](rsync.md)
- [sed](sed.md)
+
+# File Systems
+
+- [BTRFS](BTRFS.md)
+- [ZFS](ZFS.md)
+- [Ext4](Ext4.md)
+
+# Storage Concepts
+
+- [RAID](RAID.md)
diff --git a/docs/RAID.md b/docs/RAID.md
@@ -0,0 +1,47 @@
+# RAID
+
+**Source:** Wikipedia :\
+
+**Definition:** Redundant Array of Independent Disks (RAID) configurations are storage device configurations that implement at least one of the following: striping, mirroring, parity.
+
+NOTE: There are additional RAID levels, but they are largely unused.
+
+## RAID 0
+
+- [x] Striping
+- [ ] Mirroring
+- [ ] Parity
+
+RAID 0 is quite dangerous. While it benefits from striping and has no overhead cost for mirroring or parity, the loss of one drive is capable of destroying all data given that data is striped across all disks to improve IO speed.
+
+## RAID 1
+
+- [ ] Striping
+- [x] Mirroring
+- [ ] Parity
+
+RAID 1 pairs drives for mirroring. This results in decent read performance and reliability, but poor write performance and storage capacity.
+
+## RAID 5
+
+- [x] Striping
+- [ ] Mirroring
+- [x] Parity
+
+RAID 5 uses striping with distributed parity. This requires all but one drives to be able to operate. Additionally, RAID 5 requires at least three disks. This is a popular approach with three disks.
+
+## RAID 6
+
+- [x] Striping
+- [ ] Mirroring
+- [x] Parity
+
+RAID 6 is basically RAID 5 except it has an additional parity block. This requires four disks, but can remain functional even if two disks fail.
+
+## RAID 10
+
+- [x] Striping
+- [x] Mirroring
+- [ ] Parity
+
+RAID 10 (also referred to as RAID 1+0) combines mirroring and striping for reliability and performance. RAID 10 requires at least four disks and works by striping across pairs. This results in a 50% decrease in storage capacity w.r.t. RAID 0, but can survive one disk failure from each pair before any data is lost. Once data is lost, it is likely all data will be lost. While the worst case for this means it can only survive one failure, this is often not the case.
diff --git a/docs/ZFS.md b/docs/ZFS.md
@@ -0,0 +1,28 @@
+# ZFS (Zettabyte File System)
+
+**Source:** [Arch Wiki](https://wiki.archlinux.org/title/ZFS)
+
+**Definition:** ZFS is a filesystem developed by Sun Microsystems (acquired by Oracle), released under the CDDL. ZFS is a COW filesystem that provides exceptional stability and snapshotting.
+
+## Features
+
+- Checksumming of data and metadata
+- COW Snapshotting
+ - Including the sending/receiving of snapshots (efficiently with streams)
+- Self-healing (often silent)
+- Extremely Large Storage Capacity
+ - Supports up to 16 EB file sizes and 256 quadrillion ZB of storage with no file count limit
+- Pooling
+- Data Deduplication
+ - This can be useful, but it is costly in terms of RAM usage (1-5GB of RAM/TB of storage)
+- Data compression
+- Encryption
+
+## Anti-Features
+
+- Released under the CDDL
+ - Since the CDDL is not compatible with the GPL V3 ZFS can not be shipped as part of the Linux kernel. This results in difficulties with OpenZFS (a fork of ZFS which has continued development in private under the name Oracle ZFS) remaining up to date with the most recent kernel and installation complications (nothing insurmountable, just tedious). This makes it unsuitable for most Linux users. Despite this, it works quite well on FreeBSD.
+
+## Who Should Use This?
+
+Anyone who is looking for a long term storage solution that is not bothered by the CDDL licensing complications. ZFS is likely the most resilient filesystem right now with a multitude of configuration options and exceptional stability.
diff --git a/docs/stdio.md b/docs/stdio.md
@@ -0,0 +1,13 @@
+# stdio
+
+**Source:** C Programming Book
+
+**Chapter:** 7
+
+## Functionality
+
+The C stdio library provides functionality for file IO and writing to stdout.
+
+To me, the only two functions that I did not expect to be in this library but were are `remove()` which deletes a file and `rename()` which renames a file.
+
+I suppose deletion is a standard IO operation, and renaming is also because creation and deletion of files to emulate this is wasteful.