Took notes on information retrieval and software testing - notes - Unnamed repository; edit this file 'description' to name the repository.

commit ec879b2e4dd9dc35ee269af3c44e1cd0f9495296
parent 38b37b107cba99500ea3857613feb424ba17a0a8
Author: Andrew Laack <andrew@laack.co>
Date:   Thu,  1 Jan 2026 04:03:17 -0600

Took notes on information retrieval and software testing

Diffstat:
A docs/AStar.md  | 18 ++++++++++++++++++
M docs/CodeVerification.md  | 5 +++++
M docs/ComputerScience.md  | 1 +
A docs/CosineSimilarity.md  | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
M docs/DirectedFuzzer.md  | 39 +++++++++++++++++++++++++++++++++++++++
M docs/DiscreteMath.md  | 98 ++++++++++++++++++++++++++++++++++++++++----------------------------------------
M docs/FreeSoftware.md  | 8 ++++----
A docs/FuzzingInformationTheoreticPerspective.md  | 8 ++++++++
A docs/InformationRetrieval.md  | 8 ++++++++
A docs/JANUS.md  | 27 +++++++++++++++++++++++++++
A docs/NDCG.md  | 27 +++++++++++++++++++++++++++
M docs/ProbabilisticRobotics.md  | 36 ++++++++++++++++++++++++++++++++++++
A docs/PromptFuzz.md  | 10 ++++++++++
A docs/PropertyBasedTesting.md  | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A docs/TF-IDF.md  | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M docs/WhereToFuzz.md  | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-

16 files changed, 505 insertions(+), 54 deletions(-)
diff --git a/docs/AStar.md b/docs/AStar.md
@@ -0,0 +1,18 @@
+# A*
+
+**Source:** Probabilistic Robotics Video 253
+
+**Definition:** Uses a **heuristic function** (H) that gives a value, used to improve search efficiency.
+
+---
+
+## Steps
+
+Assume we are in an unweighted grid world.
+
+- Define heuristic function
+    - $H(x,y) = \sqrt{(x - x_t)^2 - (y - y_t)^2}$ where $(x_t,y_t)$ are the coordinates of the target
+    - This defines the heuristic function to be the distance from the target.
+    - This heuristic is basically an optimal grid assuming no obstacles
+- Search the minimal reachable state where our cost function is:
+    - $f = g + h(x,y)$ where the g value is the minimum # of steps to get to the state
diff --git a/docs/CodeVerification.md b/docs/CodeVerification.md
@@ -8,4 +8,9 @@ Notes related to correctness verification of code
 - [Cyclomatic Complexity](CyclomaticComplexity.md)
 - [Orion](Orion.md)
 - [Where To Fuzz.md](WhereToFuzz.md)
+- [PromptFuzz](PromptFuzz.md)
+- [NDCG](NDCG.md)
 - [Directed Fuzzer](DirectedFuzzer.md)
+- [PropertyBasedTesting](PropertyBasedTesting.md)
+- [JANUS](JANUS.md)
+- [Fuzzing Information Theoretic Perspective](FuzzingInformationTheoreticPerspective.md)
diff --git a/docs/ComputerScience.md b/docs/ComputerScience.md
@@ -9,6 +9,7 @@ This is the index for my Computer Science related notes.
 - [Math 310](Math310.md) 
 - [Computer Security](ComputerSecurity.md) 
 - [Probabilistic Robotics](ProbabilisticRobotics.md)
+- [Information Retrieval](InformationRetrieval.md)
 
 ## Personal Interest
 
diff --git a/docs/CosineSimilarity.md b/docs/CosineSimilarity.md
@@ -0,0 +1,50 @@
+# Cosine Similarity
+
+**Source:** [Wikipedia](https://en.wikipedia.org/wiki/Cosine_similarity.md)
+
+## Definition
+
+Cosine similarity is the cosine of the angle created by two vectors.
+
+The cosine similarity can also be thought of as the dot product of the vectors divided by the product of their lengths. This idea gives way to the closed form solution shown below.
+
+cosine similarity $= \frac{A \cdot B}{||A|| ||B||}$
+
+The more commonly stated formula uses summations, but it conveys the same information.
+
+## Meaning + Example Usage
+
+The cosine similarity describes how similar vectors are without consideration for their magnitude. 
+
+This can be applied in an information retrieval context to compare two documents. Each unique term in the documents is an axis of the vector space where the number of usages of the term within a given document is a component of the document's vector.
+
+## Implementation
+
+```python3
+import math
+
+def magnitude(v):
+    sq = 0
+    for i in range(len(v)):
+        sq += v[i] ** 2
+    return math.sqrt(sq)
+
+def dp(A,B):
+    result = 0
+    for i in range(len(A)):
+        result += A[i] * B[i]
+    return result
+
+def cosine_similarity(A,B):
+    
+    dp_AB = dp(A,B)
+    a_l = magnitude(A)
+    b_l = magnitude(B)
+    return dp_AB / (a_l * b_l)
+
+
+if __name__ == "__main__":
+    A = [0, 4873, 823]
+    B = [0, 487, 48988]
+    print(cosine_similarity(A,B))
+```
diff --git a/docs/DirectedFuzzer.md b/docs/DirectedFuzzer.md
@@ -9,3 +9,42 @@
 One example is regression greybox fuzzing which focuses on recently changed code. The idea behind this is that recently changed code is more likely to impose regressions than existing code which has been, presumably, more thoroughly tested. 
 
 Another example are the metrics used in [Orion](Orion.md) for calculating interfaces of interest to write fuzzing harnesses for. Their approach combines deterministic values like cyclomatic complexity and call graph size with LLM derived values coming from the usage of techniques commonly associated with vulnerabilities (eg. pointer arithmetic and the likes).
+
+## Categorization
+
+A simple approach to the categorization of existing directed fuzzers can be done on the basis of
+
+1. The source of the information
+    - Source code
+        - like code complexity
+    - Binary code
+        - like sanitizer instrumentation
+    - External information
+        - like recently changed code or CVE site information used to find code patterns
+2. Discrete or continuous scoring
+    - Scoring type
+3. Granularity
+    - Block
+    - Function
+        - This seems to be the most common approach for the evaluation of selection methods
+    - Source code line
+4. Scoring mechanism
+    - Two broad categories
+        1. metrics
+            - use code metrics to score individual locations
+            - examples include:
+                - TortoiseFuzz & CollAFL
+                    - Focus on memory access count in selected method
+                - Leopard
+                    - Uses structural complexity and vulnerability metrics which are properties of a code region like pointer arithmetic and nested control structures.
+        2. patterns
+            - uses heuristics
+            - examples include:
+                - ODDFuzz
+                    - selects deserialization methods in Java as targets
+                - StrawFuzzer
+                    - Data storing instructions as targets in an attempt to cause OS crashes
+                - AmpFuzz
+                    - Targets networking related functions for amplification attack searching
+                - ParmeSan, SAVIOR, and FishFuzz
+                    - Uses sanitizer instrumentation as a heuristic for relevant code
diff --git a/docs/DiscreteMath.md b/docs/DiscreteMath.md
@@ -6,79 +6,79 @@ Discrete math related links.
 
 Unit 1.1 (logic)
 
-- [Proposition.md](Proposition.md)
-- [Negation.md](Negation.md)
-- [Connectives.md](Connectives.md)
-- [Converse.md](Converse.md)
-- [Inverse.md](Inverse.md)
-- [Contrapositive.md](Contrapositive.md)
-- [Biconditional.md](Biconditional.md)
+- [Proposition](Proposition.md)
+- [Negation](Negation.md)
+- [Connectives](Connectives.md)
+- [Converse](Converse.md)
+- [Inverse](Inverse.md)
+- [Contrapositive](Contrapositive.md)
+- [Biconditional](Biconditional.md)
 
 Unit 1.2 (logic)
 
-- [Proposition.md](Proposition.md)
-- [Connectives.md](Connectives.md)
+- [Proposition](Proposition.md)
+- [Connectives](Connectives.md)
 
 Unit 1.3 (logic)
 
-- [Tautology.md](Tautology.md)
-- [Contradiction.md](Contradiction.md)
-- [Contingency.md](Contingency.md)
-- [DemorgansLaw.md](DemorgansLaw.md)
-- [ConditionalDisjunction.md](ConditionalDisjunction.md)
-- [DistributiveLaw.md](DistributiveLaw.md)
-- [WellDefined.md](WellDefined.md)
-- [Commutative.md](Commutative.md)
-- [Satisfiable.md](Satisfiable.md)
+- [Tautology](Tautology.md)
+- [Contradiction](Contradiction.md)
+- [Contingency](Contingency.md)
+- [DemorgansLaw](DemorgansLaw.md)
+- [ConditionalDisjunction](ConditionalDisjunction.md)
+- [DistributiveLaw](DistributiveLaw.md)
+- [WellDefined](WellDefined.md)
+- [Commutative](Commutative.md)
+- [Satisfiable](Satisfiable.md)
 
 Unit 1.4 (proof)
 
-- [Predicate.md](Predicate.md)
-- [PropositionalFunction.md](PropositionalFunction.md)
-- [Quantifiers.md](Quantifiers.md)
-- [Universe.md](Universe.md)
-- [Preconditions.md](Preconditions.md)
-- [Postcondition.md](Postcondition.md)
+- [Predicate](Predicate.md)
+- [PropositionalFunction](PropositionalFunction.md)
+- [Quantifiers](Quantifiers.md)
+- [Universe](Universe.md)
+- [Preconditions](Preconditions.md)
+- [Postcondition](Postcondition.md)
 
 Unit 1.5 (proof)
 
-- [NestedQuantifier.md](NestedQuantifier.md)
+- [NestedQuantifier](NestedQuantifier.md)
 
 Unit 1.6 (proof)
 
-- [LawOfDetachment.md](LawOfDetachment.md)
+- [LawOfDetachment](LawOfDetachment.md)
 
 Unit 1.7 (proof)
 
-- [DirectProof.md](DirectProof.md)
-- [Contrapositive.md](Contrapositive.md)
-- [Contradiction.md](Contradiction.md)
-- [Cases.md](Cases.md)
-- [VacuousProof.md](VacuousProof.md)
-- [ExhaustiveProof.md](ExhaustiveProof.md)
+- [DirectProof](DirectProof.md)
+- [Contrapositive](Contrapositive.md)
+- [Contradiction](Contradiction.md)
+- [Cases](Cases.md)
+- [VacuousProof](VacuousProof.md)
+- [ExhaustiveProof](ExhaustiveProof.md)
 
 Unit 2.1 (sets)
 
-- [Set.md](Set.md)
-- [Subset.md](Subset.md)
-- [PowerSet.md](PowerSet.md)
-- [CartesianProduct.md](CartesianProduct.md)
-- [TruthSet.md](TruthSet.md)
-- [Complement.md](Complement.md)
-- [Multiset.md](Multiset.md)
+- [Set](Set.md)
+- [Subset](Subset.md)
+- [PowerSet](PowerSet.md)
+- [CartesianProduct](CartesianProduct.md)
+- [TruthSet](TruthSet.md)
+- [Complement](Complement.md)
+- [Multiset](Multiset.md)
 
 Unit 2.3 (functions)
 
-- [Range.md](Range.md)
-- [Image.md](Image.md)
-- [Preimage.md](Preimage.md)
-- [Codomain.md](Codomain.md)
-- [Injective.md](Injective.md)
-- [Surjective.md](Surjective.md)
-- [InverseFunction.md](InverseFunction.md)
-- [Floor.md](Floor.md)
-- [Ceiling.md](Ceiling.md)
-- [Bijective.md](Bijective.md)
+- [Range](Range.md)
+- [Image](Image.md)
+- [Preimage](Preimage.md)
+- [Codomain](Codomain.md)
+- [Injective](Injective.md)
+- [Surjective](Surjective.md)
+- [InverseFunction](InverseFunction.md)
+- [Floor](Floor.md)
+- [Ceiling](Ceiling.md)
+- [Bijective](Bijective.md)
 
 Unit 2.4 (sequence + other stuff)
 
diff --git a/docs/FreeSoftware.md b/docs/FreeSoftware.md
@@ -2,10 +2,10 @@
 
 ## Bad Software
 
-- [GIF](GIF.md)
-- [TIFF](TIFF.md)
-- [JPG](JPG.md)
-- [LZW Compression](LZWCompression.md)
+- GIF
+- TIFF
+- JPG
+- LZW Compression
 
 ## Good Software
 
diff --git a/docs/FuzzingInformationTheoreticPerspective.md b/docs/FuzzingInformationTheoreticPerspective.md
@@ -0,0 +1,8 @@
+# Boosting Fuzzer Efficiency: An Information Theoretic Perspective
+
+## Innovations
+
+- Idea of viewing fuzzing from an info. theory perspective
+    - not that novel
+- Entropic
+    - Power based scheduler that assigns energy (weight) to seeds that yield more information
diff --git a/docs/InformationRetrieval.md b/docs/InformationRetrieval.md
@@ -0,0 +1,8 @@
+# Information Retrieval
+
+## Links
+
+- [TF-IDF](TF-IDF.md)
+- [Cosine Similarity](CosineSimilarity.md)
+- Stemming
+- BM25
diff --git a/docs/JANUS.md b/docs/JANUS.md
@@ -0,0 +1,27 @@
+# JANUS
+
+**Source:** Fuzzing File Systems via Two-Dimensional Input Space Exploration
+
+The two dimensions are data (images) and actions (syscalls)
+
+## Steps
+
+- Load fresh OS
+    - Linux Kernel Library in user space
+- Mutate metadata
+    - not blocks as they are too big / result in slowdowns
+- Then perform image-directed syscall fuzzing
+    - store generated sys calls and deduce runtime status of each file object on the image after sys call completion
+- Use speculated status as feedback to generate new syscalls
+
+They try to improve existing FS fuzzers by
+
+1. Not fuzzing large blob images
+    - that's slow
+2. Exploding the relationship between the FS and file operations (syscalls)
+3. Not using an aging OS
+    - basically, they improve repro by restarting the OS because Linux Kernel Library in user space is fast
+
+## Useful Ideas
+
+- They found simpler representations than the blob data to describe state
diff --git a/docs/NDCG.md b/docs/NDCG.md
@@ -0,0 +1,27 @@
+# Normalized Discounted Cumulative Gain (NDCG)
+
+**Source:** Where to fuzz paper
+
+**Definition:** Normalized discounted cumulative gain is a metric used for the evaluation of information retrieval systems. 
+
+## Cumulative Gain
+
+$CG_p = \Sigma_{i=1}^{p} \text{rel}_i$
+
+This metric is a simpler variant of DCG which doesn't take into account rank (position / ordering).
+
+## Discounted Cumulative Gain
+
+$DCG_p = \Sigma_{i=1}^{p} \frac{\text{rel}_i}{\text{log}_2(i+1)} + \Sigma_{i=2}^{p} \frac{\text{rel}_i}{\text{log}_2(i+1)}$
+
+The idea with DCG is that the appearance of relevant items lower in results should be pentalized. 
+
+## Normalized Discounted Cumulative Gain
+
+$nDCG_p = \frac{DCG_p}{IDCG_p}$
+
+Where $IDCG_p$ is the ideal discounted cumulative gain.
+
+Basically, the NDCG is the DCG of the ordering divided by the optimal document ordering wrt DCG.
+
+The value of NDCG is its comparative abilities across evaluations as results are between 0 and 1.
diff --git a/docs/ProbabilisticRobotics.md b/docs/ProbabilisticRobotics.md
@@ -14,3 +14,39 @@
 - [Bicycle Motion](BicycleMotion.md)
 
 ### Second Semester
+
+- Videos 233 - 268 
+    - Search / Motion Planning
+        - Shortest Path
+            - BFS
+            - [A\*](AStar.md) - uses heuristic function
+        - Dynamic Programming
+            - Optimal distance from any location is sometimes useful
+- Videos 280 - 312
+    - Smoothing
+        - Interpolate between turns to smooth across different steps
+            - This uses gradient descent along with $\alpha$ and $\beta$ which are hyperparams for smoothing
+    - PID Control
+        - Cross track error
+            - Lateral distance between reference trajectory and the vehicle
+        - We want to minimize cross track error
+            - This often overshoots though
+                - To achieve marginal stability we then use PD control
+        - PD Control
+            - When we are reducing error, we counter-steer to stop overshoot.
+        - Systematic Bias
+            - These are biases in our system that should be accounted for to stop oscillation
+                - Like tire alignment
+        - PID
+            - P = Proportional
+            - I = Integral (solves for bias term)
+            - D = Differential (solves oscillation without considering bias)
+                - These are the three parts of the equation for control
+        - Control gains
+            - These are the hyper-params for the PID
+                - Twiddle can solve this (coordinate descent)
+                    - We change the hyperparams individually, grading each, and updating bumping factors
+- Videos 323 - 363
+    - SLAM
+        - Simultaneous Localization and Mapping
+            - Localization is assuming we have a map
diff --git a/docs/PromptFuzz.md b/docs/PromptFuzz.md
@@ -0,0 +1,10 @@
+# Prompt Fuzz
+
+**Source:** Prompt Fuzzing for Fuzz Driver Generation (paper)
+
+## How it Works
+
+1) Prompt LLM to generate programs that focus on specific APIs
+2) Eliminate programs that fail to execute or trigger false positives
+3) Guide the mutation of the LLM prompts with feedback of code coverage
+4) Convert the arguments of library API calls inside the generated programs from constants to variables that can be mutated during fuzzing
diff --git a/docs/PropertyBasedTesting.md b/docs/PropertyBasedTesting.md
@@ -0,0 +1,60 @@
+# Property Based Testing (PBT)
+
+**Definition:** Property based testing is a testing approach where formal executable specifications (properties) are written for software components and then automated harnesses check these specifications against automatically generated inputs.
+
+## Property-Based Testing In Practice
+
+Paper from UPenn and Jane Street.
+
+- Developers define a property
+- Harness checks this property using many random inputs produced by a generator
+- If a counterexample is found, the developer is notified
+
+The difference between PBT and fuzzing is fuzzing looks for crashes whereas PBT attempts to validate properties. In this way, fuzz tests can be seen as a subset of property based tests where the property being evaluated is that the program doesn't crash.
+
+Python library for hypothesis testing:
+
+- https://github.com/HypothesisWorks/hypothesis
+
+## Agentic PBT: Finding Bugs Across the Python Ecosystem
+
+This was made primarily by people from Anthropic so there may be a conflict of interest here. Moreover, they seem to overstate the effectiveness, they found a single bug in NumPy, and some other trivial bugs, but that doesn't really seem like that successful of a campaign.
+
+### Agentic property-based testing steps
+
+1. Define a target
+    - Their testing is constrained to python code (somewhat arbitrarily) and only for functions, files, or a module
+        - Their approach should be generalizable, I see no reason a diff wouldn't work equally as well
+1. Prompt agent with the following instructions (they actually include the prompt in the paper, yay!)
+    1. Analyze the target
+        - Figure out if the target is a function, files, or a module
+            - Kind of weird they don't specify this already, maybe this is some form of context loading...
+    2. Understand the target
+        - Read documentation, function signatures, source code, use web search, whatever is needed to understand the logic.
+    3. Propose properties
+        - Basically, ask it to use what it has 'learned' to define some properties that should hold
+    4. Write tests to exercise said properties
+    5. Execute and triage tests
+    6. Report bugs
+
+They spent a claimed $5,474.20 on Opus tokens to find 18 bugs, 17 of which were worth reporting. They thus spent ~$322 per bug they reported... There was also quite a bit of manual intervention at the end. It seems like it would've been better to add a final step where the agent resolved the issue and then performed some more validation, similar to the process done by Orion. This only puts humans in the loop for reviewing the PR that fixes the problem.
+
+More context about the above:
+
+> Our evaluation demonstrates that LLM-guided property-based testing can systematically uncover
+> bugs missed by traditional testing. With a cost of $5.56/bug report, and extrapolating from our manual
+> grading that 56% of these are valid bugs, our agent can find bugs for $9.93/valid bug. This is an upper
+> bound on the real-world cost, where developers with domain expertise can be more judicious with
+> where to target the agent. The diversity of issues, spanning numerical issues to business logic issues,
+> show the power of PBT, and the ability of agents to autonomously mine for such properties.
+
+Okay, so I don't really believe they sampled fairly for the bugs they evaluated, nor do I trust their percentage of valid bugs estimation. Despite this, there is a non-zero chance this approach could be useful.
+
+They found one issue in NumPy related the the Wald distribution
+- https://github.com/numpy/numpy/pull/29609
+
+### Can LLMs Write Good Property-Based Tests?
+
+The version in my hands is from July 2024.
+
+
diff --git a/docs/TF-IDF.md b/docs/TF-IDF.md
@@ -0,0 +1,115 @@
+# TF-IDF
+
+**Source:** Wikipedia
+
+**Definition:** TF-IDF (term frequency-inverse document frequency) is an information retrieval metric used to describe the importance of a word in a given document from a corpus of documents.
+
+Intuitively, this value describes how different the usage of a given term is in the current document compared to how it is used in other documents in the corpus. The idea is terms used more frequently in the current document are of interest.
+
+## Calculation
+
+1. Calculate TF(word)
+    - # of times word appears in document / total number of terms in the document
+        - Probability of a random term in the document being the word
+2. Calculate IDF(word)
+    - Inverse document frequency of the word is the log of the # of documents / number of documents with the word
+        - This comes directly from the information theoretic derivation of the information derived from knowing a word exists in a document.
+3. Calculate TFIDF(word)
+    - TF(word) * IDF(word)
+        - This amounts to the probability of a random term in the document being the word multiplied by the probability a random document has the term.
+            - As such, this value describes 
+
+## Description of Calculation
+
+TF is high when the current document uses the term frequently. 
+
+IDF is high when most documents don't use the term at all.
+
+Given this, high values occur when infrequently used terms are used frequently in the current document since we multiply the TF and IDF to get the final value.
+
+## Implementation
+
+```python3
+import os
+import math
+import re
+import sys
+
+# get all words
+def get_words(filename):
+    with open(filename, 'r') as f:
+        lines = f.read()
+    lines = re.sub('[^0-9a-zA-Z]+', ' ', lines)
+    lines = lines.split(' ')
+    final = []
+    for i in range(0, len(lines)):
+        if lines[i] != '':
+            final.append(lines[i].lower())
+    return final
+
+def tfs(prefix, filenames, word):
+    tfs = {}
+    for filename in filenames:
+        tfs[filename] = tf(prefix, filename, word)
+    return tfs
+
+def tf(prefix, filename, word):
+    words = get_words(prefix + filename)
+    count_of_word = 0
+    for cw in words:
+        if cw == word:
+            count_of_word += 1
+    if len(words) != 0:
+        return count_of_word / len(words)
+    return 0 # empty documents
+
+
+# technically, we might just want the one word and output the value for that
+def idf(prefix, filenames):
+    word_document_frequency = {}
+    for filename in filenames:
+        words = get_words(prefix + filename)
+        for word in words:
+            if word in word_document_frequency:
+                word_document_frequency[word] += 1
+            else:
+                word_document_frequency[word] = 1
+    idf = word_document_frequency.copy()
+    for word in idf:
+        idf[word] = math.log(len(filenames) / idf[word])
+    return idf
+
+if __name__ == "__main__":
+    user_input = True # continually prompt if the user is in interactive mode
+    while user_input:
+        word = ""
+        top_k = 1
+        if len(sys.argv) == 2:
+            word = sys.argv[1]
+            top_k = int(sys.argv[2])
+            user_input = False
+        else:
+            user_input = True
+            word = input("Word to find: ")
+            top_k = int(input("Top k elements to show: "))
+
+        filenames = os.listdir('documents')
+        idf_dict = idf('documents/', filenames)
+
+        if word not in idf_dict:
+            print('Word does not appear in any documents')
+            exit()
+
+        tf_dict = tfs('documents/', filenames, word)
+
+        tfidf = {}
+
+        for filename in filenames:
+            tfidf[filename] = idf_dict[word] * tf_dict[filename]
+
+        sorted_items = sorted(tfidf.items(), key=lambda kv: (kv[1], kv[0]))
+        sorted_items.reverse()
+
+        for i in range(top_k):
+            print(sorted_items[i])
+```
diff --git a/docs/WhereToFuzz.md b/docs/WhereToFuzz.md
@@ -24,7 +24,54 @@ Continuous Scoring:
 - They then use each of the methods to pick out code blocks of interest based on the method's weightings
 - They then check how many of the ground truth issues are covered by each approach
 
-## Questions
+## Questions & Critiques
 
 - How do they deal with potential differences between actual code failure locations and what is tested with OSS-Fuzz
     - One could imagine there are even more issues throughout the codebases that haven't yet been caught by OSS-Fuzz or are insufficiently covered by it, making the training dataset faulty in the sense that it doesn't represent the true distribution of errors / failures / vulnerabilities.
+
+> While there is still no guarantee that OSS-Fuzz identified every bug,
+> the massive amount of time spent on fuzzing these targets is likely
+> to find a large majority of crashes reachable by a fuzzer.
+
+That's kind of lame TBH. That said, if write code with some number of vulnerabilities, that would be even more contrived. Ideally, they'd use the entirety of some CVE database because existing fuzzers might still be missing key issues. What they did seems acceptable, albeit limited.
+
+- Their evals are ran against open source C and C++ projects from 2016 - 2023
+    - This seems like it is limited in a few ways
+        - It is only projects written in two languages that aren't exceptionally popular to start projects with today
+            - ie. these are likely mature projects
+        - These are open source projects
+            - These are distributionally different than proprietary projects, but in what ways I'm unsure
+
+- They are only using issues found by OSS-Fuzz as their metric, but this doesn't weight them according to importance
+    - Maybe this doesn't matter as one vulnerability means the system is vulnerable, but still, it seems prudent to weight specific issues more highly than others.
+
+## Results
+
+Given their dataset, the approaches they evaluated, and the NDCG- (normalized discounted cululative gain) and NDCG+ calculations (NDCG with only most recent function from stack trace vs all functions in the stack trace to get an under and overestimate for retrieval), they found the following:
+
+- Leopard-V performs the best followed by Leopard-C, Sanitizer, then CodeT5+.
+    - The top three are deterministic. There are also worse deterministic ones, but it is not the case that the ML approaches performed better (or even on par)
+        - that said, the paper is from July 2024 so LLMs were less sophisticated back then, and the literature was more limited.
+            - I'd be curious to see how this stacks up today.
+- Every approach except Linevul outperforms the random baseline by a ss amount
+
+> Target selection methods based on software metrics per-
+> form significantly better than every other considered
+> method. The best software metric, Leopard-V, correctly
+> captures as much as 13% of the crashes with its highest
+> ranking function across the whole corpus of more than
+> 1600 crashes. This makes it the most natural and really only
+> viable candidate for fuzzing approaches which require a
+> discrete selection method.
+
+> Software metric-based target selection methods perform
+> significantly better than any other method across most
+> types of crashes and sanitizers. The only other method
+> close to their performance in some cases is the sanitizer-
+> based one.
+
+## Takeaways
+
+The problem of identifying key areas to fuzz can be seen as an information retrieval problem whree NDCG can be used to compute how well selection match actual problematic code regions. This actually seems fairly useful for our line number based evals.
+
+The Leopard approach (code metrics) outperforms more sophisticated approaches in identifying potential functions of interest, within the constraints of the survey.

	notes Unnamed repository; edit this file 'description' to name the repository.
	Log \| Files \| Refs

A	docs/AStar.md	\|	18	++++++++++++++++++
M	docs/CodeVerification.md	\|	5	+++++
M	docs/ComputerScience.md	\|	1	+
A	docs/CosineSimilarity.md	\|	50	++++++++++++++++++++++++++++++++++++++++++++++++++
M	docs/DirectedFuzzer.md	\|	39	+++++++++++++++++++++++++++++++++++++++
M	docs/DiscreteMath.md	\|	98	++++++++++++++++++++++++++++++++++++++++----------------------------------------
M	docs/FreeSoftware.md	\|	8	++++----
A	docs/FuzzingInformationTheoreticPerspective.md	\|	8	++++++++
A	docs/InformationRetrieval.md	\|	8	++++++++
A	docs/JANUS.md	\|	27	+++++++++++++++++++++++++++
A	docs/NDCG.md	\|	27	+++++++++++++++++++++++++++
M	docs/ProbabilisticRobotics.md	\|	36	++++++++++++++++++++++++++++++++++++
A	docs/PromptFuzz.md	\|	10	++++++++++
A	docs/PropertyBasedTesting.md	\|	60	++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A	docs/TF-IDF.md	\|	115	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	docs/WhereToFuzz.md	\|	49	++++++++++++++++++++++++++++++++++++++++++++++++-