notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

Orion.md (3815B)


      1 # Orion
      2 
      3 **Source:** Orion: Fuzzing Workflow Automation
      4 
      5 ## Notes
      6 
      7 This is an approach to fuzzing automation presented in the 'Orion: Fuzzing Workflow Automation' paper by employees of Nvidia. 
      8 
      9 ## Process
     10 
     11 - Harness generation and execution
     12     - Takes target project source code as input
     13     - Constructs a codebase index
     14         - The codebase is chunked on the basis of functions
     15     - Select interfaces for fuzzing by ranking non-static functions by how likely it thinks fuzzing will trigger bugs
     16         - This ranking is done by computing a few metrics:
     17             - Cyclomatic complexity
     18                 - Number of independent paths paths through a function
     19             - Internal function calls
     20                 - How frequently a function is called by others
     21             - Lines of code
     22             - Callgraph size
     23                 - Number of functions reachable from the given function
     24                     - This seems to be an attempt to place more weight on more important functionallity which might be misguided.
     25                         - If there is a part of the codebase that is very integral, it seems likely that is more well tested.
     26             - Dangerous expressions
     27                 - Constructs like pointer arithmetic, memory maangement, bit operations (detected by the LLM...)
     28                     - Honestly, I'd just use regex...
     29             - Sink functions
     30                 - Functions associated with vulnerabilities
     31                     - Again... why are you using LLMs!? Just use regex
     32             - Parsing functions
     33                 - Functions that parse structured inputs
     34                     - fair play for LLM usage.
     35     - Generates seed inputs for each selected function
     36     - Generate harness
     37         - A dependency analysis agent identifies setup and teardown processes and header file dependencies
     38         - Constructs compilable fuzz drivers compatible with generated seeds
     39     - The resulting harnesses and seeds are then sent to the fuzzing infra which executes the fuzzer, monitors for errors, and records the results
     40 - Crash handling
     41     - The triage agent filters out harness-related issues
     42     - Triage agent root causes issues and creates minimal repros
     43 - Patching
     44     - Patching agent patches the issue
     45     - Minimal repros are validated as fixed from the prior step
     46         - If the issue still exists, patching agent tries again
     47 
     48 ## Questions During Reading
     49 
     50 - How do they ensure the bug triaging / patching doesn't result in further regressions?
     51     - It seems likely they are finding real issues, but the patches would be dubious, even if they result in the problem being resolved for a given byte buffer input to the harness.
     52     - Towards the end of the paper they say basically a human reviews at the end and they ensure it passes minimal tests
     53         - This seems like basically they just validate it compiles and passes the fuzz repros
     54 - Why does parsing the source code to create call grpahs and type indexes improve performance?
     55     - Agentic models should be able to request this information themselves with tool use without the need for preprocessing. 
     56         - This paper was put on ArXiv on Sep. 18th so they clearly had access to tool using agentic models that could easily do this.
     57     - Also, I don't see ablations so I don't know if it does.
     58 - How did they choose the target identification metrics?
     59 
     60 ## Interesting Questions to Explore
     61 
     62 - Does generating call graphs improve model performance?
     63     - What about other forms of codebase processing for context generation?
     64 - What are the most important metrics for deterministic risk identification?
     65 
     66 ## Important Takeaways
     67 
     68 - They define seeds prior to their harnesses.
     69     - Their rationale seems to be if they constrain the input and output spaces, LLMs will be better at generating harnesses.