LLMs to Write Fuzzers - Infinite Forest

There has been a lot talk on the power of OpenAI and Copilot for writing software. The opinions range from mass productivity increase to development slowdown caused by the subtle bugs and generative AI's tendency to make up API's. In this post I examine OpenAI's code generation ability by attempting to write fuzzers. Fuzzers are security tools that generate random data and feed them into an application in order to find bad behavior. For this project I choose the newer, coverage-guided Atheris Python fuzzer. There is two primary reasons, first it is my understanding that OpenAI's models perform best writing Python and since there is not have a lot existing Atheris code, versus something like AFL, there is limited training data which should allow us to really test the capabilities. The two chosen test libraries are Twisted Network Engine and Numpy. The code for the fuzzers has been collected in https://github.com/xxyyx/fuzz-forest. We have an explicit goal to rely almost exclusively on GPT, I will not write any code myself beyond providing context and prompts. Fuzzers will run for for at least 5 hours to compare coverage. # Prompt Design The first step is thinking about the prompt and what context to include. OpenAI and Microsoft provide two good introductory guides for writing code. https://platform.openai.com/docs/guides/code https://microsoft.github.io/prompt-engineering/ With these in mind I begin the prompt with context of the language, library, a link to the code, and a generic example from the documentation: ```python # write python fuzz tests using atheris library # https://github.com/google/atheris """ # Generic Atheris fuzz Example # !/usr/bin/python3 import atheris with atheris.instrument_imports(): import some_library import sys def TestOneInput(data): some_library.parse(data) atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() """ # When fuzzing Python, Atheris will report a failure if the Python code under test throws an uncaught exception. ``` The guides recommend adding API definitions to combat the AI's tendency to make up API functions and parameters so I decided to copy and paste the entire API reference for the data provider: ```python # Atheris FuzzedDataProvider API Reference # Often, a bytes object is not convenient input to your code being fuzzed. Similar to libFuzzer, # we provide a FuzzedDataProvider to translate these bytes into other input forms. # To construct the FuzzedDataProvider, use the following code: # fdp = atheris.FuzzedDataProvider(input_bytes) # The FuzzedDataProvider provides the following functions: # ConsumeBytes(count: int): Consume count bytes. # ConsumeUnicode(count: int): Consume unicode characters. Might contain surrogate pair characters, which according to the specification are invalid in this situation. # ConsumeUnicodeNoSurrogates(count: int): Consume unicode characters, but never generate surrogate pair characters. # ConsumeString(count: int): Alias for ConsumeBytes in Python 2, or ConsumeUnicode in Python 3. # ConsumeInt(int: bytes): Consume a signed integer of the specified size (when written in two's complement notation). # ConsumeUInt(int: bytes): Consume an unsigned integer of the specified size. # ConsumeIntInRange(min: int, max: int): Consume an integer in the range [min, max]. # ConsumeIntList(count: int, bytes: int): Consume a list of count integers of size bytes. # ConsumeIntListInRange(count: int, min: int, max: int): Consume a list of count integers in the range [min, max]. # ConsumeFloat(): Consume an arbitrary floating#point value. Might produce weird values like NaN and Inf. # ConsumeRegularFloat(): Consume an arbitrary numeric floating#point value; never produces a special type like NaN or Inf. # ConsumeProbability(): Consume a floating#point value in the range [0, 1]. # ConsumeFloatInRange(min: float, max: float): Consume a floating#point value in the range [min, max]. # ConsumeFloatList(count: int): Consume a list of count arbitrary floating#point values. Might produce weird values like NaN and Inf. # ConsumeRegularFloatList(count: int): Consume a list of count arbitrary numeric floating#point values; never produces special types like NaN or Inf. # ConsumeProbabilityList(count: int): Consume a list of count floats in the range [0, 1]. # ConsumeFloatListInRange(count: int, min: float, max: float): Consume a list of count floats in the range [min, max]. # PickValueInList(l: list): Given a list, pick a random value. # ConsumeBool(): Consume either True or False.# Atheris FuzzedDataProvider API Reference ``` With all that written we are up to 790 tokens which leaves use with plenty of space for more examples. The documentation talks about custom mutators and includes a zlib example. So we include the full thing with context: ```python # python zlib # https://docs.python.org/3/library/zlib.html # function signature is zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE) # fuzz test for zlib.decompress(): """ # An example of fuzzing with a custom mutator in Python. import atheris with atheris.instrument_imports(): import sys import zlib def CustomMutator(data, max_size, seed): try: decompressed = zlib.decompress(data) except zlib.error: decompressed = b'Hi' else: decompressed = atheris.Mutate(decompressed, len(decompressed)) return zlib.compress(decompressed) @atheris.instrument_func # Instrument the TestOneInput function itself def TestOneInput(data): #The entry point for our fuzzer. # This is a callback that will be repeatedly invoked with different arguments # after Fuzz() is called. # We translate the arbitrary byte string into a format our function being fuzzed # can understand, then call it. # Args: # data: Bytestring coming from the fuzzing engine. try: decompressed = zlib.decompress(data) except zlib.error: return if len(decompressed) < 2: return try: if decompressed.decode() == 'FU': raise RuntimeError('Boom') except UnicodeDecodeError: pass if __name__ == '__main__': if len(sys.argv) > 1 and sys.argv[1] == '--no_mutator': atheris.Setup(sys.argv, TestOneInput) else: atheris.Setup(sys.argv, TestOneInput, custom_mutator=CustomMutator) atheris.Fuzz() """ ``` Good start let's dig into the individual libraries. # Numpy The first step is looking though the **fuzz-examples/** directory in the [OSS-Fuzz project](https://github.com/google/oss-fuzz) to see if there are any existing Atheris fuzzers for numpy. To my delight there is a single [fuzz_loader.py](https://github.com/google/oss-fuzz/blob/master/projects/numpy/fuzz_loader.py) file testing the `np.loadtxt()` function. Let's add that to the prompt as another example following the same context format as above; defining the function signature, a link to documentation, and specifying that this is the fuzzer. ```python # Function signature is numpy.loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None, *, quotechar=None, like=None) # https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html # Fuzz test for np.loadtxt(): """ import atheris import sys with atheris.instrument_imports(): from io import StringIO import numpy as np def get_fuzz_types(): # Define the rows dtype = np.dtype( [('f0', np.uint16), ('f1', np.float64), ('f2', 'S7'), ('f3', np.int8)] ) # An expected match expected = np.array( [ (1, 2.4, "a", -34), (2, 3.1, "b", 29), (3, 9.9, "g", 120), ], dtype=dtype ) return dtype, expected def TestOneInput(fuzz_data): dtype, expected = get_fuzz_types() fdp = atheris.FuzzedDataProvider(fuzz_data) new_data = StringIO(fdp.ConsumeString(sys.maxsize)) try: np.loadtxt(new_data, dtype=dtype, delimiter=";", skiprows=True) # Catch all of the exceptions that are caught in # https://github.com/numpy/numpy/blob/main/numpy/lib/tests/test_loadtxt.py except StopIteration: return except ValueError: return except IndexError: return except TypeError: return except RuntimeError: return def main(): atheris.instrument_all() atheris.Setup(sys.argv, TestOneInput, enable_python_coverage=True) atheris.Fuzz() if __name__ == "__main__": main() """ ``` That brings us to a total of 1846 tokens. Finally we add some guiding text to the bottom of the prompt: ```python # Function signature is matmul(x1: Array, x2: Array, /) -> Array # https://numpy.org/doc/stable/reference/generated/numpy.matmul.html # fuzz test for np.matmul(): import sys import atheris ``` Our complete prompt explains what we are doing, how to do it, API references, and multiple examples. Let's run the following: ```python # write python fuzz tests using atheris library # https://github.com/google/atheris """ # Generic Atheris fuzz Example # !/usr/bin/python3 import atheris with atheris.instrument_imports(): import some_library import sys def TestOneInput(data): some_library.parse(data) atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() # When fuzzing Python, Atheris will report a failure if the Python code under test throws an uncaught exception. """ # Atheris FuzzedDataProvider API Reference # Often, a bytes object is not convenient input to your code being fuzzed. Similar to libFuzzer, # we provide a FuzzedDataProvider to translate these bytes into other input forms. # To construct the FuzzedDataProvider, use the following code: # fdp = atheris.FuzzedDataProvider(input_bytes) #The FuzzedDataProvider provides the following functions: # ConsumeBytes(count: int): Consume count bytes. # ConsumeUnicode(count: int): Consume unicode characters. Might contain surrogate pair characters, which according to the specification are invalid in this situation. However, many core software tools (e.g. Windows file paths) support them, so other software often needs to too. # ConsumeUnicodeNoSurrogates(count: int): Consume unicode characters, but never generate surrogate pair characters. # ConsumeString(count: int): Alias for ConsumeBytes in Python 2, or ConsumeUnicode in Python 3. # ConsumeInt(int: bytes): Consume a signed integer of the specified size (when written in two's complement notation). # ConsumeUInt(int: bytes): Consume an unsigned integer of the specified size. # ConsumeIntInRange(min: int, max: int): Consume an integer in the range [min, max]. # ConsumeIntList(count: int, bytes: int): Consume a list of count integers of size bytes. # ConsumeIntListInRange(count: int, min: int, max: int): Consume a list of count integers in the range [min, max]. # ConsumeFloat(): Consume an arbitrary floating#point value. Might produce weird values like NaN and Inf. # ConsumeRegularFloat(): Consume an arbitrary numeric floating#point value; never produces a special type like NaN or Inf. # ConsumeProbability(): Consume a floating#point value in the range [0, 1]. # ConsumeFloatInRange(min: float, max: float): Consume a floating#point value in the range [min, max]. # ConsumeFloatList(count: int): Consume a list of count arbitrary floating#point values. Might produce weird values like NaN and Inf. # ConsumeRegularFloatList(count: int): Consume a list of count arbitrary numeric floating#point values; never produces special types like NaN or Inf. # ConsumeProbabilityList(count: int): Consume a list of count floats in the range [0, 1]. # ConsumeFloatListInRange(count: int, min: float, max: float): Consume a list of count floats in the range [min, max]. # PickValueInList(l: list): Given a list, pick a random value. # ConsumeBool(): Consume either True or False. # python zlib # https://docs.python.org/3/library/zlib.html # function signature is zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE) # fuzz test for zlib.decompress(): """ # An example of fuzzing with a custom mutator in Python. import atheris with atheris.instrument_imports(): import sys import zlib def CustomMutator(data, max_size, seed): try: decompressed = zlib.decompress(data) except zlib.error: decompressed = b'Hi' else: decompressed = atheris.Mutate(decompressed, len(decompressed)) return zlib.compress(decompressed) @atheris.instrument_func # Instrument the TestOneInput function itself def TestOneInput(data): #The entry point for our fuzzer. # This is a callback that will be repeatedly invoked with different arguments # after Fuzz() is called. # We translate the arbitrary byte string into a format our function being fuzzed # can understand, then call it. # Args: # data: Bytestring coming from the fuzzing engine. try: decompressed = zlib.decompress(data) except zlib.error: return if len(decompressed) < 2: return try: if decompressed.decode() == 'FU': raise RuntimeError('Boom') except UnicodeDecodeError: pass if __name__ == '__main__': if len(sys.argv) > 1 and sys.argv[1] == '--no_mutator': atheris.Setup(sys.argv, TestOneInput) else: atheris.Setup(sys.argv, TestOneInput, custom_mutator=CustomMutator) atheris.Fuzz() """ # Function signature is numpy.loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None, *, quotechar=None, like=None) # https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html # Fuzz test for np.loadtxt(): """ import atheris import sys with atheris.instrument_imports(): from io import StringIO import numpy as np def get_fuzz_types(): # Define the rows dtype = np.dtype( [('f0', np.uint16), ('f1', np.float64), ('f2', 'S7'), ('f3', np.int8)] ) # An expected match expected = np.array( [ (1, 2.4, "a", -34), (2, 3.1, "b", 29), (3, 9.9, "g", 120), ], dtype=dtype ) return dtype, expected def TestOneInput(fuzz_data): dtype, expected = get_fuzz_types() fdp = atheris.FuzzedDataProvider(fuzz_data) new_data = StringIO(fdp.ConsumeString(sys.maxsize)) try: np.loadtxt(new_data, dtype=dtype, delimiter=";", skiprows=True) # Catch all of the exceptions that are caught in # https://github.com/numpy/numpy/blob/main/numpy/lib/tests/test_loadtxt.py except StopIteration: return except ValueError: return except IndexError: return except TypeError: return except RuntimeError: return def main(): atheris.instrument_all() atheris.Setup(sys.argv, TestOneInput, enable_python_coverage=True) atheris.Fuzz() if __name__ == "__main__": main() """ # Function signature is np.matmul(x1: Array, x2: Array, /) -> Array # https://numpy.org/doc/stable/reference/generated/numpy.matmul.html # fuzz test for np.matmul(): import sys import atheris ``` The API returns the requested `np.matmul()` fuzz test but since our max token length was set at 1000 it continues writing code for other functions in the documentation. We were given 5 fuzz tests in total. Let's go through each individually. ### np.matmul() ```python # Function signature is npmatmul(x1: Array, x2: Array, /) -> Array # https://numpy.org/doc/stable/reference/generated/numpy.matmul.html # fuzz test for np.matmul(): import sys import atheris import numpy as np def TestOneInput(fuzz_data): fdp = atheris.FuzzedDataProvider(fuzz_data) x1 = np.array(fdp.ConsumeIntList(10, 4)) x2 = np.array(fdp.ConsumeIntList(10, 4)) np.matmul(x1, x2) def main(): atheris.instrument_all() atheris.Setup(sys.argv, TestOneInput, enable_python_coverage=True) atheris.Fuzz() if __name__ == "__main__": main() ``` This function is able to run and after 5 hours we get the following coverage: cov: 299 ft: 529 corp: 20/1014b ### np.array() ```python # Function signature is np.array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0) # https://numpy.org/doc/stable/reference/generated/numpy.array.html # fuzz test for np.array(): import sys import atheris import numpy as np def TestOneInput(fuzz_data): fdp = atheris.FuzzedDataProvider(fuzz_data) x1 = np.array(fdp.ConsumeIntList(10, 4)) x2 = np.array(fdp.ConsumeIntList(10, 4)) np.array(x1, x2) def main(): atheris.instrument_all() atheris.Setup(sys.argv, TestOneInput, enable_python_coverage=True) atheris.Fuzz() if __name__ == "__main__": main() ``` The `np.array()` testcase does not work and I get the following error message ``` np.array(x1, x2) TypeError: Cannot construct a dtype from an array ``` I know I was trying to avoid user modification but I decide to remove the `np.array(x1, x2)` line to test. This works and returns several hundreds edges after running for 5 hours. ```python fdp = atheris.FuzzedDataProvider(fuzz_data) x1 = np.array(fdp.ConsumeIntList(10, 4)) x2 = np.array(fdp.ConsumeIntList(10, 4)) # np.array(x1, x2) have comment out because it will raise TypeError ``` cov: 299 ft: 496 corp: 16/457b ### np.array_equal() ```python # Function signature is np.array_equal(a1, a2, strict=False) # https://numpy.org/doc/stable/reference/generated/numpy.array_equal.html # fuzz test for np.array_equal(): import sys import atheris import numpy as np def TestOneInput(fuzz_data): fdp = atheris.FuzzedDataProvider(fuzz_data) x1 = np.array(fdp.ConsumeIntList(10, 4)) x2 = np.array(fdp.ConsumeIntList(10, 4)) np.array_equal(x1, x2) def main(): atheris.instrument_all() atheris.Setup(sys.argv, TestOneInput, enable_python_coverage=True) atheris.Fuzz() if __name__ == "__main__": main() ``` This testcase also works and returns the following coverage details after running for 5 hours: cov: 306 ft: 516 corp: 15/242b ### np.arrary_equiv() ```python # Function signature is np.array_equiv(a1, a2) # https://numpy.org/doc/stable/reference/generated/numpy.array_equiv.html # fuzz test for np.array_equiv(): import sys import atheris import numpy as np def TestOneInput(fuzz_data): fdp = atheris.FuzzedDataProvider(fuzz_data) x1 = np.array(fdp.ConsumeIntList(10, 4)) x2 = np.array(fdp.ConsumeIntList(10, 4)) np.array_equiv(x1, x2) def main(): atheris.instrument_all() atheris.Setup(sys.argv, TestOneInput, enable_python_coverage=True) atheris.Fuzz() if __name__ == "__main__": main() ``` This function covers 304 egdes after running for 5 hours: cov: 304 ft: 486 corp: 14/221b ### np.array_repr() ```python # Function signature is np.array_repr(arr, max_line_width=None, precision=None, suppress_small=None) # https://numpy.org/doc/stable/reference/generated/numpy.array_repr.html # fuzz test for np.array_repr(): import sys import atheris import numpy as np def TestOneInput(fuzz_data): fdp = atheris.FuzzedDataProvider(fuzz_data) x1 = np.array(fdp.ConsumeIntList(10, 4)) x2 = np.array(fdp.ConsumeIntList(10, 4)) np.array_repr(x1, x2) def main(): atheris.instrument_all() atheris.Setup(sys.argv, TestOneInput, enable_python_coverage=True) atheris.Fuzz() if __name__ == "__main__": main() ``` The `np.array_repr()` test case was another that was not able to run as it returned the following exception: ``` === Uncaught Python exception: === ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ... es/numpy/core/arrayprint.py", line 546, in _array2string lst = _formatArray(a, format_function, options['linewidth'], File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/numpy/core/arrayprint.py", line 889, in _formatArray return recurser(index=(), File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/numpy/core/arrayprint.py", line 846, in recurser s, line = _extendLine_pretty( File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/numpy/core/arrayprint.py", line 759, in _extendLine_pretty return _extendLine(s, line, word, line_width, next_line_prefix, legacy) File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/numpy/core/arrayprint.py", line 746, in _extendLine if needs_wrap: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ``` After adding ValueError exception catching i was able to obtain the following stats after running for 5 hours: cov: 342 ft: 475 corp: 12/387b ## Modifying Weights The next step is to modify the weights and see if there is any discernable difference for our `np.matmul()` fuzzer using the same prompt. In this case I changed the temperature to .5 and the frequency penalty to .38. Notice how GPT added a `np.random.uniform()` transformation on the fuzzed data: ```python # Function signature is np.matmul(x1: Array, x2: Array, /) -> Array # https://numpy.org/doc/stable/reference/generated/numpy.matmul.html # fuzz test for np.matmul(): import atheris import numpy as np def TestOneInput(fuzz_data): fdp = atheris.FuzzedDataProvider(fuzz_data) shape = fdp.ConsumeIntInRange(1, 100) x1 = np.random.uniform(size=(shape, shape)) x2 = np.random.uniform(size=(shape, shape)) np.matmul(x1, x2) def main(): atheris.instrument_all() atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() if __name__ == "__main__": main() ``` This ran and after 5 hours we get the following error: cov: 299 ft: 564 corp: 7/10b When changing the temperature to 1 and the frequency penalty to .38 we get a **Uncaught Python exception** ```python # Function signature is np.matmul(x1: Array, x2: Array, /) -> Array # https://numpy.org/doc/stable/reference/generated/numpy.matmul.html # fuzz test for np.matmul(): import atheris import numpy as np def TestOneInput(data): fdp = atheris.FuzzedDataProvider(data) x1 = np.array([fdp.ConsumeInt(8) for _ in range(fdp.ConsumeInt(8))]) x2 = np.array([fdp.ConsumeInt(8) for _ in range(fdp.ConsumeInt(8))]) np.matmul(x1, x2) def main(): atheris.instrument_all() atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() if __name__ == "__main__": main() ``` ``` === Uncaught Python exception: === line 12, in TestOneInput np.matmul(x1, x2) ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 0 is different from 10 ``` In the final test we push up the temperature to 1, Top P to .5 and frequency penalty to .38. This results in a strange fuzzer with many comments and even some already written error catching. Most interesting is the appended text `https://github.com/googleprojectzeroinjectionbugsplzkthxbye`. ```python # Function signature is np.matmul(x1: Array, x2: Array, /) -> Array # https://numpy.org/doc/stable/reference/generated/numpy.matmul.html # fuzz test for np.matmul(): import sys import atheris import numpy as np def get_fuzz_types(): # Define the rows dtype = np.dtype( [('f0', np.uint16), ('f1', np.float64), ('f2', 'S7'), ('f3', np.int8)] ) # An expected match expected = np.array( [ (1, 2.4, "a", -34), (2, 3.1, "b", 29), (3, 9.9, "g", 120), ], dtype=dtype ) return dtype, expected # The entry point for our fuzzer. This is a callback that will be repeatedly invoked with different arguments after Fuzz() is called. We translate the arbitrary byte string into a format our function being fuzzed can understand, then call it. Args: data: Bytestring coming from the fuzzing engine. # https://github.com/google/atheris/blob/master/examples/python/np_loadtxt_example_with_custom_mutator/np_loadtxt_example_with_custom_mutator.py#L49-L54 # https://github.com/google/atheris/blob/master/examples/python/np_loadtxt_example_with_custom_mutator/np_loadtxt_example_with_custom_mutator.py#L49-L54 # https://github.com/google/atheris-examples-python3-numpy-matmul#matmul-fuzzing-example # https://github.com/googleprojectzeroinjectionbugsplzkthxbye?files=1&utf8=%E2%9C%93&q=matmul&type=&language=python def TestOneInput(fuzz_data): dtype, expected = get_fuzz_types() fdp = atheris.FuzzedDataProvider(fuzz_data) new_data = StringIO(fdp.ConsumeString(sys.maxsize)) try: np.matmul(new_data, dtype=dtype, delimiter=";", skiprows=True) # Catch all of the exceptions that are caught in # https://github.com/numpy/numpy/blob/main/numpy/lib/tests/test_loadtxt.py except StopIteration: return except ValueError: return except IndexError: return except TypeError: return # https://github.com/googleprojectzeroinjectionbugsplzkthxbye?files=1&utf8=%E2%9C%93&q=matmul&type=&language=python # https://github.com/googleprojectzeroinjectionbugsplzkthxbye?files=1&utf8=%E2%9C%93&q=matmul&type=&language=python # https://github.com/googleprojectzeroinjectionbugsplzkthxbye?files=1&utf8=%E2%9C%93&q=matmul&type=&language=python # https://github.com/googleprojectzeroinjectionbugsplzkthxbye?files=1&utf8=%E2%9C%93&q=matmul&type=&language=python # https://github.com/googleprojectzeroinjectionbugsplzkthxbye?files=-1#readme ``` This ran for 5 hours and obtained the following coverage: cov: 304 ft: 537 corp: 16/402b ## ChatGPT When I entered the exact same prompts for in ChatGPT it returned a test for `np.loadtxt()` that was worse than the example I provided. I told ChatGPT that I didn't want a np.loadtxt() test, I wanted a np.matmul() fuzzer and it presented me with the following test which would not run and ChatGPT could not fixed when prompted. ```python import atheris import sys import numpy as np # Define a custom mutator for the input data def CustomMutator(data): # We can use the built-in `atheris.Mutate` function to mutate the input data mutated_data = atheris.Mutate(data) # Return the mutated data return mutated_data @atheris.instrument_func def TestOneInput(data): # Use Atheris to create a FuzzedDataProvider fdp = atheris.FuzzedDataProvider(data) # Use the FuzzedDataProvider to generate random matrices for the matrix multiplication matrix1 = fdp.ConsumeFloatListInRange(2, -100, 100) matrix2 = fdp.ConsumeFloatListInRange(2, -100, 100) # Reshape the matrices to be 2x2 matrix1 = np.reshape(matrix1, (2, 2)) matrix2 = np.reshape(matrix2, (2, 2)) # Perform the matrix multiplication result = np.matmul(matrix1, matrix2) # Check that the result is a 2x2 matrix if result.shape != (2, 2): raise RuntimeError('Unexpected matrix shape') if __name__ == '__main__': # Set up Atheris with the custom mutator and the TestOneInput function atheris.Setup(sys.argv, TestOneInput, custom_mutator=CustomMutator) # Start the fuzzing process atheris.Fuzz() ``` ``` === Uncaught Python exception: === ValueError: cannot reshape array of size 2 into shape (2,2) ``` # Twisted We move on to researching Twisted by searching for any existing fuzzers. There are none for twisted but two for [urllib3](https://github.com/urllib3/urllib3) a Python http library; [fuzz_requests.py](https://github.com/google/oss-fuzz/blob/master/projects/urllib3/fuzz_requests.py) and[fuzz_urlparse.py](https://github.com/google/oss-fuzz/blob/master/projects/urllib3/fuzz_urlparse.py](fuzz_urlparse.py). I paste the two examples using the same format as described above to come up with our [complete prompt](https://github.com/xxyyx/prompts/blob/main/codex-fuzzing-prompts.py). ### twisted.http.parseContentRange() ```python # fuzz tests for twisted.web.http # https://docs.twisted.org/en/stable/api/twisted.web.http.html # function signature is twisted.web.http.parseContentRange(header) # https://docs.twisted.org/en/stable/api/twisted.web.http.html#parseContentRange # fuzz test for twisted.web.http.parseContentRange(): import sys import atheris from twisted.web import http def TestOneInput(data): fdp = atheris.FuzzedDataProvider(data) original = fdp.ConsumeUnicode(sys.maxsize) try: http.parseContentRange(original) except ValueError: None return def main(): atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() if __name__ == "__main__": atheris.instrument_all() main() ``` Obtained the following coverage after 5 hours: cov: 303 ft: 424 corp: 18/2288b ### twisted.http.parse_qs() ```python # function signature is twisted.web.http.parse_qs(qs, keep_blank_values=0, strict_parsing=0) # https://docs.twisted.org/en/stable/api/twisted.web.http.html#parse_qs # fuzz test for twisted.web.http.parse_qs(q): import sys import atheris from twisted.web import http def TestOneInput(data): fdp = atheris.FuzzedDataProvider(data) original = fdp.ConsumeUnicode(sys.maxsize) try: response = http.parse_qs(original) response.keys() response.values() except ValueError: None return def main(): atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() if __name__ == "__main__": atheris.instrument_all() main() ``` Notice how GPT added a ValueError catch on it's own. This testcase ran for a few minutes and found the following uncaught exception: ``` === Uncaught Python exception: === TypeError: must be str or None, not bytes ... site-packages/twisted/web/http.py", line 273, in parse_qs items = [s2 for s1 in qs.split(b"&") for s2 in s1.split(b";")] TypeError: must be str or None, not bytes ``` ### twisted.http.fromChunk ```python # function signature is twisted.web.http.fromChunk(data: bytes) -> Tuple[bytes, bytes]: # https://docs.twisted.org/en/stable/api/twisted.web.http.html#fromChunk # fuzz test for twisted.web.http.fromChunk(): import sys import atheris from twisted.web import http def TestOneInput(data): fdp = atheris.FuzzedDataProvider(data) original = fdp.ConsumeUnicode(sys.maxsize) try: response = http.fromChunk(original) response[0] # bytes, the chunk data itself (without the trailing CRLF). This is always present in a valid chunk header, even if it's empty (i.e., 0\r\n). response[1] except Exception as e: print("Exception occured : ", e) return def main(): atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() if __name__ == "__main__": atheris.instrument_all() # instrument all imported modules for fuzzing with Atheris library (https://github.com/google/atheris) main() ``` This testcase created a blanket Exception handler and added comments about handling CRLF. Pretty interesting. cov: 301 ft: 531 corp: 17/320b after 5 hours. ## twisted.microdom I was not satisfied with the coverage of our http functions so I modified the prompt to target the micrdom library functionality: ``` # fuzz tests for twisted.web.microdom # https://docs.twisted.org/en/stable/api/twisted.web.microdom.html # function signature is twisted.web.microdom.parseXML(readable) # https://docs.twisted.org/en/stable/api/twisted.web.microdom.html#parseString # fuzz test for twisted.web.microdom.parseXML(): import sys import atheris from twisted.web import microdom ``` This gave us 4 fuzzers to review: ### twisted.microdom.parseXML() ```python # function signature is twisted.web.microdom.parseXML(readable) # https://docs.twisted.org/en/stable/api/twisted.web.microdom.html#parseString # fuzz test for twisted.web.microdom.parseXML(): import sys import atheris from twisted.web import microdom def TestOneInput(data): fdp = atheris.FuzzedDataProvider(data) original = fdp.ConsumeUnicode(sys.maxsize) try: microdom.parseXML(original) except microdom.ParseError: None return def main(): atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() if __name__ == "__main__": atheris.instrument_all() main() ``` This function returned am Uncaught exception on running: ``` === Uncaught Python exception: === FileNotFoundError: [Errno 2] No such file or directory: '' line 1165, in parseXML return parse(readable, caseInsensitive=0, preserveCase=1) File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/twisted/web/microdom.py", line 1120, in parse readable = open(readable, "rb") FileNotFoundError: [Errno 2] No such file or directory: '' ``` ### twisted.microdom.parseString() ```python # function signature is twisted.web.microdom.parseString(string) # https://docs.twisted.org/en/stable/api/twisted.web.microdom.html#parseString # fuzz test for twisted.web.microdom.parseString(): import sys import atheris from twisted.web import microdom def TestOneInput(data): fdp = atheris.FuzzedDataProvider(data) original = fdp.ConsumeUnicodeNoSurrogates(sys.maxsize) try: microdom.parseString(original) except microdom.ParseError: None return def main(): atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() if __name__ == "__main__": atheris.instrument_all() main() ``` This returned a nice crash after running for a few minutes: ``` #2027 NEW cov: 408 ft: 565 corp: 38/150b lim: 6 exec/s: 0 rss: 770Mb L: 6/6 MS: 1 ChangeByte- #2044 NEW cov: 411 ft: 568 corp: 39/156b lim: 6 exec/s: 0 rss: 770Mb L: 6/6 MS: 2 CrossOver-ChangeByte- === Uncaught Python exception: === MismatchedTags: expected </< ('<xmlfile />' line 1 column 2) />>, got </END_OF_FILE> line: 4 col: 0, began line: 1 col: 2 Traceback (most recent call last): File "/tmp/ipykernel_1783632/3734284493.py", line 13, in TestOneInput microdom.parseString(original) File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/twisted/web/microdom.py", line 1157, in parseString return parse(BytesIO(st.encode("UTF-16")), *args, **kw) File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/twisted/web/microdom.py", line 1131, in parse mdp.connectionLost(None) File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/twisted/web/microdom.py", line 1106, in connectionLost raise MismatchedTags( MismatchedTags: expected </< ('<xmlfile />' line 1 column 2) />>, got </END_OF_FILE> line: 4 col: 0, began line: 1 col: 2 An exception has occurred, use %tb to see the full traceback. ``` ### twisted.midrodom.parse() ```python # function signature is twisted.web.microdom.parse(readable) # https://docs.twisted.org/en/stable/api/twisted.web.microdom.html#parse # fuzz test for twisted.web.microdom.parse(): import sys import atheris from twisted.web import microdom def TestOneInput(data): fdp = atheris.FuzzedDataProvider(data) original = fdp.ConsumeUnicode(sys.maxsize) try: microdom.parse(original) except microdom.ParseError: None return def main(): atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() if __name__ == "__main__": atheris.instrument_all() main() ``` This function was another case like `http.parseXML()` where the test failed due to not having a local file it needed. I know how to fix this and could modify it to work but our goal was to let GPT automatically populate it. ``` === Uncaught Python exception: === FileNotFoundError: [Errno 2] No such file or directory: '' Traceback (most recent call last): File "/tmp/ipykernel_1784885/905363962.py", line 12, in TestOneInput microdom.parse(original) File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/twisted/web/microdom.py", line 1120, in parse readable = open(readable, "rb") FileNotFoundError: [Errno 2] No such file or directory: '' ``` ### twisted.midrodom.parseFile() ```python # function signature is twisted.web.microdom.parseFile(file) # https://docs.twisted.org/en/stable/api/twisted.web.microdom.html#parseFile # fuzz test for twisted.web.microdom.parseFile(): import sys import atheris from twisted.web import microdom def TestOneInput(data): fdp = atheris.FuzzedDataProvider(data) original = fdp.ConsumeUnicode(sys.maxsize) try: microdom.parseFile(original) except microdom.ParseError: None return def main(): atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() if __name__ == "__main__": atheris.instrument_all() main() ``` This test case was notable because there is no parseFile() function, OpenAI made up the function and the documentation link. ``` === Uncaught Python exception: === AttributeError: module 'twisted.web.microdom' has no attribute 'parseFile' ``` ### twisted.midrodom.parseXMLString() For the last test where we raise the temperate to 1: ```python # function signature is twisted.web.microdom.parseXML(readable) # https://docs.twisted.org/en/stable/api/twisted.web.microdom.html#parseString # fuzz test for twisted.web.microdom.parseXML(): import sys import atheris from twisted.web import microdom def TestOneInput(data): fdp = atheris.FuzzedDataProvider(data) original = fdp.ConsumeUnicodeNoSurrogates(sys.maxsize) try: microdom.parseXMLString(original).toprettyxml() microdom._validateNodeType("root", 0, "document") # validate node type for root element of parsed xml doc except (microdom.ParseError): None # ParseError indicates a valid input is received by the parseXML function return def main(): atheris.Setup(sys.argv, TestOneInput) atheris.Fuzz() if __name__ == "__main__": # pragma: no cover atheris.instrument_all() # instrument all imports used in program to detect any exceptions that may be thrown while running the fuzz tests main() ``` This testcase also had a crash: ``` #9116 NEW cov: 440 ft: 773 corp: 70/626b lim: 17 exec/s: 3038 rss: 765Mb L: 10/17 MS: 2 CrossOver-CrossOver- === Uncaught Python exception: === MismatchedTags: expected </<:UN ('<xmlfile />' line 1 column 6) >...</:UN>>, got </END_OF_FILE> line: 3 col: 5, began line: 1 col: 6 Traceback (most recent call last): File "/tmp/ipykernel_1789193/2351479959.py", line 14, in TestOneInput microdom.parseXMLString(original).toprettyxml() File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/twisted/web/microdom.py", line 1172, in parseXMLString return parseString(st, caseInsensitive=0, preserveCase=1) File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/twisted/web/microdom.py", line 1157, in parseString return parse(BytesIO(st.encode("UTF-16")), *args, **kw) File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/twisted/web/microdom.py", line 1131, in parse mdp.connectionLost(None) File "/home/x/miniconda3/envs/tf/lib/python3.9/site-packages/twisted/web/microdom.py", line 1106, in connectionLost raise MismatchedTags( MismatchedTags: expected </<:UN ('<xmlfile />' line 1 column 6) >...</:UN>>, got </END_OF_FILE> line: 3 col: 5, began line: 1 col: 6 ``` # Conclusion OpenAI GPT models are capable of writing basic fuzz test cases that get good coverage and are able to find unidentified bugs. From my limited experiments all but 2 of the numpy fuzzers ran without any modification. The numpy fuzzers had similar coverage statistics maxing out at around 300 edges. The two found exceptions were also not very interesting and did not have a security impact. Thinking about it, I believe this is due to the numpy fuzzers being concentrated around data primitives. Additionally, numpy is used to be used for mass data computation so it's naturally been tested with thousands of strange input to make it more resilient over the decades. The resutls for twisted were much better. All but 2 of the twisted fuzzers ran without modification and most importantly out fuzzers identified 3 preciously unknown crashes in three important library functions: * `twisted.midrodom.parseXMLString()` * `twisted.microdom.parseString()` * `twisted.http.parse_qs()` So, should security engineers be concerned about losing their jobs? Well not yet. While GPT can do okay at writing fuzzers if we were to try to scale and let it lose on our entire code base we would create hundreds of fuzzers with hundreds of issues that would create so much noise a capable engineer would have to dedicate months to triage them all. Even if findings bugs was perfected, the real job of a security engineer is to communicate with teams, prioritize fixes and approaches, and make holisitic decisions throughout the SDLC. What I do see is a future where models trained on the source code can automatically write unit and fuzz tests while coding and then send them to an OSS-fuzz type framework that makes use of spare cycles and runs the tests for a few hours. Personally, I am loving the Visual Studio integrated Copilot plugin and am now a happy subscriber.