Hallucinated vulnerability disclosure for Curl generates disgust
But Bug Bounty platform HackerOne isn't too worried that LLM-generated bug reports will become a deluge...
A bug bounty hunter has attempted to gain a reward by reporting what appears to be an AI-generated/hallucinated software vulnerability.
The attempted disclosure via the HackerOne bug bounty platform alarmed information security professionals concerned about a wave of junk AI-generated “vulnerability” reports swamping/spamming triage teams.
(Bug bounty platforms let organisations tap a marketplace of hackers to crowd-source vulnerability testing, as part of their security programme. Companies wanting to set up their own vulnerability disclosure programme can refer to the NCSC’s toolkit for some top-level guidance.)
This particular disclosure purported to be for a heap overflow vulnerability in the open source project Curl. It was rapidly reviewed by Curl’s founder, the impressively responsive Daniel Stenberg, aka Badger, whose growing scepticism turned to disgust when his probing was met with a nonsense code seemingly spewed up by ChatGPT or an equivalent.
“Just mumbo jumbo coming back, inventing problems that don't exist in code” said Stenberg, closing the report but disclosing it for transparency.
Trying to “explain” the bug, the bounty hunter had generated code that is not even present in the Curl git master: “It looks like an edit done by you. Seems odd to complain about code you wrote yourself” Stenberg said.
“We're soon to see a top reason security issues don't get fixed will be because they are lost in the sea of LLM-generated nonsense bug bounty reports” as security professional Scott Piper put it wearily on X.
(Vulnerability disclosure triage is already hard: As the Apache Software Foundation’s 2022 annual report showed, the ASF, which oversees 350+ open source projects, received 22,600 emails to its security email in 2022. After careful review of each one, the ASF found that excluding obvious spam, many were “the unfiltered output of some publicly available scanning tool” and triage found that just 2.3% were legitimate bugs.)
HackerOne’s Sandeep Singh, Director, Technical Services told The Stack that whilst such reports look frustrating, he wasn’t overly worried about their proliferation. He emailed: “Although there have been cases where responses or report language have been supported by GenAI, the frequency of such instances does not indicate any substantial increase.
"However, it’s likely we will see more of this. HackerOne Triage reviews every report independently and attempts to reproduce the vulnerability before sending it to the customer, weeding out non-valid or non-reproducible reports, regardless of whether they are AI-generated.
“We are looking to increase the automation of this process and are monitoring for changes as GenAI becomes ever more ubiquitous, updating our policies and procedures accordingly. In the example shared, it is common that when individuals want to learn about Buffer overflow, they will take the advice of many online tutorials that use the example strcpy as an unsafe function to use. However, a hacking expert will be able to distinguish when reading the code base if sufficient checks are in place to prevent buffer overflow. In this case, the individual may have just found the strcpy function in the codebase and then used an LLM to write a report and follow-up responses without understanding the code properly.
“The HackerOne platform disincentivises reporters from continuing to submit low-quality invalid reports by deducting reputation points (closing reports as Not Applicable) and providing an option to the customer to ban reporters in the case of repeated spam submissions” he added. Whether that’s enough to stop a ripple becoming a tsunami remains to be seen.