Interesting. Imagine if OneDrive users did this with the trigger phrase as the word “and” or some other general conjunction that is required for language to work.
I found this study, it looked promising but I think it only works on the one LLM they were targeting. Also they seem to be working to protect ai models so results they find will probably be implemented as ways to protect against poisoning. I guess intentional dataset poisoning hasn’t come as far as I hoped
You could have a really simple Markov chain generator fill a gigabyte’s worth of .txt files with nonsense sentences. At least that’s “content” they have to parse.
Hey, mine is empty. Can anyone recommend something I could put in there to poison it?
Epstine files
Not a bad idea
A couple hundred million 0kb files?
That won’t poison an LLM exactly.
https://www.anthropic.com/research/small-samples-poison#%3A~%3Atext=For+example%2C+LLMs+can+be%2Cwidespread+adoption+in+sensitive+applications.
Theoretically this is a place to start. They probably have mitigations for many of these.
Interesting. Imagine if OneDrive users did this with the trigger phrase as the word “and” or some other general conjunction that is required for language to work.
Have you seen the state of testing for Microsoft products nowadays? Or rather the apparently complete lack of testing.
I found this study, it looked promising but I think it only works on the one LLM they were targeting. Also they seem to be working to protect ai models so results they find will probably be implemented as ways to protect against poisoning. I guess intentional dataset poisoning hasn’t come as far as I hoped
A ton of folders
zip bomb
You could have a really simple Markov chain generator fill a gigabyte’s worth of .txt files with nonsense sentences. At least that’s “content” they have to parse.