• 1 Post
  • 20 Comments
Joined 2 years ago
cake
Cake day: June 10th, 2023

help-circle


  • Better, actually. This feeds the crawler a potentially infinite amount of nonsense data. If not caught, this will fill up the whatever storage medium is used. Since the data is generated using Markov-chains, any LLM trained on it will learn to disregard context that goes farther back than one word, which would be disastrous for the quality of any output the LLM produces.

    Technically, it would be possible for a single page using iocaine to completely ruin an LLM. With nightshade you’d have to poison quite a number of images. On the other hand, Iocaine text can be easily detected by a human, while nightshade is designed to not be noticeable by humans.






  • Current neural networks do really fancy statistics. To make the model better, you need to make the statistics more precise. Leading to marginal improvements of accuracy requiring exponentially growing marginal amounts of training data. This leads to exponentially decaying marginal utility coupled with exponentially growing marginal expense. Which quickly becomes unsustainable. Edit: On the plus side, this likely means you won’t have to give up much utility when the market adjusts.