Recently Rob Knight posted on his block that Perplexity AI is lying about the user agent they are using to access sites. WIRED later confirmed this. To which Perplexity CEO Aravind Srinivasan responded with a big ol’ “eh the robots.txt is not a legal framework” which I read as a “there is nothing stopping us so we’ll do what we want, fuck you”.
Fighting Bots, Manu
- Accept the fact that some dickheads will do whatever they want because that’s just the world we live in
- Make everything private and only allow actual human beings access to our content
I’ve added a bunch more bot user agents into my “blockbots” nginx snippet as per the list here. For the rest of the bots that I cannot block because they are disguising themselves with a user web browser user agent: *shrug*.
As far as I am concerned, search crawlers like Google, Bing, Kagi etc. are fine, so are RSS feed readers. All other bots get blocked if I notice them being a bot in my logs.