I’m the administrator of kbin.life, a general purpose/tech orientated kbin instance.

  • 0 Posts
  • 108 Comments
Joined 2 years ago
cake
Cake day: June 29th, 2023

help-circle

  • I think their auction servers are a hidden gem. I mean the prices used to be better. Now they have some kind of systrem that resets them when they get too low. But the prices are still pretty good I think. But a year or two ago I got a pretty good deal on two decently spec’d servers.

    People are scared off by the fact you just get their rescue prompt on auctions boxes… Except their rescue prompt has a guided imaging setup tool to install pretty much every popular distro with configurable raid options etc.


  • Thanks. I think at the time I made an instance (about a year and a half ago I reckon), there was quite a batch snapping up kbin/lemmy on every tld imaginable.

    It’s actually not a bad idea. “The front page of the threadiverse” so to speak. There are plenty of instance lookups out there, but they’re generally self discovered. Something that helps match a user to a smaller instance cannot be a bad thing.

    Having large instances is a good thing of course, especially for hosting larger communities. But, in order to remain fully independent, smaller instances that can be run truly as a hobby on affordable hardware are essential for the fediverse in my opinion.



  • No. I see several genuine looking users that registered and did nothing (fine I guess). But there’s a lot with very similar <somethingnnn>@gmail.com. Some don’t do anything and so far I’ve left them. Some are clearly posting advert crap and they get deleted as soon as I see it. Every now and then I just go through purge the rest that are clearly bot accounts.

    If I was actually getting genuine active users I might look into making a form or otherwise making it difficult (not sure if mbin has that ability mind you). But seems I don’t really get real users. Just me, posting and commenting all day.


  • No, I think it’s just me on my instance (that probably has the capacity for 1000+ active users) and the steady influx of suspicious accounts that pass the email verification and captcha and then either post nothing, or post adverts get banned/deleted and it goes on.

    Mind you I don’t really advertise the instance either. So that’s likely why.

    I suspect people coming from reddit don’t understand the fediverse (I know I didn’t when I first got here). So they go to the hosting instance and join there, not really understanding they can join any instance and then join the community (if not already on the instance).


  • I feel like the only even remotely acceptable way to do this is to show the ad, prompt for the answer for 10 seconds. They can log the right/wrong answer or if the time expires the lack of one and must move on.

    I can imagine metrics knowing if your advertising is actually reaching people is valid. But to make people answer and especially make them watch more if they answer wrong is about as dystopian as it gets.

    If (and I say if, I really don’t want to believe it is) that is the case, the only correct response is to uninstall Hulu immediately and put on your pirate hat.


  • For threadiverse (lemmy/mbin et al) there’s not much in it. It’s fairly easy for an operator to curate their instance by pre subscribing to a whole bunch of communities. I run my own instance, barely any users and I’m constantly banning and deleting them for advertising. But I have plenty of content.

    I made my own mastodon instance and connected to a bunch of groups. Only two or three are active. There’s not really an easy way to get content without following a lot of people. So anyone visiting my instance will see virtually nothing. If they go to social they will see plenty.

    So it’s a bit of a no brainer for most I think.




  • So on my mbin instance, it’s on cloudflare. So I filter the AS numbers there. Don’t even reach my server.

    On the sites that aren’t behind cloudflare. Yep it’s on the nginx level. I did consider firewall level. Maybe just make a specific chain for it. But since I was blocking at the nginx level I just did it there for now. I mean it keeps them off the content, but yes it does tell them there’s a website there to leech if they change their tactics for example.

    You need to block the whole ASN too. Those that are using chrome/firefox UAs change IP every 5 minutes from a random other one in their huuuuuge pools.


  • Yeah, I probably should look to see if there’s any good plugins that do this on some community submission basis. Because yes, it’s a pain to keep up with whatever trick they’re doing next.

    And unlike web crawlers that generally check a url here and there, AI bots absolutely rip through your sites like something rabid.


  • If you’re running nginx I am using the following:

    if ($http_user_agent ~* "SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot|ClaudeBot|Bytespider|ImagesiftBot|Barkrowler|DataForSeoBo|Amazonbot|facebookexternalhit|meta-externalagent|FriendlyCrawler|GoogleOther|PetalBot|Applebot") { return 403; }

    That will block those that actually use recognisable user agents. I add any I find as I go on. It will catch a lot!

    I also have a huuuuuge IP based block list (generated by adding all ranges returned from looking up the following AS numbers):

    AS45102 (Alibaba cloud) AS136907 (Huawei SG) AS132203 (Tencent) AS32934 (Facebook)

    Since these guys run or have run bots that impersonate real browser agents.

    There are various tools online to return prefix/ip lists for an autonomous system number.

    I put both into a single file and include it into my web site config files.

    EDIT: Just to add, keeping on top of this is a full time job! EDIT 2: Removed Mojeek bot as it seems to be a normal web crawler.




  • We can see it ourselves. We use rabbitmq for incoming (and maybe outgoing, it’s been a while since I looked at how it is) federation. So, you can see the queues there. For incoming (from rabbitmq) and outgoing there are also queues (symfony messenger) and these handle failures and can be configured and can be queried.

    After the upgrade I just took the default configuration again (because it seems queue names changed). But I used to have various rules setup in rabbitmq for retries and it took a fair few tries before the messages ended up in the proper “failed” queue (which needs manual action to retry). Some items you eventually need to clear (instances that just shutdown, or instances that lost their domain for example). They will never complete.

    But it’s not exposed in any way to my knowledge. Well unless people have their rabbitmq web interface open and without login of course.





  • Looking at incoming request. .world is working OK for me. They seem to be batching stuff like I’ll get nothing for 30 seconds, then over 3 seconds like 50+ requests.

    Of course I don’t know if their queue is backed up and I’m getting delayed stuff. I’d need to stop processing and look into the incoming queue to see what they’re sending.

    Bit of an edit. Looking at incoming again I can see under newest items, an entry from world that was 11 minutes old. Oh I have an idea. I’ll see if this edit gets there in a timely manner.

    Spoiler alert, it was instant.

    Oh ignore me. It’s specifically between those two instances I guess.