1. Yes, it's a whole new look! Have questions or need help? Please post your question in the New Forum Questions thread Click the X to the right to dismiss this notice
    Dismiss Notice
  2. Seeing tons of unread posts after the upgrade? See this thread for help. Click the X to the right to dismiss this notice
    Dismiss Notice

Slurp Spiders

Discussion in 'Forum rules and information' started by Dutchml, Dec 10, 2006.

  1. Dutchml

    Dutchml Member

    Joined:
    Nov 4, 2002
    Messages:
    715
    Likes Received:
    15
    Anyone notice all these Yahoo and Google slurp spiders hanging around our forums. They're just indexing our pages, but they occasionally pick up on e-mail addresses as well. I've heard these boogers can be minimized by an admin changing the crawl delay settings or some such. Can the admins do this or at least make some announcement about these things that can compromise information posted here?
     
  2. flynnibus

    flynnibus Well-Known Member Forum Staff

    Joined:
    Oct 29, 2002
    Messages:
    5,358
    Likes Received:
    250
    The search engines all index pages.. and have been doing so all along.

    As for email addresses.. that's why its up to the individual user to post or not post their personal information. We provide ways through the forum for users to communicate (private message, and email via forums) without posting personal information. The search bots aren't the ones you have to worry about.. its the other malicious bots (that ignore friendly no-bot files anyways) harvesting email addresses you need to be worried about.

    As for making an annoucement.. we have and it is visable at the top of every forum. http://www.broadlandshoa.org/hoaforum/announcement.php?f=15&a=4
     
  3. brim

    brim Member

    Joined:
    Nov 18, 2003
    Messages:
    1,339
    Likes Received:
    11
    I noticed there's not a robots.txt on here...that would minimize the indexing of the forums.

    User-agent: *
    Disallow: /hoaforum/
     
  4. Mr. Linux

    Mr. Linux Senior Member & Moderator Forum Staff

    Joined:
    Jul 26, 2001
    Messages:
    3,277
    Likes Received:
    69
    That will work with 'legitimate' bots, like Google and Yahoo which are not the problem. Most malicious bots/email address harvesters will pull the robots.txt file and look there first knowing there's something good there to index that you don't want anyone to index...

    As Flynnibus stated, there's nothing better than plain old common sense guidelines for posters to follow; if you're worried about spam, don't post your email address in any of your postings. It's plainly stated at the top of every forum...
     
  5. flynnibus

    flynnibus Well-Known Member Forum Staff

    Joined:
    Oct 29, 2002
    Messages:
    5,358
    Likes Received:
    250
    But we want the forums indexed by legit sites

    and malicous bots don't adhere to robot.txt files anyways
     
  6. brim

    brim Member

    Joined:
    Nov 18, 2003
    Messages:
    1,339
    Likes Received:
    11
    I found legitimate bots to be irritating as well...like when they eat bandwidth indexing my picture site, but yah, nowadays if you post your email publicly then you deserve what you get.

    Go post your email address in usenet and see what happens. :)
     
  7. flynnibus

    flynnibus Well-Known Member Forum Staff

    Joined:
    Oct 29, 2002
    Messages:
    5,358
    Likes Received:
    250
    I have something in mind that will help the uneducated.. but haven't looked at its complications yet.

    I'd much prefer to find someone skilled enough to work on the styles of the site :) I'm graphically crippled :D
     
  8. Dutchml

    Dutchml Member

    Joined:
    Nov 4, 2002
    Messages:
    715
    Likes Received:
    15
    Those bots also will harvest e-mail addresses out of your user profile. You don't have to post them in a forum for these spiders to find them.
     
  9. flynnibus

    flynnibus Well-Known Member Forum Staff

    Joined:
    Oct 29, 2002
    Messages:
    5,358
    Likes Received:
    250
    Email addresses are not part of the public viewable profile. Clicking on 'email' in a users profile will send a message via the forums, not exposing your email address to the user sending you the email.

    The only way your email address is listed on the forum is if you put it in public view. Be it a signature, a post, a blurb about yourself, whatever.
     
  10. vacliff

    vacliff "You shouldn't say that."

    Joined:
    Nov 14, 2002
    Messages:
    5,281
    Likes Received:
    344
    So what's a "slurp spider" and a "bot"?
    Sorry to display my ignorance.
    (Before y'all start, I know, I know, it's on display all the time!)
     
  11. flynnibus

    flynnibus Well-Known Member Forum Staff

    Joined:
    Oct 29, 2002
    Messages:
    5,358
    Likes Received:
    250
    Cliff,

    its software that is design to automatically go through webpages looking for information to index or track. they are 'robots' (hence bots) designed to automatically go through webpages.

    This is how the search engines find the content that they index for you to later find with a search.

    Slurp is the one used by yahoo

    Some reading here http://en.wikipedia.org/wiki/Web_crawler
     
  12. Dutchml

    Dutchml Member

    Joined:
    Nov 4, 2002
    Messages:
    715
    Likes Received:
    15
    And sometimes hundreds of them start a bot convention on a site. They'll come and go every couple of weeks. If you click on "currently active users" at the bottom of the main forum page you can see where the little guys are hanging out.
     
  13. vacliff

    vacliff "You shouldn't say that."

    Joined:
    Nov 14, 2002
    Messages:
    5,281
    Likes Received:
    344
    So they aren't dangerous?
     
  14. flynnibus

    flynnibus Well-Known Member Forum Staff

    Joined:
    Oct 29, 2002
    Messages:
    5,358
    Likes Received:
    250
    No. They can be a pest when they hit a site too often or some people don't like the bots taking up their bandwidth (as you pay for bandwidth on websites). But they are how the search engines 'find' stuff.

    Malicious ones scan websites for personal info to use for spammers or other malicous uses. As long as your website is public, they will be able to hunt for that info.
     

Share This Page