1. Yes, it's a whole new look! Have questions or need help? Please post your question in the New Forum Questions thread Click the X to the right to dismiss this notice
  2. Seeing tons of unread posts after the upgrade? See this thread for help. Click the X to the right to dismiss this notice

Website Outage Information and Discussion

Discussion in 'Forum rules and information' started by Forum Administrator, Jun 2, 2008.

  1. Forum Administrator

    Forum Administrator Member Forum Staff

    Joined:
    Jul 25, 2001
    Messages:
    246
    Likes Received:
    5
    This weekend our community website and discussion forums were affected by a catastrophic fire and power emergency in the Houston, TX datacenter that hosts our website. This situation did not affect our HOA offices or their operation but did affect our community website, discussion forums and mailing lists.


    We are back online now but fully expect intermittent outages while the hosting company reconstructs elements of the datacenter that were destroyed in the incident. Those curious about the incident can read about some of it here


    We apologize for the inconvenience of not having the website available - but such catastrophic, wide-spread outages are quite unusual and we expected the downtime to be quite short.

    We do not anticipate any information or data loss - but we may have periods where the website is unavailable until the datacenter situation is fully resolved.

    If you have any questions, you can post them here.

    Thank You
    Broadlands Technology Committee
     
  2. Kaosdad

    Kaosdad Will work for Rum

    Joined:
    Sep 21, 2005
    Messages:
    2,557
    Likes Received:
    4
    Being one of the guys that runs a datacenter locally, I truly feel sorry for the facilities & operations geeks at The Planet. Already they are (more than likely) being bombarded with "I WANT A FULL REPORT!!!" "HOW COULD THIS HAPPEN????" and "I DEMAND HOURLY UPDATES ON FULL RESTORATION PROGRESS AND A COMPLETE 'GOING FORWARD' PLAN OF HOW YOU WILL MAKE CERTAIN THIS NEVER EVER HAPPENS AGAIN!!!!!!!"

    I always want to shoot back snappy responses like:

    "I can report to you, or I can restore services. Pick one."
    "Will this happen again? Well, we deal with electro-mechanical devices that will fail at some time. So, maybe not this, but something else will break."

    Given the nature of the event I am stunned we are back on line today! Great job, Planet Guys!

    Edit: This is cool - I posted after Mr. Linux and jumped up front.
     
  3. Villager

    Villager Ashburn Village Resident

    Joined:
    Nov 1, 2006
    Messages:
    2,512
    Likes Received:
    19
    I'm happy to have it back! :)
     
  4. Silence Dogood99

    Silence Dogood99 New Member

    Joined:
    Apr 11, 2005
    Messages:
    2,769
    Likes Received:
    2
    Thanks again to the volunteers on the Tech Committee who give up their time with family, or time at the pool over the weekend, to give us this opportunity to post with neighbors.
     
  5. T8ergirl

    T8ergirl New Member

    Joined:
    Dec 29, 2004
    Messages:
    523
    Likes Received:
    4
    Me too! I was getting ready to drive T8erman to forum rehab. He was getting the DTs pretty bad. :)
     
  6. gunzour

    gunzour "Living on the Edge"

    Joined:
    Mar 29, 2007
    Messages:
    586
    Likes Received:
    0
    Me too.. sounds like the folks in Texas have had a very long weekend. No one likes their web site to go down, but the information coming out of this sounds like they had quite a catastrophic event, that could probably happen to any data center. Kudos to the folks who have likely been working non-stop since Saturday morning trying to put their operations back together, and thank goodness nobody was hurt!
     
  7. Mr. Linux

    Mr. Linux Senior Member & Moderator Forum Staff

    Joined:
    Jul 26, 2001
    Messages:
    3,210
    Likes Received:
    27
    I would like to thank the people who contacted me over the last 24-48 hours offering their help with the website, temporary relocation, etc.

    The Tech Committee appreciates everyone's offers of help and assistance! You all are great!
     
  8. foodie

    foodie New Member

    Joined:
    Aug 6, 2005
    Messages:
    1,472
    Likes Received:
    3
    Thanks for all the hard work you do for your community and the forums. I noticed tonight that the current time posted is an hour off. No big deal to me--again kudos to the Technology Committee.:bow::)

    Foodie & Family:)
     
  9. Forum Administrator

    Forum Administrator Member Forum Staff

    Joined:
    Jul 25, 2001
    Messages:
    246
    Likes Received:
    5
    Website is back online and running from a new datacenter. This has been a long painful road - thank you for your patience through the downtime. This type of catastrophic failure by a provider is almost unheard of except in the case of total destruction in the the face of natural disasters.

    Of course because we try to stay lean to ensure residents are not paying more then what they need (or desire to pay for), we do not utilize multiple sites to host our systems. So while something like a www.aol.com would not go down when a single site drops off, simple installations like ours are susceptible to such failures. Where this got out of hand was the total durations and expectations for return to service. Trust us, if we knew on Day 1 this would have been a 5 day outage, we would have been restoring to a new provider.

    However since the bulk of the pages is not what one would term 'mission critical' we chose to wait it out rather then add confusion to the whole mix by rolling back to older backups and then have to deal with merging things back together when the main site came back online, etc. We opt'd to put up a status page to keep you up to date on what we knew. Due to complexities in moving the domain name stuff around for the temporary page, combined with the missed expectations on return to service, this status page took longer to get online then we normally would have preferred. Lesson learned there, we can have a better alternative ready in the future if needed.

    As for the site today, its identical to the previous (its the same box!) but with the network move, there may be some configuration that is not done yet. If you encounter any problems navigating or using the services of the pages, please use the contact us link at the bottom of the page to report any issues.

    Thank you for your patience during this. Now get outside and enjoy the weather :)
    - Broadlands Technology Committee
     
  10. flynnibus

    flynnibus Well-Known Member Forum Staff

    Joined:
    Oct 29, 2002
    Messages:
    5,242
    Likes Received:
    215
    looks like the last wait to hold on to a move to the second datacenter has paid off. Our old datacenter had yet ANOTHER power failure this morning. Short, but about 20+mins of downtime for most.
     
  11. Skins fan

    Skins fan Tequila fan (100% agave)

    Joined:
    Mar 18, 2004
    Messages:
    312
    Likes Received:
    0
    Kaos,

    As someone who leases servers from a datacenter and has dozens of customers who depend on me and don't know the datacenter from Adam, I can say that I was completely impressed with the regular updates provided by the Planet. Almost any other datacenter would have taken your approach.

    Honestly, are you the only guy working there??? Its the job of a datacenter to be prepared for just such an event and to keep its customers informed in the event of a failure so they can be informed and let their clients know what his happening. The Planet CEO understood this and took it very seriously. They were very well organized and professional under the absolute worst of circumstances.

    I have had downtime before and do understand that equipment fails from time to time. However, datacenters advertise redundancies and assure 99.98 (or whatever) uptime. A datacenter I used to deal with turned out not to have all the redundancies it claimed and the result was a longer downtime than necessary. Its perfectly legitimate for a client to expect the datacenter to be accountable and able to inform their customers on an ongoing basis until everything is back to normal.

    Skins fan

     
  12. flynnibus

    flynnibus Well-Known Member Forum Staff

    Joined:
    Oct 29, 2002
    Messages:
    5,242
    Likes Received:
    215
    Sorry, but if all you were doing is reading the webpage with the updates - then all you are seeing is the rose colored glasses view of the event. I could write pages on how bad they screwed it up and how aweful they misrepresented their abilities and the information provided to clients.

    Let's start with just some basic highlights from day 1 no less

    They say they had an explosion that damaged three walls, and completely destroyed the power distribution and switching gear. Yet, several hours into the incident, they couldn't even say what was going on. Their tech support phones were knocked out, their customer portal was knocked out, no one was answering the office main numbers. It took several hours for them to even acknowledge there was a catastrophic event of ANY sort in the datacenter to take it down. HOURS... this in an industry where downtime of MINUTES is unsat.

    The only source of information at all.. was a community discussion forum, that isn't even the offical conduit of information, nor were there any official updates from the company - only some posts from employees with 2nd hand information.

    Early on it even got as bad as them CLOSING the threads discussing the outage and posting only a blurb that the datacenter was evacuated in a forum where no one else was allowed to post. All of this while there was still no way to reach support - and there is still NOTHING on any of the company websites about the issue.

    It wasn't until later they started getting their information flow SOMEWHAT sorted where they assigned employees to start posting bits of knowledge of which basically consisted of acknowledging there was a fire and that was about all that was known. Several hours into the incident they finally got around to putting together a webpage hosting status.

    I'll skip the rest of the complete CLUSTER*&^# they had dealing with their DNS customers, resellers, and the several failed attempts to get even a baseline running and just say... you can't believe 3/4 of the BS posted about support queues and other things on that page. That is the most selective collection of information and is in NO WAY reflective of the actual service customers were getting.

    The only way we even got up when we did in 5 days was because we would not back down on getting our gear OUT of that building... which they screwed up as well.

    ThePlanet was about 4 days behind on the response they SHOULD have had from the start - and even after the fact really refuse to acknowledge the severity of the disruption. They are going around patting each other on the back 'mission accomplished' and posting zero wait queue times - meanwhile while thousands of customers still can't get their DNS records due to poorly planned and then fumbled recovery of the servers. During which ThePlanet was totally unresponsive to requests.


    Let's also note TP has SIX datacenters. Only one was affected - yet the company virtually shutdown to any customer that was affected. They have staff at all the other datacenters, plus their support center. By their own claims no one could get into the place - yet it took them almost a full day to start realizing 'oh, we better have a plan B'. It took them 3 more days to get plan B stabalized. And even after 4 days, while they had skeleton services up, thousands of customers were still down and the CEO is boasting about how great their recovery has been. Total disconnect from reality.

    Couldn't be any further from the truth. The actual lack of information is what pissed most customers off. It took them almost 2 full days before TP realized they were toast and had to start some sort of plan C. Then they start - and then stop half way through because they got temporary power back. Temporary power that failed at least 2 more times after that.

    If TP had realistic estimates and transparency even within the first 12hrs of the incident customers could have started their own contingencies - instead TP was not forward with their own capabilities to recover and failed miserably. The DNS server outages affected WAY more customers then the actual hosted server failures and if you read the status page you'll see very little information regarding DNS... most of it inaccurate in terms of when they claimed things were back.

    And stick to their SLAs - something TP is weasling out of with some BS settlement-style offering to their customers. If you measured real downtime and the number of ups and downs during the incident, they'd own much more. On top of that you'd think they'd be wanting to swoon customers from mass defection don't you think?

    Not TP - they are too busy rah-rahing and telling each other how great they were doing while thousands of customers sat in the dark.
     
  13. flynnibus

    flynnibus Well-Known Member Forum Staff

    Joined:
    Oct 29, 2002
    Messages:
    5,242
    Likes Received:
    215
    Oh - and lets not forget this incident NEVER showed up on their main pages - even when their tech support lines were OUT. The only way you found out about the service update webpage was through the community forums or by opening a support ticket.. which you couldn't do for about the first 30hrs.

    Their support services are effectively wiped off the map - and they don't even put a NOTICE AT ALL on their company pages. Some transparency and flow of information... PFFT!
     
  14. Chsalas

    Chsalas Member

    Joined:
    Mar 14, 2003
    Messages:
    1,522
    Likes Received:
    15
    It was probably more like damage control for them. Like they say, if it isn't in writing, it didn't happen.
     
  15. flynnibus

    flynnibus Well-Known Member Forum Staff

    Joined:
    Oct 29, 2002
    Messages:
    5,242
    Likes Received:
    215
    They obviously didn't want their image marred by putting real information on their public webpages. They put their 'image' before their customers.

    They also refuse to release any details or show any photos of the incident. This compared to the company when they suffered through hurricanes they had webcams and other details showing how they were handling the disaster. That company was bought by ThePlanet and now we have this big company mentality - bury it and minimize exposure.
     

Share This Page