PDA

View Full Version : Ensim2


Joe
08-25-2003, 11:02 AM
We're getting some less than optimal readings from Ensim2 this morning - I'm performing a quick reboot to see if that clears the issue.

Downtime should be 5-10 minutes.

Joe

Joe
08-25-2003, 11:36 AM
We're still waiting on a reboot from Rackshack... as soon as I have more info, I'll provide updates. Looks like they might have needed to run FSCK (File system Checking)

Joe

thevillageinn
08-25-2003, 12:08 PM
i had noticed that I got mail server errors all night on this account. good to see that it wasn't just me...thanks for keeping on top of this stuff for all of us!

Anonymous
08-25-2003, 12:42 PM
Ugh, 11:30 and its still down :?

muttdog
08-25-2003, 12:46 PM
joe is always on top of this stuff... I am starting to believe that he doesnt sleep...


Average ticket close time is like 5 minutes... :shock:

I have a ticket open with Dell Corporation since March...

muttdog
08-25-2003, 12:58 PM
What did I tell ya?

I posted and 5 minutes later :lol: , BOOM it is all working again... (at least my site is)


Way to go Joe!

Joe
08-25-2003, 01:03 PM
Here's the course of events from Rackshack:

8/25/03 8:57:47 AM
Please ReBoot My Server.

8/25/03 9:04:18 AM
Rebooting

8/25/03 9:25:23 AM
Server is still unresponsive after reboot. Further investigation is needed.

8/25/03 9:38:39 AM
Server is running manual FSCK. Will continue to monitor until server is back online.

I'm on the phone with them now to find current status.

Joe
08-25-2003, 01:04 PM
Please note, those times are CST

Anonymous
08-25-2003, 01:05 PM
I assume it's still down? Can't access any of my sites.

Joe
08-25-2003, 01:26 PM
12:15pm ET:

Server is still running through FSCK ... no ETA at present. As soon as I get more info, I'll update it here.

Joe

Joe
08-25-2003, 01:38 PM
8/25/03 11:16:20 AM
Server is kernel panic, No init found. Unable to mount file system, server will need to be restored.

We're working with the datacenter now to install a new drive and restore data.

Unfortunately, this outage will be extended. Once it's completed, all customers on this server will be contacted.

If you're using Zone Edit, or other instant DNS management tools, and would like to move to a new server, please let me know.

Joe

Anonymous
08-25-2003, 02:36 PM
So you will you be able to restore the data? Using a backup stored on another server? If so, what is your most recent one?

I need to advise my members of the situation. I would like to assure them that only data from the last day or so was lost. I'm sweatin' bullets because I haven't downloaded a backup in at least 3 weeks.

Anonymous
08-25-2003, 02:47 PM
When you do the restore, does that restore all of the websites? I've been a bad boy and haven't ftp'd my backups lately...

Anonymous
08-25-2003, 02:48 PM
there are emails coming in during this server down time... :idea: so, will we will be able to get these emails after the server is fixed?

thx!!

Joe
08-25-2003, 03:27 PM
The server hardware restoration has begun - just got off the phone with RS.

As for mail, mail will continue to que for 4-24 hours ... we shouldn't lose anything.

As for data backups, I won't know positively until it's done, but I believe the backups are safe on the boxes secondary drive. I'm 99.5% certain that everything is covered. I did verify that backups ran last night, worst case is we'd restore from the previous day).

I can't stress enough how important backups are ... never EVER rely on ANY host to provide them - you should always keep a fairly current copy on your hard drive - thats why we take overnights to put them in everyone's /var/backup folder. We have been very fortunate that this is the first real hardware outage in a long time to affect a mass number of sites.

As I get more updates, I'll let you know.

Joe

Joe
08-25-2003, 06:16 PM
Well, rackshack in their infinite wisdom moved the data backup drive from ensim2 to ensim1 (as requested) - but never bothered to correct the parameters on ensim1 - which left BOTH servers down for nearly 3 hours.

So, now that ensim1 is back online, we can finally begin the process of data restoration.

I'm very sorry this is taking so long, I've been sitting here calling RS every couple hrs to find a status - it's just now that it's back online.

Joe

Joe
08-25-2003, 06:24 PM
Great news!! All the backup data from last night is safe and sound.

We're finishing the preparations (security patches) on Ensim2 now, then we'll begin to restore client data. There's almost 170 sites on there, so it'll take a little bit of time, but it WILL happen - please have patience just a little while longer.

At first, the control panel will NOT look normal - it'll be Ensims default "green" - ugly, but functional. We're working on the security, then the data restores. After thats done, we'll move on to make the box pretty again (that should be done in a day or so).

Thanks everyone - your support through this is MUCH appreciated.

Joe & Staff.

Herbster
08-25-2003, 06:50 PM
I'd say that's a pretty darn quick fix for such an ugly problem.
My site looks as I left it. All scripts run as per normal.

Hey Joe. Guess we don't have to ask you what you do for excitement. :lol:

chlucy
08-25-2003, 07:19 PM
My site is "up" but redirects to the site admin login page. When i try to log in it rejects my password. Same goes for my email - it keeps asking for a new password. Any idea when it will be back to normal (I could care less about how pretty it is)? Sorry to be a pest but I'm waiting on a bunch of emails.

Thanks for keeping us so informed! I know many hosts who wouldn't bother mentioning anything at all.

Joe
08-25-2003, 07:20 PM
What is your site?

soccerbuzz1
08-25-2003, 07:28 PM
my site isnt coming up. i cant even log in to ensim. :( my site is kidintraffic.com. when i try to go to it, it comes up with a dns error. help!

Joe
08-25-2003, 07:31 PM
Let me explain - as I said earlier - NOTHING IS RESTORED on Ensim2 yet... we're just finishing the security updates, then we can restore user data - we're still looking at 2-3 hrs for complete data restoration. Our backups worked, they're all valid, we've just gotta replace 30gb of customer data. It's gonna take some time.

Joe

Anonymous
08-25-2003, 09:04 PM
thanks for the updates - as i didn't think that hostpc would ever go down as i recommend you guys to practically everyone i know.

just out of curiosity - at what time last night did the backups you guys performed occur? i was noticing that the ftp transfers have been going a bit slow the past couple of days but maybe that wasn't indicative of the situation either-

thanks again for the 411 of the situation

Joe
08-25-2003, 09:50 PM
Restore is in progress.

I'm going to need your help tho.

If you're a reseller, I need you to open a helpdesk ticket (http://helpdesk.hostpc.com ) and tell me which domains are yours...

We're restoring each site individually - I'll need to go back and associate with the reseller.

Joe
08-25-2003, 09:52 PM
I really need to ask you all to be patient. I'll be up as long as it takes to insure that sites are restored accurately and precisely - there's going to be a couple bumps, but we'll get through it with your cooperation.

Again, if anyone wants to move to another box, for any reason, I'd be more than happy to allow that.

Joe

Cindy
08-25-2003, 10:03 PM
you know there's no hurry on my site Joe.......but the email address I use for work - any idea when that will be coming thru again? Tonight? Tomorrow? Next week? :wink: Just so I know if I need to contact some of the companies I work for.

Cindy

Joe
08-25-2003, 10:10 PM
The restore is in full progress. Sites are being restored in alphabetical order. There's about 30gb of data to restore - it's gonna take a little while, but it IS working. Please read this entire post for critical information.

There MAY - and I stress MAY be some issues with databases that need to be reassigned to the correct user. Thats not a problem - the databases exist, they just MAY need ownerships changed.

All users mail accounts are being restored. We've been down a little less than 12 hours. Mail will take time to start flowing in. Nothing SHOULD HAVE bounced, but it may have. The rest will slowly come in over the next 12-24 hours - new mail being delivered first, everything else catching up eventually.

Please hold off logging into your control panels on this server for about the next hour. That'll keep the process running smoothly.

Now, what are we doing to prevent this from happening again in the future. I've had that question about a dozen times today. There is NO way to anticipate server/hardware failure. All you can do is have an emergency plan in place to deal with it if it does happen.

1) Identify the problem, contact the datacenter for hardware diagnostic checking.

2) Have datacenter replace hardware, if necessary.

3) Fortunately, HostPC Maintains backups of user data. I've said it a MILLION times - DO NOT RELY ON ANY HOST to do this for you - EVEN US! These are computers, processes can fail, anything could happen. Have a good copy of your sites at ALL times.

4) Restore security on the newly formatted drives.

5) Restore client data in a defined, organized manner.

6) Address issues as they appear.

I feel our emergency plan, while not perfect, worked very well in this situation. Ideally, we'd have NO downtime, but it does happen, with anyone. Server back online, hardware replaced, data restored within 12 hours. Not many hosting companies can claim that. I'm not patting myself on the back, but I'm satisfied that everything went as well as could be expected given the hardware problems.

Outages are not covered by Rackshack, or DV2 datacenter for power failures. They do not reimburse us for outages at all.

I will offer to refund 50% of your monthly hosting fee for customers on Ensim2 due to this outage. Please open a support ticket, ask for a check to be issued to you. Be sure to include your full mailing address, and recent payment receipt #. We'll issue a check within 2 weeks for your inconvenience. Payments will not be issued via Paypal, only by company check.

Do we stand behind our services? Yes, even if it is out of pocket.

Thank you for your patience, your continued support, and your trust.

Joe

Amezri
08-25-2003, 10:24 PM
Just wanted to say thanks for all your work on getting things up and running. I have been very satisfied with hostpc so far and haven't experienced any major problems or outages except for this one case and you and your staff are always quick to help.

As long as things are back up within a week (I've experienced a host outage that long, yes), then I'll be more than happy and won't even request a refund. Heh.

Keep up the most excellent work! :D

-Cynthia

Joe
08-25-2003, 10:42 PM
a WEEK?? :shock:

If hardware hadn't been replaced within another hour or so, or if we didn't see progress, we were getting ready to move everyone to another server, and force the DNS server changes (hotwiring). Fortunately, Rackshack was able to identify and replace the hardware (after a couple of small obstacles) rather quickly.

a week?? :shock:

Joe
08-25-2003, 11:39 PM
The first round of data has been restored.

UPDATE!!!!!

For the sites that are missing, the complete data was found in a backup from 8/24 rather than 8/25.

We're going to "re-restore" the 8/24 backup for the missing sites. It'll take a little while longer to re-run the restores for those sites.

The "clean up" will begin in the morning. The rest of the sites (8/24) will be restored overnight.

Joe

Anonymous
08-26-2003, 12:47 AM
a WEEK?? :shock:

If hardware hadn't been replaced within another hour or so, or if we didn't see progress, we were getting ready to move everyone to another server, and force the DNS server changes (hotwiring). Fortunately, Rackshack was able to identify and replace the hardware (after a couple of small obstacles) rather quickly.

a week?? :shock:

Oh.. uh..I didn't mean a week with you guys. :oops: I meant I was with another hosting service that had outages all the time, often for days. It sucked.

All my sites are back now, thank you! ^_^ Erm, with the exception of the cgi yabb board located at http://www.number96.net/cgi-bin/yabb/YaBB.cgi - I'm encountering a 500 error. All the permissions look correct (chmod to 755). Is this still a server problem that I should wait out, or should I open a ticket?

Amezri
08-26-2003, 12:49 AM
Gah! Opps.. sorry. :oops: That was me. Didn't realize I wasn't logged it -_-;

Joe
08-26-2003, 01:00 AM
Yes, open a ticket.

We're working through them one at a time.

Joe

thevillageinn
08-26-2003, 02:58 AM
if you've been to the support page to get your database restored, but don't want to wait...just like the page says, you can do it yourself.

*simply log into your site admin page
*make sure your mySQL password is correct (I had to reset mine)
*go to the /var/backup directory in the file manager
*download the appropriate database backup files to your local computer (remember where you put them)
*minimize your browser window and find the database backup file you just downloaded.
*unzip the backup file (WinRAR works on .gz as should WinZip)
*Upload the uncompressed database backup file to your /var/www/html directory naming it as follows: databasename.import (where databasename is the actual name of the database you intend to restore)
*Go back to the site admin page
*go to "Database-o-Matic"
*make sure the database you plan to restore the backup to exists, either by listing your databases, or by creating it
*return to main "Database-o-Matic" page and click "Import a MySQL Dump file into a database"
*if your backup file is found in the /var/www/html directory, and is named appropriately there should be a clickable link in the list of databases
*Click that link and you should get a successful status, if not, simply follow the steps again to see if you missed anything

Hope that helps-
(let me know if I missed anything)

Joe
08-26-2003, 06:44 AM
Thanks thevillageinn! Great info - thanks for helping out!

This server is likely to be a little slower than normal today, but it WILL clear up. Why? Everyone is scrambling to make sure their data is correct, checking mail, etc. While everyone does that (should last no more than a day or so) - it's likely to be a bit slower than normal.

I promise, just a little more patience, it'll all be MUCH MUCH better soon :)

Joe

chlucy
08-26-2003, 07:34 AM
Thanks again for keeping us updated throughout the process. If all our sites are up, do you still need resellers to open a ticket and list the domains? I haven't checked databases yet.

Joe
08-26-2003, 07:39 AM
Yes, we still need resellers to let us know which domains are theirs... so we can check our records and get them setup accordingly.

Thanks!!

For those checking things out that may be a little confused about how the control panel looks, please check this link: http://www.hostpc.com/siteadmintutorial.html for help in navigating. We'll restore the new "skin" when everything has been restored to working order.

Joe

Joe
08-26-2003, 08:36 AM
Well, I decided to take a 5 hr "nap" - woke up, and here's a status report as of now.

1) All sites have been restored, to the best of our knowledge. If there's a stray that didn't get re-created, please let us know via the helpdesk ( http://helpdesk.hostpc.com)

2) We're noticing some file permission errors, or ownership errors. These are simple to fix. If you can't modify a file, or overwrite a file, or a page isn't being displayed, open a helpdesk ticket here: http://helpdesk.hostpc.com - it's a quick fix, we'll have it done pronto. Some permissions got hosed up during the restore. We'll adjust our restore scripts.

3) Server software is secure, and up to date

4) All mail accounts have been created - there is no qued mail on the server, incoming or outgoing. Mail from yesterday should be flowing into your inboxes as time goes on, without an issue.

5) Global Squirrelmail operations have been restored.

6) I'm going to hold off on restoring awstats functionality till later in the week. If yours isn't working by the end of the week, please open a support ticket and let us know.

Left to do:

1) Improving the look/feel of the Ensim control panel. God that thing is ugly :)

2) Associate remaining domain names with reseller accounts.

3) Restore backup functionality before end of Tuesday.

4) General "housekeeping"


I want to also take this opportunity to thank the MANY users that have sent us messages of support during this downtime & restoration. We lost only one customer due to the incident (regretably), and so far are processing about 5 refunds. Your support through this ordeal made it easier for the staff. I TRULY appreciate your understanding and support.

Joe

Joe
08-27-2003, 09:11 AM
We've got a few lingering database issues left to clear up, and a couple of squirrelmail issues that are plaguing us, but other than that, I think restoration of all sites went fairly smoothly. I'm calling in some assistance from our upper level support services today for assistance in clearing a few issues.

If you are still having database issues, and have an open support ticket, please know that we will get these addressed today. Our restore scripts written by Mike didn't take into account restoration of databases, or re-assigning databases created via database-o-matic to the correct owners, so each one has to be done manually - nearly 140 databases on this server. (ugg). Without Mike's help, this whole restore would have taken MUCH MUCH longer.

I'm going to "un-sticky" this post for now. We should have everything else completed by 6pm tonight.

At that point, we'll do one final check on all files, re-install the Ensim skin, and then close this thread, hopefully forever.

I again would like to thank everyone on this server for their patience and understanding. We've worked closely with many of you individually in our trouble tickets (helpdesk) system to clear custom configuration issues - something we wouldn't normally do, but I feel a certain responsibility to help out. Your patience during this situation has been MUCH appreciated.

Joe

Joe
08-27-2003, 08:47 PM
Apparently we've still got a lingering CGI issue on this machine. I've tried 100 ways to get it working, but I'm out of ideas. I've called in backup support .

They'll be looking into the issue later tonight.

Thanks again for your patience.

Joe

Joe
08-27-2003, 09:25 PM
Ok, I THINK we've got EVERY cgi based issue on this server covered ... we ran a global change, which should have fixed all CGI issues, including Movable Type, Awstats, etc.

If there are any other problems, please let us know via the helpdesk - otherwise, I'll consider this issue (thankfully) CLOSED! :)

Joe

Joe
08-28-2003, 09:57 AM
It appears that Rackshack dropped some IP's as nameservers for this box - they're (hopefully) restoring them now.

Anyone using ns2a.hostpc.com/ns2b.hostpc.com may experience a bit more downtime - should be back within 1-2 hours as DNS repropogates.

I think we're probably going to ask anyone using this DNS to move to another server when this is done.... we'll update you more on that in the near future.

Joe

Anonymous
08-28-2003, 10:53 AM
Everything was working correctly last night for me. This morning, it looks like there are more issues cropping up. I can't get to my web site. Once I did get forwarded to ensim control panel logon and was able to successfully get to see that my files were still there.

I opened up a helpdesk ticket...but then it looks like that's messed up now too...

Good Luck!

Anonymous
08-28-2003, 10:54 AM
Originally it was a mysql/php error...now it's working.

Joe
08-28-2003, 10:58 AM
I'm workin on it - looks like it may STILL be a rackshack issue.

For those that would like to move to another subnet (server) please let me know, we'll get the process started. Open a support ticket asking to be moved to a new server.

Amezri
08-28-2003, 11:12 AM
Moving is a pain in the ema because last time I had to reconfigure my entire message board as well as repermission various scripts and that was annoying...

So my question before I ask for a move for two of my domains is this: Once rackshack has been repaired and is "stable" do you believe that it will continue to be stable? Or do you foresee continued problems with ensim2?

Joe
08-28-2003, 11:29 AM
No, once this is stabilized (and I just received word that it is being worked on actively) - I believe it will stay working.

I simply put the offer out for anyone who is getting frustrated. I don't want to lose any customers over this, and will make every consideration to retain them. If you can bear with us through this hurdle, it WILL regain stability, and I'll probably never hear from any of you again :(

It's by no means "mandatory" - just an option.

Joe
08-28-2003, 11:31 AM
Status Report:

I've had upper level support step in to get the server stabilized, and they assure me that the software itself is fine - and data is secure. I did that as a backup check to my efforts.

Rackshack had a network failure this morning (first time in recent memory) that affected our subnet. That was 99% of our problem. We're still working on the remaining 1%, but we have every reason to believe it will be back to 100% shortly. I'll continue to keep you updated as time progresses.

Joe

Amezri
08-28-2003, 11:38 AM
I will bear with you and see how it goes. :) My friends (with other hosting services) have all experienced severe downtime issues where tech support has not been available or able to explain the problem, so I really do appreciate all the effort you have put into repairing this situation and keeping us informed. Obviously the rackshack issues are not under your control, so can't really blame you there. :wink:

Thanks again. Keep up the great work. One or two days of downtime won't kill me, that's for sure. Heh.

Joe
08-28-2003, 12:59 PM
We're going to experience one more short outage on this box later today as they (Rackshack) remove the faulty secondary drive. I believe this may clear up any remaining stability issues.

This downtime should be <10 minutes. It's unknown what exact time this will occur.

Joe

Joe
08-28-2003, 01:16 PM
8/28/03 10:57:02 AM
drive has been removed. closing ticket.

I believe this will solve most, if not all, of the stability problems associated in the past 96 hours.

Joe

Anonymous
08-29-2003, 11:22 AM
hi, got a quick question

what time does the daily backup occur at rackshack?

Nick
08-29-2003, 11:42 AM
Hello Guest

Joe can correct me if I'm wrong, but I believe our backups start around 2am or 3am EST

Joe
08-29-2003, 12:21 PM
Actually, they're between 12-1am ET.

northern
08-29-2003, 04:51 PM
Just a couple of things.

It would have been much better if I had received an email about this. Luckily (maybe) I just found out about this when I couldn’t log in. It actually took a lot of time finding this forum and reading through it. Don’t know why my clients didn’t complain. - Anyway an emergency email warning system would be nice.

I would prefer my refund be sent to the Joe Mack Caribbean Cruise Fund. If there isn't one, then there should be!

Thanks for the good work

Joe
08-29-2003, 06:03 PM
That fund isn't likely to exist for a long time, but thanks for the thought! :)

As for an e-mail "warning system", we used to have that - I had the option of sending messages to each reseller or client on an internal mailing list. The second time it was used, I had 3 spam complaints filed, 2 with spamcop. Fortunatley, when my provider (Rackshack) saw the content of the mail, they didn't get upset, but it did get my private mail IP blocked from spamcop for about 2 weeks. Haven't used or updated the service yet.

Yes, I agree, it would have been nice to be able to communicate vital information. Thats why I didn't.

In the future, we're going to have safeguards in place (already under construction) - so information will either be available here in the hostpc.com forums, or at myhostpc.com - separate servers, separate information. MyHostPC can be fired up within minutes of an outage. Also our toll free number -- I encourage everyone to have that number handy in case of emergencies.

Joe

D9r
08-29-2003, 06:27 PM
I also wish I had received an email notice about this problem Monday. I think it's absurd that someone would call that SPAM -- it's simply communication from a webhost to its clients. My other webhost does that all the time. It's the only way to go in my opinion.

My 2 sites on ensim2 are currently down. I assume they're down anyway, I can't get to them. One is www.tuckercivic.org

Joe
08-29-2003, 06:42 PM
HUGE HUGE HUGE DOS attacks spreading through the net - I've been on the server, it stopped responding. I almost asked for a reboot and geared up for a problem. Couldn't see it from Verizon - works fine from everywhere else.

Now Verizon see's it fine again - go figure.

D9r
08-29-2003, 06:47 PM
Must be Mars that's causing all this trouble. Either Mars or pres Bush, :wink:

D9r
08-29-2003, 06:49 PM
It works for me now too. A DOS attack? What's that? Some teenage brat causing trouble again?

Later:
My friend told me -- 'Denial of Service'

chlucy
08-29-2003, 07:07 PM
I've been having random trouble with several sites - mostly just email since I'm not actually browsing the sites. Some are hosted here, some are other places. Sometimes I'll get password errors for email and other times it's just 'can't contact host' errors. And my connection was down for a little while this afternoon so it's been a frustrating day in terms of getting any work done.

Joe
08-29-2003, 07:38 PM
it's been a frustrating day in terms of getting any work done.

Amen to that!

Joe
08-29-2003, 10:55 PM
Final step - the skin has been re-applied to the control panel - hope it's a little easier on your eyes :)

Joe

basketclothes
08-30-2003, 08:42 PM
Joe,
Just wanted to let you know that I'm not able to set how much space a user can use (disk quota). In the spot where there was a textbox where I could enter the MB avail for the user, it says 'unlimited'. Just in case you are unaware of this.

-- Jason

Nick
08-30-2003, 10:54 PM
hello basketclothes

that is the downside to upgrading ensim...they busted quotas in the process