Server crashes during backup

Last Post 02 May 2007 09:50 AM by kfarlee. 18 Replies.
AddThis - Bookmarking and Sharing Button Printer Friendly
  •  
  •  
  •  
  •  
  •  
Sort:
PrevPrev NextNext
You are not authorized to post a reply.
Author Messages
ferretwoman
New Member
New Member

--
30 Apr 2007 06:57 AM
Every time I do a full backup of my databases, it will go through most of them, then get to the largest database I have (13 Gig) which is the Sharepoint database, and as it starts to back it up, the machine will then freeze up and need to be hard rebooted. I confirmed it was the backup by moving the backup to a different time, which is when the server crashed. Also, if I manually attempt to do a full backup on it, the machine will then crash.

I'm running Windows Server 2003, SP2, fully patched, SQL server 2005 SP2 (32-bit) fully patched. 4 Gigs memory. Have tried setting it to use 3 Gig itself, with and without AWE, but it didn't make a difference. This machine is a dedicated SQL Server machine. Microsoft has had us run perfmon and send them perfmon reports and dumps. They don't see anything that would cause our issues. Nothing shows up in the event viewer. This server has been running just fine since last June when we installed SQL 2005. The problems started occurring on March 8 of this year. I don't see any patches that correspond to that date.

Hardware is all reporting fine according to the Gateway Hardware tool.

I'm the Sharepoint Admin, we have no SQL Admin, so now this is another job of mine. So any advice, anything I should look for, would be extremely helpful.

Linda
ferretwoman
New Member
New Member

--
30 Apr 2007 07:29 AM
Yes, we updated the BIOS and ran the Gateway scanning tool that looks for driver updates, and there were no others.

I run the Maintenance plan tool that comes with SQL server, and back it up locally on the same server. I run it in the middle of the night when we have hardly any users online.

Linda
SQLUSA
New Member
New Member

--
30 Apr 2007 11:09 PM
Linda,

What was the exact position of the moon on March 8 when all this started?

Is it possible that the database grew?

Can you up server memory to 8GB?

What does freeze up mean? Is it 100% spin? Can you do a sp_who?

Do you see blocking? Can you run SQL Profiler while this is happening?

Do you have enough room for backup?

What else done in the plan? What if you do a plain backup instead?

Kalman Toth, Database Architect
SQL Server 2005 Training - http://www.sqlusa.com
ferretwoman
New Member
New Member

--
01 May 2007 05:30 AM
No, no seperate disk array. I could try to back it up to a different server using mapped drives. I could set that up for tonight's run to see if that helps any.

Linda

quote:

Originally posted by: rm
Do you have separate disk array for backup? How is disk i/o during backup? May need contact Gateway to see if it's known issue.


ferretwoman
New Member
New Member

--
01 May 2007 05:38 AM
What was the exact position of the moon on March 8 when all this started?
** Uh, up in the sky?

Is it possible that the database grew?
** Since it's a large database (and we ran into the autogrowth problem earlier in the year) I have it set to only autogrow 10MB at a time so it's grown a little, maybe from 11 to 12 GIG but not outrageously huge. I still have 2/3 disk space left on that server for growth too.

Can you up server memory to 8GB?
** I wish. I did check to see if I could swap out the memory for larger chips but Gateway says it's maxxed at 4 GB. Since it's less than 4 years old I can't get the funding to replace the server either.

What does freeze up mean? Is it 100% spin? Can you do a sp_who?
** To quote the network administrator: It is like somebody hit pause on a dvd. Everything just freezes and we receive no reponse from the monitor and keyboard.

Do you see blocking? Can you run SQL Profiler while this is happening?
** Since the server does not respond to anything unfortunately no

Do you have enough room for backup?
** Yes there is about 250 Gig of space left on the server even after backups.

What else done in the plan? What if you do a plain backup instead?
** In the plan is to do a full backup of all the production databases, and then the clean up routine deletes all backups over 4 days old.
** If I attempt to manually back up the largest database, the server freezes after getting maybe 10 or 20% of the way (which is about 10 minutes into the process).

Linda
ferretwoman
New Member
New Member

--
01 May 2007 06:10 AM

can you run your backup with the BLOCKSIZE option, overiding the default, with a blocksize of 32Kb
** I'll read up on this and try this - thanks for the idea

if this fails, reduce block size to 16KB and keep reducing the blocksize in the sequence

64-->32-->8-->4-->2

to find a backup blocksize which does not cause the database server to "crash", you have not explained exactly what you mean by "crash"
** Crash means screen freezes, machine will not accept input from keyboard or mouse. I can also tell you that at that point, any application that is on another server that needs to talk to any SQL Server database on the SQL server will report that it has lost connection with the SQL Server database. So I'm guessing it won't respond to network requests either.

screen freezes? CPU is at 100% monopolized by sqlserver?
* Can't get into the server to determine the CPU rate. But our perfmon report which was read by Microsoft showed this about the server:

** no processor spiking the CPU.
** When compared against individual process I do not see any processor spiking CPU.
** Avg. Disk sec/Transfer is well below the threshold.
** Split I/O is close to 0 and is good.

For the time of the entire log:
** Available Memory never falls below 1GB.
** We never see any of the processors maxxed out. Average utilization on all processors is less than 5%.
** Free PTE's always over 180,000
** Pool non-paged bytes max is 17MB which is well below the threshold value of 256MB.
** Pool paged bytes max is 58MB which is less than Theoretical limit of 491MB.
** There is no sustained disk queuing.
** Average pages/sec is under 10 which is good.
** Average Page Faults/sec is 1330, which is acceptable.
** Interrupts/sec is well below threshold.
** % Registry quota in use is below 20%.
** Redirector current commands is 0
** No obvious handle leaks.

Linda
kfarlee
New Member
New Member

--
02 May 2007 09:50 AM
I've worked offline with Linda, and the core of the issue seems to be that the database, log, and backup files are all located on the system drive (C
Since SQL backup is entirely IO limited, we'll consume all available IO resources.
If this is on the system drive, you'll have trouble doing standard things like paging, etc. This can easily cause a system to lock up.
Gary74
New Member
New Member

--
02 May 2007 02:35 PM
Don't know if someone suggested this already, but isn't that around the time the DST update(s) were released?
Just a thought.
SQLUSA
New Member
New Member

--
02 May 2007 05:46 PM
Linda,

Can you respond to this?

If you direct your backup to a different drive(not C), does it work?

I agree, not a good idea to do to many things on the C drive.

But also feel, that it is time to junk this server.

Kalman Toth, SQL Server Architect
SQL Server 2005 Training - http://www.sqlusa.com/order2005

ferretwoman
New Member
New Member

--
03 May 2007 05:49 AM
I attempted last night to back the files to a separate drive and again, during the largest database backup, it failed. It got about 1/5 of the way into the backup, about 9 minutes, (which I can tell by the size of the file) and then froze up.

Unfortunately, junking the less than 4 year old server is not an option. I work for the gov't and the funding just isn't available.

However, do you think it would be worthwhile to redo the server from scratch? Let me tell you what I know about the setup so far: There is a 37 GB drive, not part of the RAID. Then I have 4 - 73 GB drives that are RAID 5, and another set of 4- 73 GB drives (failover). The C: is the RAID, and was where I was doing everything. Last night I set it to backup to the D: 37 GB drive, which obviously isn't large enough to continue to do that for long. But it still failed either way.

Should I wipe the server and make the small drive the C drive, and the RAID drives where the db files and backups are located? Do you think this would help at all?

quote:

Originally posted by: SQLUSA
Linda,

Can you respond to this?

If you direct your backup to a different drive(not C), does it work?

I agree, not a good idea to do to many things on the C drive.

But also feel, that it is time to junk this server.

Kalman Toth, SQL Server Architect
SQL Server 2005 Training - http://www.sqlusa.com/order2005




AussiePete
New Member
New Member

--
08 May 2007 06:00 PM
1. You dont have a backup of your database! This should be addressed as your first priority. During an outage window stop SQL Server, and then copy the MDF and LDF files for your Sharepoint database to the backup location. This in itself would be an interesting test to see if the disk controller can manage to copy the files without causing a lockup. If you cant stop your SQL Server due to other DBs being used 24/7 then look into using the sp_detach_db and sp_attach_db procs to let you do a cold backup of the database files.
2. Run a CheckDB against the Sharepoint Database to eliminate any database corruption causes.
3. Having OS, mdf, ldf, and bak files on the one (4 disk) drive is crowded, slow, but should not cause your OS to freeze completely and indefinitely.
4. I see your DB growth is 10MB and has grown presumably 1,200 times to get to 12+GB. This is unusual, and could result in a lot of fragmentation. I dont think it has caused your problem though. I recommend changing your filegrowth to 10% - 20% or perhaps 2000MB.
5. Be sure to check the health of your RAID array using the appropriate diagnostic tool. Perhaps schedule a CheckDisk at next reboot.
6. Once you get a backup of the database files please copy them to another server, mount them using sp_attach_db, and try doing the backup on that server.
ferretwoman
New Member
New Member

--
09 May 2007 05:26 AM
I have requested a larger secondary drive to put the database files on however getting another disk array is not likely to happen. I have a few other servers available to me if I wanted to move the backups to a different server so that may be something I look into after some other problems with the server is fixed.

Thanks for your suggestions!

Linda

quote:

Originally posted by: rm
Possible to add more disks in the server? You should have OS and binaries, db data files, db log files, tempdb, backup files on their own disk array ideally. At least get two more arrays to separate OS, db files and backup files.


ferretwoman
New Member
New Member

--
09 May 2007 05:36 AM
Gateway is coming out tomorrow to replace just about everything in the machine, including processors, memory, and mother board. When attempting to do a memory test that booted to a floppy, the machine crashed at 1 minute 24 seconds with 4 GB RAM and 43 seconds into the test with only 2 GB RAM. Since we aren't doing anything involving Windows or SQL or even for that matter hitting the hard drives, Gateway agrees it's a hardware issue.

I still have a request in for a larger hard drive to place our database files. One question tho - I have the database files now where the SQL program files are. Should I move just the database files (mdf and ldf) to the new harddrive, or should I move the db files and the SQL Server 2005 program files too?

I requested a complete backup of the machine this evening so we could quickly get back up and running tomorrow if for some reason they decide to replace the hard drives too.

Linda
ferretwoman
New Member
New Member

--
09 May 2007 05:43 AM
Originally posted by: AussiePete
1. You dont have a backup of your database! This should be addressed as your first priority. During an outage window stop SQL Server, and then copy the MDF and LDF files for your Sharepoint database to the backup location. This in itself would be an interesting test to see if the disk controller can manage to copy the files without causing a lockup. If you cant stop your SQL Server due to other DBs being used 24/7 then look into using the sp_detach_db and sp_attach_db procs to let you do a cold backup of the database files.

** I'm running a manual copy of all my databases to another server at 4:30 tonight just for this reason.

2. Run a CheckDB against the Sharepoint Database to eliminate any database corruption causes.
** Will do!

3. Having OS, mdf, ldf, and bak files on the one (4 disk) drive is crowded, slow, but should not cause your OS to freeze completely and indefinitely.
** I didn't think so - see other post about our hardware finally getting replaced

4. I see your DB growth is 10MB and has grown presumably 1,200 times to get to 12+GB. This is unusual, and could result in a lot of fragmentation. I dont think it has caused your problem though. I recommend changing your filegrowth to 10% - 20% or perhaps 2000MB.
** So I should allow for larger growth? I had a problem with the large DB asking for too much space. I'll adjust that back up to 10%.

5. Be sure to check the health of your RAID array using the appropriate diagnostic tool. Perhaps schedule a CheckDisk at next reboot.
** Will ask server maintenance crew about this

6. Once you get a backup of the database files please copy them to another server, mount them using sp_attach_db, and try doing the backup on that server.
** ok.

Linda
SQLUSA
New Member
New Member

--
09 May 2007 04:20 PM
Don't get discourage Linda.

Once I had a server which kept rebooting itself every other week or so. Until it was replaced.

Kalman Toth, SQLUSA: http://www.sqlusa.com/businessintelligence
At the Microsoft Business Intelligence Conference 2007 in Seattle
ferretwoman
New Member
New Member

--
11 May 2007 11:56 AM
The morning that Gateway came to replace parts, I successfully backed up my largest database. As I was moving that to a different server, the server crapped out on me. Gateway replaced 2 power supplies and the motherboard then we ran the memtest again. It did not freeze up this time, however it did find bad RAM. That was replaced, and tonight's backup will be a test of whether it was the RAM or motherboard, or some other problem causing the freezes. I'm hoping for the best.

Linda
SQLUSA
New Member
New Member

--
11 May 2007 07:56 PM
Linda,

I can bet $500 that it was the hardware.

Kalman Toth, Database Architect
ferretwoman
New Member
New Member

--
14 May 2007 04:52 AM
I believe you would be a winner. We replaced the RAM module that was coming up with ECC errors, and for the first time in a few weeks my server not only stayed up the whole weekend, but I got a whole weekend full of backups of all my databases with no errors. Woohoo!

Linda

quote:

Originally posted by: SQLUSA
Linda,

I can bet $500 that it was the hardware.

Kalman Toth, Database Architect


SQLUSA
New Member
New Member

--
17 Jan 2008 09:32 PM
Thanks Linda,

99.99% of the time bugs "make sense".

When a bug does not make sense, the hardware comes under suspicion.

SQL Server has excellent (soft bug) error logging, so if it is not an app bug, not a database/system bug, the culprit may turn out to be the memory, motherboard, etc.

Kalman Toth - Database, Data Warehouse & Business Intelligence Architect
SQLUSA: http://www.sqlusa.com/order2005highperformance/ The Best SQL Server 2005 Training in the World!
You are not authorized to post a reply.

Acceptable Use Policy