<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>toddbaker.orgtoddbaker.org</title>
	<atom:link href="http://www.toddbaker.org/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.toddbaker.org/blog</link>
	<description>SQL Server, Powershell, and all that other stuff</description>
	<lastBuildDate>Mon, 08 Aug 2011 01:22:49 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>MDW Data Collection Upload may use up all available ports on a large server</title>
		<link>http://www.toddbaker.org/blog/2011/05/13/mdw-data-collection-upload-uses-up-all-available-ports-on-a-large-server/</link>
		<comments>http://www.toddbaker.org/blog/2011/05/13/mdw-data-collection-upload-uses-up-all-available-ports-on-a-large-server/#comments</comments>
		<pubDate>Fri, 13 May 2011 17:39:19 +0000</pubDate>
		<dc:creator>todd</dc:creator>
				<category><![CDATA[MDW]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.toddbaker.org/blog/?p=42</guid>
		<description><![CDATA[Click here to download the workaround package. Recently we started having odd issues with a few of our larger clusters.  Windows authentication would randomly fail, accompanied with Netlogon errors in the event log stating the domain was unavailable.  Even odder &#8230;<p class="read-more"><a href="http://www.toddbaker.org/blog/2011/05/13/mdw-data-collection-upload-uses-up-all-available-ports-on-a-large-server/">Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<address><a href="http://www.toddbaker.org/blog/wp-content/uploads/2011/05/PerfCountersUpload.zip">Click here to download the workaround package.</a><br />
</address>
<p>Recently  we started having odd issues with a few of our larger clusters.   Windows authentication would randomly fail, accompanied with Netlogon  errors in the event log stating the domain was unavailable.  Even odder  was when this was happening, if you did a nslookup on the server’s  domain you’d only get IPv6 addresses back, even though normally they are  accompanied by several IPv4 addresses as well.</p>
<p>We  started looking into the obvious things:  DC problems, network issues,  DNS resolution issues, firewalls, IP conflicts, network card issues but  could find nothing.  This only happened on a couple clusters out of many  servers, so if it was a network or DC issue, you’d think you’d see it  on other servers.</p>
<p>These  particular clusters had some things in common.  They were both 2-node  clusters made with similar hardware:  Dell R810’s with 32 cores and  hyperthreading enabled.  These also had 4 SQL instances installed  and about 20 SAN drives assigned to each cluster.  They were running Windows 2008 R2 and SQL 2008 R2.  They also had MDW data  collection enabled on all their instances.</p>
<p>I  noticed one day that when we had the problem, netstat would show  several thousand outbound connections being made and quickly dropped to  our MDW server.  The local port number on those connections would climb  to and reach 65534 and then our authentication/dns problems would show  themselves.  While all of these connections were in a TIME_WAIT state,  the port # is not immediately dropped after a connection is dropped so it  can’t be reused immediately.  So, no more outbound connections to the  DNS server or DC can be made for authentication, causing our problems.</p>
<p>I  tracked the cause down to the upload job for the ServerActivity  collection set.  Specifically, the SSIS package that is run to upload  data (located at Data Collector\PerfCountersUpload on every SQL 2008  server), has a lookup task under the data flow “DFT &#8211; Bulk Insert  Collected data into MDW” which grabs the performance counter ID from  snapshots.performance_counter_instances to be populated when the data  load to snapshots.performance_counter_values happens later in the data flow.  This  is set to Partial Cache, which means for each and every  performance_counter_value, it seemingly does the following:</p>
<p>1) Logs into the server hosting the MDW database.<br />
2) Does a select from performance_counter_instances for a single performance counter path.<br />
3) Logs out.</p>
<p>On  a smaller server, this may not be a problem as there aren’t many  counters to get the ID for.  On these particular servers because of the  number of instances, drives, processors, processes running on the  server, there are a lot of instances of the LogicalDisk, Process,  Processor and MSSQL* performance counters.  Therefore, a lot of  connections are made and quickly dropped, eating up all the available  port numbers.</p>
<p>The  workaround I came up with is to set the lookup task “LKU &#8211; Lookup into  performance_counter_instances to obtain performance_counter_id for all  counter paths that get inserted” to Full cache instead of Partial.  This  grabs the whole performance_counter_instance table, but in my  environment with about 35 instances running collection, it is only 5000  rows.  I’ve also experimented with changing the lookup to a merge join,  but the results are similar.</p>
<p>Performing  this fix isn’t as simple as editing the package in BIDS and uploading it.  There  are a couple constraints that must be dropped before you can replace  the package.  Not to mention it has to be done on every server with the  Server Activity (or any custom performance counter collector) enabled.   The script below will disable the constraints, creates a backup copy, upload the package, then  reenable the constraints.  Naturally, you’ll need to replace the  package source location with one that makes sense in your environment.   Also, if you don’t have xp_cmdshell enabled on your server, you can  just omit that part and do it manually, or copy it using whatever method  you’re used to.</p>
<blockquote><p>USE [msdb]<br />
GO</p>
<p>IF  EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N&#8217;[dbo].[FK_syscollector_collector_types_internal_upload_sysssispackages]&#8216;) AND parent_object_id = OBJECT_ID(N&#8217;[dbo].[syscollector_collector_types_internal]&#8216;))<br />
ALTER TABLE [dbo].[syscollector_collector_types_internal] DROP CONSTRAINT [FK_syscollector_collector_types_internal_upload_sysssispackages]<br />
GO</p>
<p>declare @path nvarchar(255), @packagename nvarchar(255), @packagefoldername nvarchar(255)<br />
select @packagename = &#8216;PerfCountersUpload&#8217;, @packagefoldername=&#8217;Data Collector&#8217;<br />
select @path=&#8217;\\mdwserver\MDWDeployment\&#8217;+@packagename+&#8217;.dtsx&#8217;<br />
declare @cmd nvarchar(1000)<br />
select @cmd=&#8217;dtutil /SQL &#8220;&#8216;+@packagefoldername+&#8217;\'+@packagename+&#8217;&#8221; /SourceServer &#8216;+@@SERVERNAME+&#8217; /DestServer &#8216;+@@SERVERNAME+&#8217; /Q /COPY SQL;&#8221;&#8216;+@packagefoldername+&#8217;\'+@packagename+&#8217;BAK&#8221;&#8216;<br />
exec xp_cmdshell @cmd<br />
select @cmd=&#8217;dtutil /FILE &#8220;&#8216;+@path+&#8217;&#8221; /DestServer &#8216;+@@SERVERNAME+&#8217; /Q /COPY SQL;&#8221;&#8216;+@packagefoldername+&#8217;\'+@packagename+&#8217;&#8221;&#8216;</p>
<p>exec xp_cmdshell @cmd<br />
GO</p>
<p>ALTER TABLE [dbo].[syscollector_collector_types_internal]  WITH CHECK ADD  CONSTRAINT [FK_syscollector_collector_types_internal_upload_sysssispackages] FOREIGN KEY([upload_package_folderid], [upload_package_name])<br />
REFERENCES [dbo].[sysssispackages] ([folderid], [name])<br />
GO</p>
<p>ALTER TABLE [dbo].[syscollector_collector_types_internal] CHECK CONSTRAINT [FK_syscollector_collector_types_internal_upload_sysssispackages]<br />
GO</p></blockquote>
<p>You can download the edited package <a title="PerfCounterUpload" href="http://www.toddbaker.org/blog/wp-content/uploads/2011/05/PerfCountersUpload.zip">here.</a></p>
<p>I posted this on <a href="https://connect.microsoft.com/SQLServer/feedback/details/668397/mdw-serveractivity-upload-causes-several-thousand-connections-to-mdw-server">connect</a> as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.toddbaker.org/blog/2011/05/13/mdw-data-collection-upload-uses-up-all-available-ports-on-a-large-server/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL 2008 Management Data Warehouse &#8211; Fixing Long-running purges</title>
		<link>http://www.toddbaker.org/blog/2010/12/17/sql-2008-mdw-fixing-long-running-purges/</link>
		<comments>http://www.toddbaker.org/blog/2010/12/17/sql-2008-mdw-fixing-long-running-purges/#comments</comments>
		<pubDate>Fri, 17 Dec 2010 19:30:51 +0000</pubDate>
		<dc:creator>todd</dc:creator>
				<category><![CDATA[MDW]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Management Data Warehouse]]></category>

		<guid isPermaLink="false">http://www.toddbaker.org/blog/?p=26</guid>
		<description><![CDATA[Update:  This issue was fixed in SQL 2008 R2 SP1.  For more details, check out this KB article. Click here to download the workaround script. With the release of SQL Server 2008 came the Management Data Warehouse, a built-in way &#8230;<p class="read-more"><a href="http://www.toddbaker.org/blog/2010/12/17/sql-2008-mdw-fixing-long-running-purges/">Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<p><strong>Update:</strong>  This issue was fixed in SQL 2008 R2 SP1.  For more details, <a href="http://support.microsoft.com/kb/2584903">check out this KB article.</a></p>
<address><a href="http://www.toddbaker.org/blog/wp-content/uploads/2010/12/sp_purge_data.sql">Click here to download the workaround script.</a></address>
<p>With the release of SQL Server 2008 came the <a title="Management Data Warehouse" href="http://msdn.microsoft.com/en-us/library/dd939169(v=sql.100).aspx">Management Data Warehouse</a>, a built-in way to collect performance data from SQL, the server it is running on and even your own performance data through the use of custom data collectors.  There are plenty of resources on how to set this up and make your own collections, so I won&#8217;t focus too much on that. Instead, this will be about improving performance for the nightly purge job that is created when you setup the MDW.</p>
<p>When you setup your MDW database, a job named mdw_purge_data_[&lt;your MDW db name&gt;] is created.  This job runs the stored procedure core.sp_purge_data which performs the purge in two parts:</p>
<ol>
<li>Snapshots needing purged get deleted from core.snapshots_internal.  Data in the related tables (snapshots.performance_counter_instance_values, snapshots.query_stats, etc) is then deleted via cascading triggers.</li>
<li>Query plans and text are removed from snapshots.notable_query_text and snapshots.notable_query_plans.  Since these tables do not have a snapshot_id associated with them, they have to be removed based on whether the corresponding sql_handle is missing from snapshots.query_stats or not. (By the way, this was <a title="added" href="http://blogs.msdn.com/b/petersad/archive/2009/04/23/sql-server-data-collector-nightly-purge-can-leave-orphaned-rows.aspx">added </a>in SQL 2008 R2 and CU5 for SQL 2008 to correct the problem with orphan plans taking up all the space in the database)</li>
</ol>
<p>Seems fairly straight-forward, right?  On our system, #1 would run quickly.. usually in just a few minutes on our 200GB MDW database.  #2, however, would take hours.  In fact, if you didn&#8217;t set the @run_duration parameter, it would run for days and block incoming query stats in the process.  This was a problem.</p>
<p>The query below accounts for 99% of the runtime:</p>
<pre>WHILE (@rows_affected = @delete_batch_size)
BEGIN
DELETE TOP (@delete_batch_size) snapshots.notable_query_plan
FROM snapshots.notable_query_plan AS qp
WHERE NOT EXISTS (
SELECT snapshot_id
FROM snapshots.query_stats AS qs
WHERE qs.[sql_handle] = qp.[sql_handle] AND qs.plan_handle = qp.plan_handle
AND qs.plan_generation_num = qp.plan_generation_num
AND qs.statement_start_offset = qp.statement_start_offset
AND qs.statement_end_offset = qp.statement_end_offset
AND qs.creation_time = qp.creation_time);
SET @rows_affected = @@ROWCOUNT;
IF(@rows_affected &gt; 0)
BEGIN
RAISERROR ('Deleted %d orphaned rows from snapshots.notable_query_plan', 0, 1, @rows_affected) WITH NOWAIT;
END</pre>
<p>According to the comment above the query, it was done this way to avoid lock escalation and transaction logs growing due to the deletion of the 10-50MB (each) query plans.  Ok, this is a concern, but it&#8217;s not worth joining on the query_stats table that, in my environment, has 43 million rows and uses 15GB of disk space.  Joining on it (especially on 6 different columns) is expensive and not something you want to do over and over again.  That being said, there is a missing index that improves query time somewhat.  Without it, the purge ran for days.  With it, it ran in &lt;12 hours.</p>
<pre>CREATE NONCLUSTERED INDEX [Ix_query_stats_sql_handle] ON [snapshots].[query_stats]
([sql_handle] ASC) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]</pre>
<p>Still, 12 hours for a purge is bad, so I decided to rework the query.  Throwing caution to the wind, I decided to do two things:  first, I&#8217;d grab the handles for the plans that I was going to be removing and shove them into a temp table.  Second, I would then delete all the query plans by joining on that temp table rather than query_stats.  This improved things dramatically.  The purge went from running in 12 hours to 11 minutes, purging 2200 rows from notable_query_plan at last run.</p>
<p>Here&#8217;s the rewrite of the query above:</p>
<pre>select sql_handle,plan_handle, plan_generation_num,statement_start_offset,statement_end_offset,creation_time into #nqp FROM snapshots.notable_query_plan qp
 WHERE NOT EXISTS (
 SELECT *
 FROM snapshots.query_stats AS qs  
 WHERE qs.[sql_handle] = qp.[sql_handle] AND qs.plan_handle = qp.plan_handle
 AND qs.plan_generation_num = qp.plan_generation_num
 AND qs.statement_start_offset = qp.statement_start_offset
 AND qs.statement_end_offset = qp.statement_end_offset
 AND qs.creation_time = qp.creation_time)
 SET @rows_affected = @delete_batch_size;
 WHILE (@rows_affected&gt;0)
 BEGIN
 DELETE TOP (@delete_batch_size) FROM snapshots.notable_query_plan from #nqp n where n.sql_handle=notable_query_plan.sql_handle and notable_query_plan.plan_handle = n.plan_handle and notable_query_plan.plan_generation_num=n.plan_generation_num
 and notable_query_plan.statement_end_offset=n.statement_end_offset and notable_query_plan.statement_start_offset=n.statement_start_offset and notable_query_plan.creation_time=n.creation_time
 SET @rows_affected = @@ROWCOUNT;
 IF(@rows_affected &gt; 0)
 BEGIN
 select @errormsg=CAST(getdate() as nvarchar)+' Deleted %d orphaned rows from snapshots.notable_query_plan'
 RAISERROR (@errormsg, 0, 1, @rows_affected) WITH NOWAIT;
 END
 END
drop table #nqp</pre>
<p>The cost on the transaction log and tempdb really wasn&#8217;t that great.  The log grew to about 6.5GB and tempdb to about 200mb.  Not bad for 2200 rows.</p>
<p><a href="http://www.toddbaker.org/blog/wp-content/uploads/2010/12/sp_purge_data.sql">Here</a> is the script to modify your sp_purge_data procedure to include the fix.</p>
<div id="_mcePaste" class="mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow: hidden;">http://www.toddbaker.org/blog/wp-content/uploads/2010/12/sp_purge_data.sql</div>
]]></content:encoded>
			<wfw:commentRss>http://www.toddbaker.org/blog/2010/12/17/sql-2008-mdw-fixing-long-running-purges/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Hello!</title>
		<link>http://www.toddbaker.org/blog/2010/12/17/hello/</link>
		<comments>http://www.toddbaker.org/blog/2010/12/17/hello/#comments</comments>
		<pubDate>Fri, 17 Dec 2010 18:37:30 +0000</pubDate>
		<dc:creator>todd</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.toddbaker.org/blog/?p=24</guid>
		<description><![CDATA[After a long wait I&#8217;ve finally decided to make a blog.  Hopefully this will be a source of info for SQL Server professionals out there (including myself) and will be a small addition to the seemingly never-ending source of knowledge &#8230;<p class="read-more"><a href="http://www.toddbaker.org/blog/2010/12/17/hello/">Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[<p>After a long wait I&#8217;ve finally decided to make a blog.  Hopefully this will be a source of info for SQL Server professionals out there (including myself) and will be a small addition to the seemingly never-ending source of knowledge coming from the SQL Server community.</p>
<p>A little about me:  I&#8217;ve been a production SQL Server DBA since 2004, and in IT since college.  I&#8217;ve always been interested in computers and technology and always try to push the envelope for new and better ways to do things.  Outside of work I enjoy golf, riding my bike and have spent far too many hours with a Playstation controller in my hands.</p>
<p>Ok, so enough of the boring introductions.. on to some real stuff!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.toddbaker.org/blog/2010/12/17/hello/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
