Wednesday, March 28, 2012

Load balancing

I'm wondering if its reasonable to use clustering combined with table
partitiions to distribute a heavy load? (OLTP app)
My basic thought is that I want to distribute product data across many hard
drives (and I/O busses) such that the database server knows which drives
contain the data it needs (and therefore doesn't acces the drives it doesn't)
as well as using the cluster to distribute load to multiple servers.
I've seen a reference to a "federated" database, but am not sure what that
means.
absolutely. Clustering won't do much for you performance wise though.
I think you probably want to look at Distributed Partitioned Views for what
you want to do though. Clustering is used for high availability. IE Should
one node fail, your clients will be disconnected and when they reconnect
they will be accessing the same SQL Server but running on a different node.
With Distributed Partitioned Views their query will be directed at a view on
one server, but the data will be accessed on a different server.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"JayKon" <JayKon@.discussions.microsoft.com> wrote in message
news:282512C2-0242-40A6-93BD-010E155398EA@.microsoft.com...
> I'm wondering if its reasonable to use clustering combined with table
> partitiions to distribute a heavy load? (OLTP app)
> My basic thought is that I want to distribute product data across many
> hard
> drives (and I/O busses) such that the database server knows which drives
> contain the data it needs (and therefore doesn't acces the drives it
> doesn't)
> as well as using the cluster to distribute load to multiple servers.
> I've seen a reference to a "federated" database, but am not sure what that
> means.
>
|||> With Distributed Partitioned Views their query will be directed at a view on
> one server, but the data will be accessed on a different server.
Thank you.
There is a rather criptic statement in the 2000 BOL that concerns me though.
"The most important goal is to minimize distributed processing in such a
system. You must be able to collocate related data on the same member server,
and then route each SQL statement to a member server that contains most, if
not all, of the data required to process the statement. "
Does this mean that I should put all (or as much) of the total data to be
retervied on one server?
For example, in a title/item lookup all item records related to a title
should be on the same server/partition? All other things being equal (which
they are probably not), I would prefer to put them on seperate servers and
have seperate NIC's for the DB's to talk to each other. Still, either would
work.
|||I believe it means that if your distributed partition view has three member
servers and you issue a query on servera which has a where clause pointing
to the data stored on serverb, and a join on another table, that second
table should be on serverb as well.
So if you partitioned on date. Servera might contain last years data,
serverb, the prior year, and serverc this years data. A search on this years
data might originate on servera, and will be directed to serverc by the
distributed partition view.
In your case perhaps titles beginning with a-h might be on servera, i-q
might be on serverb, r to z on serverc. Items related to titles in servera
should be on servera, and so on for serverb and c. There could be some
duplication of item data across all three servers.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"JayKon" <JayKon@.discussions.microsoft.com> wrote in message
news:E798A0E7-6392-4DD2-9D53-18C43E4C1DE0@.microsoft.com...
> Thank you.
> There is a rather criptic statement in the 2000 BOL that concerns me
> though.
> "The most important goal is to minimize distributed processing in such a
> system. You must be able to collocate related data on the same member
> server,
> and then route each SQL statement to a member server that contains most,
> if
> not all, of the data required to process the statement. "
> Does this mean that I should put all (or as much) of the total data to be
> retervied on one server?
> For example, in a title/item lookup all item records related to a title
> should be on the same server/partition? All other things being equal
> (which
> they are probably not), I would prefer to put them on seperate servers and
> have seperate NIC's for the DB's to talk to each other. Still, either
> would
> work.
|||My understanding is that this is related to the selectivity of your partition
function. In other words, if your queries always have a where clause that
states "WHERE [YEAR] = 2007" and each partition has data for a separate year,
then things will go well. However if peaple can do "WHERE [YEAR] IN (1999,
2000, 2007) and Forename = 'Harry'" then there are problems. The data has to
be unioned from the 3 partitions and the processing won't be on a separate
server. Performance in this type of scenario can become worse than not having
the partition at all...
HTH,
Paul Ibison
|||No, my understanding is the this query will be sent to all partitions and
then unioned. This will be efficient.
However a query on a DPV partitioned on data which is not aligned with any
of these partitions won't be as efficient. For example this query
select * from MyDPV where forname='Harry'
will result in index scans of all partitions and then bookmark lookups to
return all the matching rows. It is important to note that processing is
distributed amoung the member servers. If the DPV was all on one server (ie
each partition in a different table or database) all the processing will be
on the same server, and the cpu hit of this server doing that bookmark
lookup will be intense.
If it was distributed over different servers each server will crunch their
part of the query and the cpu hit will be there. The cpu hit on the server
where you issue the DPV query will be minimal - its only aggregating the
results.
DPV are great if the majority of the queries align themselves with your
partitions. If they don't they aren't so great as it involves table/index
scans.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Paul Ibison" <Paul.Ibison@.Pygmalion.Com> wrote in message
news:EA4B50B2-A9C4-45EE-AE79-9B76088027A2@.microsoft.com...
> My understanding is that this is related to the selectivity of your
> partition
> function. In other words, if your queries always have a where clause that
> states "WHERE [YEAR] = 2007" and each partition has data for a separate
> year,
> then things will go well. However if peaple can do "WHERE [YEAR] IN (1999,
> 2000, 2007) and Forename = 'Harry'" then there are problems. The data has
> to
> be unioned from the 3 partitions and the processing won't be on a separate
> server. Performance in this type of scenario can become worse than not
> having
> the partition at all...
> HTH,
> Paul Ibison
|||What my example was trying to (incorrectly!) show was that if the query
doesn't relate to the partition function (startup filter), the query will get
sent to each server and processed there, then the results unioned. There will
be cases where this is worse than processing on one box, which avoids the
network round trips and the final unioning of datasets.
Cheers,
Paul Ibison
sql

No comments:

Post a Comment