Hi, Another entry about queries - this time a tuning test case and the
relations between simple queries, join fetch queries, paging query
results, and batch size.
Paging the Query Results
Since I have never said anything about paging in this blog I start with a short introduction about paging in EJB3:
To support paging the EJB3 Query interface defines the following two methods:
- setMaxResults - sets the number of maximum rows to retrieve from the database
- setFirstResult - sets the first row to retrieve
For example if our GUI displays a list of customers and we have 500,000
customers (database rows) in out database we wouldn't like to display
all 500,000 records is one view (even if we put performance
considerations aside - nobody can do anything with a list of 500,000
rows). The GUI design would usually include
paging
- we break the list of records to display into logical pages (for
example 100 records per page) and the user can navigate between pages
(same as Google's results navigator down the search page).
When using the paging support it is important to remember that the query
has to be sorted
otherwise we can't be sure that when fetching the "next page" it will
really be the next page (since in the absence of the 'order by' clause
form a SQL query the order in which rows are fetch is unpredictable).
Here is a sample use, for fetching the first tow pages of 100 rows each:
Query q = entityManager.createQuery("select c from Customer c order by c.id");
q.setFirstResult(0).setMaxResults(100);
. next page
Query q = entityManager.createQuery("select c from Customer c order by c.id");
q.setFirstResult(100).setMaxResults(100);
This is a simple API and it's important (for performance) to remember using it when we need to fetch only parts of the results.
Test Case Description
This test cased is based on a real tuning I did for an application, I
just changed the class names to Customer and Order. Let's assume that I
have a Customer entity with a set of orders (lazily fetched - but it
happens in eager fetch as well) and we need to:
- Fetch customers and their orders
- Do it in a "paging mode" - 100 customers per page
Tuning Requirement #1 - Fetch Customers and Their Orders
There are two possibilities to perform this kind of fetch:
Simple select: select c from customer c order by c.id
Join fetch: select distinct c from Customer c left outer join fetch c.orders order by c.id
The simple select is as simple as it can be, we load a list of
customers with a proxy collection in their orders field. The orders
collection will be filled with data once I access it (for example
c.getOrders().getSize() ). The 'join fetch' means that we want to fetch
an association as an integral part of the query execution. The joined
fetched entities (in the example above: c.orders) must be part of an
association that is referenced by an entity returned from the query (in
the example above: c). The 'join fetch' is one of the tools used for
improving queries performance (see more in
here). The Hibernate core documentations explains that "a 'fetch' join allows associations or collections of values to be
initialized along with their parent objects, using a
single select" (see
here).
I have in my database 18,998 customer records, each with few orders.
Let's compare execution time for the two queries. My code looks the
same for both queries (except of the query itself), I execute the
query, then I iterate the results checking the size of of each customer
orders collection and print the execution time and number of records
fetch (as a sanity for the query syntax):
Query q = entityManager.createQuery(queryStr);
long a = System.currentTimeMillis();
List<Customer> l = q.getResultList();
for (Customer c : l) {
c.getOrders().size();
}
long b = System.currentTimeMillis();
System.out.println("Execution time: " + (b - a)+ "; Number of records fetch: " + l.size() );
And to the numbers (avg. 3 executions):
- Simple select: 24,984 millis
- Join fetch: 1,219 millis
The join fetch query execution time was 20 times faster(!) than the
simple query. The reason is obvious, using the join fetch select I had
only one round trip to the database. While using a simple select I had
to fetch the customers (1 round trip to the database) and each time I
accessed a collection I had another round trip (that's 18,998
additional round trips!).
The winner is 'join fetch'. But does it? wait for the next one - the paging...
Tuning Requirement #2 - Use Paging
The second requirement was to do it in paging - each page will have 100
customers (so we will have 18,900/100+1 pages - the last page has 98
customers). So let's change the code above a little bit:
Query q = entityManager.createQuery(queryStr);
q.setFirstResult(pageNum*100).setMaxResults(100);
long a = System.currentTimeMillis();
List<Customer> l = q.getResultList();
for (Customer c : l) {
c.getOrders().size();
}
long b = System.currentTimeMillis();
System.out.println("Execution time: " + (b - a)+ "; Number of records fetch: " + l.size() );
I added the second line which limits the query result to a specific
page with up to 100 records per page. And the numbers are (avg. 3
executions):
- Simple select: 328 millis
- Join fetch: 1,660 millis
The wheel has turned over. Why? First a quote from the EJB3 Persistence specification:
"The effect of applying setMaxResults or
setFirstResult to a query involving fetch joins over collections is
undefined" (section 3.6.1 - Query Interface)
We
could have stopped here but it is interesting to understand the issue
and to see what Hibernate does.
To implement the paging features Hibernate delegates the work to the
database using its syntax to limit the number of records fetched by the
query. Each database has its own proprietary syntax for limiting the
number of fetched records, some examples:
- Postgres uses LIMIT and OFFSET
- Oracle has rownum
- MySQL uses its version of LIMIT and OFFSET
- MSSQL has the TOP keyword in the select
- and so on
The important thing to remember here is meaning of such limit: the database returns a
subset of the query result.
So if we asked for the first 100 customers which their names contain
'Eyal' the outcome is logically the same as building a table in memory
out of all customers that match the criteria and take from there the
first 100 rows. And here is the catch: if the query with the limit
includes a join clause for a collection
than the first 100 row in the "logical table" will not necessarily be
the first 100 customers. the outcome of the join might duplicate
customers in the "logical tables" but the database doesn't aware or
care about that - it performs operations on tables not on objects!. For
example think of the extreme case, the customer 'Eyal' has 100 orders.
The query will return 100 rows, hibernate will identify that all belong
to the same customer and return only one Customer as the query result -
this is not what we were asking for.
This also works, of course, the other way around. If a customer had
more than 100 orders and the result set size was limited to 100 rots
the orders collection would not contain all of the customer's orders.
To deal with that limitation Hibernate actually doesn't issue an SQL statement with a LIMIT clause. Instead it fetches
all of the records
and performs the paging in memory. This explains why using the 'join
fetch' statement with paging took more than the one without paging -
the delta is the in-memory paging done by Hibernate. If you look at
Hibernate logs you will find the next warning issued by Hibernate:
WARNING: firstResult/maxResults specified with collection fetch; applying in memory!
Final Tuning - BatchSize
Does it mean that in the case of paging we shouldn't use a join fetch?
usually it does (unless your page size is very close to the actual
number of records). But even if you use a simple select this is a
classic case for using the @BatchSize annotation.
If my session/entity manager has 100 customers attached to it than, be
default, for each first access to one of the customers' order
collection Hibernate will issue a SQL statement to fill that
collection. At the end I will execute 100 statements to fetch 100
collections. You can see it in the log:
Hibernate: /* select c from Customer c order by c.id */
select customer0_.id as id0_, customer0_.ccNumber as ccNumber0_, customer0_.name as name0_,
customer0_.fixedDiscount as fixedDis5_0_, customer0_.DTYPE as DTYPE0_ from CUSTOMERS customer0_ order by customer0_.id limit ? offset ?
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_,
orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from
ORDERS orders0_ where orders0_.customer_id=?
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=?
Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_,
orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_,
orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=?
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_,
orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=?
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_,
orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=?
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_,
orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=?
The @BatchSize annotation can be used to define how many
identical associations to populate in a single database query. If the
session has 100 customers attached to it and the mapping of the
'orders' collection is annotated with @BatchSize of size
n.
It means that whenever Hibernate needs to populate a lazy orders
collection it checks the session and if it has more customers which
their orders collections need to be populated it fetches up to
n
collections. Example: if we had 100 customers and the batch size was
set to 16 when iterating over the customers to get their number of
orders hibernate will go to the database only 7 times (6 times to fetch
16 collections and one more time to fetch the 4 remaining collections -
see the sample below). If our batch size was set to 50 it would go only
twice.
@OneToMany(mappedBy="customer",cascade=CascadeType.ALL, fetch=FetchType.LAZY)
@BatchSize(size=16)
private Set<Order> orders = new HashSet<Order>();
And in the log:
Hibernate: /* select c from Customer c order by c.id */
select customer0_.id as id0_, customer0_.ccNumber as ccNumber0_, customer0_.name as name0_, customer0_.fixedDiscount as fixedDis5_0_,
customer0_.DTYPE as DTYPE0_ from CUSTOMERS customer0_ order by customer0_.id limit ? offset ?
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_,
orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_
where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_,
orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_
where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_,
orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_
where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_,
orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_
where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_,
orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_
where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_,
orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_
where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: /* load one-to-many par2.Customer.orders */
select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_,
orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_
where orders0_.customer_id in (?, ?, ?, ?)
Back to our test case. In my example setting the batch size to 100
looks like a nice tuning opportunity. And indeed when setting it to 100
the total execution time dropped to 188 millis (that's an 132 (!!!)
times faster than worse result we had). The batch size can also be set
globally by setting the hibernate.default_batch_fetch_size property for
the session factory.
<原文地址:
http://www.jroller.com/eyallupu/entry/tuning_queries_using_paging_batch >