We Recommend These Resources
I was positively surprised to see how popular my recent listing about 10 Common Mistakes Java Developers Make when Writing SQL was, both on my own blog and on my syndication partner DZone. The popularity shows a couple of things:
- How important SQL is to the professional Java world.
- How common it is to forget about some basic SQL things.
- How well SQL-centric libraries such as jOOQ or MyBatis are responding to market needs, by embracing SQL. An amusing fact is that users have even mentioned my blog post on SLICK’s mailing list. SLICK is a non-SQL-centric database access library in Scala. LikeLINQ (and LINQ-to-SQL) it focuses on language integration, not on SQL code generation.
Anyway, the common mistakes I listed previously are far from complete, so I will treat you to a sequel of 10 subtly less common, yet equally interesting mistakes Java developers make when writing SQL.
1. Not using PreparedStatements
Interestingly, this mistake or misbelief still surfaces blogs, forums and mailing lists many years after the appearance of JDBC, even if it is about a very simple thing to remember and to understand. It appears that some developers refrain from using PreparedStatements for any of these reasons:
- They don’t know about PreparedStatements
- They think that PreparedStatements are slower
- They think that writing a PreparedStatement takes more effort
First off, let’s bust the above myths. In 96% of the cases, you’re better off writing a PreparedStatement rather than a static statement. Why? For simple reasons:
- You can omit syntax errors originating from bad string concatenation when inlining bind values.
- You can omit SQL injection vulnerabilities from bad string concatenation when inlining bind values.
- You can avoid edge-cases when inlining more “sophisticated” data types, such as TIMESTAMP, binary data, and others.
- You can keep open PreparedStatements around for a while, reusing them with new bind values instead of closing them immediately (useful in Postgres, for instance).
- You can make use of adaptive cursor sharing (Oracle-speak) in more sophisticated databases. This helps prevent hard-parsing SQL statements for every new set of bind values.
Convinced? Yes. Note, there are some rare cases when you actually want to inline bind values in order to give your database’s cost-based optimiser some heads-up about what kind of data is really going to be affected by the query. Typically, this results in “constant” predicates such as:
- DELETED = 1
- STATUS = 42
But it shouldn’t result in “variable” predicates such as:
- FIRST_NAME LIKE “Jon%”
- AMOUNT > 19.95
Note that modern databases implement bind-variable peeking. Hence, by default, you might as well use bind values for all your query parameters. Note also that higher-level APIs such as JPA CriteriaQuery or jOOQ will help you generate PreparedStatements and bind values very easily and transparently when writing embedded JPQL or embedded SQL.
More background info:
- Caveats of bind value peeking: An interesting blog post by Oracle Guru Tanel Poder on the subject
- Cursor sharing. An interesting Stack Overflow question.
By default, always use PreparedStatements instead of static statements. By default, never inline bind values into your SQL.
2. Returning too many columns
This mistake is quite frequent and can lead to very bad effects both in your database’s execution plan and in your Java application. Let’s look at the second effect first:
Bad effects on the Java application:
If you’re selecting * (star) or a “default” set of 50 columns, which you’re reusing among various DAOs, you’re transferring lots of data from the database into a JDBC ResultSet. Even if you’re not reading the data from the ResultSet, it has been transferred over the wire and loaded into your memory by the JDBC driver. That’s quite a waste of IO and memory if you know that you’re only going to need 2-3 of those columns.
This was obvious, but beware also of…
Bad effects on the database execution plan:
These effects may actually be much worse than the effects on the Java application. Sophisticated databases perform a lot of SQL transformation when calculating the best execution plan for your query. It may well be that some parts of your query can be “transformed away”, knowing that they won’t contribute to the projection (SELECT clause) or to the filtering predicates. I’ve recently blogged about this in the context of schema meta data:
How schema meta data impacts Oracle query transformations
Now, this is quite a beast. Think about a sophisticated SELECT that will join two views:
SELECT * FROM customer_view c JOIN order_view o ON c.cust_id = o.cust_id
Each of the views that are joined to the above joined table reference might again join data from dozens of tables, such as CUSTOMER_ADDRESS, ORDER_HISTORY, ORDER_SETTLEMENT, etc. Given the SELECT * projection, your database has no choice but to fully perform the loading of all those joined tables, when in fact, the only thing that you were interested in was this:
SELECT c.first_name, c.last_name, o.amount FROM customer_view c JOIN order_view o ON c.cust_id = o.cust_id
A good database will transform your SQL in a way that most of the “hidden” joins can be removed, which results in much less IO and memory consumption within the database.
Never execute SELECT *. Never reuse the same projection for various queries. Always try to reduce the projection to the data that you really need.
Note that this can be quite hard to achieve with ORMs.
3. Thinking that JOIN is a SELECT clause
This isn’t a mistake with a lot of impact on performance or SQL correctness, but nevertheless, SQL developers should be aware of the fact that the JOIN clause is not part of the SELECT statement per se. The SQL standard 1992 defines a
table reference as such: