Don’t use string constants when writing SQL statements in code
I was always in two minds about using constants for table and column names when writing SELECT queries. Now I've concluded that constants are definitely bad, and should not be used. Here's why.
The topic of discussion is difference is between writing
sql = "SELECT * FROM " + TABLE_NAME + " WHERE ..."
and
sql = "SELECT * FROM my_table WHERE ..."
There are the following consequences from this choice as far as I can see:
- A compiler like Java can tell you if you misspell a variable name (e.g. tableName) but not if you misspell "myTable" in the middle of a string (Win for constants, if you're using a static typing language)
- You can rename the values easier: just change and rename the constant. But how often do just table names or columns get renamed? Normally when I change the database I am implementing a new feature, and everything has to be changed anyway. (Marginal win for constants)
- The layout of the query is shorter and easier to read if constants are not used. (Win for not using constants)
To avoid having this choice in the first place:
- One could use an ORM, but at least in Java, e.g. HQL in Hibernate still has a string for column names, and table names if you're doing joins, so the problem is still there.
- Using a system like LINQ in .NET which allows you to specify queries in a way the compiler understands, not just a string. (But can it do everything SQL can do including vendor-specific things?)
- Being able to extend the language with other languages such as SQL and regular expressions. This is a fantasy of mine and a friend, hasn't happened yet. This would work by the compiler working in conjunction with the database engine to assert that the query is valid at compile time (and possibly even creating an db-specific internal parsed representation right there and then.)
Compare the following two pieces of code and I think the choice will become obvious. Both pieces of code come from the current code-base I'm working on, neither have I written myself.
sb.append("SELECT ").append(RecruiterRole.TABLE_NAME).append(".*,"); sb.append(Login.TABLE_NAME).append(".*"); sb.append(" FROM ").append(RecruiterRole.TABLE_NAME); sb.append(',').append(Login.TABLE_NAME); sb.append(" WHERE "); sb.append(RecruiterRole.COLUMN_COMPANY_ID).append(" = ?"); sb.append(" AND "); sb.append(RecruiterRole.TABLE_NAME).append('.'). sb.append(RecruiterRole.COLUMN_LOGIN_ID).append('=?');
vs
sql.append(" SELECT * "); sql.append(" FROM application"); sql.append(" WHERE job_advert_id IN ("); sql.append(" SELECT job_advert_id"); sql.append(" FROM share"); sql.append(" WHERE talent_scout_login_id = ?)"); sql.append(" AND potential_applicant_identity_id NOT IN ("); sql.append(" SELECT potential_applicant_identity_id"); sql.append(" FROM positive_endorsement"); sql.append(" WHERE talent_scout_login_id = ?)"); sql.append(" AND company_id = ?"); sql.append(" AND share_talent_scout_login_id = ?"); sql.append(" ORDER BY datetime_utc DESC");
Here are the reasons why the second code is more readable:
- Because table/column names are inline, the code reads easier
- Indenting is used for sub-selects
- Each condition, order-by is on its own line (e.g. "AND company_id=?")
- Keywords uppercase, column and table names lowercase
The danger of the above code is less that errors in spelling will only be detected at run-time and not at compile-time, but that the query does the wrong thing (while appearing to do the right thing). For example, an error I saw recently (which obviously did not make the live system!) was that users could see data not only from themselves but from all users because the "WHERE login_id=?" had been forgotten. But to the untrained eye, or a user on the test system with only a few users, the query appeared to work.
In this case, it's a clear win for readability, over compile-time checking of a mistake which will is unlikely to happen and will be identified at run-time.