Bill Karwin (Karwin Software Solutions)
Official notes:
My notes:
SQL Antipatterns – Bill Karwin
Logic
- Comma separated lists
- Don’t do it!
- See slides for ways to handle queries if a field does have several ids in one field, separated by a delimiter.
- SOLUTION: a many-to-many relationship always requires an intersection table.
- Multi-Column Attributes
- E.g. using columns like ref1, ref2, ref3
- Antipattern: Add more columns
- But how many columns are enough?
- SOLUTION: Use an intersection table
- Entity-Attribute-Value
- Objective: Make a table with a variable set of attributes. i.e. some columns only apply to some types of rows.
- Antipattern: store al attributes in a second table, one attribute per row
- Bad: mixing data with meta-data
Physical
- Many tables, split by, e.g. date
- To make it more performant
- SOLUTION #1: Use horizontal partitioning (aka sharding – MySQL 5.1)
- SOLUTION #2: use vertial partitioning – move a seldom used column into a new table, with a 1-to-1 relationship.
- ID required
- Forcing an ID column, even if there’s already a unique id column called something else
- Use natural keys when needed: e.g. unique ID.
- Use compound keys when needed: PRIMARY KEY (bug_id, product_id)
- If column has same name across tables, can use USING syntax. Not more efficient though.
- Choose sensible names (ask others!)
- Phantom files
- Objective: store screenshot images
- ANTIPATTERN: Store string with path to image.
- Files don’t obey delete, or update, rollback.
- Files won’t get backed up.
- Aren’t made secure through DB security
- SOLUTION: Use BLOBS
- FLOAT antipattern
- Objective: store real numbers exactly
- FLOAT datatype is inexact!!
- 3.3 * 1000000000 = 3299999999952.3163
- SOLUTION: Use NUMERIC or DECIMAL type instead.
- ENUM
- Objective: restrict a columns values to a fixed set of values
- Bad because you have to redefine the table in order to add a type.
- Instead have a separate lookup table.
- Updates become a data change, not a meta-data change.
- To retire values, create a new column in the lookup table ‘active’.
- Readable passwords
- Don’t store passwords in plaintext!
- SOLUTION: use message digest function (e.g. MD5, SHA1/2)
- Good: Compare md5(pass) to stored_pass
- Better: concatenate a salt before using the hash function (store salt in db, even for each user also)
- Best:
- get salt from db
- hash password with salt in app
- then send hashed pass in query!
- Even better: get the salt and the hash from the DB, then compare in the app. Avoids the need for 2 DB calls.
Query
- Missed a load (TO DO: Download from tutorial notes)
- HAVING antipattern
- WHERE
- SELECT
- Define column aliases
- GROUP BY
- HAVING
- ORDER BY
- Poor Man’s Search Engine (full text index: MATCH… AGAINST…)
- Using LIKE or RLIKE (regexp).
- Indexes don’t benefit or substring searches.
- Can only benefit from index if searching for field beginning with string.
- No word boundaries
- SOLUTION: Use MySQL FULLTEXT index (MyISAM), or Lucene, or www.sphinxsearch.org
- Implicit columns
- SELECT * FROM bugs
- INSERT INTO bugs VALUES (…)
- Always name columns explicitly, even for bulk inserts, to stop errors.
Application
- User supplied SQL
- Satisfy demand for flexibility
- Antipattern: Let users write their own SQL expressions and run verbatim
- SOLUTION: User supplies values, which you plug into safely prepared queries.
- SQL Injection attack
- Interpolating untrusted data into SQL expressions
- SOLUTIONS
- Filter input (e.g. value must be an integer)
- For strings, escape them for MySQL (php: mysql_real_escape_string)
- Use parameters for queries (question marks)
- Parameter façade
- Objective: include application parameters in SQL statements
- Interpolation can change syntax, etc.
- Much, much safer to use parameters in queries to prevent injection attack.
- He really laboured the point for those last two
- Pseudokey neat freak
- Trying to re-use deleted id keys
- UNSIGNED BIGINT: 18,111,222,333,444,555,666
- Session coupling
- Decrease overhead of making a database connection
- E.g. PHP “Share nothing’ architecture or CGI applications
- Antipattern: Persistent connections from httpd to database
- Uses up database resources too much
- Uncommitted transactions can be messed up across users.
- Characters sets are inherited across users, as they are per session
- Session variables like @@autocommit are inherited
- Phantom side-effects
- E.g. triggers to send email
- Database can be out of sync with the real world
- Accessing remote resources can take too long, And hang up the database
- SOLUTION: In triggers, stored procedures and database functions, operate only on the database itself.
- Use an external application to do external things