Archive for April, 2008

MySQL conference 2008 – Monday tutorial – SQL Antipatterns

April 24, 2008

SQL Antipatterns

Bill Karwin (Karwin Software Solutions)

 

 

 

 

Official notes:

    sql_antipatterns_mysqluc2008

 

My notes:

SQL Antipatterns – Bill Karwin

Logic

  • Comma separated lists
    • Don’t do it!
    • See slides for ways to handle queries if a field does have several ids in one field, separated by a delimiter.
    • SOLUTION: a many-to-many relationship always requires an intersection table.
  • Multi-Column Attributes
    • E.g. using columns like ref1, ref2, ref3
    • Antipattern: Add more columns
    • But how many columns are enough?
    • SOLUTION: Use an intersection table
  • Entity-Attribute-Value
    • Objective: Make a table with a variable set of attributes. i.e. some columns only apply to some types of rows.
    • Antipattern: store al attributes in a second table, one attribute per row
    • Bad: mixing data with meta-data

 

Physical

 

  • Many tables, split by, e.g. date
    • To make it more performant
    • SOLUTION #1: Use horizontal partitioning (aka sharding – MySQL 5.1)
    • SOLUTION #2: use vertial partitioning – move a seldom used column into a new table, with a 1-to-1 relationship.
  • ID required
    • Forcing an ID column, even if there’s already a unique id column called something else
    • Use natural keys when needed: e.g. unique ID.
    • Use compound keys when needed: PRIMARY KEY (bug_id, product_id)
    • If column has same name across tables, can use USING syntax. Not more efficient though.
    • Choose sensible names (ask others!)
  • Phantom files
    • Objective: store screenshot images
    • ANTIPATTERN: Store string with path to image.
    • Files don’t obey delete, or update, rollback.
    • Files won’t get backed up.
    • Aren’t made secure through DB security
    • SOLUTION: Use BLOBS
  • FLOAT antipattern
    • Objective: store real numbers exactly
    • FLOAT datatype is inexact!!
    • 3.3 * 1000000000 = 3299999999952.3163
    • SOLUTION: Use NUMERIC or DECIMAL type instead.
  • ENUM
  • Objective: restrict a columns values to a fixed set of values
    • Bad because you have to redefine the table in order to add a type.
    • Instead have a separate lookup table.
    • Updates become a data change, not a meta-data change.
    • To retire values, create a new column in the lookup table ‘active’.
  • Readable passwords
    • Don’t store passwords in plaintext!
    • SOLUTION: use message digest function (e.g. MD5, SHA1/2)
      • Good: Compare md5(pass) to stored_pass
      • Better: concatenate a salt before using the hash function (store salt in db, even for each user also)
      • Best:
        • get salt from db
        • hash password with salt in app
        • then send hashed pass in query!
      • Even better: get the salt and the hash from the DB, then compare in the app. Avoids the need for 2 DB calls.
  •  

 

Query

 

  • Missed a load (TO DO: Download from tutorial notes)
  • HAVING antipattern
    • WHERE
    • SELECT
    • Define column aliases
    • GROUP BY
    • HAVING
    • ORDER BY
  • Poor Man’s Search Engine (full text index: MATCH… AGAINST…)
    • Using LIKE or RLIKE (regexp).
    • Indexes don’t benefit or substring searches.
    • Can only benefit from index if searching for field beginning with string.
    • No word boundaries
    • SOLUTION: Use MySQL FULLTEXT index (MyISAM), or Lucene, or www.sphinxsearch.org
  • Implicit columns
    • SELECT * FROM bugs
    • INSERT INTO bugs VALUES (…)
    • Always name columns explicitly, even for bulk inserts, to stop errors.
  •  

 

Application

 

  • User supplied SQL
    • Satisfy demand for flexibility
    • Antipattern: Let users write their own SQL expressions and run verbatim
    • SOLUTION: User supplies values, which you plug into safely prepared queries.
  • SQL Injection attack
    • Interpolating untrusted data into SQL expressions
    • SOLUTIONS
      • Filter input (e.g. value must be an integer)
      • For strings, escape them for MySQL (php: mysql_real_escape_string)
      • Use parameters for queries (question marks)
  • Parameter façade
    • Objective: include application parameters in SQL statements
    • Interpolation can change syntax, etc.
    • Much, much safer to use parameters in queries to prevent injection attack.
  • He really laboured the point for those last two
  • Pseudokey neat freak
    • Trying to re-use deleted id keys
    • UNSIGNED BIGINT: 18,111,222,333,444,555,666
  • Session coupling
    • Decrease overhead of making a database connection
    • E.g. PHP “Share nothing’ architecture or CGI applications
    • Antipattern: Persistent connections from httpd to database
    • Uses up database resources too much
    • Uncommitted transactions can be messed up across users.
    • Characters sets are inherited across users, as they are per session
    • Session variables like @@autocommit are inherited
  • Phantom side-effects
    • E.g. triggers to send email
    • Database can be out of sync with the real world
    • Accessing remote resources can take too long, And hang up the database
    • SOLUTION: In triggers, stored procedures and database functions, operate only on the database itself.
    • Use an external application to do external things

 

MySQL conference 2008 – Monday tutorial – Building Scalable & High Performance Datamarts with MySQL

April 24, 2008

Building Scalable & High Performance Datamarts with MySQL

Tangirala Sarma (DW/BI Architect)

 

[placeholder. notes were written by hand]

MySQL conference 2008 – Tuesday keynote speeches

April 24, 2008
Mårten Mickos from MySQL said how MySQL and Sun communities are so similar, so the companies will work well together.

I found Johnathan Schwartz from Sun very inspiring:

There is *no* proprietary software left in supercomputing. Massive open supercomputer in Texas is already full up.

Computational Science is the third mode of discovery, complementing physical + ??

People stay poor because their wealth can be stolen and because of lack of communication. that’s why Standard bank and MTN are giving free phones to African farmers. Sun systems runs the software.

The Network has become a social utility like water.

In IT we can choose what we do.

MySQL gets 70,000 d/l a day

there are 40 million total developers worldwide

Werner Vogels from Amazon.com

Now a technology company.

Moved the business logic on top of the DB.

no direct dm access except thru business logic.

Webpage dependencies (250-300)!

Instead of waterfall, Develop, Test, Operate cycle is software as a service. ITS A LIE! Massive infrastructure is required before operation: Hardware, Load balancers, etc.

They made each team responsible for its own infrastructure. It took 70pc of their time! so they became expert in it.

- They expect their servers/data to fail, and their software to tolerate this.

8-10pc of HDD fail every year.

BOOK: Getting real.

Dont scale. its too hard. (use EC2).

If youre going to grow, you need to grow constantly, one step at a time.

Before, people hoarded resource/servers as they never knew when they could need them again.

Everything fails, all the time.

They treat all hardware as unreliable.

Let go of control and determinism.

dont build systems based of average performance, engineer for 99.9pc of usage.

Amazon Web Services: http://aws.amazon.com

argh sales pitch!

 

 

 

 

Activesync is okay

April 24, 2008

I got my Windows Mobile 5 to Activesync with a laptop running Windows XP and copied all my Notes files across into Office 2000. Unfortunately I need Office XP to copy the Tasks and Contacts.

ActiveSync is rubbish

April 4, 2008

I connect my Orange SPV M3100 Windows Mobile phone to my Windows XP PC in order to back up some contacts and notes from my phone to the PC. It only requires me to install ActiveSync 4.5, unset the “use advanced settings” on my phone, and it just works. That’s good.
But then it starts copying 7000 work contacts from my PC’s outlook address book, mixing them all up with my personal contacts on the phone. And I didn’t even click ’sync’ or ask it to in any way!! After I click “Stop”, it continues to reconnect and do it again, every few minutes until I unplug the cable.

Actually the whole ActiveSync software seem pretty rigid and unyielding. It does what it wants to do, and you have very little control over it. New plan: Try installing SyncCE – an open source syncing software – on Mac OSX.

  1. Follow the instructions at http://www.synce.org/moin/SynceInstallation/Source
  2. The first package: libsynce-0.11 requires another package called dbus.
  3. Download and configure dbus (http://www.freedesktop.org/wiki/Software/dbus)
  4. dbus make errors like this:
  5. /usr/bin/libtool: unknown option character `z' in: -zUsage: /usr/bin/libtool -static [-] file [...] [-filelist listfile[,dirname]] [-arch_only arch] [-sacLT]
    Usage: /usr/bin/libtool -dynamic [-] file [...] [-filelist listfile[,dirname]] [-arch_only arch] [-o output] [-install_name name] [-compatibility_version #] [-current_version #] [-seg1addr 0x#] [-segs_read_only_addr 0x#] [-segs_read_write_addr 0x#] [-seg_addr_table ] [-seg_addr_table_filename ] [-all_load] [-noall_load]
  6. libtool doesn’t have a -z option. The version is: Apple Computer, Inc. version cctools-590.18.
  7. Try installing the GNU version. It installs in /usr/local/bin/, while the Mac version is in /usr/bin/. The $PATH variable shows that it will look in /usr/local/bin/ first, good.
  8. Run dbus configure and make again. Same error.
  9. Change the dbus Makefile to have LIBTOOL = $(SHELL) /usr/local/bin/libtool hardcoded.
  10. Run dbus make again. Same error. wtf?