14 March 2012

modular programming in MySQL stored procedures

MySQL stored procedures are handy when things get complicated.

But one major drawback is that they can't call one another.

Nor can functions return tables.

It would be great if they could, because then you could for example create a subquery that could appear as a function call inside different parent queries -- and if you ever needed to change the subquery, it would change in all the parent queries without you needing to edit all the copies.

At present, if the same subquery needs to appear in several different places, it has to appear literally, and you need to keep all the copies up to date individually.

When you start to use a suite of interlocking functions and procedures, things descend very quickly into chaos.

The solution I've found is this: create a modular structure inside a programming environment (I have used ruby), and then generate the full text of your procedure definitions with a call to the ruby script.

For now I'm just going to post the code of the system and hope that it can speak for itself. Later (especially if anyone shows interest) I'll post an instruction manual and examples of use.


# -*- coding: utf-8 -*-
## Andrew wrote this
## 2011-09-29

=begin

purpose
-------

Make SQL procedures and queries modular.

explanation
-----------

Many queries work by building up nested subqueries.

Sometimes, different queries might need to use the same subqueries.

However, the SQL language (at least in MySQL) has no provision for reusing modules.

So if you are editing a subquery that is used in two or three
different queries, you will find yourself needing to make the same
edits in two or three different places. This is no good

Also, queries with nested subqueries can get hard to read. But if you
can chunk things up, and give each chunk a good name, that should help
with the readability.


=end

#################################################
##
##   CONFIGURATION GLOBALS
##
#################################################

## you can change these to tweak the behaviour slightly

# pretty: whether to give the text a pleasing format or not

# the default is to display prettily if the output is a terminal
# (console) unless you supply 'compact' on the command line and
# compactly if it's being piped somewhere, unless you supply 'pretty'
# on the command line

# the default behaviour should just about do, don't you think?
# what exactly are you after?

$pretty = !ARGV.member?('compact') && (ARGV.member?('pretty') || $stdout.isatty)



# $indent_step: ...why on earth would you want to change this?
$indent_step = '  '

#################################################
##
##   UTILITY GLOBALS
##
#################################################

# don't mess with these

# indent: current indentation level; how many steps to indent when
# pretty
$indent = 0



#################################################
##
##   GLOBAL FUNCTIONS
##
#################################################



def ind
  $pretty ? $indent_step * $indent : ''
end

def nl
  $pretty ? "\n#{ind}" : ' '
end

def format (p = '')
  if $pretty
    indent = ind
    p.gsub(/^/,"#{indent}")
  else
    # get rid of all comment lines, and then excess whitespace
    p.gsub(/^[ \t]*#.*$/,'').gsub(/[ \t\n\r]+/,' ')
  end
end



#################################################
##
##   Abbreviations
##
#################################################

# about sodding time!

$abbr = Hash::new

# These are the default characters for identifying the abbreviations

$n = '$'
$u = '~'

# Don't like em?

def set_abbreviation_marker_characters(namealias, unabbreviate)
  set_namealias_abbreviation_marker_character(namealias)
  set_unabbreviate_abbreviation_marker_character(unabbreviate)
end

def set_namealias_abbreviation_marker_character(namealias)
  $n = namealias
end

def set_unabbreviate_abbreviation_marker_character(unabbreviate)
  $u = unabbreviate
end

def unabbreviation (text)
  output = text.gsub(/[#{$n}]([a-z]+)\b/o) { namealias $1.to_sym }
  output = output.gsub(/[#{$u}]([a-z]+)\b/o) { unabbreviate $1.to_sym }
  output
end




# there are two ways that an abbreviation can be expanded -- either
# simply, or assuming that the abbreviation is a table or column name,
# and placing " as xxx" after the expansion, where xxx is the
# abbreviation itself, so that the abbreviation becomes an alias in
# mysql, and you can (in fact, you have to) use it without expanding it

## this is how you define an abbreviation: the abbreviation comes
## first, as a symbol, and then what it means (the expansion)
def abbreviate (abbr, means)
  $abbr[abbr]=means
end

## this is the simple expansion
def unabbreviate (abbr)
  raise NameError, "Abbreviation #{abbr} not found in abbreviations table" unless $abbr.member?(abbr)
  "#{$abbr[abbr]}"
end

## this is the one that results in an alias being defined in mysql
def namealias (abbr)
  raise NameError, "Abbreviation #{abbr} not found in abbreviations table" unless $abbr.member?(abbr)
  "#{$abbr[abbr]} as #{abbr}"
end



#################################################
##
##   VariableList class: a list of variables
##                    belonging to a query
##
#################################################


class VariableList < Array

  def initialize (*variables)
    "Create a list containing the supplied variables"

    variables.each { |v| self << v }
  end

  def <<(v)
    "Add a variable to the list"

    if v.kind_of? Symbol
      super v
    elsif v.kind_of? Variable or v.kind_of? Cursor
      super v.name
    else
      raise TypeError, "Trying to add #{v.inspect} which is class #{v.class} -- Can't add a non-variable"
    end
  end

  def params(p = '')

    "Output the parameter-list: i.e. all the typed but undefined
    variables, separated by commas"

    self.each { |v|
      a = Variable[v].param
      if a.strip != '' and p.strip != ''
        p += ",#{nl}"
      end
      if a.strip != ''
        p += "#{a}"
      end
    }
    p
  end

  def param_count(c=0)
    self.each { |v|
      c += 1 if Variable[v].is_param
    }
    c
  end

  def undeclare_params!
    self.each { |v| Variable[v].undeclare_param! }
  end
   
  def declarations (p = '')

    "Output the declarations of all typed and defined variables in the
    list"

    self.each { |v|
      p += Variable[v].declare
    }
    p   
  end

  def definitions (p = '')

    "Output the definitions of all defined variables in the
    list"

    self.each { |v|
      p += Variable[v].define
    }
    p
  end  

  def simple?

    "Return true if none of the variables on the list have
    definitions; false otherwise"

    self.each { |v|
      return false if Variable[v].is_defined?
    }
    true
  end

  def check
    "verify that all variables on the list are valid"

    self.each { |v|
      raise NameError, "Error: variable #{v} has not been initialised (check spelling)" if Variable[v].nil?
    }
  end

end

#################################################
##
##   Variable class: has a name,
##                    possibly a type
##                    possibly a definition
##                   might be:
##                    a parameter (type but no definition)
##                    a user-variable (no type, will have '@' prepended to name
##                    or a standard defined variable
##                       which is declared with 'declare'
##
#################################################


class Variable < Object

  ## @@store is a list of all the variables that have been defined; we
  ## use it to access them by name, and to make sure that each one is
  ## unique
  @@store = Hash::new()


  def initialize(name, type, definition = nil)
    "
Create a variable, giving it a name, a type and a definition.

If definition is omitted (but type is present) then the variable is a
parameter, and will appear on the parameter list of the generated
procedure.

If type is set to nil, the variable will be a user-defined variable,
and when used within the definition and the body of the query ti must
be preceded by '@'. If a definition is supplied it will be output;
otherwise, according to the MySQL documentation it will default to
NULL, and a string type;
"



    if @@store[name]
      raise ArgumentError, "Variable \"#{name}\" already exists"
    end

    @name = name
    @type = type
    @definition = definition

    ## @param_declared:

    ## this flag is to indicate whether or not a parameter has already
    ## been placed on the parameter list

    # The reason for it is so that procedures wrapping outer queries
    # can specify in what order they want the parameter list to
    # be. Inner queries then need to know if a particular parameter is
    # already on the parameter list or not.

    # However, whenever a new procedure or function is generated, we
    # must first make sure all parameters are undeclared. That's what
    # the undeclare_params! methods are about.

    @param_declared = false


    @@store[name] = self
  end

  def name
    @name
  end

  def type
    @type
  end

  def param
    "If this variable is a parameter and has not already been declared, declare it"

    if is_param
      @param_declared = true
      "#{name}  #{type}"
    else
      ''
    end
  end

  def is_param
    @definition.nil? and !@type.nil? and !param_declared?
  end

  def param_declared? 

    "Keeping track of whether this is a parameter that has already
    been declared or not"

    @param_declared
  end

  def undeclare_param!
    @param_declared = false
  end
   
  def declare

    "Non-parameters (with type) are declared here, using the 'declare'
    keyword"

    ## @type can be nil, meaning it's a user variable; just define it
    ## (if there's a definition), don't declare it

    if !@definition.nil? and !@type.nil?
      "#{nl}declare #{name}  #{type};"
    else
      ''
    end
  end

  def define
    "Output the definitions of all variables which have them"

    # except make sure not to output a loose ';' if the definition is empty

    if is_defined? and @definition.strip != ''
      "#{nl}#{format(@definition).strip};#{nl}"
    else
      ''
    end
  end

  def is_defined?
    "Does this variable have a definition?"

    !@definition.nil?
  end

  def self.[](s)
    "Provides access to named variables using Variable[:name]"
    @@store[s]
  end

end

#################################################
##
##   QueryList class: a list of subqueries
##                    belonging to a Query
##
#################################################

class QueryList < Array

  def initialize (*queries)
    "Create a list containing the supplied queries"

    queries.each { |q| self << q }
  end

  def <<(q)
    "Add a query to the list"

    if q.kind_of? Symbol
      super q
    elsif q.kind_of? Query
      super q.name
    end
  end

  def params (p = '')
    "Output the parameter list for all the queries in the list"

    self.each { |q|
      p = Query[q].params(p)
    }
    p
  end

  def declarations (p = '')
    "Output the variable declarations for all the queries in the list"

    self.each { |q|
      p = Query[q].declarations(p)
    }
    p   
  end

  def definitions (p = '')
    "Output the variable definitions for all the queries in the list"

    self.each { |q|
      p = Query[q].definitions(p)
    }
    p
  end 

  def simple?
    "Are all the queries on this list simple? (no variables with definitions)"

    self.each { |q|
      return false if !Query[q].simple?
    }
    true
  end

  def param_count(c=0)
    self.each { |q|
      c += Query[q].param_count(c)
    }
    c
  end

  def undeclare_params!
    self.each { |q| Query[q].undeclare_params! }
  end

  def check
    "verify that all subqueries are valid"

    self.each { |q|
      raise NameError, "There is no query called '#{q}', check spelling" if Query[q].nil?
      Query[q].check_variables
    }
  end

end


#################################################
##
##   Query class
##
#################################################



class Query < Object
 
  ## @@store holds all Query instances, to aid in access and to avoid
  ## duplicates
  @@store = Hash::new()

  def initialize(name,comment,body,*variables)
    "Create a query -- it must have a name, comment, a body, and may have variables"
   
    raise ArgumentError, "Query name \"#{name}\" already taken" if @@store[name]

    @name = name
    @body = body

    set_comment(comment)

    ## a note about variables:
    ## we can specify the order of parameters

    ## we might have an outer query that uses param B
    ## and an inner query that uses param A

    ## ordinarily, the procedure that this outer query gives rise to
    ## will have a parameter list that goes (B, A) because its own
    ## parameter is found first

    ## but supposing we want it to be (A, B) ?

    ## the way to do this is to redeclare the existing variable (A) in
    ## the outer query's variable list, before (B)

    @variables = VariableList::new(*variables)
    @subqueries = QueryList::new()

    @handlers = Array::new()
    ## can we handle handlers?

    @deterministic = true
    @readonly = true

    @setting = nil



    @@store[name]=self
  end

  ## setting: This is a little fragment of sql into which the subquery
  ## may be placed. It is not obligatory, and in fact probably will be
  ## seldom used. An example might be:

  ## join
  ##   {{}}
  ## on {}.id=othertable.id

  ## where {{}} gets replaced by the output of "subquery" (or what it
  ## would have been without a setting), and {} gets replaced by the
  ## result of "name"

  ## the need for this is to make subqueries optional, so that they
  ## can be placed in an outer query or not, without their absence
  ## rendering the outer query erroneous in syntax

  ## The only slight wrinkle is that if setting is part of the Query
  ## itself, then one subquery can't find itself in several different
  ## settings. It would be better if setting could be associated with
  ## the query only in the QueryList

  def setting
    @setting
  end

  def set_setting(s)
    @setting = s
  end


  def clone(newname)
    "Create identical query, but with no subqueries"

    Query::new(newname,@comment,@body,*@variables)
  end

  ## clone explanation:

  ## the idea is that when you have two alternative ways of generating
  ## equivalent queries, if you want to be able to implement both
  ## alternatives, then you will need to clone the first subquery that
  ## is identical, and from there branch outwards

  ## -- For example: you have query Q1 which may use either subquery A
  ## or subquery B, the end result is the same. So you can either do

  ##   Query[Q1].add_subquery A -- or --
  ##   Query[Q1].add_subquery B

  ## -- yet obviously not both. However, if you want to have side-by-side
  ## implementations of the two alternatives, then you simply clone Q1
 
  ## Query[Q1].clone Q2

  ## this creates a query called Q2 which is identical to Q1
  ## (including variables), save for any subqueries Q1 might have

  ## then

  ##   Query[Q1].add_subquery A -- and --
  ##   Query[Q2].add_subquery B

  ## are possible, and the corresponding procedures may be generated
  ## and compared

  ## end of clone explanation


  def add_variable (v)
    "Add a variable to this query"

    @variables << v
  end

  def add_variables (*v)
    v.each { |x| add_variable x }

## (we don't understand why "@variables.concat v" causes an error of
## undefined method 'param' for Nil later in the code, when it should
## do the same as the above)
  end

  ## sometimes an outer query (after cloning) is used with a subquery
  ## that takes a different parameter than the original subquery did
  ## -- this means that the original parameter has to be replaced with
  ## a new one, which is what replace_variable does

  def replace_variable(old, new)
    @variables[@variables.index(old)]=new
  end

  def add_subquery (s)
    "Add a subquery to this query"

    if s.kind_of? Symbol
      n = s
    elsif s.kind_of? Query
      n = s.name
    else
      raise TypeError, "Can't add '#{s.inspect}' -- #{s.class} can't be converted to a query"
    end

    ## avoid endless loops in the generation
    if @name == n
      raise NameError, "Can't add self (\"#{@name}\") as subquery"
    end
   
    @subqueries << s
  end

  def add_handler (h)
    @handlers << h
  end

  def set_comment (comment)
    "Make query's comment be this"

    @comment = comment
  end

  def comment
    "Output query name and comment text, every line preceded by '## '"

    return '' if @comment.nil? or !$pretty
    format("\n\n#{name}\n\n#{@comment.strip}\n".gsub(/^(.+)$/, '## \1'))
  end

  def set_not_deterministic
    @deterministic = false
  end

  def deterministic?
    return false if !@deterministic
    @subqueries.each { |s|
      return false if !Query[s].deterministic?
    }
    true
  end

  def set_modifies_sql_data
    @readonly = false
  end

  def readonly?
    return false if !@readonly
    @subqueries.each { |s|
      return false if !Query[s].readonly?
    }
    true
  end


  def name
    @name
  end

  def body
    "Output the entire body of this query, including any subqueries"

    output = format(@body)
    output = output.gsub(/[{]{2}(\d+)[}]{2}/) {
      i = $1.to_i
      @subqueries.count > i ? Query[@subqueries[i]].subquery : ''
    }
    output = output.gsub(/[{](\d+)[}]/) {
      i = $1.to_i
      @subqueries.count > i ? Query[@subqueries[$1.to_i]].name : ''
    }
    output = unabbreviation output
    "#{comment}#{output}"
  end

  def query
    "Output the full body, including subqueries, but not as a subquery itself"
    "#{nl}#{body.strip};#{nl}"
  end

  def subquery
    "Output the full body as a named subquery"

    $indent += 1
    show = "(#{nl}#{body}#{nl}"
    $indent -= 1

    op = "#{show}) as #{name}#{nl} "

    if !@setting.nil?
      op = @setting.gsub(/[{]{2}[}]{2}/, op)
      op = op.gsub(/[{][}]/, name.to_s)
    end

    op
  end

  def function_query
    "Output the full body, including subqueries, in parenthesis and with 'return'"
    "return(#{body.strip});#{nl}"
  end



  def params(p = '')
    "Output the complete parameter list of this query (including subqueries)"

    p = @subqueries.params(@variables.params(p))
  end

  def param_list
    "Output all parameters enclosed in parenthesis, trying to nicely format"
    if !$pretty || param_count == 1
      "( #{params} )"
    else
      "(#{nl}#{params}#{nl})"
    end
  end

  def param_count(c = 0)
    c = @subqueries.param_count(@variables.param_count(c))
  end

  def undeclare_params!
    @subqueries.undeclare_params!
    @variables.undeclare_params!
  end

  def declarations (p = '')
    "Output the complete variable declarations list of this query (including subqueries)"

    @subqueries.declarations(@variables.declarations(p)) 
  end

  def definitions (p = '')
    "Output the complete variable definitions list of this query (including subqueries)"

    @subqueries.definitions(@variables.definitions(p))  
  end 

  def preamble
    p = ''
    if deterministic?
      p += "deterministic\n"
    else
      p += "not deterministic\n"
    end

    if readonly?
      p += "reads sql data\n"
    else
      p += "modifies sql data\n"
    end

    if !simple?
      p += "begin\n"
      $indent +=1
    end

    p
  end

  def put_vars
    "#{declarations}#{definitions}"
  end

  def put_handlers
    output = ''
    @handlers.each { |h|
      output << "#{nl}declare continue handler for not found set #{h} = true;#{nl}"
    }
    output
    end


  def postamble
    return '' if simple?
    $indent -=1
<<output

end//
delimiter ;
output
  end

  def procedure
    undeclare_params!
    <<output
drop procedure if exists #{@name};#{ self.simple? ? '' : "\ndelimiter //" }
create procedure #{@name}#{param_list}
#{preamble}#{put_vars}#{put_handlers}#{query}#{postamble}
output
  end

  def function (type)
    undeclare_params!
    <<output
drop function if exists #{@name};#{ self.simple? ? '' : "\ndelimiter //" }
create function #{@name}#{param_list}
returns #{type}#{nl}#{preamble}#{put_vars}#{put_handlers}#{function_query}#{postamble}
output
  end

  def drop_procedure
    "drop procedure if exists #{name};\n"
  end

  def drop_function
    "drop function if exists #{name};\n"
  end

  def drop
    drop_procedure
    drop_function
  end

  def self.store
    @@store
  end

  def self.[](n)
    @@store[n]
  end

  def self.clean_up

    "Drop any procedures or functions that might have been generated
    from all active queries"

    # cover all bases
    @@store.each { |n,v|
      puts "drop procedure if exists #{n};\ndrop function if exists #{n};\n"
    }
    puts "\n"
  end

  def simple?

    "See if any of this query's variables are defined, or if any of
    the subqueries have defined variables, and if so return false;
    otherwise return true -- meaning that the generated routine won't
    need 'begin' or 'end' or a delimiter switch"

    @variables.simple? and @subqueries.simple?
  end

  def check_variables
    @variables.check
  end

  def check
    check_variables
    @subqueries.check
    self
  end

end

## As explained above, the idea of this class is to associate a
## subquery with a setting, but in such a way that the same query can
## be associated with different settings
class QueryInSetting < Query

  ## ... I guess this is not finished...

end


#################################################
##
##   Cursor class
##
#################################################



class Cursor < Variable

  ## the Cursor is a variable which accepts a Query as its "type"
  ## the actual type becomes "cursor for " and then the body of the query

## strike out the next two lines
  ## it needs to designate an existing Variable as a CONTINUE HANDLER
  ## -- that is what the second argument of the initialize method is for
## end of strike-out

  ## there are elements of this implementation that need research --
  ## for example, how is it going to work for a procedure query to
  ## have more than one cursor? can Cursors share continue handlers?
  ## (example procedures on the web never seem to have more than one
  ## continue handler, even when they have multiple cursors)

  ## if it's true that one procedure can never have multiple
  ## continue-handlers, then we need a way to make sure that when we
  ## add multiple cursors to a procedure, each cursor can know about the
  ## single C-H that the parent procedure is going to use...
  ## it would be a single property added to the VariableList class

  ## Also, we currently add the variables to the cursors -- but this
  ## is not quite right, the parent procedure should handle everything
  ## about the cursor variables.

  def initialize(query, code=nil)
    # A cursor owns its query
    @query = query
    super(@query.name,"cursor for #{nl}#{format @query.body.strip}",'')

    ## @code is the code that should be executed each time the
    ## cursor fetches -- we automatically set up a loop, and this code
    ## goes inside the loop

    @code = code
  end

  ## we need to derive the declare method so that the query's
  ## declarations come above, then the cursor's own declaration, then
  ## the query's definitions

  def declare
   
    "#{@query.declarations}#{super}#{@query.definitions}"
  end

  ## also we need to be able to pull out the query's params
  def param
    # make sure our query outputs its params
    @query.undeclare_params!
    @query.params
  end

  def name
    "#{super}_cursor"
  end

  def basename
    @query.name
  end

  def code
    @code.nil? ? '' : @code
  end

  def loop (variables,
            replacementcode = nil,
            countername = nil,
            handlername = 'cursor_done')
   
    counter = (countername.nil? or countername == '') ? 'counter' : countername

"
set #{handlername} = false;
set #{counter} = 0;
  open #{name};
  #{basename}_loop: loop
  fetch #{name} into #{variables};
  if #{handlername} then leave #{basename}_loop;
  end if;

  set #{counter} = #{counter} + 1;

  #{unabbreviation(format(replacementcode.nil? ? code : replacementcode))}

end loop;

#{countername.nil? ? '' : "select #{counter} as #{basename};"}

close #{name}"
  end

end



## in all cases, the check methods raise an exception if anything is
## not OK (thus not returning); the Query.check method returns self if
## OK

def check (x)
  q = Query[x]
  raise NameError, "There is no query called #{x} (check spelling)" if q.nil?
  q.check
end

## I wonder if this will save typing
def q(x)
  check x
end

def v(x)
  Variable[x]
end

def proc (p)
  "Correct way to generate a procedure"
  puts check(p).procedure
end

def func (f, type)
  "Correct way to generate a function"
  puts check(f).function(type)
end

## you might need to make sure a certain procedure or function is removed

def unproc (p)
  puts check(p).drop_procedure
  puts "\n"
end

def unfunc (f)
  puts check(f).drop_function
  puts "\n"
end

## supposing we quickly would like to generate the loop for a cursor

## we'd need the cursor's name, the list of variables to fetch into,
## the code to execute in the loop, the name of the not-found handler
## (can default to "done"), and perhaps whether to output the counter,
## and what to call it if so

No comments: