query-interpreter/README.md

2.7 KiB

query-interpreter

Core program to interpret query language strings into structured data, and back again.

Data Structure Philosophy

We are operating off of the philosophy that the first class data is SQL Statement stings.

From these strings we derive all structured data types to represent those SQL statements. Whether it be CRUD or schema operations.

Our all of these structs will have to implement the Query interface

type Query interface {
	GetFullSql() string
}

So ever struct we create from SQL will need to be able to provide a full and valid SQL statement of itself.

These structs are then where we are able to alter their fields programatically to create new statements altogether.

SQL Tokens

We are currently using DataDog's SQL Tokenizer sqllexer to scan through SQL strings. Here are the general token types it defines:

type TokenType int

const (
 ERROR TokenType = iota
 EOF
 SPACE                  // space or newline
 STRING                 // string literal
 INCOMPLETE_STRING      // incomplete string literal so that we can obfuscate it, e.g. 'abc
 NUMBER                 // number literal
 IDENT                  // identifier
 QUOTED_IDENT           // quoted identifier
 OPERATOR               // operator
 WILDCARD               // wildcard *
 COMMENT                // comment
 MULTILINE_COMMENT      // multiline comment
 PUNCTUATION            // punctuation
 DOLLAR_QUOTED_FUNCTION // dollar quoted function
 DOLLAR_QUOTED_STRING   // dollar quoted string
 POSITIONAL_PARAMETER   // numbered parameter
 BIND_PARAMETER         // bind parameter
 FUNCTION               // function
 SYSTEM_VARIABLE        // system variable
 UNKNOWN                // unknown token
 COMMAND                // SQL commands like SELECT, INSERT
 KEYWORD                // Other SQL keywords
 JSON_OP                // JSON operators
 BOOLEAN                // boolean literal
 NULL                   // null literal
 PROC_INDICATOR         // procedure indicator
 CTE_INDICATOR          // CTE indicator
 ALIAS_INDICATOR        // alias indicator
)

These are an OK generalizer to start with when trying to parse out SQL, but can not be used without conditional logic that checks what the actual keywords are.

Currently we scan through the strings to tokenize it. When steping through the tokens we try to determine the type of query we are working with. At that point we assume the over structure of the rest of the of the statement to fit a particular format, then parse out the details of the statement into the struct correlating to its data type.

Checkout the function ParseSelectStatement from q/select.go as an example.

Improvement Possibilities

  • want to to cut down as many scans as possible by injecting functional dependencies