81 lines
2.7 KiB
Markdown
81 lines
2.7 KiB
Markdown
# query-interpreter
|
|
|
|
Core program to interpret query language strings into structured data, and back again.
|
|
|
|
## Data Structure Philosophy
|
|
|
|
We are operating off of the philosophy that the first class data is SQL Statement stings.
|
|
|
|
From these strings we derive all structured data types to represent those SQL statements.
|
|
Whether it be CRUD or schema operations.
|
|
|
|
Our all of these structs will have to implement the `Query` interface
|
|
|
|
```go
|
|
type Query interface {
|
|
GetFullSql() string
|
|
}
|
|
```
|
|
|
|
So ever struct we create from SQL will need to be able to provide a full and valid SQL
|
|
statement of itself.
|
|
|
|
These structs are then where we are able to alter their fields programatically to create
|
|
new statements altogether.
|
|
|
|
|
|
## SQL Tokens
|
|
|
|
We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings.
|
|
Here are the general token types it defines:
|
|
|
|
```go
|
|
type TokenType int
|
|
|
|
const (
|
|
ERROR TokenType = iota
|
|
EOF
|
|
SPACE // space or newline
|
|
STRING // string literal
|
|
INCOMPLETE_STRING // incomplete string literal so that we can obfuscate it, e.g. 'abc
|
|
NUMBER // number literal
|
|
IDENT // identifier
|
|
QUOTED_IDENT // quoted identifier
|
|
OPERATOR // operator
|
|
WILDCARD // wildcard *
|
|
COMMENT // comment
|
|
MULTILINE_COMMENT // multiline comment
|
|
PUNCTUATION // punctuation
|
|
DOLLAR_QUOTED_FUNCTION // dollar quoted function
|
|
DOLLAR_QUOTED_STRING // dollar quoted string
|
|
POSITIONAL_PARAMETER // numbered parameter
|
|
BIND_PARAMETER // bind parameter
|
|
FUNCTION // function
|
|
SYSTEM_VARIABLE // system variable
|
|
UNKNOWN // unknown token
|
|
COMMAND // SQL commands like SELECT, INSERT
|
|
KEYWORD // Other SQL keywords
|
|
JSON_OP // JSON operators
|
|
BOOLEAN // boolean literal
|
|
NULL // null literal
|
|
PROC_INDICATOR // procedure indicator
|
|
CTE_INDICATOR // CTE indicator
|
|
ALIAS_INDICATOR // alias indicator
|
|
)
|
|
|
|
```
|
|
|
|
These are an OK generalizer to start with when trying to parse out SQL, but can not be used
|
|
without conditional logic that checks what the actual keywords are.
|
|
|
|
Currently we scan through the strings to tokenize it. When steping through the tokens we try
|
|
to determine the type of query we are working with. At that point we assume the over structure
|
|
of the rest of the of the statement to fit a particular format, then parse out the details of
|
|
the statement into the struct correlating to its data type.
|
|
|
|
Checkout the function `ParseSelectStatement` from `q/select.go` as an example.
|
|
|
|
## Improvement Possibilities
|
|
|
|
- want to to cut down as many `scan`s as possible by injecting functional dependencies
|