doc: update readme with more hogh level details, linked to SQL token doc
This commit is contained in:
parent
e08c84de3a
commit
87bc13631a
101
README.md
101
README.md
@ -1,5 +1,9 @@
|
|||||||
# query-interpreter
|
# query-interpreter
|
||||||
|
|
||||||
|
**README LAST UPDATED: 04-15-25**
|
||||||
|
|
||||||
|
This project is under active development and is subject to change often and drastically as I am likely an idiot.
|
||||||
|
|
||||||
Core program to interpret query language strings into structured data, and back again.
|
Core program to interpret query language strings into structured data, and back again.
|
||||||
|
|
||||||
## Data Structure Philosophy
|
## Data Structure Philosophy
|
||||||
@ -27,54 +31,69 @@ new statements altogether.
|
|||||||
## SQL Tokens
|
## SQL Tokens
|
||||||
|
|
||||||
We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings.
|
We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings.
|
||||||
Here are the general token types it defines:
|
The general token types it defines can be found [here](/docs/SQL_Token_Types.md)
|
||||||
|
|
||||||
```go
|
|
||||||
type TokenType int
|
|
||||||
|
|
||||||
const (
|
|
||||||
ERROR TokenType = iota
|
|
||||||
EOF
|
|
||||||
SPACE // space or newline
|
|
||||||
STRING // string literal
|
|
||||||
INCOMPLETE_STRING // incomplete string literal so that we can obfuscate it, e.g. 'abc
|
|
||||||
NUMBER // number literal
|
|
||||||
IDENT // identifier
|
|
||||||
QUOTED_IDENT // quoted identifier
|
|
||||||
OPERATOR // operator
|
|
||||||
WILDCARD // wildcard *
|
|
||||||
COMMENT // comment
|
|
||||||
MULTILINE_COMMENT // multiline comment
|
|
||||||
PUNCTUATION // punctuation
|
|
||||||
DOLLAR_QUOTED_FUNCTION // dollar quoted function
|
|
||||||
DOLLAR_QUOTED_STRING // dollar quoted string
|
|
||||||
POSITIONAL_PARAMETER // numbered parameter
|
|
||||||
BIND_PARAMETER // bind parameter
|
|
||||||
FUNCTION // function
|
|
||||||
SYSTEM_VARIABLE // system variable
|
|
||||||
UNKNOWN // unknown token
|
|
||||||
COMMAND // SQL commands like SELECT, INSERT
|
|
||||||
KEYWORD // Other SQL keywords
|
|
||||||
JSON_OP // JSON operators
|
|
||||||
BOOLEAN // boolean literal
|
|
||||||
NULL // null literal
|
|
||||||
PROC_INDICATOR // procedure indicator
|
|
||||||
CTE_INDICATOR // CTE indicator
|
|
||||||
ALIAS_INDICATOR // alias indicator
|
|
||||||
)
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
These are an OK generalizer to start with when trying to parse out SQL, but can not be used
|
These are an OK generalizer to start with when trying to parse out SQL, but can not be used
|
||||||
without conditional logic that checks what the actual keywords are.
|
without some extra conditional logic that checks what the actual values are.
|
||||||
|
|
||||||
Currently we scan through the strings to tokenize it. When steping through the tokens we try
|
Currently we scan through the strings to tokenize it. When stepping through the tokens we try
|
||||||
to determine the type of query we are working with. At that point we assume the over structure
|
to determine the type of query we are working with. At that point we assume the over all structure
|
||||||
of the rest of the of the statement to fit a particular format, then parse out the details of
|
of the rest of the of the statement to fit a particular format, then parse out the details of
|
||||||
the statement into the struct correlating to its data type.
|
the statement into the struct correlating to its data type.
|
||||||
|
|
||||||
Checkout the function `ParseSelectStatement` from `q/select.go` as an example.
|
## Scan State
|
||||||
|
|
||||||
|
As stated, we scan through the strings, processing each each chunk, delineated by spaces and
|
||||||
|
punctuation, as a token. To properly interpret the tokens from their broad `token.Type`s, we
|
||||||
|
have to keep state of what else we have processed so far.
|
||||||
|
|
||||||
|
This state is determined by a set off flags depending on query type.
|
||||||
|
|
||||||
|
For example, a Select query will have:
|
||||||
|
```go
|
||||||
|
passedSELECT := false
|
||||||
|
passedColumns := false
|
||||||
|
passedFROM := false
|
||||||
|
passedTable := false
|
||||||
|
passedWHERE := false
|
||||||
|
passedConditionals := false
|
||||||
|
passedOrderByKeywords := false
|
||||||
|
passesOrderByColumns := false
|
||||||
|
```
|
||||||
|
|
||||||
|
The general philosophy for these flags is to name, and use, them in the context of what has
|
||||||
|
already been processed through the scan. Making naming and reading new flags trivial.
|
||||||
|
|
||||||
|
A `Select` object is shaped as the following:
|
||||||
|
```go
|
||||||
|
type Select struct {
|
||||||
|
Table string
|
||||||
|
Columns []string
|
||||||
|
Conditionals []Conditional
|
||||||
|
OrderBys []OrderBy
|
||||||
|
IsWildcard bool
|
||||||
|
IsDistinct bool
|
||||||
|
}
|
||||||
|
|
||||||
|
//dependency in query.go
|
||||||
|
type Conditional struct {
|
||||||
|
Key string
|
||||||
|
Operator string
|
||||||
|
Value string
|
||||||
|
DataType string
|
||||||
|
Extension string // AND, OR, etc
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
type OrderBy struct {
|
||||||
|
Key string
|
||||||
|
IsDescend bool // SQL queries with no ASC|DESC on their ORDER BY are ASC by default, hence why this bool for the opposite
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
## Improvement Possibilities
|
## Improvement Possibilities
|
||||||
|
|
||||||
- want to to cut down as many `scan`s as possible by injecting functional dependencies
|
- Maybe utilize the `lookBehindBuffer` more to cut down the number of state flags in the scans?
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user