doc: update readme with more hogh level details, linked to SQL token doc
This commit is contained in:
parent
e08c84de3a
commit
87bc13631a
101
README.md
101
README.md
@ -1,5 +1,9 @@
|
||||
# query-interpreter
|
||||
|
||||
**README LAST UPDATED: 04-15-25**
|
||||
|
||||
This project is under active development and is subject to change often and drastically as I am likely an idiot.
|
||||
|
||||
Core program to interpret query language strings into structured data, and back again.
|
||||
|
||||
## Data Structure Philosophy
|
||||
@ -27,54 +31,69 @@ new statements altogether.
|
||||
## SQL Tokens
|
||||
|
||||
We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings.
|
||||
Here are the general token types it defines:
|
||||
The general token types it defines can be found [here](/docs/SQL_Token_Types.md)
|
||||
|
||||
```go
|
||||
type TokenType int
|
||||
|
||||
const (
|
||||
ERROR TokenType = iota
|
||||
EOF
|
||||
SPACE // space or newline
|
||||
STRING // string literal
|
||||
INCOMPLETE_STRING // incomplete string literal so that we can obfuscate it, e.g. 'abc
|
||||
NUMBER // number literal
|
||||
IDENT // identifier
|
||||
QUOTED_IDENT // quoted identifier
|
||||
OPERATOR // operator
|
||||
WILDCARD // wildcard *
|
||||
COMMENT // comment
|
||||
MULTILINE_COMMENT // multiline comment
|
||||
PUNCTUATION // punctuation
|
||||
DOLLAR_QUOTED_FUNCTION // dollar quoted function
|
||||
DOLLAR_QUOTED_STRING // dollar quoted string
|
||||
POSITIONAL_PARAMETER // numbered parameter
|
||||
BIND_PARAMETER // bind parameter
|
||||
FUNCTION // function
|
||||
SYSTEM_VARIABLE // system variable
|
||||
UNKNOWN // unknown token
|
||||
COMMAND // SQL commands like SELECT, INSERT
|
||||
KEYWORD // Other SQL keywords
|
||||
JSON_OP // JSON operators
|
||||
BOOLEAN // boolean literal
|
||||
NULL // null literal
|
||||
PROC_INDICATOR // procedure indicator
|
||||
CTE_INDICATOR // CTE indicator
|
||||
ALIAS_INDICATOR // alias indicator
|
||||
)
|
||||
|
||||
```
|
||||
|
||||
These are an OK generalizer to start with when trying to parse out SQL, but can not be used
|
||||
without conditional logic that checks what the actual keywords are.
|
||||
without some extra conditional logic that checks what the actual values are.
|
||||
|
||||
Currently we scan through the strings to tokenize it. When steping through the tokens we try
|
||||
to determine the type of query we are working with. At that point we assume the over structure
|
||||
Currently we scan through the strings to tokenize it. When stepping through the tokens we try
|
||||
to determine the type of query we are working with. At that point we assume the over all structure
|
||||
of the rest of the of the statement to fit a particular format, then parse out the details of
|
||||
the statement into the struct correlating to its data type.
|
||||
|
||||
Checkout the function `ParseSelectStatement` from `q/select.go` as an example.
|
||||
## Scan State
|
||||
|
||||
As stated, we scan through the strings, processing each each chunk, delineated by spaces and
|
||||
punctuation, as a token. To properly interpret the tokens from their broad `token.Type`s, we
|
||||
have to keep state of what else we have processed so far.
|
||||
|
||||
This state is determined by a set off flags depending on query type.
|
||||
|
||||
For example, a Select query will have:
|
||||
```go
|
||||
passedSELECT := false
|
||||
passedColumns := false
|
||||
passedFROM := false
|
||||
passedTable := false
|
||||
passedWHERE := false
|
||||
passedConditionals := false
|
||||
passedOrderByKeywords := false
|
||||
passesOrderByColumns := false
|
||||
```
|
||||
|
||||
The general philosophy for these flags is to name, and use, them in the context of what has
|
||||
already been processed through the scan. Making naming and reading new flags trivial.
|
||||
|
||||
A `Select` object is shaped as the following:
|
||||
```go
|
||||
type Select struct {
|
||||
Table string
|
||||
Columns []string
|
||||
Conditionals []Conditional
|
||||
OrderBys []OrderBy
|
||||
IsWildcard bool
|
||||
IsDistinct bool
|
||||
}
|
||||
|
||||
//dependency in query.go
|
||||
type Conditional struct {
|
||||
Key string
|
||||
Operator string
|
||||
Value string
|
||||
DataType string
|
||||
Extension string // AND, OR, etc
|
||||
}
|
||||
|
||||
|
||||
type OrderBy struct {
|
||||
Key string
|
||||
IsDescend bool // SQL queries with no ASC|DESC on their ORDER BY are ASC by default, hence why this bool for the opposite
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Improvement Possibilities
|
||||
|
||||
- want to to cut down as many `scan`s as possible by injecting functional dependencies
|
||||
- Maybe utilize the `lookBehindBuffer` more to cut down the number of state flags in the scans?
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user