Compare commits
No commits in common. "87bc13631aa8807a487c0a5222ee23bf3b30c3c3" and "8e07b6387733dd6bd21c7aba0bd76d405d6e2132" have entirely different histories.
87bc13631a
...
8e07b63877
101
README.md
101
README.md
@ -1,9 +1,5 @@
|
|||||||
# query-interpreter
|
# query-interpreter
|
||||||
|
|
||||||
**README LAST UPDATED: 04-15-25**
|
|
||||||
|
|
||||||
This project is under active development and is subject to change often and drastically as I am likely an idiot.
|
|
||||||
|
|
||||||
Core program to interpret query language strings into structured data, and back again.
|
Core program to interpret query language strings into structured data, and back again.
|
||||||
|
|
||||||
## Data Structure Philosophy
|
## Data Structure Philosophy
|
||||||
@ -31,69 +27,54 @@ new statements altogether.
|
|||||||
## SQL Tokens
|
## SQL Tokens
|
||||||
|
|
||||||
We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings.
|
We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings.
|
||||||
The general token types it defines can be found [here](/docs/SQL_Token_Types.md)
|
Here are the general token types it defines:
|
||||||
|
|
||||||
|
```go
|
||||||
|
type TokenType int
|
||||||
|
|
||||||
|
const (
|
||||||
|
ERROR TokenType = iota
|
||||||
|
EOF
|
||||||
|
SPACE // space or newline
|
||||||
|
STRING // string literal
|
||||||
|
INCOMPLETE_STRING // incomplete string literal so that we can obfuscate it, e.g. 'abc
|
||||||
|
NUMBER // number literal
|
||||||
|
IDENT // identifier
|
||||||
|
QUOTED_IDENT // quoted identifier
|
||||||
|
OPERATOR // operator
|
||||||
|
WILDCARD // wildcard *
|
||||||
|
COMMENT // comment
|
||||||
|
MULTILINE_COMMENT // multiline comment
|
||||||
|
PUNCTUATION // punctuation
|
||||||
|
DOLLAR_QUOTED_FUNCTION // dollar quoted function
|
||||||
|
DOLLAR_QUOTED_STRING // dollar quoted string
|
||||||
|
POSITIONAL_PARAMETER // numbered parameter
|
||||||
|
BIND_PARAMETER // bind parameter
|
||||||
|
FUNCTION // function
|
||||||
|
SYSTEM_VARIABLE // system variable
|
||||||
|
UNKNOWN // unknown token
|
||||||
|
COMMAND // SQL commands like SELECT, INSERT
|
||||||
|
KEYWORD // Other SQL keywords
|
||||||
|
JSON_OP // JSON operators
|
||||||
|
BOOLEAN // boolean literal
|
||||||
|
NULL // null literal
|
||||||
|
PROC_INDICATOR // procedure indicator
|
||||||
|
CTE_INDICATOR // CTE indicator
|
||||||
|
ALIAS_INDICATOR // alias indicator
|
||||||
|
)
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
These are an OK generalizer to start with when trying to parse out SQL, but can not be used
|
These are an OK generalizer to start with when trying to parse out SQL, but can not be used
|
||||||
without some extra conditional logic that checks what the actual values are.
|
without conditional logic that checks what the actual keywords are.
|
||||||
|
|
||||||
Currently we scan through the strings to tokenize it. When stepping through the tokens we try
|
Currently we scan through the strings to tokenize it. When steping through the tokens we try
|
||||||
to determine the type of query we are working with. At that point we assume the over all structure
|
to determine the type of query we are working with. At that point we assume the over structure
|
||||||
of the rest of the of the statement to fit a particular format, then parse out the details of
|
of the rest of the of the statement to fit a particular format, then parse out the details of
|
||||||
the statement into the struct correlating to its data type.
|
the statement into the struct correlating to its data type.
|
||||||
|
|
||||||
## Scan State
|
Checkout the function `ParseSelectStatement` from `q/select.go` as an example.
|
||||||
|
|
||||||
As stated, we scan through the strings, processing each each chunk, delineated by spaces and
|
|
||||||
punctuation, as a token. To properly interpret the tokens from their broad `token.Type`s, we
|
|
||||||
have to keep state of what else we have processed so far.
|
|
||||||
|
|
||||||
This state is determined by a set off flags depending on query type.
|
|
||||||
|
|
||||||
For example, a Select query will have:
|
|
||||||
```go
|
|
||||||
passedSELECT := false
|
|
||||||
passedColumns := false
|
|
||||||
passedFROM := false
|
|
||||||
passedTable := false
|
|
||||||
passedWHERE := false
|
|
||||||
passedConditionals := false
|
|
||||||
passedOrderByKeywords := false
|
|
||||||
passesOrderByColumns := false
|
|
||||||
```
|
|
||||||
|
|
||||||
The general philosophy for these flags is to name, and use, them in the context of what has
|
|
||||||
already been processed through the scan. Making naming and reading new flags trivial.
|
|
||||||
|
|
||||||
A `Select` object is shaped as the following:
|
|
||||||
```go
|
|
||||||
type Select struct {
|
|
||||||
Table string
|
|
||||||
Columns []string
|
|
||||||
Conditionals []Conditional
|
|
||||||
OrderBys []OrderBy
|
|
||||||
IsWildcard bool
|
|
||||||
IsDistinct bool
|
|
||||||
}
|
|
||||||
|
|
||||||
//dependency in query.go
|
|
||||||
type Conditional struct {
|
|
||||||
Key string
|
|
||||||
Operator string
|
|
||||||
Value string
|
|
||||||
DataType string
|
|
||||||
Extension string // AND, OR, etc
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
type OrderBy struct {
|
|
||||||
Key string
|
|
||||||
IsDescend bool // SQL queries with no ASC|DESC on their ORDER BY are ASC by default, hence why this bool for the opposite
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
## Improvement Possibilities
|
## Improvement Possibilities
|
||||||
|
|
||||||
- Maybe utilize the `lookBehindBuffer` more to cut down the number of state flags in the scans?
|
- want to to cut down as many `scan`s as possible by injecting functional dependencies
|
||||||
|
|
||||||
|
@ -24,7 +24,7 @@ type Conditional struct {
|
|||||||
Key string
|
Key string
|
||||||
Operator string
|
Operator string
|
||||||
Value string
|
Value string
|
||||||
DataType string // TODO: not something we can parse from string, but find a way to determine this later
|
DataType string
|
||||||
Extension string // AND, OR, etc
|
Extension string // AND, OR, etc
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -52,7 +52,7 @@ func GetQueryTypeFromToken(token *sqllexer.Token) QueryType {
|
|||||||
|
|
||||||
func IsCrudSqlStatement(token *sqllexer.Token) bool {
|
func IsCrudSqlStatement(token *sqllexer.Token) bool {
|
||||||
queryType := GetQueryTypeFromToken(token)
|
queryType := GetQueryTypeFromToken(token)
|
||||||
return (queryType > 0 && queryType <= 4)
|
return (queryType > 0 && queryType <= 4) // TODO: Update if QueryTypes Change
|
||||||
}
|
}
|
||||||
|
|
||||||
func IsTokenBeginingOfStatement(currentToken *sqllexer.Token, previousToken *sqllexer.Token) bool {
|
func IsTokenBeginingOfStatement(currentToken *sqllexer.Token, previousToken *sqllexer.Token) bool {
|
||||||
@ -114,9 +114,10 @@ func ExtractSqlStatmentsFromString(sqlString string) []string {
|
|||||||
} else {
|
} else {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if !isBeginingFound && IsTokenBeginingOfStatement(token, &previousScannedToken) {
|
if !isBeginingFound && IsTokenBeginingOfStatement(token, &previousScannedToken) { // TODO: add logic that checks if begining is already found, if so an error should happen before here
|
||||||
isBeginingFound = true
|
isBeginingFound = true
|
||||||
} else if !isBeginingFound {
|
} else if !isBeginingFound {
|
||||||
continue
|
continue
|
||||||
|
11
q/select.go
11
q/select.go
@ -1,11 +1,13 @@
|
|||||||
package q
|
package q
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
// "fmt"
|
||||||
"strings"
|
"strings"
|
||||||
|
|
||||||
"github.com/DataDog/go-sqllexer"
|
"github.com/DataDog/go-sqllexer"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
// 126 rich mar drive
|
||||||
type Select struct {
|
type Select struct {
|
||||||
Table string
|
Table string
|
||||||
Columns []string
|
Columns []string
|
||||||
@ -58,6 +60,7 @@ func mutateSelectFromKeyword(query *Select, keyword string) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// TODO: make this an array of tokens instead
|
||||||
func unshiftBuffer(buf *[10]sqllexer.Token, value sqllexer.Token) {
|
func unshiftBuffer(buf *[10]sqllexer.Token, value sqllexer.Token) {
|
||||||
for i := 9; i >= 1; i-- {
|
for i := 9; i >= 1; i-- {
|
||||||
buf[i] = buf[i-1]
|
buf[i] = buf[i-1]
|
||||||
@ -77,8 +80,9 @@ func ParseSelectStatement(sql string) Select {
|
|||||||
passedConditionals := false
|
passedConditionals := false
|
||||||
passedOrderByKeywords := false
|
passedOrderByKeywords := false
|
||||||
passesOrderByColumns := false
|
passesOrderByColumns := false
|
||||||
|
//checkForOrderDirection := false
|
||||||
|
|
||||||
lookBehindBuffer := [10]sqllexer.Token{}
|
lookBehindBuffer := [10]sqllexer.Token{} // TODO: make this an array of tokens instead
|
||||||
var workingConditional = Conditional{}
|
var workingConditional = Conditional{}
|
||||||
|
|
||||||
var columns []string
|
var columns []string
|
||||||
@ -120,8 +124,7 @@ func ParseSelectStatement(sql string) Select {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// TODO: make sure to check for other keywords that are allowed
|
if !passedFROM && strings.ToUpper(token.Value) == "FROM" { // TODO: make sure to check for other keywords that are allowed
|
||||||
if !passedFROM && strings.ToUpper(token.Value) == "FROM" {
|
|
||||||
passedFROM = true
|
passedFROM = true
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
@ -146,7 +149,7 @@ func ParseSelectStatement(sql string) Select {
|
|||||||
workingConditional.Operator = token.Value
|
workingConditional.Operator = token.Value
|
||||||
} else if token.Type == sqllexer.BOOLEAN || token.Type == sqllexer.NULL || token.Type == sqllexer.STRING || token.Type == sqllexer.NUMBER {
|
} else if token.Type == sqllexer.BOOLEAN || token.Type == sqllexer.NULL || token.Type == sqllexer.STRING || token.Type == sqllexer.NUMBER {
|
||||||
workingConditional.Value = token.Value
|
workingConditional.Value = token.Value
|
||||||
}
|
} // TODO: add captire for data type
|
||||||
|
|
||||||
if workingConditional.Key != "" && workingConditional.Operator != "" && workingConditional.Value != "" {
|
if workingConditional.Key != "" && workingConditional.Operator != "" && workingConditional.Value != "" {
|
||||||
query.Conditionals = append(query.Conditionals, workingConditional)
|
query.Conditionals = append(query.Conditionals, workingConditional)
|
||||||
|
Loading…
x
Reference in New Issue
Block a user