Compare commits

..

No commits in common. "87bc13631aa8807a487c0a5222ee23bf3b30c3c3" and "8e07b6387733dd6bd21c7aba0bd76d405d6e2132" have entirely different histories.

3 changed files with 52 additions and 67 deletions

101
README.md
View File

@ -1,9 +1,5 @@
# query-interpreter # query-interpreter
**README LAST UPDATED: 04-15-25**
This project is under active development and is subject to change often and drastically as I am likely an idiot.
Core program to interpret query language strings into structured data, and back again. Core program to interpret query language strings into structured data, and back again.
## Data Structure Philosophy ## Data Structure Philosophy
@ -31,69 +27,54 @@ new statements altogether.
## SQL Tokens ## SQL Tokens
We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings. We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings.
The general token types it defines can be found [here](/docs/SQL_Token_Types.md) Here are the general token types it defines:
```go
type TokenType int
const (
ERROR TokenType = iota
EOF
SPACE // space or newline
STRING // string literal
INCOMPLETE_STRING // incomplete string literal so that we can obfuscate it, e.g. 'abc
NUMBER // number literal
IDENT // identifier
QUOTED_IDENT // quoted identifier
OPERATOR // operator
WILDCARD // wildcard *
COMMENT // comment
MULTILINE_COMMENT // multiline comment
PUNCTUATION // punctuation
DOLLAR_QUOTED_FUNCTION // dollar quoted function
DOLLAR_QUOTED_STRING // dollar quoted string
POSITIONAL_PARAMETER // numbered parameter
BIND_PARAMETER // bind parameter
FUNCTION // function
SYSTEM_VARIABLE // system variable
UNKNOWN // unknown token
COMMAND // SQL commands like SELECT, INSERT
KEYWORD // Other SQL keywords
JSON_OP // JSON operators
BOOLEAN // boolean literal
NULL // null literal
PROC_INDICATOR // procedure indicator
CTE_INDICATOR // CTE indicator
ALIAS_INDICATOR // alias indicator
)
```
These are an OK generalizer to start with when trying to parse out SQL, but can not be used These are an OK generalizer to start with when trying to parse out SQL, but can not be used
without some extra conditional logic that checks what the actual values are. without conditional logic that checks what the actual keywords are.
Currently we scan through the strings to tokenize it. When stepping through the tokens we try Currently we scan through the strings to tokenize it. When steping through the tokens we try
to determine the type of query we are working with. At that point we assume the over all structure to determine the type of query we are working with. At that point we assume the over structure
of the rest of the of the statement to fit a particular format, then parse out the details of of the rest of the of the statement to fit a particular format, then parse out the details of
the statement into the struct correlating to its data type. the statement into the struct correlating to its data type.
## Scan State Checkout the function `ParseSelectStatement` from `q/select.go` as an example.
As stated, we scan through the strings, processing each each chunk, delineated by spaces and
punctuation, as a token. To properly interpret the tokens from their broad `token.Type`s, we
have to keep state of what else we have processed so far.
This state is determined by a set off flags depending on query type.
For example, a Select query will have:
```go
passedSELECT := false
passedColumns := false
passedFROM := false
passedTable := false
passedWHERE := false
passedConditionals := false
passedOrderByKeywords := false
passesOrderByColumns := false
```
The general philosophy for these flags is to name, and use, them in the context of what has
already been processed through the scan. Making naming and reading new flags trivial.
A `Select` object is shaped as the following:
```go
type Select struct {
Table string
Columns []string
Conditionals []Conditional
OrderBys []OrderBy
IsWildcard bool
IsDistinct bool
}
//dependency in query.go
type Conditional struct {
Key string
Operator string
Value string
DataType string
Extension string // AND, OR, etc
}
type OrderBy struct {
Key string
IsDescend bool // SQL queries with no ASC|DESC on their ORDER BY are ASC by default, hence why this bool for the opposite
}
```
## Improvement Possibilities ## Improvement Possibilities
- Maybe utilize the `lookBehindBuffer` more to cut down the number of state flags in the scans? - want to to cut down as many `scan`s as possible by injecting functional dependencies

View File

@ -24,7 +24,7 @@ type Conditional struct {
Key string Key string
Operator string Operator string
Value string Value string
DataType string // TODO: not something we can parse from string, but find a way to determine this later DataType string
Extension string // AND, OR, etc Extension string // AND, OR, etc
} }
@ -52,7 +52,7 @@ func GetQueryTypeFromToken(token *sqllexer.Token) QueryType {
func IsCrudSqlStatement(token *sqllexer.Token) bool { func IsCrudSqlStatement(token *sqllexer.Token) bool {
queryType := GetQueryTypeFromToken(token) queryType := GetQueryTypeFromToken(token)
return (queryType > 0 && queryType <= 4) return (queryType > 0 && queryType <= 4) // TODO: Update if QueryTypes Change
} }
func IsTokenBeginingOfStatement(currentToken *sqllexer.Token, previousToken *sqllexer.Token) bool { func IsTokenBeginingOfStatement(currentToken *sqllexer.Token, previousToken *sqllexer.Token) bool {
@ -114,9 +114,10 @@ func ExtractSqlStatmentsFromString(sqlString string) []string {
} else { } else {
continue continue
} }
} }
if !isBeginingFound && IsTokenBeginingOfStatement(token, &previousScannedToken) { if !isBeginingFound && IsTokenBeginingOfStatement(token, &previousScannedToken) { // TODO: add logic that checks if begining is already found, if so an error should happen before here
isBeginingFound = true isBeginingFound = true
} else if !isBeginingFound { } else if !isBeginingFound {
continue continue

View File

@ -1,11 +1,13 @@
package q package q
import ( import (
// "fmt"
"strings" "strings"
"github.com/DataDog/go-sqllexer" "github.com/DataDog/go-sqllexer"
) )
// 126 rich mar drive
type Select struct { type Select struct {
Table string Table string
Columns []string Columns []string
@ -58,6 +60,7 @@ func mutateSelectFromKeyword(query *Select, keyword string) {
} }
} }
// TODO: make this an array of tokens instead
func unshiftBuffer(buf *[10]sqllexer.Token, value sqllexer.Token) { func unshiftBuffer(buf *[10]sqllexer.Token, value sqllexer.Token) {
for i := 9; i >= 1; i-- { for i := 9; i >= 1; i-- {
buf[i] = buf[i-1] buf[i] = buf[i-1]
@ -77,8 +80,9 @@ func ParseSelectStatement(sql string) Select {
passedConditionals := false passedConditionals := false
passedOrderByKeywords := false passedOrderByKeywords := false
passesOrderByColumns := false passesOrderByColumns := false
//checkForOrderDirection := false
lookBehindBuffer := [10]sqllexer.Token{} lookBehindBuffer := [10]sqllexer.Token{} // TODO: make this an array of tokens instead
var workingConditional = Conditional{} var workingConditional = Conditional{}
var columns []string var columns []string
@ -120,8 +124,7 @@ func ParseSelectStatement(sql string) Select {
} }
} }
// TODO: make sure to check for other keywords that are allowed if !passedFROM && strings.ToUpper(token.Value) == "FROM" { // TODO: make sure to check for other keywords that are allowed
if !passedFROM && strings.ToUpper(token.Value) == "FROM" {
passedFROM = true passedFROM = true
continue continue
} }
@ -146,7 +149,7 @@ func ParseSelectStatement(sql string) Select {
workingConditional.Operator = token.Value workingConditional.Operator = token.Value
} else if token.Type == sqllexer.BOOLEAN || token.Type == sqllexer.NULL || token.Type == sqllexer.STRING || token.Type == sqllexer.NUMBER { } else if token.Type == sqllexer.BOOLEAN || token.Type == sqllexer.NULL || token.Type == sqllexer.STRING || token.Type == sqllexer.NUMBER {
workingConditional.Value = token.Value workingConditional.Value = token.Value
} } // TODO: add captire for data type
if workingConditional.Key != "" && workingConditional.Operator != "" && workingConditional.Value != "" { if workingConditional.Key != "" && workingConditional.Operator != "" && workingConditional.Value != "" {
query.Conditionals = append(query.Conditionals, workingConditional) query.Conditionals = append(query.Conditionals, workingConditional)