Compare commits
	
		
			No commits in common. "87bc13631aa8807a487c0a5222ee23bf3b30c3c3" and "8e07b6387733dd6bd21c7aba0bd76d405d6e2132" have entirely different histories.
		
	
	
		
			87bc13631a
			...
			8e07b63877
		
	
		
							
								
								
									
										101
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										101
									
								
								README.md
									
									
									
									
									
								
							@ -1,9 +1,5 @@
 | 
				
			|||||||
# query-interpreter
 | 
					# query-interpreter
 | 
				
			||||||
 | 
					
 | 
				
			||||||
**README LAST UPDATED: 04-15-25**
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
    This project is under active development and is subject to change often and drastically as I am likely an idiot.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Core program to interpret query language strings into structured data, and back again.
 | 
					Core program to interpret query language strings into structured data, and back again.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Data Structure Philosophy
 | 
					## Data Structure Philosophy
 | 
				
			||||||
@ -31,69 +27,54 @@ new statements altogether.
 | 
				
			|||||||
## SQL Tokens
 | 
					## SQL Tokens
 | 
				
			||||||
 | 
					
 | 
				
			||||||
We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings. 
 | 
					We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings. 
 | 
				
			||||||
The general token types it defines can be found [here](/docs/SQL_Token_Types.md)
 | 
					Here are the general token types it defines:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```go
 | 
				
			||||||
 | 
					type TokenType int
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					const (
 | 
				
			||||||
 | 
					 ERROR TokenType = iota
 | 
				
			||||||
 | 
					 EOF
 | 
				
			||||||
 | 
					 SPACE                  // space or newline
 | 
				
			||||||
 | 
					 STRING                 // string literal
 | 
				
			||||||
 | 
					 INCOMPLETE_STRING      // incomplete string literal so that we can obfuscate it, e.g. 'abc
 | 
				
			||||||
 | 
					 NUMBER                 // number literal
 | 
				
			||||||
 | 
					 IDENT                  // identifier
 | 
				
			||||||
 | 
					 QUOTED_IDENT           // quoted identifier
 | 
				
			||||||
 | 
					 OPERATOR               // operator
 | 
				
			||||||
 | 
					 WILDCARD               // wildcard *
 | 
				
			||||||
 | 
					 COMMENT                // comment
 | 
				
			||||||
 | 
					 MULTILINE_COMMENT      // multiline comment
 | 
				
			||||||
 | 
					 PUNCTUATION            // punctuation
 | 
				
			||||||
 | 
					 DOLLAR_QUOTED_FUNCTION // dollar quoted function
 | 
				
			||||||
 | 
					 DOLLAR_QUOTED_STRING   // dollar quoted string
 | 
				
			||||||
 | 
					 POSITIONAL_PARAMETER   // numbered parameter
 | 
				
			||||||
 | 
					 BIND_PARAMETER         // bind parameter
 | 
				
			||||||
 | 
					 FUNCTION               // function
 | 
				
			||||||
 | 
					 SYSTEM_VARIABLE        // system variable
 | 
				
			||||||
 | 
					 UNKNOWN                // unknown token
 | 
				
			||||||
 | 
					 COMMAND                // SQL commands like SELECT, INSERT
 | 
				
			||||||
 | 
					 KEYWORD                // Other SQL keywords
 | 
				
			||||||
 | 
					 JSON_OP                // JSON operators
 | 
				
			||||||
 | 
					 BOOLEAN                // boolean literal
 | 
				
			||||||
 | 
					 NULL                   // null literal
 | 
				
			||||||
 | 
					 PROC_INDICATOR         // procedure indicator
 | 
				
			||||||
 | 
					 CTE_INDICATOR          // CTE indicator
 | 
				
			||||||
 | 
					 ALIAS_INDICATOR        // alias indicator
 | 
				
			||||||
 | 
					)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
These are an OK generalizer to start with when trying to parse out SQL, but can not be used 
 | 
					These are an OK generalizer to start with when trying to parse out SQL, but can not be used 
 | 
				
			||||||
without some extra conditional logic that checks what the actual values are.
 | 
					without conditional logic that checks what the actual keywords are.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Currently we scan through the strings to tokenize it. When stepping through the tokens we try
 | 
					Currently we scan through the strings to tokenize it. When steping through the tokens we try
 | 
				
			||||||
to determine the type of query we are working with. At that point we assume the over all structure 
 | 
					to determine the type of query we are working with. At that point we assume the over structure 
 | 
				
			||||||
of the rest of the of the statement to fit a particular format, then parse out the details of 
 | 
					of the rest of the of the statement to fit a particular format, then parse out the details of 
 | 
				
			||||||
the statement into the struct correlating to its data type.
 | 
					the statement into the struct correlating to its data type.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Scan State
 | 
					Checkout the function `ParseSelectStatement` from `q/select.go` as an example.
 | 
				
			||||||
 | 
					 | 
				
			||||||
As stated, we scan through the strings, processing each each chunk, delineated by spaces and
 | 
					 | 
				
			||||||
punctuation, as a token. To properly interpret the tokens from their broad `token.Type`s, we 
 | 
					 | 
				
			||||||
have to keep state of what else we have processed so far. 
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
This state is determined by a set off flags depending on query type.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
For example, a Select query will have:
 | 
					 | 
				
			||||||
```go
 | 
					 | 
				
			||||||
	passedSELECT := false
 | 
					 | 
				
			||||||
	passedColumns := false
 | 
					 | 
				
			||||||
	passedFROM := false
 | 
					 | 
				
			||||||
	passedTable := false
 | 
					 | 
				
			||||||
	passedWHERE := false
 | 
					 | 
				
			||||||
	passedConditionals := false
 | 
					 | 
				
			||||||
	passedOrderByKeywords := false
 | 
					 | 
				
			||||||
	passesOrderByColumns := false
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
The general philosophy for these flags is to name, and use, them in the context of what has 
 | 
					 | 
				
			||||||
already been processed through the scan. Making naming and reading new flags trivial.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
A `Select` object is shaped as the following:
 | 
					 | 
				
			||||||
```go
 | 
					 | 
				
			||||||
type Select struct {
 | 
					 | 
				
			||||||
	Table        string
 | 
					 | 
				
			||||||
	Columns      []string
 | 
					 | 
				
			||||||
	Conditionals []Conditional
 | 
					 | 
				
			||||||
	OrderBys     []OrderBy
 | 
					 | 
				
			||||||
	IsWildcard   bool
 | 
					 | 
				
			||||||
	IsDistinct   bool
 | 
					 | 
				
			||||||
}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
//dependency in query.go
 | 
					 | 
				
			||||||
type Conditional struct {
 | 
					 | 
				
			||||||
	Key       string
 | 
					 | 
				
			||||||
	Operator  string
 | 
					 | 
				
			||||||
	Value     string
 | 
					 | 
				
			||||||
	DataType  string
 | 
					 | 
				
			||||||
	Extension string // AND, OR, etc
 | 
					 | 
				
			||||||
}
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
type OrderBy struct {
 | 
					 | 
				
			||||||
	Key       string
 | 
					 | 
				
			||||||
	IsDescend bool // SQL queries with no ASC|DESC on their ORDER BY are ASC by default, hence why this bool for the opposite
 | 
					 | 
				
			||||||
}
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Improvement Possibilities
 | 
					## Improvement Possibilities
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- Maybe utilize the `lookBehindBuffer` more to cut down the number of state flags in the scans?
 | 
					- want to to cut down as many `scan`s as possible by injecting functional dependencies
 | 
				
			||||||
 | 
					 | 
				
			||||||
 | 
				
			|||||||
@ -24,7 +24,7 @@ type Conditional struct {
 | 
				
			|||||||
	Key       string
 | 
						Key       string
 | 
				
			||||||
	Operator  string
 | 
						Operator  string
 | 
				
			||||||
	Value     string
 | 
						Value     string
 | 
				
			||||||
	DataType  string // TODO: not something we can parse from string, but find a way to determine this later
 | 
						DataType  string
 | 
				
			||||||
	Extension string // AND, OR, etc
 | 
						Extension string // AND, OR, etc
 | 
				
			||||||
}
 | 
					}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@ -52,7 +52,7 @@ func GetQueryTypeFromToken(token *sqllexer.Token) QueryType {
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
func IsCrudSqlStatement(token *sqllexer.Token) bool {
 | 
					func IsCrudSqlStatement(token *sqllexer.Token) bool {
 | 
				
			||||||
	queryType := GetQueryTypeFromToken(token)
 | 
						queryType := GetQueryTypeFromToken(token)
 | 
				
			||||||
	return (queryType > 0 && queryType <= 4)
 | 
						return (queryType > 0 && queryType <= 4) // TODO:  Update if QueryTypes Change
 | 
				
			||||||
}
 | 
					}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
func IsTokenBeginingOfStatement(currentToken *sqllexer.Token, previousToken *sqllexer.Token) bool {
 | 
					func IsTokenBeginingOfStatement(currentToken *sqllexer.Token, previousToken *sqllexer.Token) bool {
 | 
				
			||||||
@ -114,9 +114,10 @@ func ExtractSqlStatmentsFromString(sqlString string) []string {
 | 
				
			|||||||
			} else {
 | 
								} else {
 | 
				
			||||||
				continue
 | 
									continue
 | 
				
			||||||
			}
 | 
								}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
		}
 | 
							}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
		if !isBeginingFound && IsTokenBeginingOfStatement(token, &previousScannedToken) {
 | 
							if !isBeginingFound && IsTokenBeginingOfStatement(token, &previousScannedToken) { // TODO: add logic that checks if begining is already found, if so an error should happen before here
 | 
				
			||||||
			isBeginingFound = true
 | 
								isBeginingFound = true
 | 
				
			||||||
		} else if !isBeginingFound {
 | 
							} else if !isBeginingFound {
 | 
				
			||||||
			continue
 | 
								continue
 | 
				
			||||||
 | 
				
			|||||||
							
								
								
									
										11
									
								
								q/select.go
									
									
									
									
									
								
							
							
						
						
									
										11
									
								
								q/select.go
									
									
									
									
									
								
							@ -1,11 +1,13 @@
 | 
				
			|||||||
package q
 | 
					package q
 | 
				
			||||||
 | 
					
 | 
				
			||||||
import (
 | 
					import (
 | 
				
			||||||
 | 
						//	"fmt"
 | 
				
			||||||
	"strings"
 | 
						"strings"
 | 
				
			||||||
 | 
					
 | 
				
			||||||
	"github.com/DataDog/go-sqllexer"
 | 
						"github.com/DataDog/go-sqllexer"
 | 
				
			||||||
)
 | 
					)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					// 126 rich mar drive
 | 
				
			||||||
type Select struct {
 | 
					type Select struct {
 | 
				
			||||||
	Table        string
 | 
						Table        string
 | 
				
			||||||
	Columns      []string
 | 
						Columns      []string
 | 
				
			||||||
@ -58,6 +60,7 @@ func mutateSelectFromKeyword(query *Select, keyword string) {
 | 
				
			|||||||
	}
 | 
						}
 | 
				
			||||||
}
 | 
					}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					// TODO: make this an array of tokens instead
 | 
				
			||||||
func unshiftBuffer(buf *[10]sqllexer.Token, value sqllexer.Token) {
 | 
					func unshiftBuffer(buf *[10]sqllexer.Token, value sqllexer.Token) {
 | 
				
			||||||
	for i := 9; i >= 1; i-- {
 | 
						for i := 9; i >= 1; i-- {
 | 
				
			||||||
		buf[i] = buf[i-1]
 | 
							buf[i] = buf[i-1]
 | 
				
			||||||
@ -77,8 +80,9 @@ func ParseSelectStatement(sql string) Select {
 | 
				
			|||||||
	passedConditionals := false
 | 
						passedConditionals := false
 | 
				
			||||||
	passedOrderByKeywords := false
 | 
						passedOrderByKeywords := false
 | 
				
			||||||
	passesOrderByColumns := false
 | 
						passesOrderByColumns := false
 | 
				
			||||||
 | 
						//checkForOrderDirection := false
 | 
				
			||||||
 | 
					
 | 
				
			||||||
	lookBehindBuffer := [10]sqllexer.Token{}
 | 
						lookBehindBuffer := [10]sqllexer.Token{} // TODO: make this an array of tokens instead
 | 
				
			||||||
	var workingConditional = Conditional{}
 | 
						var workingConditional = Conditional{}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
	var columns []string
 | 
						var columns []string
 | 
				
			||||||
@ -120,8 +124,7 @@ func ParseSelectStatement(sql string) Select {
 | 
				
			|||||||
			}
 | 
								}
 | 
				
			||||||
		}
 | 
							}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
		// TODO: make sure to check for other keywords that are allowed
 | 
							if !passedFROM && strings.ToUpper(token.Value) == "FROM" { // TODO: make sure to check for other keywords that are allowed
 | 
				
			||||||
		if !passedFROM && strings.ToUpper(token.Value) == "FROM" {
 | 
					 | 
				
			||||||
			passedFROM = true
 | 
								passedFROM = true
 | 
				
			||||||
			continue
 | 
								continue
 | 
				
			||||||
		}
 | 
							}
 | 
				
			||||||
@ -146,7 +149,7 @@ func ParseSelectStatement(sql string) Select {
 | 
				
			|||||||
				workingConditional.Operator = token.Value
 | 
									workingConditional.Operator = token.Value
 | 
				
			||||||
			} else if token.Type == sqllexer.BOOLEAN || token.Type == sqllexer.NULL || token.Type == sqllexer.STRING || token.Type == sqllexer.NUMBER {
 | 
								} else if token.Type == sqllexer.BOOLEAN || token.Type == sqllexer.NULL || token.Type == sqllexer.STRING || token.Type == sqllexer.NUMBER {
 | 
				
			||||||
				workingConditional.Value = token.Value
 | 
									workingConditional.Value = token.Value
 | 
				
			||||||
			}
 | 
								} // TODO: add captire for data type
 | 
				
			||||||
 | 
					
 | 
				
			||||||
			if workingConditional.Key != "" && workingConditional.Operator != "" && workingConditional.Value != "" {
 | 
								if workingConditional.Key != "" && workingConditional.Operator != "" && workingConditional.Value != "" {
 | 
				
			||||||
				query.Conditionals = append(query.Conditionals, workingConditional)
 | 
									query.Conditionals = append(query.Conditionals, workingConditional)
 | 
				
			||||||
 | 
				
			|||||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user