diff --git a/README.md b/README.md index bb825ab..a5ce4d3 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,9 @@ # query-interpreter +**README LAST UPDATED: 04-15-25** + + This project is under active development and is subject to change often and drastically as I am likely an idiot. + Core program to interpret query language strings into structured data, and back again. ## Data Structure Philosophy @@ -27,54 +31,69 @@ new statements altogether. ## SQL Tokens We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings. -Here are the general token types it defines: +The general token types it defines can be found [here](/docs/SQL_Token_Types.md) -```go -type TokenType int - -const ( - ERROR TokenType = iota - EOF - SPACE // space or newline - STRING // string literal - INCOMPLETE_STRING // incomplete string literal so that we can obfuscate it, e.g. 'abc - NUMBER // number literal - IDENT // identifier - QUOTED_IDENT // quoted identifier - OPERATOR // operator - WILDCARD // wildcard * - COMMENT // comment - MULTILINE_COMMENT // multiline comment - PUNCTUATION // punctuation - DOLLAR_QUOTED_FUNCTION // dollar quoted function - DOLLAR_QUOTED_STRING // dollar quoted string - POSITIONAL_PARAMETER // numbered parameter - BIND_PARAMETER // bind parameter - FUNCTION // function - SYSTEM_VARIABLE // system variable - UNKNOWN // unknown token - COMMAND // SQL commands like SELECT, INSERT - KEYWORD // Other SQL keywords - JSON_OP // JSON operators - BOOLEAN // boolean literal - NULL // null literal - PROC_INDICATOR // procedure indicator - CTE_INDICATOR // CTE indicator - ALIAS_INDICATOR // alias indicator -) - -``` These are an OK generalizer to start with when trying to parse out SQL, but can not be used -without conditional logic that checks what the actual keywords are. +without some extra conditional logic that checks what the actual values are. -Currently we scan through the strings to tokenize it. When steping through the tokens we try -to determine the type of query we are working with. At that point we assume the over structure +Currently we scan through the strings to tokenize it. When stepping through the tokens we try +to determine the type of query we are working with. At that point we assume the over all structure of the rest of the of the statement to fit a particular format, then parse out the details of the statement into the struct correlating to its data type. -Checkout the function `ParseSelectStatement` from `q/select.go` as an example. +## Scan State + +As stated, we scan through the strings, processing each each chunk, delineated by spaces and +punctuation, as a token. To properly interpret the tokens from their broad `token.Type`s, we +have to keep state of what else we have processed so far. + +This state is determined by a set off flags depending on query type. + +For example, a Select query will have: +```go + passedSELECT := false + passedColumns := false + passedFROM := false + passedTable := false + passedWHERE := false + passedConditionals := false + passedOrderByKeywords := false + passesOrderByColumns := false +``` + +The general philosophy for these flags is to name, and use, them in the context of what has +already been processed through the scan. Making naming and reading new flags trivial. + +A `Select` object is shaped as the following: +```go +type Select struct { + Table string + Columns []string + Conditionals []Conditional + OrderBys []OrderBy + IsWildcard bool + IsDistinct bool +} + +//dependency in query.go +type Conditional struct { + Key string + Operator string + Value string + DataType string + Extension string // AND, OR, etc +} + + +type OrderBy struct { + Key string + IsDescend bool // SQL queries with no ASC|DESC on their ORDER BY are ASC by default, hence why this bool for the opposite +} +``` + ## Improvement Possibilities -- want to to cut down as many `scan`s as possible by injecting functional dependencies +- Maybe utilize the `lookBehindBuffer` more to cut down the number of state flags in the scans? +