187 lines
5.9 KiB
Markdown
187 lines
5.9 KiB
Markdown
# query-interpreter
|
|
|
|
**README LAST UPDATED: 04-24-25**
|
|
|
|
This project is under active development and is subject to change often and drastically as I am likely an idiot.
|
|
|
|
Core program to interpret query language strings into structured data, and back again.
|
|
|
|
## How to Use the Project
|
|
|
|
### Microservice for client applications
|
|
|
|
At least for now, this can be treated like a micro service. Very simply you query the endpoint(s)
|
|
with your SQL strings and retrieve the structured data back.
|
|
|
|
```
|
|
POST /query
|
|
body: {
|
|
sql: string
|
|
}
|
|
```
|
|
|
|
Be aware that the api is not currently converting enums to their string representations, so you might expect
|
|
JoinType to be `"INNER"`, bit it is returned as `0`. Refer to the [dto](q/dto.go) as a reference to what the
|
|
`string` values for these `enum`s would be.
|
|
|
|
> `iota` is incrementing for all values in the `enum`, and starts at `0` unless modified. `iota + 1` will start at `1`
|
|
|
|
Right now we are only parsing SELECT statements. If you try to do something else it will either error out
|
|
or hang. The HTTP response should timeout after 30 seconds.
|
|
|
|
### Development on core logic
|
|
|
|
This project is wored on via TDD methods, it is the only way to do so as the parsing of SQL is so janky. If
|
|
you are wanting to add a feature to the parsing, you need to first write a unit test.
|
|
|
|
Become familiar with [select_test](q/select_test.md) to see how we are doing it. In Brief:
|
|
|
|
We have the test struct where `input` is the entire SQL string you are testing, and expected is
|
|
the exact struct (of which ever query struct) you expect to see returned.
|
|
```go
|
|
type ParsingTest struct {
|
|
input string
|
|
expected Select
|
|
}
|
|
```
|
|
|
|
Add yours to the `var testSqlStatements []ParsingTest` of the file.
|
|
|
|
If you are adding a new field to the query's struct, or modifying any fields, make sure to add or update the
|
|
conditionals in teh `t.Run(testName, func(t *testing.T)` block.
|
|
|
|
#### Remember the TDD Process
|
|
- Write enough of a test to make sure it fails
|
|
- Write enough prod code to make sure it passes
|
|
- Repeat until finished developing the feature
|
|
|
|
### Starting the app
|
|
|
|
**Prerequisites:**
|
|
|
|
* Go installed (version X.Y or higher - check your code for specifics)
|
|
* `go mod tidy` to fetch dependencies
|
|
* `cp .env.example .env` to create your own .env file
|
|
|
|
**Running the App:**
|
|
|
|
1. `go run main.go` to start the server (PORT is determined by the .env file)
|
|
2. `go test ./q` to run test suite if developing features (add `-v` if you want a verbose output)
|
|
|
|
## Data Structure Philosophy
|
|
|
|
We are operating off of the philosophy that the first class data is SQL Statement stings.
|
|
|
|
From these strings we derive all structured data types to represent those SQL statements.
|
|
Whether it be CRUD or schema operations.
|
|
|
|
Our all of these structs will have to implement the `Query` interface
|
|
|
|
```go
|
|
type Query interface {
|
|
GetFullSql() string
|
|
}
|
|
```
|
|
|
|
So ever struct we create from SQL will need to be able to provide a full and valid SQL
|
|
statement of itself.
|
|
|
|
These structs are then where we are able to alter their fields programatically to create
|
|
new statements altogether.
|
|
|
|
|
|
## SQL Tokens
|
|
|
|
We are currently using DataDog's SQL Tokenizer `sqllexer` to scan through SQL strings.
|
|
The general token types it defines can be found [here](/docs/SQL_Token_Types.md)
|
|
|
|
|
|
These are an OK generalizer to start with when trying to parse out SQL, but can not be used
|
|
without some extra conditional logic that checks what the actual values are.
|
|
|
|
Currently we scan through the strings to tokenize it. When stepping through the tokens we try
|
|
to determine the type of query we are working with. At that point we assume the over all structure
|
|
of the rest of the of the statement to fit a particular format, then parse out the details of
|
|
the statement into the struct correlating to its data type.
|
|
|
|
## Scan State
|
|
|
|
As stated, we scan through the strings, processing each each chunk, delineated by spaces and
|
|
punctuation, as a token. To properly interpret the tokens from their broad `token.Type`s, we
|
|
have to keep state of what else we have processed so far.
|
|
|
|
This state is determined by a set off flags depending on query type.
|
|
|
|
For example, a Select query will have:
|
|
```go
|
|
passedSELECT := false
|
|
passedColumns := false
|
|
passedFROM := false
|
|
passedTable := false
|
|
passedWHERE := false
|
|
passedConditionals := false
|
|
passedOrderByKeywords := false
|
|
passesOrderByColumns := false
|
|
```
|
|
|
|
The general philosophy for these flags is to name, and use, them in the context of what has
|
|
already been processed through the scan. Making naming and reading new flags trivial.
|
|
|
|
A `Select` object is shaped as the following:
|
|
```go
|
|
type Select struct {
|
|
Table string `json:"table"`
|
|
Columns []Column `json:"columns"`
|
|
Conditionals []Conditional `json:"conditionals"`
|
|
OrderBys []OrderBy `json:"order_bys"`
|
|
Joins []Join `json:"joins"`
|
|
IsWildcard bool `json:"is_wildcard"`
|
|
IsDistinct bool `json:"is_distinct"`
|
|
}
|
|
|
|
type Column struct {
|
|
Name string `json:"name"`
|
|
Alias string `json:"alias"`
|
|
AggregateFunction AggregateFunctionType `json:"aggregate_function"` // Changed type name to match Go naming conventions
|
|
}
|
|
|
|
type Conditional struct {
|
|
Key string `json:"key"`
|
|
Operator string `json:"operator"`
|
|
Value string `json:"value"`
|
|
Extension string `json:"extension"`
|
|
}
|
|
|
|
type OrderBy struct {
|
|
Key string
|
|
IsDescend bool // SQL queries with no ASC|DESC on their ORDER BY are ASC by default, hence why this bool for the opposite
|
|
}
|
|
|
|
type Join struct {
|
|
Type JoinType `json:"type"`
|
|
Table Table `json:"table"`
|
|
Ons []Conditional `json:"ons"`
|
|
}
|
|
|
|
// Only used in Join.Table right now, but Select.Table will also use this soon
|
|
type Table struct {
|
|
Name string `json:"name"`
|
|
Alias string `json:"alias"`
|
|
}
|
|
|
|
type AggregateFunctionType int
|
|
const (
|
|
MIN AggregateFunctionType = iota + 1
|
|
MAX
|
|
COUNT
|
|
SUM
|
|
AVG
|
|
)
|
|
|
|
```
|
|
|
|
## Improvement Possibilities
|
|
|
|
- Maybe utilize the `lookBehindBuffer` more to cut down the number of state flags in the scans?
|
|
|