query-interpreter/README.md

3.2 KiB

query-interpreter

README LAST UPDATED: 04-24-25

This project is under active development and is subject to change often and drastically as I am likely an idiot.

Core program to interpret query language strings into structured data, and back again.

Data Structure Philosophy

We are operating off of the philosophy that the first class data is SQL Statement stings.

From these strings we derive all structured data types to represent those SQL statements. Whether it be CRUD or schema operations.

Our all of these structs will have to implement the Query interface

type Query interface {
	GetFullSql() string
}

So ever struct we create from SQL will need to be able to provide a full and valid SQL statement of itself.

These structs are then where we are able to alter their fields programatically to create new statements altogether.

SQL Tokens

We are currently using DataDog's SQL Tokenizer sqllexer to scan through SQL strings. The general token types it defines can be found here

These are an OK generalizer to start with when trying to parse out SQL, but can not be used without some extra conditional logic that checks what the actual values are.

Currently we scan through the strings to tokenize it. When stepping through the tokens we try to determine the type of query we are working with. At that point we assume the over all structure of the rest of the of the statement to fit a particular format, then parse out the details of the statement into the struct correlating to its data type.

Scan State

As stated, we scan through the strings, processing each each chunk, delineated by spaces and punctuation, as a token. To properly interpret the tokens from their broad token.Types, we have to keep state of what else we have processed so far.

This state is determined by a set off flags depending on query type.

For example, a Select query will have:

	passedSELECT := false
	passedColumns := false
	passedFROM := false
	passedTable := false
	passedWHERE := false
	passedConditionals := false
	passedOrderByKeywords := false
	passesOrderByColumns := false

The general philosophy for these flags is to name, and use, them in the context of what has already been processed through the scan. Making naming and reading new flags trivial.

A Select object is shaped as the following:

type Select struct {
	Table        string
	Columns      []Column
	Conditionals []Conditional
	OrderBys     []OrderBy
	Joins        []Join
	IsWildcard   bool
	IsDistinct   bool
}

type Column struct {
	Name              string
	Alias             string
	AggregateFunction AggregateFunctionType
}

type AggregateFunctionType int
const (
	MIN AggregateFunctionType = iota + 1
	MAX
	COUNT
	SUM
	AVG
)

//dependency in query.go
type Conditional struct {
	Key       string
	Operator  string
	Value     string
	DataType  string
	Extension string // AND, OR, etc
}


type OrderBy struct {
	Key       string
	IsDescend bool // SQL queries with no ASC|DESC on their ORDER BY are ASC by default, hence why this bool for the opposite
}

Improvement Possibilities

  • Maybe utilize the lookBehindBuffer more to cut down the number of state flags in the scans?