Hello! 👋🏼 This is a cookbook that helps you define schemas in a way that the Defog model can understand them better.

Our auto-generated schemas work well for most cases, but they can fail if you have complex or domain-specific databases. If this happens, manually editing the schema definitions to give Defog more context will help generate better queries. We have seen a 90+% reduction in query errors when customers define their schemas well!

Column definitions

When you write down what, exactly, is inside a column and how it is calculated - query quality improves significantly.

Defining terms better

Giving Defog more context for ambigious terms helps improve query quality. Case in point here. Without the additional context in the pricepaid column, our model would be unsure about whether pricepaid is the price paid per ticket, or the total price paid for the listing.

Similarly, without details about whether commission is a part of the price paid or not, we would have no idea about whether it is included in the pricepaid, or whether it is paid over and beyond the pricepaid.

<aside> 💡 When you give us more context about what your columns mean, query quality goes up significantly.

</aside>

Untitled

Defining data formats better

If you have enumerations (a column where the data type can only be one of 10 or so values), please list all of them in the column description. Similarly, if you a JSON column, it’s helpful to describe the format of the JSON in the column descriptions.

Handling enums

If you have a column that can only have one of a few unique values (up to 25), you can define them all in the column description. That way, we will be able to handle user queries much better!

To illustrate this, let’s consider a very simple example. This is the description of a table that has a project, start_date, end_date, and status.

Untitled

This is what the dataset for this table looks like. As you can see, the status can be either done, or not done

Untitled