What your database schema reveals and how to protect it

4 min read February 15, 2024

When you connect a database to any external tool, the tool typically reads your schema — the list of tables, columns, data types, and relationships. This is necessary for the tool to generate accurate queries.

What’s less obvious is how much information your schema communicates, and to whom.

What schema access reveals

A database schema is more revealing than it appears at first glance:

Business logic

Table names describe what your business does. Column names describe how you do it. A schema with tables like subscription_upgrades, churn_risk_scores, referral_credits, and failed_payment_retries communicates your product’s internal mechanics — pricing strategies, retention tactics, fraud signals.

A competitor or an attacker who obtains your schema doesn’t need to read a single row of data to learn a lot about how your business operates.

Data that exists, even without reading it

The existence of a column tells you something. A users table with a stripe_customer_id column tells you the company uses Stripe. A column named ssn or national_id in a customers table tells you the company stores sensitive identity documents. A raw_card_data column (hopefully nonexistent, but it happens) is a compliance disaster waiting to be discovered.

Schema auditing is part of how security researchers and attackers assess the value of a target.

Relationships and access patterns

Foreign key relationships and table structures reveal how data is connected. orders.customer_id → customers.id, sessions.user_id → users.id — these tell you how records relate, which helps construct queries that extract meaningful data once read access is obtained.

Where schema goes when you connect a tool

When ByeSQL (or any schema-aware tool) connects to your database, it reads the schema to provide the language model with context. This context is sent to the AI model as part of the prompt.

You should understand what that means:

  • The schema is transmitted to the tool’s backend over an encrypted connection (TLS)
  • It may be included in LLM API calls as context
  • It should not be logged in plain text, stored persistently, or shared across accounts

When evaluating any tool that reads your schema, ask: where does the schema go, who can see it, and is it retained after the session ends?

Practical steps to limit schema exposure

1. Connect with a scoped user

The simplest control: connect with a database user that only has access to the tables you intend to query. If the user can only see orders, products, and customers, then only that portion of the schema is visible to the connected tool.

See the guides on limiting access to specific tables for PostgreSQL and MySQL.

2. Use views to expose sanitised versions of tables

If you have a table that is safe for analytics but contains some sensitive columns, create a view that exposes only the columns you want:

CREATE VIEW analytics.orders AS
  SELECT id, created_at, status, product_id, quantity, region
  FROM public.orders;
  -- omits: customer_email, billing_address, payment_method_id

Grant the read-only user access to the view, not the underlying table. The tool sees only the columns in the view.

3. Use a dedicated analytics schema

Put all tables and views intended for external tools in a separate schema — analytics, reporting, or similar. Grant the read-only user access to only that schema.

This makes the boundary explicit and auditable. Any table added to the analytics schema is intentionally exposed. Anything outside it is not.

4. Avoid names that reveal sensitive intent

This is a lower-priority concern compared to access controls, but it’s worth being aware of: column names like credit_card_number, plain_text_token, or temp_password are red flags in a schema regardless of what they actually contain.

Use names that are descriptive but don’t advertise the sensitivity of the data. payment_method_id is better than stripe_card_token; the former describes what it is, the latter describes where it comes from and confirms a specific integration.

5. Audit who has schema access

In PostgreSQL:

SELECT grantee, table_schema, table_name, privilege_type
FROM information_schema.role_table_grants
WHERE privilege_type = 'SELECT'
ORDER BY table_schema, table_name;

Review this regularly. Revoke access from users and applications that no longer need it.

What you can’t hide

A schema that accurately represents your data will reveal the shape of your data. That’s unavoidable. The goal isn’t to obscure everything — it’s to be deliberate about what is exposed and to whom.

A well-scoped read-only user with access only to the tables relevant to analytics is the right baseline. Views for column-level filtering and a dedicated analytics schema are the next level of control if you need it.

The key question: if your schema were fully public tomorrow, what would it reveal? The answer tells you what to protect.