Marc Linster has posted a blog Creating a Data Redaction Capability to Meet GDPR Requirements Using EDB Postgres to demonstrate data redaction on EDB Postgres Advanced Server 10, which has taken the approach to leverage the PostgreSQL search_path feature to direct privileged users to the raw unredacted data when they run a query, and to direct non-privileged users to a view that implements redaction logic.
EDB Postgres Advanced Server 11 comes with native data redaction capabilities that are much more concrete and safer than the redaction using search_path and views workaround.
For syntax and more detail, refer the EDB Postgres Advanced Server 11 documentation. Here are some highlights of this data redaction policy:
- Redaction policies allow a user to choose redaction behavior via redaction function.
- Users can be made exempt from all column redaction policies, which the table owner and superuser is by default.
- More than one redaction policy can be created on the same table, but a column can only be associated with one policy.
- Flexibility to choose when actual redaction should apply and exemptions on columns in the query via the scope and exception options.
Approach:
This blog will demonstrate data redaction using native data redaction policy features in EDB Postgres Advanced Server 11. For this, we will use the same example from the Marc Linster’s blog.
Step-by-step walkthrough:
- A sample data set with employee IDs, names, social security numbers, salary etc. is created in the table ‘employees’ in the mycompany database.
- A library of redaction functions for SSN, and salaries apply data type specific redaction techniques.
- A data redaction policy for ssn and salary column will be applied whenever user other than ‘privilegeduser’ tries to access the ‘employees’ table data
Scripts:
-- Create database
DROP DATABASE IF EXISTS mycompany;
CREATE DATABASE mycompany WITH OWNER = enterprisedb;
-- Connect to the new database
$ psql -d mycompany -U enterprisedb
psql (11.0.4, server 11.0.4)
Type "help" for help.
mycompany=#
-- Create table with employee information
CREATE TABLE employees (
id INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
name VARCHAR(40) NOT NULL,
SSN VARCHAR(11) NOT NULL,
salary MONEY);
-- Add sample data
INSERT INTO employees (name, ssn, salary)
VALUES ( 'Sally Sample', '020-78-9345', 51234.34),
( 'Jane Doe', '123-33-9345', 62500.00),
( 'Bill Foo', '123-89-9345', 45350);
-- Create privileged and non-privileged user and grant the necessary access.
CREATE ROLE privilegeduser LOGIN PASSWORD 'password';
GRANT ALL ON employees TO privilegeduser;
CREATE ROLE non_privilegeduser LOGIN PASSWORD 'password';
GRANT ALL ON employees TO non_privilegeduser;
-- Define redaction function for ssn column.
CREATE OR REPLACE FUNCTION redact_ssn (ssn varchar(11)) RETURNS varchar(11)
AS
/* replaces 020-12-9876 with xxx-xx-9876 */
$$ SELECT overlay (ssn placing 'xxx-xx' from 1); $$
LANGUAGE SQL SECURITY DEFINER;
-- Define redaction function for salary column.
CREATE OR REPLACE FUNCTION redact_salary (salary money) RETURNS money
AS
/* always returns 0 */
$$ SELECT 0::money; $$
LANGUAGE SQL SECURITY DEFINER;
-- Create data redaction policy on employee table to redact column data when current session user is not 'privilegeduser'. ADD COLUMN … USING syntax adds a column of the table to the data redaction policy and specifies a redaction function expression to mask that column data.
CREATE REDACTION POLICY emp_data_protect ON employees FOR (session_user <> 'privilegeduser')
ADD COLUMN ssn USING redact_ssn(ssn),
ADD COLUMN salary USING redact_salary(salary);
-- User can add more columns to this policy using the ALTER REDACTION POLICY command like this:
ALTER REDACTION POLICY emp_data_protect ON employees
ADD COLUMN <column_name> USING <redaction_function>
-- This policy can be seen in the table description
mycompany=# \d employees
Table "public.employees"
Column | Type | Collation | Nullable | Default
--------+-----------------------+-----------+----------+----------------------------------
id | integer | | not null | generated by default as identity
name | character varying(40) | | not null |
ssn | character varying(11) | | not null |
salary | money | | |
Indexes:
"employees_pkey" PRIMARY KEY, btree (id)
Redaction Policies:
REDACTION POLICY "emp_data_protect" FOR (SESSION_USER <> 'privilegeduser'::name) ENABLED
Number of redacted columns: 2 (Use \d+ to list them.)
-- By default table owner and super user can see unreacted data.
mycompany=# select tableowner from pg_tables where tablename = 'employees';
tableowner
--------------
enterprisedb
(1 row)
mycompany=# select * from employees;
id | name | ssn | salary
----+--------------+-------------+------------
1 | Sally Sample | 020-78-9345 | $51,234.34
2 | Jane Doe | 123-33-9345 | $62,500.00
3 | Bill Foo | 123-89-9345 | $45,350.00
(3 rows)
-- Also, privilegeduser can see unredacted data to whom we have exempted from the policy.
$ psql -d mycompany -U privilegeduser
psql (11.0.4, server 11.0.4)
Type "help" for help.
mycompany=> select * from employees;
id | name | ssn | salary
----+--------------+-------------+------------
1 | Sally Sample | 020-78-9345 | $51,234.34
2 | Jane Doe | 123-33-9345 | $62,500.00
3 | Bill Foo | 123-89-9345 | $45,350.00
(3 rows)
When a user other than privilegeduser tries to access the employee table will see redacted data for ssn and salary column.
$ psql -d mycompany -U non_privilegeduser
psql (11.0.4, server 11.0.4)
Type "help" for help.
mycompany=> select * from employees;
id | name | ssn | salary
----+--------------+-------------+--------
1 | Sally Sample | xxx-xx-9345 | $0.00
2 | Jane Doe | xxx-xx-9345 | $0.00
3 | Bill Foo | xxx-xx-9345 | $0.00
(3 rows)
Also, non_privilegeduser will not able to search on SSN.
mycompany=> select * from employees where ssn = '123-89-9345';
id | name | ssn | salary
----+------+-----+--------
(0 rows)
This happens because the column used in the WHERE clause has a data redaction policy. If we want to allow this query to work we need to set an exception where the actual column values should be used instead of the redacted output in equality operations. The default redaction exception is none.
-- Connect to table owner and alter the redaction option for SSN column
psql -d mycompany -U enterprisedb
psql (11.0.4, server 11.0.4)
Type "help" for help.
mycompany=# ALTER REDACTION POLICY emp_data_protect ON employees
MODIFY COLUMN ssn WITH OPTIONS (EXCEPTION equal);
ALTER REDACTION POLICY
-- Now connect to non_privilegeduser and search for ssn.
$ psql -d mycompany -U non_privilegeduser
psql (11.0.4, server 11.0.4)
Type "help" for help.
mycompany=> select * from employees where ssn = '123-89-9345';
id | name | ssn | salary
----+----------+-------------+--------
3 | Bill Foo | xxx-xx-9345 | $0.00
(1 row)
However, ALTER is not the only way to set the redaction option. You can specify at the time of policy creation as well, as follows:
CREATE REDACTION POLICY emp_data_protect ON employees FOR (session_user <> 'privilegeduser')
ADD COLUMN ssn USING redact_ssn(ssn) WITH OPTIONS (EXCEPTION equal),
ADD COLUMN salary USING redact_salary(salary) ;
Conclusion:
Data protection can be achieved using Postgres search_path, user defined functions and views but the
native data redaction policy of EDB Postgres Advanced Server 11 is much more flexible, easier to create and maintain.
Contributions:
My colleague Jeevan Chalke and I worked on this data redaction feature. Rushabh Lathia and Robert Haas reviewed it. My another colleague from EnterpriseDB, Prabhat Sahu, tested it.
This post benefited from helpful feedback from EnterpriseDB colleague Thom Brown.