#589 : Improvements to RDS CI transaction isolation and random name generation
General MR
Summary
Transaction Isolation Improvements
The current masterquery
method (and psycopg2/psql calls) use autocommit isolation level on transactions and execute each SQL statement over a new connection. We believe this may be causing (or at least heavily contributing to) some intermittent failures that we see on pipelines where the CI RDS DB is a dependency. This moves towards ISOLAITON_LEVEL_READ_COMMITTED, and blocks everything (except the CREATE DATABASE
call) into a single explicit transaction, which will probably solve anything transaction-related going on here.
The guts of AUTOCOMMIT vs READ_COMMITTED
Basically AUTOCOMMIT doesn't necessarily guarantee what order things happen in, especially when there are concurrent operations occurring (like say a clean install
and upgrade
pipeline starting at pretty much the same time). READ_COMMITTED forces explicit transaction management and we get what we expect when we commit it. Our use case doesn't seem like one that would suffer from concurrency problems, but our last call modifies the PUBLIC
schema, which is highly shared by everything in the postgres DB. And that's where our failures are occurring.
1. ISOLATION_LEVEL_AUTOCOMMIT:
-
Behavior: Each individual query executes within its own implicit transaction. When a query completes successfully, the transaction automatically commits (saves changes). There's no need to explicitly call
commit()
. -
Concurrency: There's less control over how queries interact with each other concurrently. Multiple connections might modify the same data simultaneously, potentially leading to issues like "dirty reads," "non-repeatable reads," and "phantom reads."
- Dirty Read: Reading data that has been modified by another transaction but hasn't yet been committed.
- Non-Repeatable Read: Reading the same data twice within the same transaction, but getting different results because another transaction has modified it in between.
- Phantom Read: Seeing different sets of rows when querying for the same data multiple times within a transaction, due to other transactions inserting or deleting rows.
2. ISOLATION_LEVEL_READ_COMMITTED:
-
Behavior: Transactions are explicitly managed. You use
begin()
to start a transaction, andcommit()
to save changes. - Concurrency: Offers better control over concurrency by ensuring that each transaction reads data that has already been committed by other transactions. This helps prevent dirty reads but not necessarily non-repeatable reads or phantom reads.
Randomization : Use /dev/urandom instead of $RANDOM
Currently the username and database name for RDS databases are generated using BASH $RANDOM
. This is efficient and convenient but not a very high quality random number. In our pipelines, we frequently have multiple containers running on the same host starting similar processes at the same time; there is some concern that using a low quality PRNG will lead to name collisions on RDS user and database names, leading to transient runtime issues in the creation process.
This MR moves the process to use /dev/urandom
instead, which should be a better source of randomness for names. This also increases the number of random characters used in the name generation process from 6 to 33. These two combined should eliminate the potential of collisions.
Relevant logs/screenshots
BATS tests pass. I didn't write a new test for the concurrency problem - guaranteeing that we can reproduce a concurrency test in a bash testing framework might be challenging. I'm willing to try if anyone insists.
$ BATS_RDS_USE_DOCKER=true bats ./ci_rds_test.sh
ci_rds_test.sh
✓ rds_requested enabled
✓ rds_new_databasename format
✓ rds_newdb internal psql
✓ rds_requested disabled
✓ rds_requested malformed
✓ rds_mapvalues complete
✓ rds_mapvalues complete arrays
✓ rds_mapvalues incomplete map definition
✓ rds_mapvalues missing files
✓ rds_create_multiple
✓ rds_create notenabled
✓ rds_create creator fails
✓ rds_create creator succeeds
✓ rds_delete internal
✓ rds_purge internal
✓ rds_values_merge fails with missing files
✓ rds_values_merge fails with invalid yaml
✓ rds_values_merge succeeds with valid yaml files
✓ rds_get_accessor_role sets valid credentials
✓ rds_get_accessor_role only assumes the role once within the lifetime
20 tests, 0 failures
Linked Issue
Closes #589 (closed)
Upgrade Notices
N/A