Data anonymizer from CSV/Database to CSV file. (more sources/outputs to come)
- Python 3.9
- poetry (optional)
- Clone the repository
- Install dependencies
poetry install
or
pip install -r requirements.txt
- Create a
config.yml
file with the configuration for source, output and rules to apply. (see Config) (example config) - Run:
python -m anonymize -c config.yml
Sources (see sources.py
)
List of supported databases.
source:
type: db
uri: postgres://postgres:pass@localhost:5432/postgres
table: mydata
source:
type: csv
path: /path/to/data.csv
separator: '|' # Optional, default is ','
Outputs (see outputs.py
)
output:
type: csv
path: /path/to/output.csv
separator: '|' # Optional, default is ','
Rules (see rules.py
)
- The rules will validate the column name and the method, and then apply the method to the column
- If the column is not found in the source, it will be ignored.
- The if the column is not found in rules list, it will be kept as is.
Available algorithms are the ones in hashlib
module.
rules:
- column: credit_card
method: hash
algorithm: md5
salt: my_very_secret_salt
Available types are: email
, firstname
, lastname
, fullname
.
rules:
- column: name
method: fake
faker_type: firstname
rules:
- column: email
method: mask_right
n_chars: 5
mask_char: x
rules:
- column: birthdate
method: mask_left
n_chars: 4
mask_char: "*"
The destroy name is inspired from postgresql_anonymizer
rules:
- column: email
method: destroy
replace_with: "SOME VALUE" # Optional, default is "CONFIDENTIAL"
Shuffle letters and numbers separately (example:
abc1.2!3
->skM4.9!0
)
rules:
- column: email
method: shuffle
- 🍴 Fork the repository
- ⬇️ Install dev dependencies:
poetry install --with=dev
orpip install -r requirements-dev.txt
- 🌳 Create a branch
git checkout -b feature/my-feature
- 🔧 Make your changes
- ✅ Run formatting, linting and tests
poe all
(seepyproject.toml
) - 🔃 Create a pull request
- Add database output
- Validation (especially for database sources)
- More rules (rounding, etc.)
- Destroy
- Shuffle