-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter recognizers based on locale/country #1328
Comments
@omri374 Can I give it a shot ? |
Absolutely! |
@omri374 I am facing difficulties while setting this up locally on a Windows OS . The documentation is a bit unclear to me. Can you help ? |
Sure. Which documentation are you following? What issues are you facing? |
@omri374 This is the documentation I am following. https://microsoft.github.io/presidio/development/ Steps I followed :- Then it says 'installing dependencies' and it gets stuck there without even showing an error, nor does it proceed. |
@AmanSal1 I'm not sure why this is happening, but pipenv isn't mandatory. You can install the package locally (preferably in a virtual environment) using pip: |
@omri374 Okay, got it. I just want to confirm: Is the UI developed using Docker? And when I change the code base for some tweaks, do we rebuild the image, right? |
By UI do you mean the demo website? It is built using Docker, see code here: https://github.com/microsoft/presidio/blob/main/docs/samples/python/streamlit/index.md I suggest to start with the Python API, and then add this capability into the REST API later on. |
@omri374 Okay, I will look into that, but I've thought of a solution. First, we'll modify the structure of the recognizers_map in the RecognizerRegistry, and when a object of AnalyzerEngine is made we would have to add the country code parameter in the get_recognizers method of that class and get back what all recognizer have been initiated . Something Like this : recognizers_map = { Do you think this is the right approach? |
Thanks, yes that sounds like a good approach. Note that there will be some universal recognizers, like credit card or URL who don't belong to any country. |
Do we also need to add a country field to the |
es": { It will look like this for the common recognizers |
@omri374 Adding the country code directly to the EntityRecognizer class might not be the most feasible option because the country code is typically associated with specific recognizers rather than being a generic property of all recognizers. What do you think ? |
@omri374 Additionally I was also thinking that it would be more feasible if we declare country code parameter in the RecognizerRegistry it is a good approach as it make it central place for it but I guess it can also be done without declaring it in the RecognizerRegistry and directly be declared in the AnalyzerEngine. |
@omri374 When I follow the given steps in the above given url I get an error |
Thanks. I'll look into why the demo is not working. |
@omri374 Yes their is some problem with recognizer_register file I guess because if we dont use the the demo and use it externally then also the same error is shown . |
@AmanSal1 I couldn't reproduce this. Is your code completely in sync with the one in the main branch? |
@omri374 Yes you are now it works . from presidio_analyzer import AnalyzerEngine Is this a valid code ? |
Yes |
|
can you please share your TEST.py? |
Is your feature request related to a problem? Please describe.
The number of recognizers in Presidio are growing which is great, but it also means that recognizers are more likely to be irrelevant to some users. Running all recognizers means more processing time and more false positives.
Describe the solution you'd like
The ability to provide the
RecognizerRegistry
(orAnalyzerEngine
) with a country code to either include or filter will allow users to use only a subset of recognizers that is relevant to their tasksThe text was updated successfully, but these errors were encountered: