-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new ta tasks #976
base: main
Are you sure you want to change the base?
feat: new ta tasks #976
Conversation
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: tasks/upload.py
Did you find this useful? React with a 👍 or 👎 |
This PR includes changes to |
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
📢 Thoughts on this report? Let us know! |
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #976 +/- ##
==========================================
- Coverage 97.97% 97.89% -0.09%
==========================================
Files 444 449 +5
Lines 35855 36476 +621
==========================================
+ Hits 35129 35707 +578
- Misses 726 769 +43
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
❌ 1 Tests Failed:
View the top 1 failed tests by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
❌ 1 Tests Failed:
View the top 1 failed tests by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
✅ All tests successful. No failed tests were found. 📣 Thoughts on this report? Let Codecov know! | Powered by Codecov |
tasks/upload.py
Outdated
arguments_list=list(chunk), | ||
) | ||
for chunk in itertools.batched(argument_list, CHUNK_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we still want to run these in batches, or rather one upload per task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one upload per task seems reasonable now that we aren't writing to the db in the processor
|
||
def test_test_analytics(dbsession, mocker, celery_app): | ||
url = "literally/whatever" | ||
storage_service = get_appropriate_storage_service(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you want to use the mock storage provider for this?
mocker.patch.object(TAProcessorTask, "app", celery_app) | ||
mocker.patch.object(TAFinisherTask, "app", celery_app) | ||
|
||
hello = celery_app.register_task(ProcessFlakesTask()) | ||
_ = celery_app.tasks[hello.name] | ||
goodbye = celery_app.register_task(CacheTestRollupsTask()) | ||
_ = celery_app.tasks[goodbye.name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never seen this pattern, what does it do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without this, when the finisher would try to call those tasks, they weren't in the mocked celery app, so what i was trying to do here is add them to the mocked celery app
i replaced this with some code that is hopefully more clear
user-agent: | ||
- Default | ||
method: GET | ||
uri: https://api.github.com/repos/ThiagoCodecov/example-python/commits/abf6d4df662c47e32460020ab14abf9303581429 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rather mock away whatever call does this request, instead of relying on vcr?
tasks/ta_finisher.py
Outdated
for upload in uploads: | ||
repo_flag_ids = get_repo_flag_ids(db_session, repoid, upload.flag_names) | ||
if upload.state == "processed": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this loop has a couple of problems:
- you are querying all uploads from the DB, but only ever run the code on
processed
ones - you only append to
tests_to_write
and friends, but never clear those across uploads save_tests
and friends runs for all the uploads, together with never clearingtests_to_write
above means that you insert the same tests over and over again depending on how many total uploads you have- you unconditionally set
state = "finished"
for all the downloads, also ones that already have that state - the intermediate msgpack file is never cleared.
these 2 new tasks are meant to replace the test results processor and test results finisher tasks the difference between the tasks is that the new tasks: - use the new parse_raw_upload function provided by the test results parser library - instead of writing to the database in the processor, the finisher takes care of writing to the database - the processor writes the results of its parsing and the finisher pulls from that
the upload task will check the new ta tasks feature flag to determine whether to use the newly introduced ta processor and ta finisher tasks we also call the new ta processor and ta finisher tasks via a chord since we removed the concurrent db writes from the processors
also update test results parser version
we don't need to chunk them anymore and each upload can get its own processing task
i want to introduce the new finished state in the test results upload pipeline to do so safely i'm adding the new v2_processed and v2_finished states the reason for this is that the meaning of processed and v2_processed are different processed means that test instances are in the db and persisted but in the new pipeline v2_finished has that meaning and v2_processed just means that the intermediate results are in redis for now
fd1401b
to
1296951
Compare
@Swatinem sorry i got confused and rebased and force pushed but i really just added 5 new commits on top of the existing ones and didn't modify any of the existing ones |
this commit changes the TA upload states to have the v2_persisted state which represents the test run data being persisted to the db and the v2_finished state representing that the upload was taken into account when making the latest commment on the PR this also removes the v2_failed state since i want an upload to both be able to have valid test runs and have some failed parsing, the v2_failed state made it seem like it was either processed or failed when it could be both the failures related to an upload will instead be represented by upload errors
This PR includes changes to |
report__commit__repository__repoid=repo_id, | ||
report__commit__commitid=commit_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe commit_id
already uniquely identifies the report, so no need for an additional repository
join.
assert commit_report | ||
uploads = commit_report.uploads | ||
assert len(uploads) == 1 | ||
upload = dbsession.query(Upload).filter_by(report_id=commit_report.id).first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upload = dbsession.query(Upload).filter_by(report_id=commit_report.id).first() | |
upload = uploads[0] |
right?
here = Path(__file__) | ||
|
||
|
||
class TestUploadTestProcessorTask(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personal preference: IMO we should move away from class-based tests
test_flag_bridge_data: list[dict] = [] | ||
|
||
for upload in uploads.values(): | ||
redis_client = get_redis_connection() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this call out of the loop?
) | ||
|
||
if len(tests_to_write) > 0: | ||
save_tests(db_session, tests_to_write) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I commented on this already someplace else?
As you never clear the tests_to_write
, having multiple intermediate_result
s means you are duplicating these insert statements with increasingly more entries.
this PR creates new ta processor and finisher tasks and uses them behind a feature flag in the upload task for a smooth rollout