Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add post about the pipeline approach #88

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

avallecam
Copy link
Member

@avallecam avallecam commented Jul 27, 2023

I'm locating this post as an entry in the "Learn" tab given its relationship with training materials. However, as mentioned at the end, we are already using it for {episoap} and package design documents. I also included some definitions shared in a recent gh-discussion post.

I added as coauthors to @CarmenTamayo @annacarnegie @sbfnk @adamkucharski @rozeggo. Let me know if this credits the team members who have contributed to this content.

Let me know your edit suggestions and questions regarding the content.

For edit suggestions, you can go to the "Files changed" tab (the fourth in the top row above this text box) to read the content as plain text and add your comments by clicking on the [+] on each line. Let me know if you prefer a different reading and editing format.

@avallecam avallecam requested a review from Bisaloo July 27, 2023 18:13
@avallecam
Copy link
Member Author

Here is a full screenshot of the post:
image

replace integrate with connect, refer to a graph as a set of connected or related elements. homogenize first sentence of each pipeline. replace question at the end.
@pratikunterwegs
Copy link
Contributor

Thanks for writing this @avallecam, looks really useful - I can read through and add some comments if that's useful? I see Hugo is already assigned though so happy to wait or let him do so instead

@avallecam
Copy link
Member Author

avallecam commented Jul 28, 2023

Thanks for writing this @avallecam, looks really useful - I can read through and add some comments if that's useful? I see Hugo is already assigned though so happy to wait or let him do so instead

Go ahead @pratikunterwegs. I am comfortable with having comments from everyone at the same time. Thank you!

I just add one additional commit d2bd66d, so slightly different from the screenshot above, but I'm already done.

@pratikunterwegs
Copy link
Contributor

Great thanks, will get some thoughts in by later today

Copy link
Member

@Bisaloo Bisaloo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks great! Could you try to add more detailed alt texts please? Imagine you cannot see the image but still would like to know the information conveyed in the image.

You can find many resources on how to write good alt texts online. This can be a starter: https://support.microsoft.com/en-us/office/everything-you-need-to-know-to-write-effective-alt-text-df98f884-ca3d-456c-807b-1a1fa82f5dc2

orcid: "0000-0001-8814-9421"
- name: "Rosalind M Eggo"
orcid: "0000-0002-0362-6717"
date: last-modified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I didn't know about this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly I spend too much time looking at the quarto documentation, hehe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now realize this may come with issues as we sometimes need to update already published posts (typos, breaking changes in quarto, broken URLs, etc.)

@avallecam
Copy link
Member Author

avallecam commented Aug 3, 2023

Could you try to add more detailed alt texts please? Imagine you cannot see the image but still would like to know the information conveyed in the image.

Thank you @Bisaloo for this edit suggestion.

Following guidelines, a fair description of the figures does not fit in two sentences (See 8a5c3b2 for fig1). The node names have more words and connections than the tidyverse reference alt text that shows connections with a "Import -> Tidy" notation.

So, a solution could be to redirect the reader to two data frames (df) in plain text format. This to allow the reading of:

  • all the node "name", "type" (task, data), "stage" (early, middle, late) in df 1, and
  • all their connections "from" and "to" in df 2.

If I follow this, my questions now are: Where to place those data frames? What format to use? I'll test a Markdown format and a HTML format at the end of the post. If needed, complementary files in the folder.

Let me know your thoughts.

@avallecam
Copy link
Member Author

for the record, I tried with the html outputs from draw_io/diagrams_net but those don't meet wcag standards for alt text due to the absence of aria labels.

I found a HTML diagram that meets alt text standards from a11y, but no interphase tool to build them yet.

I'll stick to the plan of the last message for now, as well as try more extensive alt text as an alternative too.

Also, I'll look and ask for alternative tools to develop and deliver coming diagrams or concept maps.

@avallecam
Copy link
Member Author

I think that I'll go for the complex image approach one to add a text link to a long description adjacent to the image. I'll try the quarto collapse callout for this. I found this option from this alt text decision tree.

@pratikunterwegs
Copy link
Contributor

Hi Andree, sorry about the delay - overall I think it looks good, and we are already likely to use Fig. 1 for a meeting next week. Will let you know how that goes!

@Bisaloo
Copy link
Member

Bisaloo commented Sep 7, 2023

Hi @avallecam, what's the status of this? Is it ready to be merged?

@avallecam
Copy link
Member Author

Hi @avallecam, what's the status of this? Is it ready to be merged?

No yet, Hugo. I leave this unattended while prioritizing other work. I'll retake this next week. Thank you for asking!

@sbfnk
Copy link
Contributor

sbfnk commented Sep 8, 2023

Apologies for not looking at this before - this looks nice and has some really cool figures! If not too late here are a few suggestions:

  • as far as I'm aware the terminology of pipelines for this kind of approach comes from bioinformatics, so should we perhaps provide a reference? E.g. to quote a random one I stumbled across "A bioinformatics pipeline progressively shepherds and processes massive sequence data and their associated metadata through a series of transformations using multiple software components, databases, and operation environments (hardware and operating system)" from https://doi.org/10.1016/j.jmoldx.2017.11.003
  • it might also be good to provide a rationale for the pipeline approach. I.e., in a perfect world perhaps we'd have a full generative model that uses all available data but in practice that is usually unrealistic as we can't create a full generative model of the world encompassing e.g. historical and current outbreaks, and we usually don't have all raw relevant data available anyway. By chaining a series of analyses (a common approach in infectious disease modelling where e.g. estimates from other studies are used as parameters in a new one) we benefit from splitting a task up into more manageable chunks that might each have different tools available developed independently, so we can have a multiplicity of approaches for greater robustness. We can make this process smoother by ensuring routines are compatible and feed into each other straightforwardly. The downside of the approach is that it can be difficult to correctly pass on uncertainty so that we probably lose some statistical validity. Others have thought about ways to overcome this tradeoff, e.g. https://arxiv.org/abs/2109.13730 - overall I'd say it's an open question of which approach works best for any given analysis task
  • in Fig. 2 I was missing mention of clinical data (to estimate e.g. severity) and transmission data (e.g. from household studies, or pathogen genetic data). Without transmission data one can't estimate serial intervals / generation intervals and thus can't reconstruct transmission chains or estimate transmissibility.
  • I'd suggest to be careful to distinguish modelled estimates and data, and keep them separate as much as possible in the terminology. E.g. in the example in the previous bullet points observed transmission events and timing of symptoms could be data, resulting in an estimate of serial intervals or generation intervals (which aren't data). Similarly, data on case counts combined with an estimate of the generation time can be used to estimate the reproduction number, and these estimates can be further used in scenarios (but the only data that's been used at that point were case data and perhaps data on some transmission events). I think it would be good to make that distinction especially when we say "Next, we collect the estimate of transmission data output, ideally from the Transmissibility pipeline. Finally, we use these three data" where the concepts get a bit mixed up

@chartgerink
Copy link
Member

chartgerink commented Apr 25, 2024

Any updates on whether this PR will still be worked on? If not we can close it (please respond within next two weeks).

@avallecam
Copy link
Member Author

avallecam commented May 4, 2024

Any updates on whether this PR will still be worked on? If not we can close it (please respond within next two weeks).

Yes, it is on the plans to retake in June. Added to planning to increase visibility https://github.com/orgs/epiverse-trace/projects/33

@chartgerink chartgerink marked this pull request as draft May 6, 2024 07:53
@chartgerink
Copy link
Member

Okay - I marked this as a draft PR until then. I am quite active on this repository and having open PRs that are not ready for consideration is quite confusing.

It is considered best practice to only have PRs open that are ready for merging/closing and do not need any further work to be considered (as far as I know). I appreciate your patience as we work to streamline the management of repo issues + PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants