Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable formula to wlr() #280

Open
LittleBeannie opened this issue Sep 5, 2024 · 13 comments
Open

Enable formula to wlr() #280

LittleBeannie opened this issue Sep 5, 2024 · 13 comments
Assignees
Labels
question Further information is requested

Comments

@LittleBeannie
Copy link
Collaborator

LittleBeannie commented Sep 5, 2024

The rmst() allows formula interface, i.e.,

rmst(
  data = ex1_delayed_effect,
  formula = Surv(month, evntd) ~ trt,
  tau = 10,
  reference = "0"
)

Shall we enable formula interface to wlr()?

wlr(
  data = ex1_delayed_effect,
  formula = Surv(month, evntd) ~ trt ,
  weight = fh(0, 0.5)
)

If there are strata, what is the best way to save it in the formula?
Keaven's suggestion:

wlr(
  data = ex1_delayed_effect,
  formula = Surv(month, evntd) ~ trt + strata(strata variable),
  weight = fh(0, 0.5)
)
@LittleBeannie LittleBeannie added the question Further information is requested label Sep 5, 2024
@LittleBeannie
Copy link
Collaborator Author

Decision: wait for Larry's weighted cox regression.

@LittleBeannie
Copy link
Collaborator Author

A reference R package coxphw: https://cran.r-project.org/web/packages/coxphw/index.html

@jdblischak
Copy link
Collaborator

xref:

@LittleBeannie
Copy link
Collaborator Author

Medium priority

@jdblischak
Copy link
Collaborator

@LittleBeannie could you please provide some examples that demonstrate the current interface of wlr() and then how you would like that to be expressed equivalently with the proposed formula argument?

For example, how can I call wlr() without using the argument formula to generate the same results as the proposed call below?

wlr(
  data = ex1_delayed_effect,
  formula = Surv(month, evntd) ~ trt ,
  weight = fh(0, 0.5)
)

rmst() has various arguments that begin with var_, but wlr() doesn't have these, so it isn't immediately clear to me what to do with the variable names after I parse the input formula.

@LittleBeannie
Copy link
Collaborator Author

Hi @jdblischak! The following code is equivalent to your example.

ex1_delayed_effect |>
  mutate(treatment = trt, tte = month) |>
  wlr(weight = fh(0, 0.5))

Please kindly note that, for wlr.tte_data(), it requires the data to have 4 columns: tte, event, stratum and treatment.

@jdblischak
Copy link
Collaborator

To make sure I understand: the point of the argument formula will be to pass user-created data sets to wlr()? Because if the object is the result of cut_data_by_date(), cut_data_by_event(), or counting_process(), it already has the correct column names.

Also, thinking from a documentation and code perspective, is this formula argument worth the effort, for us and for end users? It is easy to document the requirement "The data input to wlr() must have the columns tte, event, stratum and treatment.", and then let end users decide how they want to rename the columns. Using the formula argument, we'll need to document exactly how they need to structure the formula, and the end users will have to take the time to understand the requirement and create the formula (and will that take less time than simply renaming the columns?).

@LittleBeannie
Copy link
Collaborator Author

To make sure I understand: the point of the argument formula will be to pass user-created data sets to wlr()?

Yes. The user-created data sets, such as ex1_delayed_effect can have different column names, compared with the output of cut_data_by_date.

Also, thinking from a documentation and code perspective, is this formula argument worth the effort, for us and for end users?

It is crucial because while statisticians are acquainted with survival formulas, however, their familiarity with the mandatory column names (tte, event, stratum, and treatment) may vary.

Using the formula argument, we'll need to document exactly how they need to structure the formula, and the end users will have to take the time to understand the requirement and create the formula (and will that take less time than simply renaming the columns?).

The wlr-formula is akin to the one found in the survival package. It is essential to provide examples of formula usage, where I can help. However, statisticians are likely to be familiar with the formula as they have frequently utilized it when they use the survival package.

@jdblischak
Copy link
Collaborator

It is crucial because while statisticians are acquainted with survival formulas, however, their familiarity with the mandatory column names (tte, event, stratum, and treatment) may vary.

Ok. That makes sense. If they are already familiar with the formula syntax from the {survival} package, then it shouldn't take them long to figure out how to use it for wlr().

Please kindly note that, for wlr.tte_data(), it requires the data to have 4 columns: tte, event, stratum and treatment.

I'm still a bit confused. My plan was to use the formula argument to rename the columns, and then pass the result to wlr.tte_data. However, your example doesn't have all the required columns.

ex1_delayed_effect |>
    mutate(treatment = trt, tte = month) |>
    head(2)
## # A tibble: 2 × 6
##      id month evntd   trt treatment   tte
##   <dbl> <dbl> <dbl> <dbl>     <dbl> <dbl>
## 1     1 0.321     1     1         1 0.321
## 2     2 0.321     1     1         1 0.321

I assume that evntd is event, so I can update it as below. But is there a stratum column? If not, does that mean it is optional?

ex1_delayed_effect |>
  mutate(treatment = trt, tte = month, event = evntd) |>
  head(2)
## # A tibble: 2 × 7
## id month evntd   trt treatment   tte event
## <dbl> <dbl> <dbl> <dbl>     <dbl> <dbl> <dbl>
##   1     1 0.321     1     1         1 0.321     1
##   2     2 0.321     1     1         1 0.321     1

@LittleBeannie
Copy link
Collaborator Author

If the user creates a dataset like ex1_delayed_effect, the following 2 parts of code is equivalent.

ex1_delayed_effect |>
  mutate(treatment = trt, tte = month) |>
  select(treatment, stratum, tte, event) |>
  # this wlr() is S3 method of the `tte_data` class
  wlr(weight = fh(0, 0.5))
# this wlr is the formula interface
# users are not required to change the column name when they code. 
# as for the source code of wlr, column name might be required.
wlr(
  data = ex1_delayed_effect,
  formula = Surv(month, evntd) ~ trt ,
  weight = fh(0, 0.5)
)

@LittleBeannie
Copy link
Collaborator Author

For unstratified design, i.e., length(unique(stratum)) == 1, the formula of Surv(month, evntd) ~ trt is good.

For stratified design, i.e., length(unique(stratum)) > 1, the formula is Surv(month, evntd) ~ trt + strata(stratum).

@jdblischak
Copy link
Collaborator

There is no column named "stratum"

ex1_delayed_effect |>
  mutate(treatment = trt, tte = month) |>
  select(treatment, stratum, tte, event)
## Error in `select()`:
## ! Can't select columns that don't exist.
## ✖ Column `stratum` doesn't exist.
## Run `rlang::last_trace()` to see where the error occurred.

@LittleBeannie
Copy link
Collaborator Author

There is no column named "stratum"

If there is no "stratum" column, this means it is an unstratified design, which is equivalent to the dataset with stratum = "All".
We do not have an example data for stratified design (see all examples at https://merck.github.io/simtrial/reference/index.html#example-datasets). If you need one, please use the following mock-up example.

data <- ex1_delayed_effect |>
    mutate(stratum = sample(c("biomarker positive", "biomarker negative"), n(), replace = TRUE, prob = c(0.6, 0.4)),
           trt = case_when(evntd == 1 ~ "experimental",
                           evntd == 0 ~ "control")) |>
    rename(treatment = trt, tte = month, event = evntd) |>
    select(treatment, stratum, tte, event) 
class(data) <- c("tte_data", class(data))
data |> wlr(fh(0, 0.5))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants