I have a data.frame
containing drug prescriptions (of the same drug) including the date of the prescription and the amount of daily doses. There is one row per drug prescription. Each patient in the data.frame has a unique idnr and each unique idnr can have several fillings of prescriptions (ie several rows). The data is arranged by first idnr then filldate.
The purpose of the desired output is to create a variable indicating discontinuation of treatment which will be defined as <(-30) doses left at time of subsequent refill. The date of discontinued treatment will then be filldate + daily_doses + lag(doses_left_at_time_of_subsequent_refill)
calculated from the row which had <(-30) doses left at time of the subsequent refill. A patient can have multiple dates of discontinuation.
Example data:
idnr daily_doses filldate
1. 10 2000-01-01
1. 10 2000-01-12
1. 10 2000-01-15
1. 10 2000-01-20
1. 10 2002-03-21
1. 10 2002-03-30
1. 10 2002-04-20
2. 10 2004-05-01
2. 20 2004-05-11
2. 10 2004-05-24
2. 10 2004-06-01
3. 20 2000-01-01
3. 10 2010-03-04
3. 10 2010-03-04
3. 10 2010-03-04
3. 10 2010-03-08
3. 10 2012-07-08
3. 10 2012-07-18
3. 10 2012-07-30
3. 10 2012-08-15
3. 10 2012-09-25
As data.frame:
data.frame(
idnr = c(1L,1L,1L,1L,1L,1L,1L,2L,2L,2L,
2L,3L,3L,3L,3L,3L,3L,3L,3L,3L,3L),
daily_doses = c(10L,10L,10L,10L,10L,10L,10L,10L,
20L,10L,10L,20L,10L,10L,10L,10L,10L,10L,10L,10L,10L),
filldate = as.Date(c("2000-01-01","2000-01-12","2000-01-15",
"2000-01-20","2002-03-21","2002-03-30","2002-04-20",
"2004-05-01","2004-05-11","2004-05-24","2004-06-01",
"2000-01-01","2010-03-04","2010-03-04","2010-03-04","2010-03-08",
"2012-07-08","2012-07-18","2012-07-30","2012-08-15",
"2012-09-25"), format = "%Y-%m-%d")
)
I want to identify those who have not refilled their prescription within 30 days with the following conditions:
If a patient has filled prescriptions earlier than needed, illustrated by amount of days covered by the daily doses, the doses should be assumed to be used up before starting the next fill. In other words, there should be a variable indicating how many doses are left at the next fill date so that this information is not lost when defining date of discontinuation. If there is no subsequent fill date the variable should indicate there is -Inf
doses left
If a patient however has filled prescriptions later than supposed to, according to the amount of days covered by the daily dose, the variable indicating how many doses there are left on the next prescription should not be adjusted.
Desired output:
idnr daily_doses filldate doses_left_at_time_of_subsequent_refill
1. 10 2000-01-01 -1
1. 10 2000-01-12 7
1. 10 2000-01-15 12
1. 10 2000-01-20 -769
1. 10 2002-03-21 1
1. 10 2002-03-30 10
1. 10 2002-04-20 -Inf
2. 10 2004-05-01 0
2. 20 2004-05-11 7
2. 10 2004-05-24 11
2. 10 2004-06-01 -Inf
3. 20 2000-01-01 -3715
3. 10 2010-03-04 10
3. 10 2010-03-04 20
3. 10 2010-03-04 26
3. 10 2010-03-08 -817
3. 10 2012-07-08 0
3. 10 2012-07-18 -2
3. 10 2012-07-30 -6
3. 10 2012-08-15 -31
3. 10 2012-09-25 -Inf
I have tried by shifting the fill date forward but i can only account for just 1 previous row in the groups of idnr. This was done using the lag()
function but since patients has different number of prescriptions and since some patients have hundred of prescriptions it is not efficient using lag()
. I also tried the cumsum()
function but then I have issues with doses hoarding up from previous treatment episodes (which should already be discontinued by the -30 doses left at time of subsequent refill definition and thus not accounted for in newer treatment episodes).
Hi, it seems like an interesting question but I do not know how to read your data into my R. You have no newlines in it, so I dont know how to read this as csv… The easiest way to make your question reproducible is to give us the output of
dput(your_example_data)
! Please edit that into your question.@ccalle I’m wondering about your desired outcome. A patient gets a first refill on Jan 1 and their second on Jan 12. So wouldn’t the expected value of “left over doses” be 10 on Jan 1 (the first refill, they just got 10 doses), and -1 on Jan 12 (they took their last of the 10 doses on jan 11)?
The first task is to convert the character “filldate” to real R Date-classed values.
I’m curious about the expected output as well. I can sort-of find logic in the
c(-1, 7, 12)
(though imperfectly), but the-39
doesn’t make sense to me, not to mention the 2-year gap is a bit confounding. A start of a solution might start withcumsum(doses) - cumsum(c(diff(filldate), NA))
, though going big-negative might drive us into usingReduce
(orpurrr::accumulate
). Not sure, the expected output doesn’t really make sense to me.I don’t understand how you got the -39 either when there was a gap of over a year.
Show 6 more comments