I have the following vectors with 0s and 1s:
test1 <- c(rep(0,20),rep(1,5),rep(0,10),rep(1,15))
test1
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
^
test2 <- c(rep(0,8),rep(1,4),rep(0,5),rep(1,5),rep(0,6),rep(1,10),rep(0,2))
test2
[1] 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
^
I am trying to find the index of first 1 in the last sequence of 1s (indicated by ^ in the above code).
For test1
and test2
, the expected output is 36 and 29, respectively.
Here is a solution:
temp1 <- cumsum(test1)
which(temp1==max(temp1[duplicated(temp1)&temp1!=max(temp1)]+1))[1]
[1] 36
temp2 <- cumsum(test2)
which(temp2==max(temp2[duplicated(temp2)&temp2!=max(temp2)]+1))[1]
[1] 29
Note: The length of actual vectors is ~10k.
Update:
For completeness, here is the benchmark with a vector of size 30001:
microbenchmark::microbenchmark(zx8754(x),
Grothendieck(x),
Maël(x),times=1000)
Unit: microseconds
expr min lq mean median uq max neval cld
zx8754(x) 286.001 313.5005 960.1367 685.2515 847.0010 20326.8 1000 ab
Grothendieck(x) 297.901 328.0010 1054.2712 683.1505 848.9515 142477.1 1000 b
Maël(x) 243.500 263.3505 660.1222 494.5005 573.9515 13435.6 1000 a
5
9 Answers
9
Highest score (default)
Trending (recent votes count more)
Date modified (newest first)
Date created (oldest first)
Using rle:
r <- rle(test1)
ix <- max(which(r$values == 1))
sum(r$lengths[ 1:(ix - 1) ]) + 1
# [1] 36
r <- rle(test2)
ix <- max(which(r$values == 1))
sum(r$lengths[ 1:(ix - 1) ]) + 1
# [1] 29
1
-
2
I'm
rle
's biggest fan 🙂 . So much so that I wroteseqle
, a function which finds sequence lengths where you can specify the increment.– Carl Witthoftyesterday
Another way with which
+ diff
.
idx <- which(test1 == 1)
idx[tail(which(diff(idx) != 1), 1) + 1]
#[1] 36
1
-
2
Nice one, I started with diff, then gave up and went with rle
– zx8754
yesterday
Run rle and then use cumsum to calculate the end positions of each run and subtract the lengths and add 1 to get the start positions and then reduce that to the runs of 1’s only and finally take the last element. This gives the start position of the last run of 1’s but if you wanted:
- the end position just omit the
-lengths+1
- the last run of 0’s replace the
==1
with==0
- the first run of 1’s replace
tail
withhead
If there are no 1’s it returns a zero length numeric vector.
with(rle(test1), tail((cumsum(lengths) - lengths + 1)[values == 1], 1))
A late response, but a simple while
loop will be a (potentially very) fast approach where the sought index is towards the end of the vector.
f <- function(x) {
l <- length(x)
while (x[l] - x[l - 1] != 1) {
l <- l - 1
}
l
}
f(test1)
[1] 36
f(test2)
[1] 29
We could also use Position()
or the purrr
equivalent detect_index()
:
Position(isTRUE, diff(test1) == 1, right = TRUE) + 1
[1] 36
purrr::detect_index(diff(test1) == 1, isTRUE, .dir = "backward") + 1
[1] 36
1
-
2
I feel like this would be an obvious approach in most other languages, but in R it's really thinking outside the box. Love it.
– Gregor Thomasyesterday
The data.table
library has a non-exported function called data.table:::uniqlist(list(x))
. Remember to use three colons :::
to access non-exported functions.
data.table:::uniqlist(list(test1))
# [1] 1 21 26 36
@Arun talks about uniqlist
here:
https://stackoverflow.com/a/21267854/10276092
I use the quick y[length(y)]
method of finding the last item in a vector, and base ifelse()
to check if the last index contains a 1, else the second to last index must contain a 1.
fx <- function(x) {
y <- data.table:::uniqlist(list(x))
ifelse(x[y[length(y)]] == 1, y[length(y)], y[length(y) - 1])
}
fx(test1)
# [1] 36
fx(test2)
# [1] 29
You can try regex
, like sub
+ nchar
f1 <- function(v) nchar(sub("(.*01).*", "\1", paste0(v, collapse = "")))
or regexpr
f2 <- function(v) attr(regexpr(".*(?<=0)1", paste0(v,collapse = ""), perl = TRUE), "match.length")
or some other diff
options
f3 <- function(v) tail(which(diff(v) == 1) + 1, 1)
f4 <- function(v) max((2:length(v))[diff(v) == 1])
For completeness, here is the benchmark with a vector of size 30001. Feel free to update this if needed.
x <- c(rep(0,14736),rep(1,413),rep(0,830),rep(1,961),rep(0,274),rep(1,12787))
microbenchmark::microbenchmark(rle_zx8754(x),
rle_Grothendieck(x),
which_diff_Maël(x),
uniqlist_Viking(x),
while_Ritchie(x),
#Position_Ritchie(x),
#detect_index_Ritchie(x),
diff_Thomas(x),
#regex_Thomas(x),
#regexpr_Thomas(x),
times = 1000, check='equal')
Unit: microseconds
expr min lq mean median uq
rle_zx8754(x) 339.5 350.45 783.9827 357.45 375.15
rle_Grothendieck(x) 352.7 364.75 616.2324 372.60 391.75
which_diff_Maël(x) 264.2 274.60 404.5521 279.50 292.00
uniqlist_Viking(x) 16.7 22.30 32.1502 25.40 30.65
while_Ritchie(x) 777.6 785.60 1021.0738 801.95 847.15
diff_Thomas(x) 279.4 286.90 500.6373 291.20 306.35
max neval cld
156630.3 1000 cd
11196.5 1000 bc
7263.2 1000 b
3524.9 1000 a
6739.7 1000 d
9435.5 1000 b
functions:
x <- c(rep(0,14736),rep(1,413),rep(0,830),rep(1,961),rep(0,274),rep(1,12787))
rle_zx8754 <- function(x){
r <- rle(x)
ix <- max(which(r$values == 1))
sum(r$lengths[ 1:(ix - 1) ]) + 1
}
which_diff_Maël <- function(x){
idx <- which(x == 1)
idx[tail(which(diff(idx) != 1), 1) + 1]
}
rle_Grothendieck <- function(x){
with(rle(x), tail((cumsum(lengths) - lengths + 1)[values == 1], 1))
}
uniqlist_Viking <- function(x){
y <- data.table:::uniqlist(list(x))
ifelse(x[y[length(y)]] == 1, y[length(y)], y[length(y) - 1])
}
while_Ritchie <- function(x){
l <- length(x)
while (x[l] - x[l - 1] != 1) {
l <- l - 1
}
l
}
Position_Ritchie <- function(x){
Position(isTRUE, diff(x) == 1, right = TRUE) + 1
}
detect_index_Ritchie <- function(x){
purrr::detect_index(diff(x) == 1, isTRUE, .dir = "backward") + 1
}
diff_Thomas <- function(x){
max((2:length(x))[diff(x) == 1])
}
regex_Thomas <- function(x){
nchar(sub("(.*01).*", "\1", paste0(x, collapse = "")))
}
regexpr_Thomas <- function(x){
attr(regexpr(".*(?<=0)1", paste0(x,collapse = ""), perl = TRUE), "match.length")
}
May not be the best but just an alternate for easy understanding
data.frame(var1=c(rep(0,20),rep(1,5),rep(0,10),rep(1,15))) %>%
mutate(new=rleid(var1), row=row_number()) %>%
filter(var1==1 & max(new)==new) %>%
slice_head(n=1) %>%
select(row)
# output
row
1 36
We can also use rleid
from data.table
:
library(data.table)
i1 <- rleid(test1)
min(which(i1 == max(i1[test1 == 1])))
# [1] 36
i1 <- rleid(test2)
min(which(i1 == max(i1[test2 == 1])))
# [1] 29
Your Answer
Post as a guest
Required, but never shown
Post as a guest
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.
Not the answer you're looking for? Browse other questions tagged
or ask your own question.
or ask your own question.
@zx8754 answer is how I would have solved. If you need more performance there is a fast data.table function called
data.table:::uniqlist(list(test1))
that might help, or a C++ solution that can be built on code posted at stackoverflow.com/q/20896242/10276092yesterday
@M.Viking nice function, new to me. Can you post it as an answer, it returns
1 21 26 36
, what would be your next steps to get 36?yesterday
@zx8754 Added my answer, thanks for the push
yesterday
@zx8754 could you add the new answers to the benchmark? Or, add the code for the vector creation
yesterday
The tested vector and benchmark have been moved to here. Feel free to update the answer.
20 hours ago
|