Find the first of the last 1’s sequence

Find the first of the last 1’s sequence

13

I have the following vectors with 0s and 1s:

test1 <- c(rep(0,20),rep(1,5),rep(0,10),rep(1,15)) 

test1
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
                                                                          ^
test2 <- c(rep(0,8),rep(1,4),rep(0,5),rep(1,5),rep(0,6),rep(1,10),rep(0,2)) 

test2
[1] 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
                                                            ^

I am trying to find the index of first 1 in the last sequence of 1s (indicated by ^ in the above code).

For test1 and test2, the expected output is 36 and 29, respectively.

Here is a solution:

temp1 <- cumsum(test1)
which(temp1==max(temp1[duplicated(temp1)&temp1!=max(temp1)]+1))[1]
[1] 36

temp2 <- cumsum(test2)
which(temp2==max(temp2[duplicated(temp2)&temp2!=max(temp2)]+1))[1]
[1] 29

Note: The length of actual vectors is ~10k.

Update:

For completeness, here is the benchmark with a vector of size 30001:

microbenchmark::microbenchmark(zx8754(x),
                               Grothendieck(x),
                               Maël(x),times=1000)


Unit: microseconds
            expr     min       lq      mean   median       uq      max neval cld
       zx8754(x) 286.001 313.5005  960.1367 685.2515 847.0010  20326.8  1000  ab
 Grothendieck(x) 297.901 328.0010 1054.2712 683.1505 848.9515 142477.1  1000   b
         Maël(x) 243.500 263.3505  660.1222 494.5005 573.9515  13435.6  1000  a 

Recognized by R Language Collective

5

  • 3

    @zx8754 answer is how I would have solved. If you need more performance there is a fast data.table function called data.table:::uniqlist(list(test1)) that might help, or a C++ solution that can be built on code posted at stackoverflow.com/q/20896242/10276092

    – M.Viking

    yesterday

  • @M.Viking nice function, new to me. Can you post it as an answer, it returns 1 21 26 36, what would be your next steps to get 36?

    – zx8754

    yesterday

  • @zx8754 Added my answer, thanks for the push

    – M.Viking

    yesterday

  • 1

    @zx8754 could you add the new answers to the benchmark? Or, add the code for the vector creation

    – Maël

    yesterday

  • The tested vector and benchmark have been moved to here. Feel free to update the answer.

    – peter861222

    20 hours ago

9 Answers
9

Highest score (default)

Trending (recent votes count more)

Date modified (newest first)

Date created (oldest first)

14

Using rle:

r <- rle(test1)
ix <- max(which(r$values == 1))
sum(r$lengths[ 1:(ix - 1) ]) + 1
# [1] 36

r <- rle(test2)
ix <- max(which(r$values == 1))
sum(r$lengths[ 1:(ix - 1) ]) + 1
# [1] 29

Recognized by R Language Collective

1

  • 2

    I'm rle's biggest fan 🙂 . So much so that I wrote seqle , a function which finds sequence lengths where you can specify the increment.

    – Carl Witthoft

    yesterday

13

Another way with which + diff.

idx <- which(test1 == 1)
idx[tail(which(diff(idx) != 1), 1) + 1]
#[1] 36

Recognized by R Language Collective

1

  • 2

    Nice one, I started with diff, then gave up and went with rle

    – zx8754

    yesterday

12

Run rle and then use cumsum to calculate the end positions of each run and subtract the lengths and add 1 to get the start positions and then reduce that to the runs of 1’s only and finally take the last element. This gives the start position of the last run of 1’s but if you wanted:

  • the end position just omit the -lengths+1
  • the last run of 0’s replace the ==1 with ==0
  • the first run of 1’s replace tail with head

If there are no 1’s it returns a zero length numeric vector.

with(rle(test1), tail((cumsum(lengths) - lengths + 1)[values == 1], 1))

6

A late response, but a simple while loop will be a (potentially very) fast approach where the sought index is towards the end of the vector.

f <- function(x) {
  l <- length(x)
  while (x[l] - x[l - 1] != 1) {
    l <- l - 1
  }
  l
}

f(test1)
[1] 36
f(test2)
[1] 29

We could also use Position() or the purrr equivalent detect_index():

Position(isTRUE, diff(test1) == 1, right = TRUE) + 1
[1] 36
purrr::detect_index(diff(test1) == 1, isTRUE, .dir = "backward") + 1
[1] 36

1

  • 2

    I feel like this would be an obvious approach in most other languages, but in R it's really thinking outside the box. Love it.

    – Gregor Thomas

    yesterday

5

The data.table library has a non-exported function called data.table:::uniqlist(list(x)). Remember to use three colons ::: to access non-exported functions.

data.table:::uniqlist(list(test1))
# [1]  1 21 26 36

@Arun talks about uniqlist here:

https://stackoverflow.com/a/21267854/10276092

I use the quick y[length(y)] method of finding the last item in a vector, and base ifelse() to check if the last index contains a 1, else the second to last index must contain a 1.

fx <- function(x) {
    y <- data.table:::uniqlist(list(x))
    ifelse(x[y[length(y)]] == 1, y[length(y)], y[length(y) - 1])
}

fx(test1)
# [1] 36
fx(test2)
# [1] 29

4

You can try regex, like sub + nchar

f1 <- function(v) nchar(sub("(.*01).*", "\1", paste0(v, collapse = "")))

or regexpr

f2 <- function(v) attr(regexpr(".*(?<=0)1", paste0(v,collapse = ""), perl = TRUE), "match.length")

or some other diff options

f3 <- function(v) tail(which(diff(v) == 1) + 1, 1)

f4 <- function(v) max((2:length(v))[diff(v) == 1])

Recognized by R Language Collective

3

For completeness, here is the benchmark with a vector of size 30001. Feel free to update this if needed.

x <- c(rep(0,14736),rep(1,413),rep(0,830),rep(1,961),rep(0,274),rep(1,12787))


microbenchmark::microbenchmark(rle_zx8754(x),
                               rle_Grothendieck(x),
                               which_diff_Maël(x),
                               uniqlist_Viking(x),
                               while_Ritchie(x),
                               #Position_Ritchie(x),
                               #detect_index_Ritchie(x),
                               diff_Thomas(x),
                               #regex_Thomas(x),
                               #regexpr_Thomas(x),
                               times = 1000, check='equal')



Unit: microseconds
                 expr   min     lq      mean median     uq
        rle_zx8754(x) 339.5 350.45  783.9827 357.45 375.15
  rle_Grothendieck(x) 352.7 364.75  616.2324 372.60 391.75
   which_diff_Maël(x) 264.2 274.60  404.5521 279.50 292.00
   uniqlist_Viking(x)  16.7  22.30   32.1502  25.40  30.65
     while_Ritchie(x) 777.6 785.60 1021.0738 801.95 847.15
       diff_Thomas(x) 279.4 286.90  500.6373 291.20 306.35
      max neval  cld
 156630.3  1000   cd
  11196.5  1000  bc 
   7263.2  1000  b  
   3524.9  1000 a   
   6739.7  1000    d
   9435.5  1000  b 

functions:

x <- c(rep(0,14736),rep(1,413),rep(0,830),rep(1,961),rep(0,274),rep(1,12787))


rle_zx8754 <- function(x){
  r <- rle(x)
  ix <- max(which(r$values == 1))
  sum(r$lengths[ 1:(ix - 1) ]) + 1
}

which_diff_Maël <- function(x){
  idx <- which(x == 1)
  idx[tail(which(diff(idx) != 1), 1) + 1]
}

rle_Grothendieck <- function(x){
  with(rle(x), tail((cumsum(lengths) - lengths + 1)[values == 1], 1))
}

uniqlist_Viking <- function(x){
  y <- data.table:::uniqlist(list(x))
  ifelse(x[y[length(y)]] == 1, y[length(y)], y[length(y) - 1])
}

while_Ritchie <- function(x){
  l <- length(x)
  while (x[l] - x[l - 1] != 1) {
    l <- l - 1
  }
  l
}
Position_Ritchie <- function(x){
  Position(isTRUE, diff(x) == 1, right = TRUE) + 1
}

detect_index_Ritchie <- function(x){
  purrr::detect_index(diff(x) == 1, isTRUE, .dir = "backward") + 1
}

diff_Thomas <- function(x){
  max((2:length(x))[diff(x) == 1])
}

regex_Thomas <- function(x){
  nchar(sub("(.*01).*", "\1", paste0(x, collapse = "")))
}

regexpr_Thomas <- function(x){
  attr(regexpr(".*(?<=0)1", paste0(x,collapse = ""), perl = TRUE), "match.length")
}

2

May not be the best but just an alternate for easy understanding

data.frame(var1=c(rep(0,20),rep(1,5),rep(0,10),rep(1,15))) %>% 
  mutate(new=rleid(var1), row=row_number()) %>% 
  filter(var1==1 & max(new)==new) %>% 
  slice_head(n=1) %>% 
  select(row)

# output

  row
1  36

2

We can also use rleid from data.table:

library(data.table)

i1 <- rleid(test1)
min(which(i1 == max(i1[test1 == 1])))
# [1] 36
i1 <- rleid(test2)
min(which(i1 == max(i1[test2 == 1])))
# [1] 29

Recognized by R Language Collective

Your Answer

Draft saved
Draft discarded

Post as a guest

Required, but never shown


By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

Not the answer you're looking for? Browse other questions tagged

or ask your own question.

Leave a Reply

Your email address will not be published. Required fields are marked *