Find the first of the last 1’s sequence

Question

I have the following vectors with 0s and 1s:

test1 <- c(rep(0,20),rep(1,5),rep(0,10),rep(1,15)) 

test1
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
                                                                          ^
test2 <- c(rep(0,8),rep(1,4),rep(0,5),rep(1,5),rep(0,6),rep(1,10),rep(0,2)) 

test2
[1] 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
                                                            ^

I am trying to find the index of first 1 in the last sequence of 1s (indicated by ^ in the above code).

For test1 and test2, the expected output is 36 and 29, respectively.

Here is a solution:

temp1 <- cumsum(test1)
which(temp1==max(temp1[duplicated(temp1)&temp1!=max(temp1)]+1))[1]
[1] 36

temp2 <- cumsum(test2)
which(temp2==max(temp2[duplicated(temp2)&temp2!=max(temp2)]+1))[1]
[1] 29

Note: The length of actual vectors is ~10k.

Update:

For completeness, here is the benchmark with a vector of size 30001:

microbenchmark::microbenchmark(zx8754(x),
                               Grothendieck(x),
                               Maël(x),times=1000)


Unit: microseconds
            expr     min       lq      mean   median       uq      max neval cld
       zx8754(x) 286.001 313.5005  960.1367 685.2515 847.0010  20326.8  1000  ab
 Grothendieck(x) 297.901 328.0010 1054.2712 683.1505 848.9515 142477.1  1000   b
         Maël(x) 243.500 263.3505  660.1222 494.5005 573.9515  13435.6  1000  a

@zx8754 answer is how I would have solved. If you need more performance there is a fast data.table function called data.table:::uniqlist(list(test1)) that might help, or a C++ solution that can be built on code posted at stackoverflow.com/q/20896242/10276092 — yesterday
@M.Viking nice function, new to me. Can you post it as an answer, it returns 1 21 26 36, what would be your next steps to get 36? — yesterday
@zx8754 could you add the new answers to the benchmark? Or, add the code for the vector creation — yesterday
The tested vector and benchmark have been moved to here. Feel free to update the answer. — 20 hours ago

zx8754zx8754 52k12 gold badges113 silver badges207 bronze badges · Accepted Answer · 2023-06-29 15:00:05Z

14

Using rle:

r <- rle(test1)
ix <- max(which(r$values == 1))
sum(r$lengths[ 1:(ix - 1) ]) + 1
# [1] 36

r <- rle(test2)
ix <- max(which(r$values == 1))
sum(r$lengths[ 1:(ix - 1) ]) + 1
# [1] 29

answered yesterday

zx8754

52k12 gold badges113 silver badges207 bronze badges

Recognized by R Language Collective

1

2

I'm rle's biggest fan 🙂 . So much so that I wrote seqle , a function which finds sequence lengths where you can specify the increment.

– Carl Witthoft

yesterday

Add a comment
|

MaëlMaël 43.5k3 gold badges28 silver badges67 bronze badges · Accepted Answer · 2023-06-29 15:27:49Z

13

Another way with which + diff.

idx <- which(test1 == 1)
idx[tail(which(diff(idx) != 1), 1) + 1]
#[1] 36

answered yesterday

Maël

43.5k3 gold badges28 silver badges67 bronze badges

Recognized by R Language Collective

1

2

Nice one, I started with diff, then gave up and went with rle

– zx8754

yesterday

Add a comment
|

score 12 · Accepted Answer · 2023-06-29 18:06:53Z

Run rle and then use cumsum to calculate the end positions of each run and subtract the lengths and add 1 to get the start positions and then reduce that to the runs of 1’s only and finally take the last element. This gives the start position of the last run of 1’s but if you wanted:

the end position just omit the -lengths+1
the last run of 0’s replace the ==1 with ==0
the first run of 1’s replace tail with head

If there are no 1’s it returns a zero length numeric vector.

with(rle(test1), tail((cumsum(lengths) - lengths + 1)[values == 1], 1))

score 6 · Accepted Answer · 2023-06-30 02:30:26Z

A late response, but a simple while loop will be a (potentially very) fast approach where the sought index is towards the end of the vector.

f <- function(x) {
  l <- length(x)
  while (x[l] - x[l - 1] != 1) {
    l <- l - 1
  }
  l
}

f(test1)
[1] 36
f(test2)
[1] 29

We could also use Position() or the purrr equivalent detect_index():

Position(isTRUE, diff(test1) == 1, right = TRUE) + 1
[1] 36
purrr::detect_index(diff(test1) == 1, isTRUE, .dir = "backward") + 1
[1] 36

I feel like this would be an obvious approach in most other languages, but in R it's really thinking outside the box. Love it. — yesterday

score 5 · Accepted Answer · 2023-06-30 02:30:30Z

The data.table library has a non-exported function called data.table:::uniqlist(list(x)). Remember to use three colons ::: to access non-exported functions.

data.table:::uniqlist(list(test1))
# [1]  1 21 26 36

@Arun talks about uniqlist here:

https://stackoverflow.com/a/21267854/10276092

I use the quick y[length(y)] method of finding the last item in a vector, and base ifelse() to check if the last index contains a 1, else the second to last index must contain a 1.

fx <- function(x) {
    y <- data.table:::uniqlist(list(x))
    ifelse(x[y[length(y)]] == 1, y[length(y)], y[length(y) - 1])
}

fx(test1)
# [1] 36
fx(test2)
# [1] 29

score 4 · Accepted Answer · 2023-06-29 19:12:38Z

You can try regex, like sub + nchar

f1 <- function(v) nchar(sub("(.*01).*", "\1", paste0(v, collapse = "")))

or regexpr

f2 <- function(v) attr(regexpr(".*(?<=0)1", paste0(v,collapse = ""), perl = TRUE), "match.length")

or some other diff options

f3 <- function(v) tail(which(diff(v) == 1) + 1, 1)

f4 <- function(v) max((2:length(v))[diff(v) == 1])

M.Viking 4,7934 gold badges15 silver badges32 bronze badges · Accepted Answer · 2023-06-30 23:21:52Z

For completeness, here is the benchmark with a vector of size 30001. Feel free to update this if needed.

x <- c(rep(0,14736),rep(1,413),rep(0,830),rep(1,961),rep(0,274),rep(1,12787))


microbenchmark::microbenchmark(rle_zx8754(x),
                               rle_Grothendieck(x),
                               which_diff_Maël(x),
                               uniqlist_Viking(x),
                               while_Ritchie(x),
                               #Position_Ritchie(x),
                               #detect_index_Ritchie(x),
                               diff_Thomas(x),
                               #regex_Thomas(x),
                               #regexpr_Thomas(x),
                               times = 1000, check='equal')



Unit: microseconds
                 expr   min     lq      mean median     uq
        rle_zx8754(x) 339.5 350.45  783.9827 357.45 375.15
  rle_Grothendieck(x) 352.7 364.75  616.2324 372.60 391.75
   which_diff_Maël(x) 264.2 274.60  404.5521 279.50 292.00
   uniqlist_Viking(x)  16.7  22.30   32.1502  25.40  30.65
     while_Ritchie(x) 777.6 785.60 1021.0738 801.95 847.15
       diff_Thomas(x) 279.4 286.90  500.6373 291.20 306.35
      max neval  cld
 156630.3  1000   cd
  11196.5  1000  bc 
   7263.2  1000  b  
   3524.9  1000 a   
   6739.7  1000    d
   9435.5  1000  b

functions:

x <- c(rep(0,14736),rep(1,413),rep(0,830),rep(1,961),rep(0,274),rep(1,12787))


rle_zx8754 <- function(x){
  r <- rle(x)
  ix <- max(which(r$values == 1))
  sum(r$lengths[ 1:(ix - 1) ]) + 1
}

which_diff_Maël <- function(x){
  idx <- which(x == 1)
  idx[tail(which(diff(idx) != 1), 1) + 1]
}

rle_Grothendieck <- function(x){
  with(rle(x), tail((cumsum(lengths) - lengths + 1)[values == 1], 1))
}

uniqlist_Viking <- function(x){
  y <- data.table:::uniqlist(list(x))
  ifelse(x[y[length(y)]] == 1, y[length(y)], y[length(y) - 1])
}

while_Ritchie <- function(x){
  l <- length(x)
  while (x[l] - x[l - 1] != 1) {
    l <- l - 1
  }
  l
}
Position_Ritchie <- function(x){
  Position(isTRUE, diff(x) == 1, right = TRUE) + 1
}

detect_index_Ritchie <- function(x){
  purrr::detect_index(diff(x) == 1, isTRUE, .dir = "backward") + 1
}

diff_Thomas <- function(x){
  max((2:length(x))[diff(x) == 1])
}

regex_Thomas <- function(x){
  nchar(sub("(.*01).*", "\1", paste0(x, collapse = "")))
}

regexpr_Thomas <- function(x){
  attr(regexpr(".*(?<=0)1", paste0(x,collapse = ""), perl = TRUE), "match.length")
}

Gregor Thomas 135k18 gold badges165 silver badges293 bronze badges · Accepted Answer · 2023-06-30 02:51:46Z

2

May not be the best but just an alternate for easy understanding

data.frame(var1=c(rep(0,20),rep(1,5),rep(0,10),rep(1,15))) %>% 
  mutate(new=rleid(var1), row=row_number()) %>% 
  filter(var1==1 & max(new)==new) %>% 
  slice_head(n=1) %>% 
  select(row)

# output

  row
1  36

edited yesterday

Gregor Thomas

135k18 gold badges165 silver badges293 bronze badges

answered yesterday

jkatam

1,6754 silver badges10 bronze badges

Add a comment
|

SotosSotos 50.9k6 gold badges32 silver badges66 bronze badges · Accepted Answer · 2023-06-30 07:42:39Z

2

We can also use rleid from data.table:

library(data.table)

i1 <- rleid(test1)
min(which(i1 == max(i1[test1 == 1])))
# [1] 36
i1 <- rleid(test2)
min(which(i1 == max(i1[test2 == 1])))
# [1] 29

answered yesterday

Sotos

50.9k6 gold badges32 silver badges66 bronze badges

Recognized by R Language Collective

Add a comment
|

Find the first of the last 1’s sequence