How do I split a string by a character without ignoring trailing split-characters?

How do I split a string by a character without ignoring trailing split-characters?


7

I have a string similar to the following

my_string <- "apple,banana,orange,"

And I want to split by , to produce the output:

list(c('apple', 'banana', 'orange', ""))

I thought strsplit would accomplish this but it treats the trailing ‘,’ like it doesn’t exist

my_string <- "apple,banana,orange,"

strsplit(my_string, split = ',')
#> [[1]]
#> [1] "apple"  "banana" "orange"

Created on 2023-11-15 by the reprex package (v2.0.1)

What is the simplest approach to achieve the desired output?

Some more test cases with example strings and desired outputs

string1 = "apple,banana,orange,"
output1 = list(c('apple', 'banana', 'orange', ''))

string2 =  "apple,banana,orange,pear"
output2 = list(c('apple', 'banana', 'orange', 'pear'))

string3 =  ",apple,banana,orange"
output3 = list(c('', 'apple', 'banana', 'orange'))

## Examples of non-comma separated strings
# '|' separator
string4 =  "|apple|banana|orange|"
output4 = list(c('', 'apple', 'banana', 'orange', ''))

# 'x' separator
string5 =  "xapplexbananaxorangex"
output5 = list(c('', 'apple', 'banana', 'orange', ''))

EDIT:

Ideally solution should generalize to any splitting character

Would also prefer a base-R solution (although do still link any packages which supply this functionality since their source code might be useful to look through!)

4

  • 1

    I can't find anything, but I'm sure this is a dupe. Check out stringi::stri_split_fixed

    – Joseph Wood

    21 hours ago

  • 5

    It's a diversion from strsplit, but scan(text=my_string, sep=",", what="") works as intended.

    – thelatemail

    20 hours ago

  • @thelatemail I quite like your scan solution, Its great as quick baseR workaround. Needs to be sat in an apply function to get it working sensibly on vectors but otherwise is a very neat trick

    – Selk

    20 hours ago

  • 2

    scan is a good solution, just wrap it into a function, then use lapply as you wish.

    – zx8754

    20 hours ago

4 Answers
4


6

Why strsplit Doesn’t Give Desired Output?

When you type ?strsplit, you will read the following statement

Note that this means that if there is a match at the beginning of a
(non-empty) string, the first element of the output is "", but if
there is a match at the end of the string, the output is the same as
with the match removed.

That is the reason you don’t see the trailing "" when you use strsplit.

Below are some demonstrations

> strsplit("apple,banana,orange,", ",")
[[1]]
[1] "apple"  "banana" "orange"


> strsplit(",apple,banana,orange,", ",")
[[1]]
[1] ""       "apple"  "banana" "orange"


> strsplit(",apple,banana,orange", ",")
[[1]]
[1] ""       "apple"  "banana" "orange"


> strsplit("apple,banana,orange", ",")
[[1]]
[1] "apple"  "banana" "orange"

A Base R Workaround

If you want to make a coding practice, one base R option can be defining a custom function (recursion) like below

f <- function(x, sep = ",") {
  pat <- sprintf("^(.*?)%s.*", sep)
  s1 <- sub(pat, "\1", x)
  s2 <- sub(paste0("^.*?", sep), "", x)
  if (s2 == x) {
    return(x)
  }
  c(s1, Recall(s2, sep))
}

such that

> f("apple,banana,orange,")
[1] "apple"  "banana" "orange" ""

> f(",apple,banana,orange,")
[1] ""       "apple"  "banana" "orange" ""      

> f(",apple,banana,orange")
[1] ""       "apple"  "banana" "orange"

> f("apple,banana,orange")
[1] "apple"  "banana" "orange"

3

  • That is likely demonstrated above. but might more usefully be reinforced through a couple of code examples with starting with "". vs ending with "", as a big fan of strsplit, but hadn't given this much thought.

    – Chris

    19 hours ago


  • @Chris see my update with demonstration

    – ThomasIsCoding

    19 hours ago

  • Much appreciated addition and answers worth saving.

    – Chris

    15 hours ago



6

Use stringr

library(stringr)

str_split(my_string, ",")

[[1]]
[1] "apple"  "banana" "orange" ""  

6

  • 1

    This works (+1), but interestingly enough, still doesn't work for strsplit, as opposed to stringr::str_split.

    – thelatemail

    20 hours ago

  • 1

    which is expected ## Note that final empty strings are not produced

    – rawr

    20 hours ago

  • 3

    I think this answer can be simplified to just using stringr::str_split() since it handles leading and trailing strings, stringr::str_split(",apple,banana,orange,", pattern = ",")

    – Selk

    20 hours ago


  • 2

    This is a great solution and is likely to be useful for future viewers. Only reason reason it is not marked as the answer is due to the preference for a base-R solution

    – Selk

    20 hours ago

  • 1

    If simplicity is desired, stringr::str_split_1(my_string, ",") will return a character vector instead of of a list: [1] "apple" "banana" "orange" "".

    – Adriano Mello

    20 hours ago


5

Pasting another separator at the end should allow strsplit to function as intended.
Otherwise, you could fall back to using the scan function, which underpins the read.csv/table functions:

strsplit(paste0(string1, ","), ",")
##[[1]]
##[1] "apple"  "banana" "orange" ""

Generalisably taking into account regex replacement:

L <- list(string1, string2, string3, string4, string5)
mapply(
    function(x,s) strsplit(paste0(x, gsub("\\", "", s)), split=s),
    L,
    c(",", ",", ",", "\|", "x")
)

##[[1]]
##[1] "apple"  "banana" "orange" ""      
##
##[[2]]
##[1] "apple"  "banana" "orange" "pear"  
##
##[[3]]
##[1] ""       "apple"  "banana" "orange"
##
##[[4]]
##[1] ""       "apple"  "banana" "orange" ""      
##
##[[5]]
##[1] ""       "apple"  "banana" "orange" "" 

scan option:

scan(text=string1, sep=",", what="")
##Read 4 items
##[1] "apple"  "banana" "orange" ""

Generalising:

mapply(
    function(x,s) scan(text=x, sep=s, what=""),
    L,
    c(",", ",", ",", "|", "x")
)

2

  • I think scan is the cheapest base R workaround for this question, cheers!

    – ThomasIsCoding

    19 hours ago

  • Marking as answer as meets all criteria (base R implementation, outputs exactly as described in question). For future reference, the answer by ThomasIsCoding describes an alternative baseR solution thats also really nice. Anyone not requiring a baseR implementation should see GuedesBF answer for a simple solution using stringr

    – Selk

    4 hours ago


1

i used this

my_string <- "apple,banana,orange,"

# Now, i Append an extra character (here I use 'X') and then splitting
result <- strsplit(paste0(my_string, "X"), ",X")

result

Then for the use case

split_string <- function(s) {
  # Add a special character at the beginning and end if the string starts or ends with a comma
  if (startsWith(s, ",")) {
    s <- paste0("SPECIALCHAR", s)
  }
  if (endsWith(s, ",")) {
    s <- paste0(s, "SPECIALCHAR")
  }

  # Split the string by comma
  parts <- strsplit(s, ",", fixed = TRUE)[[1]]

  # Replace the special character with an empty string
  parts <- gsub("SPECIALCHAR", "", parts)

  return(parts)
}

# Test cases
string1 <- "apple,banana,orange,"
string2 <- "apple,banana,orange,pear"
string3 <- ",apple,banana,orange"

output1 <- split_string(string1)
output2 <- split_string(string2)
output3 <- split_string(string3)

output1 # Expected: "apple", "banana", "orange", ""
output2 # Expected: "apple", "banana", "orange", "pear"
output3 # Expected: "", "apple", "banana", "orange"

5

  • 2

    This doesn't work – it doesn't add a blank string at the end and it also doesn't split the original string.

    – thelatemail

    21 hours ago

  • 1

    Your original idea was on the right path though I think – just add another separator and then strsplitstrsplit(paste0(my_string, ","), ",") should work.

    – thelatemail

    20 hours ago

  • @thelatemail strsplit(paste0(my_string, ","), ",") is another neat solution, but its worth noting this won't generalise to regex / escaped values for split. Can solve all of my test cases but only if for '|' separator you use fixed=TRUE instead of trying to escape it with '\|'

    – Selk

    20 hours ago


  • @Selk – works for me I think – strsplit(paste(string4, "|"), split="\|")

    – thelatemail

    20 hours ago

  • @thelatemail yep, I meant if you were to generalise this into a function that would work on any separator i.e. (strsplit2(x, sep)), you'd have to add some logic to strip out double backslashes from your sep string before pasting it, if that makes sense. I agree that this, and your scan solution look like the most promising base-R solutions. If you're interested in putting together an answer that describes both approaches I think that would make a great official answer!

    – Selk

    20 hours ago




Leave a Reply

Your email address will not be published. Required fields are marked *