April 19, 2024

This appears like a query a programmer may ask after one medicinal cigarette too many. The pc science equal of “what’s the sounds of 1 hand clapping?”. However it’s a query I’ve to resolve the reply to.

I’m including indexOf() and lastIndexOf() operations to the Calculate remodel of my knowledge wrangling (ETL) software program (Easy Data Transform). This can enable customers to search out the offset of 1 string inside one other, counting from the beginning or the top of the string. Straightforward Knowledge Rework is written in C++ and makes use of the Qt QString class for strings. There are indexOf() and lastIndexOf() strategies for QString, so I believed this may be a straightforward job to wrap that performance. Perhaps quarter-hour to program it, write a take a look at case and doc it.

Clearly it wasn’t that simple, in any other case I couldn’t be scripting this weblog submit.

To start with, what’s the index of “a” in “abc”? 0, clearly. QString( “abc” ).indexOf( “a” ) returns 0. Duh. Effectively solely in case you are a (non-Fortran) programmer. Ask a non-programmer (similar to my spouse) and they’re going to say: 1, clearly. It’s the first character. Duh. Excel FIND( “a”, “abc” ) returns 1.

Okay, most of my prospects, aren’t programmers. I can use 1 primarily based indexing.

However then issues get extra tough.

What’s the index of an empty string in “abc”? 1 possibly, utilizing 1-based indexing or possibly empty will not be a legitimate worth to move.

What’s the index of an empty string in an empty string? Hmm. I assume the empty string does comprise an empty string, however at what index? 1 possibly, utilizing 1-based indexing, besides there isn’t a primary place within the string. Once more, possibly empty will not be a legitimate worth to move.

I appeared on the Qt C++ QString, Javascript string and Excel FIND() perform for solutions. However they every give totally different solutions and a few of them aren’t even internally constant. It is a easy comparability of the primary index or final index of textual content v1 in textual content v2 in every (Excel doesn’t have an equal of lastIndexOf() that I’m conscious of):

Altering these to make the all of the legitimate outcomes 1-based and setting invalid outcomes to -1, for straightforward comparability:

So:

  • Javascript disagrees with C++ QString and Excel on whether or not the primary index of an empty string in an empty string is legitimate.
  • Javascript disagrees with C++ QString on whether or not the final index of an empty string in a non-empty string is the index of the final character or 1 after the final character.
  • C++ QString thinks the primary index of an empty string in an empty string is the primary character, however the final index of an empty string in an empty string is invalid.

It appears surprisingly tough to give you one thing intuitive and constant! I feel I’m in all probability going to return an error message if both or each values are empty. This appears to me to be the one unambiguous and constant method.

I might return a 0 for a non-match or when one or each values are empty, however I feel you will need to return totally different leads to these 2 totally different instances. Additionally, not discovered and invalid really feel qualitatively totally different to a calculated index to me, so shouldn’t be simply one other quantity. What do you assume?

*** Replace 14-Dec-2023 ***

I’ve been across the homes a bit extra following suggestions on this weblog, the Easy Data Transform forum and hacker news and this what I’ve determined:

IndexOf() v1 in v2:

v1 v2 IndexOf(v1,v2)
1
aba
aba 1
a a 1
a aba 1
x y
world hiya world 7

This is identical as Excel FIND() and differs from Javascript indexOf() (ignoring the distinction in 0 or 1 primarily based indexing) just for “”.indexOf(“”) which returns -1 in Javascript.

LastIndexOf() v1 in v2:

v1 v2 LastIndexOf(v1,v2)
1
aba
aba 4
a a 1
a aba 3
x y
world hiya world 7

This differs from Javascript lastIndexOf() (ignoring distinction in 0 or 1 primarily based indexing) just for “”.indexOf(“”) which returns -1 in Javascript.

Conceptually the index is the 1-based index of the primary (IndexOf) or final (LastIndexOf) place the place, if the V1 is faraway from the discovered place, it must be re-inserted to be able to revert to V2. Because of layer8 on Hacker Information for clarifying this.

Javascript and C++ QString return an integer and each use -1 as a placeholder worth. However Straightforward Knowledge Rework is returning a string (that may be interpreted as a quantity, relying on the remodel) so we aren’t sure to utilizing a numeric worth. So I’ve left it clean the place there isn’t a legitimate consequence.

Now I’ve spent sufficient time down this rabbit gap and have to get on with one thing else! For those who don’t prefer it you possibly can all the time add an If with Calculate or use a Javascript remodel to get the consequence you like.

*** Replace 15-Dec-2023 ***

Fairly a little bit of debate on this subject on Hacker News.