Palindrome Finder

Tutorial

This finder allows the user to input either plain text, or a .txt file. When the submit button is then clicked, it finds and shows the palindromes in the text, using advanced detection algorithms. Users can customize their search by selecting the palindrome type, length, and other parameters. These parameters, as well as the different algorithms, are explained below.

Palindromes

A palindrome is a sequence of characters (a word, a sentence, a DNA string, ...) which is read the same way from start to end, as it is read from end to start. A simple example of this would be something like racecar, or Madam, I'm Adam. There exist multiple types of palindromes, and this website currently supports five of them. Further explanation of these is given below.

Plain:

A plain palindrome is the most basic, intuitive version of a palindrome, where every character is considered. There even is a distinction between capital letters and their lowercase equivalent, so it is case-sensitive.
Examples:

racecar
kayak
^_^
palslap

Text:

Text palindromes are palindromes, where only letters are considered, and other characters like punctuation, spaces and newline characters are discarded. Furthermore, these palindromes are not case-sensitive. This makes it possible to look for interesting palindromes in bodies of text, like books, manuscripts and the Bible.
Examples:

Madam, I'm Adam
Eva, can I see bees in a cave?
A man, a plan, a canal, Panama
Never odd or even

Word:

Word palindromes are interesting; here, words are handled as a separate unit, instead of characters. These words are separated by spaces, or enters. This type of palindrome is not case-sensitive, and it ignores punctuation, just like text palindromes.
Examples:

Is it crazy? It is.
Fall leaves when leaves fall
Dogs run fast, fast run dogs
Man sees sun, sees man

DNA:

DNA palindromes is a palindrome type, which can often be hard to understand. First of all, only a specific selection of characters are allowed. Among these are A, T, C and G, as these are the standard DNA nucleotides. Furthermore, the characters URYKMSWBDHVN are allowed as well, these represent a nucleotide which couldnt have been read accurately, and it could be more than one. For example, an R in a DNA string, means that the nucleotide in that spot could have been either an G or an A. These characters 'match' with nothing though, so they do not occur in palindromes. Furthermore, this type is categorized as "antireflexive", which means no character matches with itself. More specifically, 'A' matches with 'T', and vice versa, and 'C' matches with 'G'.
Examples:

ATCGAT
ATAT
CGCGCG
ATATATCGATATAT

Punctuation:

Punctuation palindromes are very similar to text palindromes. The only difference is that the palindrome must consist of whole words. So in a text like "..wit broken on me, because I have..", the fragment "e, be" would be a valid text palindrome, but not a punctuation palindrome, because it 'stops' in the middle of a word.
Examples:

Step on no pets.
Was it a car or a cat I saw?
No lemon, no melon.
Madam in Eden, I'm Adam.

Maximal Palindromes

If we consider the palindrome "racecar", we see that this palindrome actually contains smaller palindromes like "e", "cec" and "aceca". While these intra-palindromes are technically valid, showing every single one would clutter the results and overwhelm the useful information. So these are not visualized in the output. This phenomenon, where a palindrome can not be extended any more, or in other words is not part of a bigger palindrome having the same center, is called a maximal palindrome. In the text Verstappen got a new racecar, which costs 12 million euro's, "racecar" is a maximal palindrome, while "aceca" is not.

Minimum length

In the finder, you are able to specify a minimum length. For text and punctuation palindromes, this does not include the punctuation, only the letters. Furthermore, for word palindromes, these lengths correspond to the amount of words in the palindrome, and not the amount of characters. For DNA and plain palindromes, it corresponds to the amount of characters, just as expected. Furthermore, the minimum length cannot be 0 or 1, as empty palindromes and palindromes consisting of one character are trivial and will flood the output. (an empty sequence or a sequence consisting of one character is a palindrome, because if it is flipped, the sequence doesn't change). If gaps and/or errors are specified, the minimum length is updated accordingly, because a palindrome consisting of only gaps and errors, is also considered trivial. For a maximal error amount of n, and a maximal gap size of m, the minimum length will be 2n + m + 1.

Gaps

A gap in a palindrome means that in the middle of a palindrome, a sequence of a set length is ignored. For example, examplegapelpmaxe is a plain palindrome with a gap of length 3. Also Home sweet dog fireplace sweet home is a word palindrome with a gap of length 2.

Errors

There are multiple types of errors.
Substitution errors in palindromes are mismatches of characters, relative to the center position. For example, revivor is a palindrome with one substitution error, because if it weren't for the 'o' not being an 'e' (or the first 'e' not being an 'o'), it would be a perfect palindrome. The approximate and quadratic algorithm only support these substitution errors.
Insertion errors are errors, where an insertion of a specific character somewhere in the palindrome, makes the palindrome perfect. For example, racecr is a palindrome with one insertion error, because if an 'a' were to be inserted between the final two characters, it would be a perfect palindrome. These errors are only supported by the approximate algorithm.
Deletion errors work almost the same as insertion errors, but in this case, a deletion of a certain character in the palindrome makes it perfect. For example, racecaar is a palindrome with one deletion error, because if the last 'a' were to be removed, it would be a perfect palindrome. Just like insertion errors, deletion errors are only supported by the approximate algorithm.

Algorithm types

On this website, three kinds of algorithms are used, with different properties. When it comes to the output, the linear and quadratic algorithms are identical, so unless you are interested in performance, you can randomly choose one of those two. The functionality of gaps and errors are also only supported by the quadratic and approximate algorithm. The difference between the algorithms is explained below.

Linear:

The linear algorithm goes over the input text once. This means that if your input has n characters, the algorithm will do n calculation steps. This algorithm should be intuitively faster than the quadratic algorithm, as the quadratic algorithm needs to do at most n² steps. However, in practice this is heavily dependent on factors such as the kind of input, and there are many cases where the quadratic algorithm is actually faster.

Quadratic:

The quadratic algorithm goes over the input text in quadratic time. This means that if your input has n characters, the algorithm will have to do n² calculation steps. This algorithm should be intuitively slower than the linear algorithm, as the linear algorithm needs to do only n steps. However, in practice this is heavily dependent on factors such as the kind of input, and there are many cases where the linear algorithm is actually slower.

Approximate:

The approximate algorithm is highly similar to the quadratic algorithm, but it supports insertion and deletion errors.

Difference:

The performance of the three algorithms relative to each other can strongly vary based on the input. For example, in normal text, like a fragment from a book, palindromes tend to be sparse and short. Here, the quadratic algorithm performs well. However, in highly repetitive input, like a text file with a bunch of a's, there is a palindrome on every center, and in the middle of the input, the palindromes consist of nearly the whole input. Here, the quadratic algorithm ends up doing more work internally to compute and extend them all. But again, this is only performance-related, and doesn't affect the output, so if you are not interested in the functionality under the hood, it is totally fine to just choose one of these two randomly. The approximate algorithm supports insertion and deletion errors, and has a more generalized definition of maximal palindromes, and therefore does not always output the same answer as the linear and quadratic algorithm.