Tutorial


Palindrome finder takes either plain text or a .txt file as input. It finds and shows palindromes in the text, using advanced algorithms for finding palindromes. You can customize the search by selecting a type of palindrome to be found, its minimum length, and other parameters. This tutorial describes the different algorithms and parameters.

Palindromes

A palindrome is a sequence of characters (a word, a sentence, a DNA string, ...) that reads the same forwards and backwards. Examples of palindromes are racecar and Madam, I'm Adam.. The former is a palindrome that is exactly the same when read forwards and backwards, the latter is a text palindrome in which we ignore symbols like commas and spaces. We currently support five different types of palindromes.

Plain:

A plain palindrome is the basic variant that takes every character is taken into account: not just letters, but also characters like spaces and commas. Furthermore capital letters are considered different from their corresponding lowercase letters.
Examples:

  • racecar
  • kayak
  • ^_^
  • palslap

Text:

A text palindrome is a palindrome in which we only look at letters, ignoring characters like punctuation, spaces and newlines. Furthermore, a text palindrome is not case-sensitive.
Examples:

  • Madam, I'm Adam.
  • Eva, can I see bees in a cave?
  • A man, a plan, a canal, Panama
  • Never odd or even

Punctuation:

A punctuation palindrome is a text palindrome that is surrounded by punctuation symbols. So a punctuation palindrome consists of complete words. For example, in a text like "..wit broken on me, because I have..", the fragment "e, be" would be a text palindrome, but not a punctuation palindrome, because it ends in the middle of a word.
Examples:

  • Step on no pets.
  • Was it a car or a cat I saw?
  • No lemon, no melon.
  • Madam in Eden, I'm Adam.

Word:

A word palindrome takes words instead of characters as the basic unit. These words are separated by spaces, commas, newlines, or other punctuation symbols. This type of palindrome is not case-sensitive, and ignores punctuation, just like a text palindrome.
Examples:

  • Is it crazy? It is.
  • Fall leaves when leaves fall
  • Dogs run fast, fast run dogs
  • Man sees sun, sees man

DNA:

A DNA palindrome is a particular type of palindromes found in DNA. A DNA palindrome consists of the characters used for the standard DNA nucleotides A, T, C and G. The characters URYKMSWBDHVN may also occur in a DNA string. These characters represent nucleotides that cannot be exactly determined and represent one of a subset of these nucleotides. For example, an R in a DNA string represents either a G or an A nucleotide in that position. Matching is special in DNA: 'A' matches with 'T', and vice versa, and 'C' matches with 'G', and vice versa. The other characters do not match with other characters or themselves, so they do not occur in palindromes.
Examples:

  • ATCGAT
  • ATAT
  • CGCGCG
  • ATATATCGATATAT


Maximal Palindromes

The palindrome "racecar" contains quite a few shorter palindromes such as "e", "cec" and "aceca". Since these subpalindromes can be reconstructed from the palindrome in which they appear we do not show them in the output. A palindrome that cannot be extended on both sides is called a maximal palindrome. In the text Verstappen got a new racecar, which costs 12 million euro's, "racecar" is a maximal palindrome, while "aceca" is not.


Minimum length

The minimum length setting specifies the minimum length of the maximal palindromes we return. For text and punctuation palindromes, punctuation symbols are not counted. For word palindromes the length is the number of words in the palindrome. For DNA and plain palindromes the minimum length is the number of characters. The minimum length has to be at least 2 because empty palindromes and palindromes consisting of a single character are trivial and will flood the output. The minimum length takes gaps and/or errors into account, so for a maximal number of errors n, and a maximal gap size m, the minimum length is at least 2n + m + 1.


Gaps

In a palindrome with a gap of size m we ignore the middle m characters of a palindrome. For example, examplegapelpmaxe is a plain palindrome with a gap of length 3, and Home sweet dog fireplace sweet home is a word palindrome with a gap of length 2.

Errors

Sometimes we want to find palindromes that contain errors. Errors come in multiple variants.
A substitution error denotes a pair of characters that don't match around the center of a palindrome. For example, revivor is a palindrome with one substitution error: substituting 'e' for 'o' (or the other way around) would lead to a palindrome. Substitution errors are supported by the approximate and quadratic algorithms.
An insertion error is resolved if inserting a particular character in a string leads to a palindrome. For example, racecr is a palindrome with one insertion error: insert an 'a' between the two final characters. Insertion errors are supported by the approximate algorithm.
A deletion error is the complement of an insertion error, and is resolved by deleting a particular character in a string, leading to a palindrome. For example, racecaar is a palindrome with one deletion error: delete the last 'a'. Deletion errors are supported by the approximate algorithm.


Algorithm types

We use three kinds of algorithms to find palindromes, with different properties. The linear and quadratic algorithms produce the same output; they only differ in their asymptotic complexity. Gaps and errors are only supported by the quadratic and approximate algorithms.

Linear:

The linear algorithm goes over the input text once. This means that if your input has n characters, the algorithm performs a number of steps linear in n.

Quadratic:

The quadratic algorithm might take a quadratic number of steps when analysing the input. This means that if the input has n characters, the algorithm might take n² steps. The name of the algorithm seems to say that this algorithm is slower than the linear algorithm. However, in practice performance heavily depends on factors such as the number and length of palindromes in the input, and there are many cases in which the linear algorithm is slower than the quadratic algorithm.

Approximate:

The approximate algorithm is a version of the quadratic algorithm that supports insertion and deletion errors.

Analysis:

The performance of the three algorithms strongly depends on the form of the input. For example, in most books palindromes are sparse and short. The quadratic algorithm performs well for such cases. But in a text file with a lot of repetition such as a text file with many a's, there is a palindrome around every center, and in the middle of the input the palindromes cover nearly the entire input. In this case, the quadratic algorithm has to extend all palindromes at each step, which becomes expensive when the input is long. The approximate algorithm supports insertion and deletion errors, and uses a generalized definition of maximal palindromes, and does not always return the same output as the linear and quadratic algorithm.