Text Conversion


Links to Hackage

LTCP Package Diosiris Corpus Package

Text Conversion Packages

We offer two Haskell packages for converting Latin and Greek XML files into clean .txt files. The packages strip away metadata and markup, so only the text content remains. Then this text can be used in the palindrome package, to find palindromes in them.

LTCP Package

The LTCP package converts the Latin Perseus corpus (in XML format), to its textual content in text format. The corpus does not constantly use the same format, and because of that, not everything may parse properly.

Diosiris Corpus Package

The Diosiris Corpus package handles the Greek Diosiris Corpus (in XML format), and either extracts the Beta Code, or converts this content to an English text format. Additionally, it comes with the functionality to convert the palindrome start and end indices to the corresponding indices in the Beta Code, so it's easier to find them.