This option determines how TEXTfromPDF stores the output text that it extracts from PDF files.
Warning! The topic of encodings can get very complex.
Read on only if you are very brave of heart... I'll try to make this as painless as possible.
This may come as a shock to some of you but in the mind of your computer, there is really no such thing as "text". The only thing your computer knows about is numbers. The reason you are lulled into this belief that your computer knows about text is that sometimes your computer manipulates numbers in such a way that they behave like text. In order for your computer to do this, there must be a set of rules for translating between numbers and text. For example, we could decide that 1 means "a", that 2 means "b", and so forth; or we could decide that 100 means "a" and 200 means "b". It really doesn't matter what rules we decide upon, as long as we are clear on what rules we're using. It's all entirely arbitrary, rather like one of those secret codes you probably made up when you were a kid.
An encoding is simply a set of rules for performing this kind of translation. An encoding mediates between a sequence of numbers, on the one hand, and the visually representable textual characters on the other. So you can see right away that encodings are not something to be afraid of; they're something to be grateful for. Without encodings, your computer wouldn't be able to do text at all.
Still with me? Great, then let's move one. What does this have to do with the text files that TEXTfromPDF creates?
A file isn't a string of text; it is just a sequence of numbers, and it doesn't contain any information about the meaning of those numbers, except by external convention. For instance, let's pretend there's a "Harvey" encoding. Well then, if you save data to a file as a sequence of numbers representing text in the Harvey encoding, then when you read the file again later you had better know in some other way that this sequence of numbers represents text in the Harvey encoding, and any other programs that intend to read this file had better know it too, or it won't be possible to display its data properly as text.
TEXTfromPDF can save the text data from a PDF file using one of three encodings: ASCII7, Latin1, and UTF-8. (There are dozens of encodings that are in common use. TEXTfromPDF provides three of the most common encodings.)
ASCII7
In the beginning there was ASCII, a convention for expressing characters as 7-bit integers (0-127). This limits the representable "alphabet" to 128 characters (fewer, actually, since some of the numbers are reserved for invisible "control characters"). ASCII encodes the Roman alphabet, numerals, and some basic punctuation and arithmetic symbols.
Almost all systems understand ASCII, and many later character sets are based on it. However, ASCII does not provide any non-English letters, so it's impossible to encode even French or German text in ASCII, much less Japanese or Arabic.
Latin1
Also known as ISO-8859-1, Latin1 extends ASCII with accented letters (é), special letters (æ), and a few symbols (÷). It provides a near-complete alphabet for French, Spanish, German, and most other Western European languages. Many systems understand Latin-1 and can display all the characters.
UTF-8
Unicode is an effort to standardize one encoding that would permit expression of every character in every language. UTF-8 is the most common native Unicode encoding. Unlike ASCII or Latin-1, it was designed to work with Unicode data, and as a result UTF-8 can encode any and all Unicode characters.
UTF-8 is the default encoding used in TEXTfromPDF to allow it to work with PDFs containing any language.
Additional Information
ASCII7
ASCII is a 7 bit code, the first 32 characters are "control characters" and are non-printing, the mapping of the remaining 96 characters (in US-ASCII) is:
with the first being a space, and the last being another control character.
Latin1
The low 128 code positions as the same as ASCII7, the next 32 are reserved for control characters, and the last 96 characters are:
This discussion of encodings has borrowed liberally from various online sources.