: I am looking for a software or a way to list, extract, count, in short, analyze a book which is in epub or pdf format As someone who is interested in learning languages, I want to find

0 Reactions

As someone who is interested in learning languages, I want to find a way to know the each word and their repetitions inside a book that I am going to read.

Free books android app tbrJar TBR JAR Read Free books online gutenberg

Load Full (1)

: What can I do about Sigil breaking EPUB3 files Sigil is an EPUB2 editor, but that doesn't prevent people from editing EPUB3 files, which end up with broken OPF when you save them. Sigil development

@Lorraine

Posted in: #Epub #Sigil

0 Reactions

Load Full 2 Comments

: How can I use pdftk to make changes to my PDF ebooks? There are answers by asalamon74 (here), and by anthon (here, here and here), that use pdftk. I don't think I have pdftk installed on

@Lorraine

Posted in: #Conversion #Pdf

0 Reactions

Load Full 1 Comments

: Kindle cover image I was wondering about the book cover image used for KDP. Amazon documentation only mentions the minimum required resolution of the pictures. So, I assume: I should upload

@Lorraine

Posted in: #Kindle

0 Reactions

Load Full 2 Comments

:

@Lorraine

0 Reactions

Load Full 0 Comments

1 Comments

Sorted by latest first Latest Oldest Best

@Martha

0 Reactions

Here is one way to find the number of words in an epub file sorted by their frequency, with the words used the most at the top of the list.

This is done on a Mac laptop and will also work on Unix hosts.

The overview of the process:

Install Calibre
Use the ./ebook-convert command in Calibre to convert the epub file to text
Transform the entire text file to lowercase (so "Word" and "word" match)
Convert punctuation to whitespace (so "period." and "period " match)
Convert all whitespace to a new line. This puts each word on its own line.
Exclude any blank lines from the list
Sort the list of words alphabetically
Pipe (send) that list of words through uniq -c You now have a count of how often each word appears.
Sort the result in numerical order. If you use the sort command with the -r argument, the most frequent words are at the top.

Here's an example of steps (2) through (9). The head command lists the top ten words in the final output.

$ ./ebook-convert ./book.epub ./book.txt
$ cat ./book.txt | tr '[:upper:]' '[:lower:]' | tr "“" " " | tr "”" " " | tr "," " " | tr "." " " | tr " " "n" | grep -v ^$ | sort | uniq -c | sort -gr | head
5303 the
1960 and
1934 of
1910 to
1874 a
1168 i
1067 you
844 in
812 that
703 it
$

The result is pretty boring. The word 'the" appears 5303 times, while the word 'it' appears 703 times.

I suspect in most books the most common words are the tiny conjunctions, articles, prepositions and pronouns. Perhaps on something that is not a novel this might be more interesting.

Good luck!

Free books android app tbrJar TBR JAR Read Free books online gutenberg

Load Full (0)

: I am looking for a software or a way to list, extract, count, in short, analyze a book which is in epub or pdf format As someone who is interested in learning languages, I want to find

More posts by @Lorraine

: What can I do about Sigil breaking EPUB3 files Sigil is an EPUB2 editor, but that doesn't prevent people from editing EPUB3 files, which end up with broken OPF when you save them. Sigil development

: How can I use pdftk to make changes to my PDF ebooks? There are answers by asalamon74 (here), and by anthon (here, here and here), that use pdftk. I don't think I have pdftk installed on

: Kindle cover image I was wondering about the book cover image used for KDP. Amazon documentation only mentions the minimum required resolution of the pictures. So, I assume: I should upload

:

Login to post a comment!

1 Comments

Trending!

Back to top