
: Re: Are there ways to automatically detect what book an e-text is? I know that these days, there are services (I think Google provides one) that can take a "digital fingerprint" of a song, and
I am not aware if it is such a tool/service/API, and generally publishers don't offer APIs IMHO mainly because copyright infringement sites or concurrent businesses might use them.
So you need to take a custom approach, using URL because most of the sites use GET method to do their queries and do some data-mining using scripts (wget/selenium etc).
You could do like this:
Search for exact text in google
ex search:
"Numerical boundaries take many forms but are always applied in finite games. Persons are selected for finite play."
Look for ISBN in resulted pages or for title and author using regular expressions or CSS selectors, XPATH etc.
Search using advanced query on amazon or other site: www.amazon.com/gp/search/ref=sr_adv_b/?search-alias=stripbooks&unfiltered=1&field-isbn=1476731713
notice &field-isbn=1476731713 same could be used for &field-author= or &field-title=
Use regular expressions to extract all the book data.
This would be my approach.
Free books android app tbrJar TBR JAR Read Free books online gutenberg
More posts by @Katie