4/11/05

Being Concordanced Tzvee

Last week I was mesmerized by the statistical features on Amazon for my book on Yerushalmi Berakhot. Well this week the plot thickens.

Amazon now gives us this neat concordance [an alphabetized list of the most frequently occurring words in a book, excluding common words such as "of" and "it."] of the 100 most frequently used words in my book. The size mapping makes it cool:
The font size of a word is proportional to the number of times it occurs in the book. Hover your mouse over a word to see how many times it occurs, or click on a word to see a list of book excerpts containing that word.
More frequent words appear larger; less frequent appear smaller. The clickable concordance looks like this.

abba, accord, another, art, bar, basis, berakhot, between, blessed, blessing, bread, came, case, chapter, come, commandments, concerning, day, does, eat, eating, eighteen, even, first, follows, food, go, god, good, hands, himself, hiyya, house, israel, joshua, judah, king, law, let, light, lord, may, meal, mishnah, must, name, need, new, night, obligation, people, person, place, practice, pray, prayer, ps, rab, rabbis, recitation, recite, recited, reciting, refers, regarding, rule, sabbath, said, samuel, says, scripture, second, see, shall, shema, should, simeon, study, take, taught, teaching, tefillin, thou, three, time, torah, two, unit, universe, upon, verse, view, water, wine, words, written, yerushalmi, yohanan, yose, zeira
Wow. And wait, there's more. The TEXT STATS tell us important information about this book such as how many words per dollar you get if you buy it, what are it's readability and complexity quotients and other fun statistics. It turns out that the book is readable and not that complex considering it's a translation of a tractate of the Talmud. What a relief.

And now we can reveal the solution to the mystery of why Amazon produced those SIPs, the statistically improbable phrases that I wrote about last week. They are using the SIPs to automatically generate a list of related books -- with the same SIPs -- that you may want to buy.

This feature helps you find books on similar topics to the book that you are currently viewing. We determine whether two books discuss similar topics by looking at the Statistically Improbable Phrases, or "SIPs," that occur in both books. The more SIPs the two books share, the more closely related they are.

No human being at Amazon read my book. But their computer came up with a pretty good list of related books just by using the SIPs.