The mission of Shepherd is to help readers discover books in new and unique ways.
Our topic system is unique and a huge part of how we drive book discovery. We use it to pull readers toward your book list, you, and your book from around the website.
Is my book used to determine the topics of my book list?
No, your book has nothing to do with the topics we assign to your book list. We only use the 5 books you pick. Why is your book not included?
Because the book list is about the 5 books you picked, and then we use that as an opportunity to introduce you and your book. Your profile and your book are an advertisement, although a very targeted and persuasive one (how this approach helps you sell books).
How does the topic system work for my book list?
I will use a real-world example so you can see this in action.
I will use Steven Pinker's list on the best books on rationality and why it matters.
Step 1 - Find the topics for each of the 5 recommended books.
We start by analyzing each of the 5 recommended books on the list and assign it 5 Wikipedia topics. To do this, we use a variety of data about each book and analyze it using artificial intelligence and something called Natural Language Processing (how NLP works). We augment this with data from the Library of Congress if that is available.
In Steven Pinker's list, the first book is Rational Choice in an Uncertain World, and our system detected these as the 5 strongest Wikipedia topics:
- Cognitive bias
- Probability theory
For his second book pick, The Constitution of Knowledge, our system detected these as the 5 strongest Wikipedia topics:
- Fake news
- cancel culture
Once we have those for each of the five books, we move to step 2.
Step 2 - Tabulate the topics for the 5 books and find trends for the overall book recommendation list.
We combine the results for all 5 books to find the trends of the overall book list.
For Steven Pinker's list, the strongest topics are:
- Fake news
The top topic we detect that also exists as a bookshelf is used as the parent page of the book list.
For example, Steven's list looks like this:
If the top topic doesn't have a bookshelf yet, we go down the list until we find one that does (and when we add a bookshelf for a more accurate topic, we update it to that newly created one).
Step 3 - We use these topics to pull readers toward your book list in various ways.
We use each book you recommend as a hook to pull people toward your book list so that they can meet you and your book (and, hopefully, buy it).
For example, in Steven Pinker's list, the first book is Rational Choice in an Uncertain World, and our system detected the 5 strongest Wikipedia topics as:
- Cognitive bias
- Probability theory
We deploy Steven's recommendation for that book to any of those 5 topics with a bookshelf and pull people back toward his book list. As of August 2022, we have 2,700+ bookshelf pages, and I add more weekly.
Out of that list of 5 topics, the only one that is a bookshelf is rationality. When a reader is browsing that bookshelf, we show that book along with Steven Pinker's list to entice them to visit it.
Here is what that looks like on the bookshelf page:
We do that for every one of the 5 books you recommend.
As we grow and add more bookshelf pages, that means that each of the 5 books you recommend could show up on as many as 5 bookshelves each. And each one pulls readers toward your book list so they can meet you and your book.
And we do the same thing for your book list.
For Steven Pinker's list, the 5 topics we use for the book list are:
- Fake news
This topic system also powers our search feature.
Our search feature is different in that it is about finding a place to start book browsing. So if a user puts in a Wikipedia topic, favorite book, or favorite author, it connects them to bookshelves and book lists they will be interested in.
Here is a search for rationality and what it returns (Steven Pinker's list is shown on the right and related bookshelves on the left):
Here is a search for a favorite author and what it returns:
Here is a search for a favorite book and what it returns (you can see that Steven's list is shown on the top of that section along with related bookshelves):
This topic system also powers the recommendation section at the bottom of book lists.
What do you think?
I know this is complex, and I would love to get questions to further improve this explanation. My email is firstname.lastname@example.org, and please let me know what I need to explain better or any questions it evokes.
Here is a FAQ for the curious :)
Do the topics change over time?
Yes, we are constantly pulling in new data about books and using that to improve accuracy. Over the long term, we plan to build our own NLP model to further improve the system (right now, we piggyback off of Wikifier).
We update everything nightly with anything we learned in the last 24 hours. And, once a week, I deeply review upcoming topics, bookshelves, and related pieces.
Does this system work better for fiction or nonfiction?
Currently, non-fiction does work better, given that topics are easier to detect. Fiction is harder but it still works well. Because we deploy the books and book lists over 5 topics, you always get great coverage.
Why is this system automated?
As of September 2022, we have 31,000+ books in our system, and that is after just 16 months of activity. There is no way a human can do something like this, especially with this level of accuracy.
Do you have human curation?
Yes, we have a system on the backend that allows me to do human curation. I am thinking about opening this up to crowdsourcing one day, but that is a huge and expensive project to build.
Why do you use Wikipedia topics?
I didn't want to have to recreate and manage our own list of topics. So I decided it would be smart to piggyback off of Wikipedia, given how amazing they are. I plan to use the machine learning component of Wikipedia to do some really cool stuff (I can't wait to show you!).
How do genre and age data fit into this?
We just paid a good chunk of money to license book metadata in Spring 2022 (including genre). We will integrate genre into our system toward Q1 2023 (roadmap here). Genres and age focus will work a lot like topics, and we will bring that into the recommendation system later this year. I want to use both to make really great book recommendations for readers and get more qualified readers to your book list.
What future features will this power?
I am working on launching a customized book recommendation newsletter in early 2023, and it will use this system along with the genre and age data. My goal is that readers can say I love "science fiction books" and "World War 2" and "dragons," and our topic/genre system will help deliver them great book recommendations every few weeks. This is just one of many features that will continue to help promote authors and delight readers.
How can I speed up Shepherd's development?
I launched a membership program for founding members at the urging of authors and readers. 100% of the money goes toward developing new features and my goal of hiring a full-time developer in 2023 (currently, we only have one part-time developer).