Knowledge is a web application that automatically transforms the digital footprint into a personal search engine. It fetches content you interact with from various platforms—GitHub, HackerNews, and Zotero—and organizes it into a navigable knowledge graph.
-
🤖 Automatic Aggregation: Daily, automated extraction of GitHub stars, HackerNews upvotes, and Zotero library.
-
🔍 Powerful Search: A built-in search engine to instantly find any item you've saved or interacted with.
-
🕸️ Knowledge Graph: Navigate bookmarks through a graph of automatically extracted topics and their connections.
My Personal Knowledge Base is available at raphaelsty.github.io/knowledge.
A GitHub Actions workflow runs twice a day to perform the following tasks:
- Extracts Content from specified accounts:
- GitHub Stars
- HackerNews Upvotes
- Zotero Records
- Processes and Stores Data in the
database/
directory:database.json
: Contains all the raw records.triples.json
: Stores the knowledge graph data (topics and relationships).retriever.pkl
: The serialized search engine model.
- Deploys Updates:
- The backend API is automatically updated and pushed to the Fly.io instance.
- The frontend on GitHub Pages is refreshed with the latest data.
The backend is built with FastAPI and deployed on Fly.io, which offers a free tier suitable for this project. The frontend is a static site hosted on GitHub Pages. The search engine is powered by multiple cherche lexical models and features a final pylate-rs model, which is compiled from Rust to WebAssembly (WASM) to run directly in the client's browser.
Follow these steps to deploy your own instance of Knowledge.
First, fork this repository to your own GitHub account and then clone it to your local machine.
The application requires API keys and credentials to function. These must be set as Repository secrets in your forked repository's settings (Settings
> Secrets and variables
> Actions
).
Secret | Service | Required | Description |
---|---|---|---|
FLY_API_TOKEN |
Fly.io | Yes | Allows the GitHub Action to deploy your application. See the Fly.io section for instructions. |
ZOTERO_API_KEY |
Zotero Settings | Optional | An API key to access your Zotero library. |
ZOTERO_LIBRARY_ID |
Zotero | Optional | The ID of the Zotero group library you want to index. |
HACKERNEWS_USERNAME |
Hacker News | Optional | HackerNews username to fetch upvoted posts. |
HACKERNEWS_PASSWORD |
Hacker News | Optional | HackerNews password. |
Next, edit the sources.yml
file at the root of the repository to specify which GitHub users' starred repositories you want to track.
github:
- "raphaelsty"
- "gbolmier"
- "MaxHalford"
- Install
flyctl
, the Fly.io command-line tool. Instructions can be found here. - Sign up and log in to Fly.io via the command line:
flyctl auth signup flyctl auth login
- Get API token and add it to your GitHub repository secrets as
FLY_API_TOKEN
:flyctl auth token
- Launch the app. Follow the Fly.io launch documentation. This will generate a
fly.toml
file. You won't need a database.
⚠️ Update API URLs After deploying, you must replace all instances ofhttps://knowledge.fly.dev
in thedocs/index.html
file with your own Fly.io app URL (e.g.,https://app_name.fly.dev
).
- Go to your forked repository's settings (
Settings
>Pages
). - Under
Build and deployment
, select the Source asDeploy from a branch
and choose themain
branch with the/docs
folder.
⚠️ Update CORS Origins After your GitHub Pages site is live, you must add its URL to theorigins
list in theapi/api.py
file to allow cross-origin requests.
origins = [
"https://your-github-username.github.io", # Add your GitHub Pages URL here
]
This project is designed to be affordable, but you are responsible for the costs incurred on Fly.io. Here is how to keep them in check:
⚠️ Bound Fly.io Concurrency To prevent costs from scaling unexpectedly, define connection limits in thefly.toml
file.
[services.concurrency]
hard_limit = 6
soft_limit = 3
type = "connections"
⚠️ Select a modest Fly.io VM A small virtual machine is sufficient. A shared-cpu-1x@1024MB is a good starting point.
To run the API on local machine for development, simply run the following command from the root of the repository:
make launch
The Zotero integration allows you to save academic papers, articles, and other documents, which will then be automatically indexed by your search engine.
-
Browser Extension: Use the Zotero Connector extension for your browser to easily save documents from the web.
-
Mobile App: The Zotero mobile app lets you add documents on the go. Any uploads will be indexed within a few hours.
My personal Knowledge Base is inspired by and extracts resources from the Knowledge Base of François-Paul Servant, namely Semanlink.
This project is licensed under the GNU General Public License v3.0.
Knowledge Copyright (C) 2023 Raphaël Sourty