..... Reach out to us via Discord
This repo contains datasets of mathematics and theoretical computer science for the training of foundations models, and code to download such datasets, and translate such datasets to Pytorch Dataset format or Tensorflow Dataset format.
Math accumen is crucial for human being. The history of automatic math problem solving is as long as that of AI. One example is automated theorem proving which has spanned several decades.
Math/reasoning will become and stay as a crucial part of Gen AI, we cannot over-emphasize its significant role in achieving AGI (if ever achievable 😄 )
The datasets we will curate can cover, and not limited to the following topics from mainstream college mathematics and theoretical computer science (discrete math, etc):
- Calculus
- Linear Algebra
- Probability and Statistics
- Odinary Differential Equations
- Partial Differentila Equaitons
- Optimization (linear, nonlinear, convex)
- Numerical analysis
- .........
- Recursion
- Boolean Algebra
- Trees, Graphs
- Searching/sorting algorithms, BFS, DFS, etc.
- Time and space complexity