>>16607by 3D i meant it's like
t1,[,,,,,,],[,,,,,,];t2,[,,,,,,],[,,,,,,];t3,[,,,,,,],[,,,,,,]
rather than t1,[,,,,,,],[,,,,,,]
each table is an inverted pointer index to a string encoded dictionary, but they're weighted to be relational to each value to predict what comes next in a sequence. it's just hyper-compressed SQL with weights basically, apache Lucene.
you could run OCR on 100TB of libgen books, 30TB of patents and 30TB of research papers, scrape *.html on waybackmachine and you'd have something much more powerful than the common crawl scrape / gutenberg collection every single one of the american corporations use for their models.