xLLM notes

These are notes I can refer to as I work my way through another AI Certification for my own AI projects and interests. xLLM is gaining interest from Fortune 100 companies as a faster solution that is better suited for specific use cases that I frequently work with as a solopreneur, entrepreneur, mentor, and occasional consultant.

Auto-tuning is also known as Self-Tuning. It collects hyperparameters chosen by the user and creates a default hyperparameter set based on the user’s choices. It provides customized hyperparameters that provide different results to different users with the same prompt. This process is easier when it returns a relevancy score to each result.

Contextual Tokens refers to when two words (data, science) are in the same paragraph and not next to each other.

Cosine Distance … criticism when used for comparing embeddings.

Dot Product … criticism when used for comparing embeddings.

Evaluation:

Reconstruct taxonomy attached to the corpus:
- For each web page, assign a category and compare it to the real category embedded in the corpus.
- Wolfram, Wikipedia, Corporate Corpus have very similar structures with taxonomy and related items that can be retrieved while crawling.
Use Evaluation metric as the loss function in the underlying gradient descent algorithm (a deep neural network). Even though current loss functions are poor proxies to model quaity – use them instead of the evaluation model because we need to update the atomic changes (weight update or neuron activation) billions of times during training. The workaround is to start with approximate evaluation metric and refine it over time until it converges to our desired metric. The result will be an adaptive loss function and prevents us from getting stuck in a local minimum.

Hyperparameters can be Local or Global. Hyperparameters in xLLM are based on explainable AI.

Knowledge Graphs are the bottom layer in xLLMs and retrieved from the corpus while browsing. If nothing is found or the quality is poor, we import one from an external source (augmented knowledge graph). They can also be created from scratch using synonyms, glossaries, books and indexes. This bring long-range context (that is missing from LLM implementations).

Fine-tuning hyperparameters on a sub-LLM locally is fast. Fine-tuning hyperparameters across all sub-LLMs is slow.

Local, Secure, Enterprise implementations provide the best value as of 2024. Easy fine-tuning, low latency and explainable parameters are very important features while avoiding hallucinations. Open source xLLM solves the previous costly and risky problems of previous LLM solutions.

LoRA refers to Low-Rank Adaptation and is a standard LLM.

LLM refers to Large Language Models.

LLM Router refers to a manager in a system of many sub-LLMs called a ‘Mixture of Experts’ . The LLM Router is the top layer above the sub-LLMs and routes the user to the correct sub-LLM that is relevant to the prompt. Explicit (ask user to choose a sub-LLM), Transparent (automated) or Semi-Transparent (user asks for irrelevant sub-LLM, so the router chooses a relevant sub-LLM instead).

Mixture of Experts refers to an entire system that includes many sub-LLMs that are managed by an LLM Router.

Multi-Tokens refer to longer tokens made with many tokens (data-science). Meta also uses multi-tokens.

RAG refers to Retrieval augmented generation and is a natural language processing (NLP) technique that combines the strengths of both retrieval and generative-based artificial intelligence (AI) models.

Search, Clustering & Predictions is what most corporate clients now ask for, and xLLMs do a better job than earlier era search engine solutions give. xLLMs excel as a solution for search features, code generation, clustering and predictive analytics based on text.

Simple LLMs refer to Large Language Models that deal with specialized content, applications like corporate corpus. Easier fine-tuning, reduced risk of hallucinations and faster training are benefits of simple LLMs.

Smart Crawling used to retrieve embedded structure.

sub-LLM refers to a LLM focused on one top category.

Variable-length Embeddings increase the speed to retrieve most frequent embeddings stored in a cache in backend tables with contextual tokens. Nested hash tables (key-value database) are used with xLLMs while vector and graph databases are the most popular for storing embeddings with LLMs. Nested hash tables store the value as the hash itself and are efficient to handle sparsity.

X-Embeddings are stored as sparse nested hashes.

xLLM refers to Extreme LLM, it’s the open source architecture of small, specialized sub-LLMs with each one focused on a top category. If 2000 of them are bundled together, the entire human knowledge is covered. The entire system (mixture of experts) is managed with an LLM router.

REFERENCES

Granville, V., & Granville, V. (2024, June 3). New Trends in LLM: Overview with Focus on xLLM – Machine Learning Techniques. Machine Learning Techniques – Machine Learning, Artificial Intelligence, Experimental Math. https://mltechniques.com/2024/06/03/new-trends-in-llm-overview-with-focus-on-xllm/

VincentGranville. (n.d.). GitHub – VincentGranville/Large-Language-Models: Large language Models (LLM). GitHub. https://github.com/VincentGranville/Large-Language-Models

XLLM.pptx. (n.d.). Google Docs. https://docs.google.com/presentation/d/15jlAz0pOmybTxAzywzXklBcL1DLvQy50/edit#slide=id.p1

xLLM notes

REFERENCES

Share this: