Machine Learning Engineering Foundations by arun

W

https://www.youtube.com/watch?v=7xTGNNLPyMI

about 1 month ago

Deep Dive into LLMs like ChatGPT

This video provides a comprehensive overview of Large Language Model (LLM) AI technology, focusing on the systems that power ChatGPT and similar products. It covers the entire training stack, from data preparation to model development, and offers insights into understanding the 'psychology' of these models and how to use them effectively. The presenter, Andrej Karpathy, a founding member of OpenAI and former Sr. Director of AI at Tesla, aims to increase understanding of state-of-the-art AI and empower users to leverage these tools. The video details pretraining data, tokenization, neural network architecture and internals, inference processes, and the evolution from base models to fine-tuned versions using techniques like supervised fine-tuning and reinforcement learning. It also touches upon challenges such as hallucinations, the role of tokens in model thinking, and the concept of 'jagged intelligence'. The content is made freely available for educational and non-commercial training purposes.

#llm #chatgpt #ai #andrej karpathy

W

https://www.youtube.com/watch?v=zjkBMFhNj_g

about 1 month ago

[1hr Talk] Intro to Large Language Models

This video provides a comprehensive, 1-hour introduction to Large Language Models (LLMs), the technology behind systems like ChatGPT. It covers: * **What LLMs are:** Explaining their core concepts, inference, and training processes. * **How they work:** Including fine-tuning into assistants, and the concept of "LLM dreams" (hallucinations). * **Future directions:** Discussing scaling laws, tool use (browsers, calculators, etc.), multimodality, and the potential for LLMs to act as operating systems. * **Security challenges:** Addressing jailbreaks, prompt injection, and data poisoning. The presenter notes that LLMs should not be fully trusted, especially when relying solely on memory, but can be more reliable when using tools like browsing or retrieval. The content is based on a talk given at the AI Security Summit and is available for educational and non-commercial use.

#large language models #llm #chatgpt #ai

W

https://youtu.be/zduSFxRajkE

about 1 month ago

Let's Build the GPT Tokenizer

- The video explains the concept and implementation of a GPT tokenizer, a crucial component in Large Language Models (LLMs) that converts text into tokens and vice-versa. - It covers the Byte Pair Encoding (BPE) algorithm, the training process, and the functions of encoding (string to tokens) and decoding (tokens to string). - The content delves into string manipulation in Python, Unicode, and various byte encodings (ASCII, UTF-8, etc.). - It discusses issues and quirks in LLMs that can be traced back to tokenization and explores potential future directions, including the idea of eliminating tokenization. - Practical implementation is demonstrated using Python, with references to libraries like `tiktoken` and `sentencepiece`. - The video includes exercises and links to code repositories (minBPE) and supplementary resources for hands-on learning.

#gpt tokenizer #llm #byte pair encoding #bpe

W

https://www.youtube.com/watch?v=kCc8FmEb1nY

about 1 month ago

Let's build GPT: from scratch, in code, spelled out.

This video provides a comprehensive, code-driven explanation of how to build a Generative Pretrained Transformer (GPT), inspired by the "Attention is All You Need" paper and OpenAI's GPT models. It covers the entire process from data loading and tokenization to implementing self-attention mechanisms, multi-headed attention, feedforward layers, residual connections, and layer normalization. The tutorial uses PyTorch and demonstrates building a nanoGPT model, with links to a Google Colab notebook and GitHub repository for the code. It also touches upon the relationship to ChatGPT, pretraining vs. finetuning, and Reinforcement Learning from Human Feedback (RLHF). Suggested exercises are provided for viewers to deepen their understanding and experiment with the concepts.

#gpt #transformer #attention is all you need #nanogpt

W

https://youtu.be/t3YJ5hKiMQ0

about 1 month ago

Building makemore Part 5: Building a WaveNet

- This video, part 5 of the 'makemore' series, focuses on building a WaveNet architecture, a convolutional neural network inspired by DeepMind's 2016 paper. - The process involves deepening a 2-layer MLP with a tree-like structure and introduces concepts of `torch.nn` and the typical deep learning development workflow. - Key steps include fixing a learning rate plot, refactoring code into PyTorch layers, implementing the WaveNet architecture, and debugging issues like a `BatchNorm1d` bug. - The video also touches upon causal dilated convolutions, a more efficient implementation of the WaveNet architecture, and discusses the general development process for deep neural networks.

#wavenet #convolutional neural network #pytorch #torch.nn

W

https://youtu.be/q8SA3rM6ckI

about 1 month ago

Building makemore Part 4: Becoming a Backprop Ninja

This video, part of the 'makemore' series by Andrej Karpathy, focuses on manually backpropagating through a 2-layer MLP with BatchNorm, without relying on PyTorch's autograd. The goal is to build an intuitive understanding of how gradients flow through a compute graph at the tensor level. The video covers backpropagation through the cross-entropy loss, linear layers, tanh activation, and BatchNorm. It is structured as an exercise, encouraging viewers to work through the problems alongside the video. Supplementary materials, including code and related papers, are provided. The video is approximately 2 hours long and has garnered significant views and engagement.

#backpropagation #neural networks #mlp #batchnorm

W

https://youtu.be/P6sfmUTpUmc

about 1 month ago

Building makemore Part 3: Activations & Gradients, BatchNorm

This video delves into the internal workings of multi-layer perceptrons (MLPs), focusing on the statistics of forward pass activations and backward pass gradients. It highlights common pitfalls related to improper scaling and introduces diagnostic tools for assessing the health of deep neural networks. The video explains the fragility of training deep neural nets and introduces Batch Normalization as a key innovation for stabilizing this process. It also touches upon residual connections and the Adam optimizer as topics for future discussion. The content includes links to the 'makemore' GitHub repository, a Jupyter notebook, a Colab notebook, the instructor's website and Twitter, and a Discord channel. Several research papers related to initialization techniques (Kaiming init), Batch Normalization, and MLP language models are also referenced. The video concludes with exercises for viewers to practice concepts like zero initialization and folding batch normalization parameters into linear layers.

#deep learning #neural networks #mlp #activations

W

https://youtu.be/TCH_1BHY58I

about 1 month ago

Building makemore Part 2: MLP

This video implements a multilayer perceptron (MLP) character-level language model and introduces fundamental machine learning concepts. Key topics covered include model training, learning rate tuning, hyperparameter optimization, evaluation, data splitting (train/dev/test), and understanding underfitting and overfitting. The video also provides links to the project's GitHub repository, a Jupyter notebook, a Colab notebook, and the original MLP language model paper by Bengio et al. (2003). Exercises are included for viewers to practice hyperparameter tuning, initialization strategies, and implementing ideas from the referenced paper.

#mlp #language model #machine learning basics #hyperparameter tuning

W

https://youtu.be/PaCmpygFfXo

about 1 month ago

The spelled-out intro to language modeling: building makemore

This video introduces language modeling by building a "makemore" character-level language model. It covers the fundamentals of `torch.Tensor` and its efficient use in neural networks. The tutorial details the framework of language modeling, including training, sampling, and evaluating loss, specifically the negative log likelihood for classification. The content progresses from a simple bigram model to a neural network approach, explaining concepts like one-hot encodings, softmax, and vectorized loss, with practical exercises provided for viewers to implement.

#language modeling #makemore #torch.tensor #neural networks

W

https://youtu.be/VMj-3S1tku0

about 1 month ago

The spelled-out intro to neural networks and backpropagation: building micrograd

This video provides a detailed, step-by-step explanation of neural networks and backpropagation, assuming only basic Python knowledge and high school calculus. It covers the core concepts of backpropagation, the implementation of a micrograd library, and the training of a neural network from scratch. The content is structured with chapters detailing the process from a simple function derivative to building a multi-layer perceptron and performing gradient descent optimization. It also includes a comparison with PyTorch and a walkthrough of the micrograd code on GitHub.

#neural networks #backpropagation #micrograd #machine learning

W

https://www.youtube.com/watch?v=iv-5mZ_9CPY

about 1 month ago

But how do AI images and videos actually work? | Guest video by Welch Labs

- This video explains the underlying mechanisms of AI image and video generation, focusing on Diffusion Models and CLIP. - It breaks down the process into key concepts: CLIP for understanding text prompts, the concept of a shared embedding space, and Diffusion Models (DDPM and DDIM) for generating images from noise. - The video also touches upon techniques like conditioning, guidance, and negative prompts, which allow for more control over the generated output. - It references specific AI models and papers, including Dall-E 2, Stable Diffusion, Midjourney, and Veo, providing links to further resources and code implementations.

#ai image generation #ai video generation #diffusion models #clip

W

https://www.youtube.com/watch?v=9-Jl0dxWQs8

about 1 month ago

How might LLMs store facts | Deep Learning Chapter 7

This video explores how Large Language Models (LLMs) might store factual information, focusing on the role of multilayer perceptrons (MLPs) within transformer architectures. It delves into the concept of 'superposition,' where multiple pieces of information can be encoded within the same set of parameters, akin to fitting many perpendicular vectors in high-dimensional spaces. The video suggests this mechanism could be crucial for LLMs' ability to recall and utilize facts. It also touches upon the computational aspects, such as counting parameters, and references external resources for further learning in mechanistic interpretability and AI alignment.

#llms #facts storage #deep learning #transformers

W

https://www.youtube.com/watch?v=eMlx5fFNoYc

about 1 month ago

Attention in transformers, step-by-step | Deep Learning Chapter 6

This video provides a step-by-step explanation of the attention mechanism in transformers and Large Language Models (LLMs). It covers the concepts of self-attention, multiple heads, and cross-attention. The video is part of a deep learning series and is funded by viewer support. It also provides links to additional resources for further learning, including building a GPT from scratch, understanding language models, and interpreting transformer circuits. Timestamps are provided for easy navigation through the content, which includes recaps on embeddings, motivating examples, the attention pattern, masking, context size, values, parameter counting, cross-attention, multiple heads, and the output matrix.

#attention mechanism #transformers #llms #deep learning

W

https://www.youtube.com/watch?v=wjZofJX0v4M

about 1 month ago

Transformers, the tech behind LLMs | Deep Learning Chapter 5

This video explains the technology behind Large Language Models (LLMs), focusing on Transformers. It breaks down how LLMs work and visualizes the data flow within them. The content is funded by viewer support. The video covers topics such as prediction and sampling, the internal workings of a transformer, the premise of deep learning, word embeddings, embeddings beyond words, unembedding, and softmax with temperature. It also provides links to external resources for further learning, including building a GPT from scratch, conceptual understanding of language models, interpreting large networks, and the history of language models. Timestamps are provided for easy navigation within the video.

#llms #transformers #deep learning #neural networks

W

https://www.youtube.com/watch?v=LPZh9BOjkQs

about 1 month ago

Large Language Models explained briefly - YouTube

This YouTube video provides a brief explanation of Large Language Models (LLMs), chatbots, pretraining, and transformers. It was created for an exhibit at the Computer History Museum and is funded by viewer support. The video's animations are made using a custom Python library called manim. The creator, Grant Sanderson (3Blue1Brown), also has other related videos on neural networks and transformers.

#large language models #llms #chatbots #transformers

W

https://www.youtube.com/watch?v=tIeHLnjs5U8

about 1 month ago

Backpropagation calculus | Deep Learning Chapter 4 - YouTube

This YouTube video, part of the "Deep Learning" series by 3Blue1Brown, delves into the calculus behind backpropagation. It aims to provide a more formal representation of the intuition presented in previous episodes, bridging the gap between conceptual understanding and practical implementation in code and other texts. The video covers the chain rule in neural networks, the computation of relevant derivatives, the meaning of these derivatives, and their sensitivity to weights and biases. It also addresses scenarios with layers containing additional neurons. The content is presented with mathematical notation and visual explanations to clarify the complex concepts of backpropagation.

#backpropagation #deep learning #calculus #neural networks

W

https://www.youtube.com/watch?v=Ilg3gGewQ5U

about 1 month ago

Backpropagation, Intuitively | Deep Learning Chapter 3 - YouTube

This YouTube video, part of the "Deep Learning" series by 3Blue1Brown, provides an intuitive explanation of backpropagation, a core algorithm in neural network learning. It aims to demystify what happens within a neural network as it learns, focusing on the conceptual understanding rather than just the mathematical formulas. The video includes an introduction, a recap of previous concepts, an intuitive walkthrough with an example, a discussion on stochastic gradient descent, and concluding remarks. It also highlights the connection to partial derivatives and suggests further resources for a deeper dive into the mathematical representation of backpropagation.

#backpropagation #neural networks #deep learning #machine learning

W

https://www.youtube.com/watch?v=IHZwWFHWa-w

about 1 month ago

Gradient Descent, How Neural Networks Learn | Deep Learning Chapter 2

This video explains the concept of gradient descent and how neural networks learn. It covers cost functions, training data, and the process of gradient descent, including analyzing the network and learning more. The video also features an interview with Lisha Li. It is part of a larger series on deep learning and is recommended for those interested in animating math and understanding neural networks.

#gradient descent #neural networks #deep learning #machine learning

W

https://www.youtube.com/watch?v=aircAruvnKk

about 1 month ago

But what is a neural network? | Deep learning chapter 1

This video, the first chapter of a deep learning series, explains the fundamental concepts of neural networks. It covers: * **Neurons:** The basic building blocks of neural networks. * **Layers:** How neurons are organized into layers (input, hidden, output). * **The Math:** The underlying mathematical principles, including weights and biases. * **Why Layers?** The advantages of using layered structures for processing information. * **Example:** An illustration using edge detection to demonstrate how neural networks work. The video emphasizes active learning and recommends resources like Michael Nielsen's free book on neural networks and Chris Olah's blog for further study. It also mentions the use of the 'manim' Python library for animations and provides links to related videos and playlists.

#neural network #deep learning #neurons #layers

Machine Learning Engineering Foundations

Chat with Collection

Chat with this Collection

Deep Dive into LLMs like ChatGPT

[1hr Talk] Intro to Large Language Models

Let's Build the GPT Tokenizer

Let's build GPT: from scratch, in code, spelled out.

Building makemore Part 5: Building a WaveNet

Building makemore Part 4: Becoming a Backprop Ninja

Building makemore Part 3: Activations & Gradients, BatchNorm

Building makemore Part 2: MLP

The spelled-out intro to language modeling: building makemore

The spelled-out intro to neural networks and backpropagation: building micrograd

But how do AI images and videos actually work? | Guest video by Welch Labs

How might LLMs store facts | Deep Learning Chapter 7

Attention in transformers, step-by-step | Deep Learning Chapter 6

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Large Language Models explained briefly - YouTube

Backpropagation calculus | Deep Learning Chapter 4 - YouTube

Backpropagation, Intuitively | Deep Learning Chapter 3 - YouTube

Gradient Descent, How Neural Networks Learn | Deep Learning Chapter 2

But what is a neural network? | Deep learning chapter 1