Input Features Output Decision

Perceptron

How it works:

Inputs: Different pieces of information (like the checklist items)
Weights: How important each piece of information is
Sum It Up: Multiply each input by its weight and add them together
Make Decision: If the total is high enough, say "yes", otherwise "no"
Learn: When wrong, adjust the weights to get better next time

Speed:

A perceptron is like a simple decision-maker with a checklist. Imagine a bouncer at a party who decides if you can come in based on a few things: Are you on the guest list? Are you wearing nice shoes? Do you have an invitation? The bouncer gives each rule a different importance (weight), adds up the scores, and makes a final yes/no decision. When the bouncer makes mistakes, they learn by adjusting how important each rule is!

Restricted Boltzmann Machine (RBM)

How it works:

Visible Layer: The data we can see (like pixels in an image)
Hidden Layer: Secret patterns the machine discovers (like "pointy ears" or "whiskers")
Two-Way Learning: Information flows both directions, like a conversation
Restricted: Switches only talk between layers, not within their own layer
Uses: Learning patterns, recommender systems, filling in missing data

Speed:

An RBM is like a box full of light switches that can turn on or off. Some switches are on the front (visible units - things we can see) and some are hidden inside (hidden units - patterns we discover). The cool part? The switches talk to each other! If you flip the visible switches in a certain pattern (like showing it a picture of a cat), the hidden switches learn to recognize "cat-ness". Then you can flip the hidden switches and the visible ones will show you a new cat picture!

Autoencoder

How it works:

Input: The original information (like a full lecture)
Encoder: Squishes it down to the most important parts (taking notes)
Latent Space: The compressed, super-important information (your notes)
Decoder: Expands the notes back to full size (studying from notes)
Output: Reconstructed version that should match the input
Uses: Image compression, noise removal, anomaly detection

Speed:

An autoencoder is like a really clever note-taker in class. Instead of writing down everything the teacher says word-for-word, they write short notes with just the most important ideas (encoding). Later, when studying for the test, they can expand those short notes back into full explanations (decoding). If the notes are good, you can recreate almost the whole lecture! The autoencoder learns what's "important enough" to write down to recreate the original information.

Ising Model

How it works:

Spins: Each cell is a tiny magnet pointing up (white) or down (black)
Neighbors: Magnets talk to their neighbors and try to match them
Energy: System is "happy" (low energy) when neighbors match
Temperature: How random the magnets are (cold = organized, hot = chaotic)
Phase Transition: Watch order emerge from chaos as temperature changes!
Uses: Understanding magnetism, neural networks, social dynamics

Temperature: Speed:

The Ising Model is like a checkerboard where each square is a tiny magnet that wants to point either up or down. Here's the cool part: each magnet wants to match its neighbors - if your neighbor points up, you want to point up too! Temperature is like how much the magnets "wiggle around". When it's cold, all the magnets line up the same way (like everyone wearing the same team jersey). When it's hot, they point randomly (like a messy crowd). This shows how simple rules create complex group behavior!

Hopfield Network

How it works:

Fully Connected: Every neuron talks to every other neuron (like a group chat)
Store Memories: Patterns are saved by adjusting connection strengths
Pattern Recall: Show a damaged pattern, get the complete memory back
Energy Landscape: Memories are valleys; the network rolls into the nearest one
Robust: Works even with noisy or incomplete inputs
Uses: Pattern recognition, memory restoration, optimization problems

Noise Level:

A Hopfield Network is like a magic photo restoration machine. Imagine you have a damaged old photograph with parts missing or blurry. You show it to the network, and it "remembers" complete photos it saw before and fixes your damaged one! It's like when you see half a face and your brain fills in the rest. The network stores memories as patterns, and when you give it a partial or noisy pattern, it rolls downhill into the closest complete memory - like a ball rolling into a valley!

Transformer

How it works:

Next-token prediction: Given some text (tokens), it predicts a probability for the next token
Pick a token: Choose the most likely token (argmax) or sample using temperature/top-k
Repeat: Append that token and predict the next one again - this is how generation works
Under the hood: Self‑attention builds a context-aware representation to make that one prediction better
That's it: Transformers are trained to be excellent next‑token predictors

💡 Pro Tips:

Temperature controls randomness: Low (0.2) = focused/repetitive, High (2.0) = creative/chaotic
Top-k limits choices: Lower = more predictable, Higher = more variety
Try phrases like "i like pizza that", "the cat", "machine learning is", or "i think"
Click "Append" to add predictions and build longer sentences iteratively!

Your prompt:

0/200

Temperature: 1.0 Top‑k: 5

In practice, a Transformer takes the tokens you've typed and outputs a probability distribution over the next token. This demo uses a trigram/bigram language model that looks at the last 1-2 words to predict what comes next, with intelligent fallbacks for unseen phrases. Real transformers use self-attention to consider ALL previous tokens with different weights, making them vastly more powerful. The model has learned patterns from diverse example sentences about food, animals, machine learning, and everyday conversations. Try typing phrases like "i like pizza that" or "machine learning is" to see contextually relevant predictions!

Deep Perceptron (MLP)

How it works:

Input Layer: Takes in raw data (pixels, numbers, features)
Hidden Layers: Each layer learns increasingly complex patterns
Neurons: Like tiny decision-makers that add up all their inputs
Activation Functions: Add non-linearity so network can learn curves, not just straight lines
Backpropagation: When wrong, learns by adjusting all decisions backwards
Uses: Image recognition, spam detection, credit scoring, recommendation systems

Speed:

A Deep Perceptron (MLP) is like a tower of smart committees, each one making decisions based on what the committee below figured out! The first committee looks at raw information (like pixels in a photo). The second committee looks at patterns the first one found (like "edges"). The third committee spots bigger patterns (like "circles" or "corners"). Each committee learns what's important and passes it up. By the time you reach the top, the network can recognize really complex things like "this is a cat" or "this email is spam!" It's like playing telephone, but each person in line makes the message smarter instead of more confusing.

Normalizing Flow

How it works:

Base Distribution: Starts with simple randomness (like a ball of Play-Doh)
Transformation Layers: Each layer warps/twists the data in a reversible way
Invertible: Can run forwards (create data) or backwards (analyze data)
Exact Probabilities: Knows how likely any particular output is
Flow: Data "flows" through transformations like water through pipes
Uses: Generating realistic images/audio, density estimation, anomaly detection

Speed:

A Normalizing Flow is like a Play-Doh factory that can make ANY shape you want! You start with a simple ball of Play-Doh (easy to make). Then you push it through a series of special molds - twist here, stretch there, bend this way. Each mold transforms it step-by-step. The cool part? You can write down EXACTLY what each mold does, so you can reverse the whole process perfectly! If you want another copy, just start with a ball and push through the same molds. This is how AI can create realistic faces, voices, or artwork - it learns what "molds" (transformations) turn random noise into the real thing.

Variational Autoencoder (VAE)

How it works:

Encoder: Learns to describe inputs as a range (not one exact point)
Latent Space: A "map" where similar things are close together
Sampling: Picks a point in that range (adds controlled randomness)
Decoder: Turns that point back into a full image/data
Training: Learns to recreate inputs while keeping latent space organized
Uses: Generating new faces, drug discovery, music composition, image interpolation

Speed:

A VAE is like an artist who doesn't trace drawings exactly - they learn the STYLE! Imagine showing an artist 1000 cat photos. Instead of memorizing each cat, they learn "cats usually have pointy ears, whiskers, and round eyes - but every cat is slightly different." Now when you ask them to draw a new cat, they don't copy an old photo; they create a unique cat using what they learned about "cat-ness." The magic is they also learn WHERE in "cat space" each feature lives (fluffy vs. short-hair, big vs. small), so you can even say "draw me a cat that's halfway between these two!" This is how AI generates new faces, art, or music that look real but never existed before.

CNN Encoder-Decoder

How it works:

Encoder: Shrinks the image while capturing important patterns (like edges and shapes)
Bottleneck: The smallest, most compressed version with just the essential information
Decoder: Expands it back to full size, adding details back in
Skip Connections: Shortcuts that help remember fine details from the original
Uses: Removing backgrounds, enhancing photos, medical image analysis

Speed:

Imagine you're looking at a photo through a magnifying glass that first makes everything blurry and simple (like squinting your eyes), then gradually brings back all the details. The CNN Encoder-Decoder is like a smart camera that first simplifies an image by focusing on the most important shapes and patterns, then rebuilds it with all the details restored - like taking a puzzle apart and putting it back together, but now you understand every piece!

Mamba2 Architecture

How it works:

Replacing Transformers: Mamba2 is a breakthrough architecture that solves the main limitation of Transformers - their quadratic complexity with sequence length
Linear Complexity: While Transformers slow down dramatically with long sequences (computing attention between all token pairs), Mamba2 maintains constant speed regardless of length
Selective State Spaces: Uses a compact memory that intelligently decides what information to retain and what to forget, unlike Transformers that must attend to everything
Hardware Optimized: Designed from the ground up for modern GPU architectures, achieving 5-10x faster inference than Transformers
State-of-the-Art Performance: Matches or exceeds Transformer quality while handling 10x longer contexts (millions of tokens vs. hundreds of thousands)
The Future: Being rapidly adopted for long-document understanding, genomics, time-series forecasting, and next-generation language models
Why It Matters: Enables AI to process entire books, codebases, or conversations in a single pass - something Transformers struggle with

Speed:

Think of Transformers as a student who needs to compare every word in a book with every other word to understand it - this gets impossibly slow with long books! Mamba2 is like a speed-reader with a smart notebook: as it reads, it instantly decides "This is important, write it down" or "This is background info, skip it." The notebook stays small and organized, so even with massive texts, Mamba2 reads at lightning speed. This breakthrough is why Mamba2 is rapidly replacing Transformers in applications that need to understand really long sequences - from analyzing entire research papers to processing hours of conversation history. Selective State Spaces let Mamba2 focus on the most relevant information without getting bogged down, making it perfect for the next generation of AI models that need to handle vast amounts of data efficiently.

CUDA

How it works:

GPU Cores: Thousands of mini-processors working simultaneously
Thread Blocks: Groups of workers that share information quickly
Parallel Threads: Individual workers each doing one small task
Shared Memory: A fast whiteboard for each group to share notes
Global Memory: The big storage room everyone can access (but it's slower)
Uses: Training neural networks, graphics rendering, scientific simulations

Speed:

Imagine you have a big homework assignment with 1000 math problems. Your regular CPU is like having ONE really smart student who solves each problem one at a time - fast, but it takes a while. A GPU with CUDA is like having a classroom with THOUSANDS of students who each solve one problem at the same time! Even though each student might be a bit slower than the super-smart one, when they all work together, they finish the whole assignment way faster. That's why GPUs are perfect for training AI!

About

Main Developer:

Overseeing Professor:

Source Code:

JavaScript Required

Perceptron

Deep Perceptron (MLP)

Ising Model

Hopfield Network

Autoencoder

Restricted Boltzmann Machine

Variational Autoencoder

Normalizing Flow

CNN Encoder-Decoder

Transformer

Mamba2

CUDA Visualization

Neural Music Generator

Perceptron

How it works:

Further Reading

Restricted Boltzmann Machine (RBM)

How it works:

Further Reading

Autoencoder

How it works:

Further Reading

Ising Model

How it works:

Further Reading

Hopfield Network

How it works:

Further Reading

Transformer

How it works:

Further Reading

Deep Perceptron (MLP)

How it works:

Further Reading

Normalizing Flow

How it works:

Further Reading

Variational Autoencoder (VAE)

How it works:

Further Reading

CNN Encoder-Decoder

How it works:

Further Reading

Mamba2 Architecture

How it works:

Further Reading

CUDA

How it works:

Further Reading