Regular Expression
(a|b)*abb
a*b+
(a|b)+
ab?c
(0|1)*00
a(bc)*d
(a|b|c)*
Simulate Input
States
| State | Type | Transitions |
|---|---|---|
| No automaton built yet | ||
Theory Reference
REGEX · ε-NFA · NFA · DFA · FORMAL LANGUAGE THEORY
Core Concepts
Regular Expressions (Regex)
A regular expression is a compact notation for describing sets of strings (formal languages). Every regex denotes exactly one regular language.
Core operations:
•
•
•
From these three primitives every regular language is expressible. Extensions like
Core operations:
•
a·b — Concatenation: string "ab"•
a|b — Union: "a" or "b"•
a* — Kleene star: zero or more "a"sFrom these three primitives every regular language is expressible. Extensions like
a+ (= aa*) and a? (= a|ε) are syntactic sugar.
Finite Automata (FA)
A finite automaton is a mathematical model of computation with a finite set of states. It reads an input string and either accepts or rejects it.
Formally:
•
•
•
•
•
Formally:
M = (Q, Σ, δ, q₀, F)•
Q — finite set of states•
Σ — input alphabet•
δ — transition function•
q₀ — start state•
F ⊆ Q — accept states
ε-NFA (Non-deterministic)
An ε-NFA allows:
1. Multiple transitions on the same symbol
2.
The transition function becomes
1. Multiple transitions on the same symbol
2.
ε-transitions (spontaneous moves without reading input)The transition function becomes
δ: Q × (Σ ∪ {ε}) → 2^Q (maps to a set of states). This makes Thompson's Construction elegant — each regex operator maps to a simple NFA fragment with ε-transitions gluing them together.
DFA (Deterministic)
A DFA has exactly one transition per state per symbol — no ε-moves, no ambiguity. The transition function is
DFAs are easier to implement and run in
δ: Q × Σ → Q.DFAs are easier to implement and run in
O(n) time in the input length. Despite being more restricted, DFAs and NFAs accept exactly the same class of languages — the regular languages (Kleene's theorem).
ε-Closure
The ε-closure of a state
Formally:
ε-closure is the core operation used in both NFA simulation and subset construction. It "expands" a set of active states to include all spontaneously reachable states.
q is the set of all states reachable from q by following zero or more ε-transitions — without consuming any input.Formally:
ε-CLOSURE(q) = {q} ∪ {p | q →ε* p}ε-closure is the core operation used in both NFA simulation and subset construction. It "expands" a set of active states to include all spontaneously reachable states.
Kleene's Theorem
A language
The three directions of proof:
1. Regex → NFA: Thompson's Construction
2. NFA → DFA: Subset (Powerset) Construction
3. DFA → Regex: State elimination / Arden's lemma
This tool demonstrates directions 1 and 2.
L is regular if and only if it is accepted by some finite automaton.The three directions of proof:
1. Regex → NFA: Thompson's Construction
2. NFA → DFA: Subset (Powerset) Construction
3. DFA → Regex: State elimination / Arden's lemma
This tool demonstrates directions 1 and 2.
Thompson's Construction Algorithm
INPUT: A regular expression in postfix form
OUTPUT: An ε-NFA with exactly one start state and one accept state
KEY PROPERTY: Each NFA fragment has exactly one accept state with no outgoing transitions
OUTPUT: An ε-NFA with exactly one start state and one accept state
KEY PROPERTY: Each NFA fragment has exactly one accept state with no outgoing transitions
1
Base case — symbol
a: Create two states s, a. Add transition s →ᵃ a. Fragment = (s, a).2
Union —
r|s: New start q, new accept f. Add ε-edges q→r.start, q→s.start, r.accept→f, s.accept→f.3
Concatenation —
r·s: Merge r.accept into s.start (or add ε-edge). Fragment = (r.start, s.accept).4
Kleene Star —
r*: New start q, new accept f. Add ε-edges q→r.start, q→f, r.accept→r.start (loop), r.accept→f.5
Result: The NFA for the full expression is the fragment left on the stack, with guaranteed unique start and accept states.
Subset (Powerset) Construction
Converts ε-NFA → DFA. Each DFA state corresponds to a set of NFA states. A DFA state is accepting if it contains any NFA accepting state. Worst case: 2^n DFA states for n NFA states.
1
Start: DFA start state =
ε-CLOSURE({q₀})2
For each unmarked DFA state S and each symbol a ∈ Σ: Compute
MOVE(S,a) = ε-CLOSURE(⋃_{q∈S} δ(q,a))3
Add transition:
S →ᵃ MOVE(S,a). If MOVE(S,a) is new, add it as an unmarked DFA state.4
Mark S as done. Repeat until no unmarked states remain.
5
Accept states: Any DFA state S where S ∩ F ≠ ∅ (F = NFA accept states).
Regex Operator Precedence & Associativity
| Operator | Symbol | Precedence | Associativity | Description |
|---|---|---|---|---|
| Kleene Star | * | Highest (3) | Left | Zero or more repetitions |
| Plus | + | 3 | Left | One or more (= aa*) |
| Optional | ? | 3 | Left | Zero or one (= a|ε) |
| Concatenation | · (implicit) | 2 | Left | Sequence of two expressions |
| Union | | | Lowest (1) | Left | Either left or right |
| Grouping | ( ) | N/A | N/A | Overrides precedence |
Regular Language Properties
Closure Properties
Regular languages are closed under:
• Union: L₁ ∪ L₂
• Intersection: L₁ ∩ L₂
• Complement: Σ* \ L
• Concatenation: L₁·L₂
• Kleene star: L*
• Reversal: L^R
• Homomorphism: h(L)
• Union: L₁ ∪ L₂
• Intersection: L₁ ∩ L₂
• Complement: Σ* \ L
• Concatenation: L₁·L₂
• Kleene star: L*
• Reversal: L^R
• Homomorphism: h(L)
Pumping Lemma
For any regular language L, there exists a pumping length
•
•
•
Used to prove a language is not regular by contradiction.
p such that any string w ∈ L with |w| ≥ p can be split w = xyz where:•
|xy| ≤ p•
|y| ≥ 1•
∀i ≥ 0: xy^i z ∈ LUsed to prove a language is not regular by contradiction.
Myhill-Nerode Theorem
A language L is regular iff the number of Myhill-Nerode equivalence classes is finite.
Two strings x, y are equivalent (
The minimum DFA for L has exactly one state per equivalence class — this is the theoretical foundation for DFA minimization.
Two strings x, y are equivalent (
x ≡_L y) if for all z: xz ∈ L ↔ yz ∈ L.The minimum DFA for L has exactly one state per equivalence class — this is the theoretical foundation for DFA minimization.
Chomsky Hierarchy
Regular languages sit at the bottom of the Chomsky hierarchy:
Type 3: Regular (FA, regex)
Type 2: Context-Free (PDA, CFG)
Type 1: Context-Sensitive (LBA)
Type 0: Recursively Enumerable (TM)
Each class strictly contains the class below it.
Type 3: Regular (FA, regex)
Type 2: Context-Free (PDA, CFG)
Type 1: Context-Sensitive (LBA)
Type 0: Recursively Enumerable (TM)
Each class strictly contains the class below it.
Enter a regular expression and click Convert
to visualize the automaton with physics simulation
to visualize the automaton with physics simulation