Regex → Automata Converter

Regular Expression

(a|b)*abb a*b+ (a|b)+ ab?c (0|1)*00 a(bc)*d (a|b|c)*

Options

Show ε-moves

Physics sim

Show labels

Curved edges

Simulate Input

States

State	Type	Transitions
No automaton built yet

Theory Reference

REGEX · ε-NFA · NFA · DFA · FORMAL LANGUAGE THEORY

Core Concepts

🔤

Regular Expressions (Regex)

A regular expression is a compact notation for describing sets of strings (formal languages). Every regex denotes exactly one regular language.

Core operations:
• a·b — Concatenation: string "ab"
• a|b — Union: "a" or "b"
• a* — Kleene star: zero or more "a"s

From these three primitives every regular language is expressible. Extensions like a+ (= aa*) and a? (= a|ε) are syntactic sugar.

🤖

Finite Automata (FA)

A finite automaton is a mathematical model of computation with a finite set of states. It reads an input string and either accepts or rejects it.

Formally: M = (Q, Σ, δ, q₀, F)
• Q — finite set of states
• Σ — input alphabet
• δ — transition function
• q₀ — start state
• F ⊆ Q — accept states

〰️

ε-NFA (Non-deterministic)

An ε-NFA allows:
1. Multiple transitions on the same symbol
2. ε-transitions (spontaneous moves without reading input)

The transition function becomes δ: Q × (Σ ∪ {ε}) → 2^Q (maps to a set of states). This makes Thompson's Construction elegant — each regex operator maps to a simple NFA fragment with ε-transitions gluing them together.

✅

DFA (Deterministic)

A DFA has exactly one transition per state per symbol — no ε-moves, no ambiguity. The transition function is δ: Q × Σ → Q.

DFAs are easier to implement and run in O(n) time in the input length. Despite being more restricted, DFAs and NFAs accept exactly the same class of languages — the regular languages (Kleene's theorem).

🔗

ε-Closure

The ε-closure of a state q is the set of all states reachable from q by following zero or more ε-transitions — without consuming any input.

Formally: ε-CLOSURE(q) = {q} ∪ {p | q →ε* p}

ε-closure is the core operation used in both NFA simulation and subset construction. It "expands" a set of active states to include all spontaneously reachable states.

⚖️

Kleene's Theorem

A language L is regular if and only if it is accepted by some finite automaton.

The three directions of proof:
1. Regex → NFA: Thompson's Construction
2. NFA → DFA: Subset (Powerset) Construction
3. DFA → Regex: State elimination / Arden's lemma

This tool demonstrates directions 1 and 2.

Thompson's Construction Algorithm

          INPUT: A regular expression in postfix form

          OUTPUT: An ε-NFA with exactly one start state and one accept state

          KEY PROPERTY: Each NFA fragment has exactly one accept state with no outgoing transitions

Base case — symbol a: Create two states s, a. Add transition s →ᵃ a. Fragment = (s, a).

Union — r|s: New start q, new accept f. Add ε-edges q→r.start, q→s.start, r.accept→f, s.accept→f.

Concatenation — r·s: Merge r.accept into s.start (or add ε-edge). Fragment = (r.start, s.accept).

Kleene Star — r*: New start q, new accept f. Add ε-edges q→r.start, q→f, r.accept→r.start (loop), r.accept→f.

Result: The NFA for the full expression is the fragment left on the stack, with guaranteed unique start and accept states.

Subset (Powerset) Construction

          Converts ε-NFA → DFA. Each DFA state corresponds to a set of NFA states. A DFA state is accepting if it contains any NFA accepting state. Worst case: 2^n DFA states for n NFA states.
        

Start: DFA start state = ε-CLOSURE({q₀})

For each unmarked DFA state S and each symbol a ∈ Σ: Compute MOVE(S,a) = ε-CLOSURE(⋃_{q∈S} δ(q,a))

Add transition: S →ᵃ MOVE(S,a). If MOVE(S,a) is new, add it as an unmarked DFA state.

Mark S as done. Repeat until no unmarked states remain.

Accept states: Any DFA state S where S ∩ F ≠ ∅ (F = NFA accept states).

Regex Operator Precedence & Associativity

Operator	Symbol	Precedence	Associativity	Description
Kleene Star	`*`	Highest (3)	Left	Zero or more repetitions
Plus	`+`	3	Left	One or more (= aa*)
Optional	`?`	3	Left	Zero or one (= a\|ε)
Concatenation	`·` (implicit)	2	Left	Sequence of two expressions
Union	`\|`	Lowest (1)	Left	Either left or right
Grouping	`( )`	N/A	N/A	Overrides precedence

Regular Language Properties

Closure Properties

Regular languages are closed under:
• Union: L₁ ∪ L₂
• Intersection: L₁ ∩ L₂
• Complement: Σ* \ L
• Concatenation: L₁·L₂
• Kleene star: L*
• Reversal: L^R
• Homomorphism: h(L)

Pumping Lemma

For any regular language L, there exists a pumping length p such that any string w ∈ L with |w| ≥ p can be split w = xyz where:
• |xy| ≤ p
• |y| ≥ 1
• ∀i ≥ 0: xy^i z ∈ L

Used to prove a language is not regular by contradiction.

Myhill-Nerode Theorem

A language L is regular iff the number of Myhill-Nerode equivalence classes is finite.

Two strings x, y are equivalent (x ≡_L y) if for all z: xz ∈ L ↔ yz ∈ L.

The minimum DFA for L has exactly one state per equivalence class — this is the theoretical foundation for DFA minimization.

Chomsky Hierarchy

Regular languages sit at the bottom of the Chomsky hierarchy:

Type 3: Regular (FA, regex)
Type 2: Context-Free (PDA, CFG)
Type 1: Context-Sensitive (LBA)
Type 0: Recursively Enumerable (TM)

Each class strictly contains the class below it.

◎

Enter a regular expression and click Convert
to visualize the automaton with physics simulation

Ready — enter a regex to begin |

States: — Transitions: — Σ: —