An In-Depth Look at Regular Expressions Part 3: Finite State Machines

by | Sep 22, 2016

So in part 1, we talked about different ways to represent languages in general, and in part 2, we talked about what a regular language is and how to use regular expressions to describe a regular language. Now I want to introduce another way to visualize a regular language.

A finite state machine, or FSM for short, is a type of theoretical machine or automaton, which can be represented by a basic flowchart. The diagrams I’ll be using as examples (which were made with the help of this nifty tool), will look something like this:

cr-regex-fsm-7

So S0, S1, S2, S3, S4, and S5 are all states, and S3 and S5 are accept states. The idea is, starting at S0, each possible path that takes you to an accept state represents an element in the language.

Let’s try tracing through this FSM. Starting a S0, we can move to S1, giving us “y” . Then to S2, giving us “ye” . Finally we move to the accept state S3, giving us “yes” . If we repeat these steps for the other path, we now have a set of strings {“yes”, “no”} . Since FSMs and regular expressions are both representations of regular languages, we can also write a regular expression to represent this language, which would look like this yes|no .

Let’s take a look at the three basic operations we discussed in part 2 in the form of FSM diagrams:

1) Concatination: (ab)

cr-regex-fsm-3

2) Alteration: (a|b)

cr-regex-fsm-2

3) Kleene Star: a*

cr-regex-fsm-1

And just like regular expressions you can combine these basic operations to represent more complex regular languages.

Take a+  for example. Which we know from part 2 is shorthand for a(a*) . If we look at the example FSM for (ab)  and replace the b  with (a*) , we can see the move from state S to state A will be the same. And if we replace states A and F with our Kleene star FSM, we’ll get something like this:

cr-regex-fsm-4

There’s also a useful FSM concept which I’ve heard referred to as “lambda moves”. If you recall from part 1, λ (lambda) is sometimes used to represent an empty character (you might refer to the concept as “epsilon moves” if you use ε (epsilon) to represent empty characters). If we look at another regex shortcut we discussed in part 2, a? , we can use the FSM for alteration to create an FSM for this shortcut, which we know is the same as (a|)  or (a|λ) . Simply replace b  in the alteration example with a lambda move.

cr-regex-fsm-5

Of course you can also shorten this if you use multiple acceptance states.

cr-regex-fsm-6

Recent Posts