0:03

So the final step in our regular refreshing pattern matching algorithm is

to construct and then determine the thick finite automaton. So how do we go ahead

and do that? and this is an integral part of the algorithm. but we pretty much have

got all the pieces, but really what makes it intricate is that if, an illustration

of what a programming line has to do to when trying understand your program, what

your programming does. What we need to do is somehow understand what's in the

regular expression and then take that information and use it to build the

machine. Now, that's what parson, that's called parsoning. to try to figure out the

structure of your program or regular expression and then, do something with it.

And this is a simple example of that but useful as well. So the first thing that

needs clear what to do. So we're going to have one state per character as we talked

about before, so that's easy to set up. and then the match transition edges. if a

state contains a character in the alphabet, we just put in a match

transition to the next state. and actually, that's implicit in our

algorithm. so now what about other things? Well, if we have for any parenthesis we

find, we'll just put in an epsilon transition to the next state. so our

machines all have that. now closure is one that you know, has quite a bit of action.

so, for every star let's look at the one that is just a one character closure. So

we have a single character closure. So this is A star. and, and what we need is

epsilon transitions for the star that allow the machine to go and pick up well,

there has, there has to be one, an epsilon transition that goes out to the star to

cover the case, so we have zero matches. and then after zero, then we want to go

back to have as many matches as we want before taking the sorry, take the match

transition. We're going to be able to go back and match as many as we want before

going up to the next. So for star, we have to add three epsilon transitions. The one

that goes if you have a character in I and a star in I + one , you have to add these

edges going both ways and then an edge out to the next character for, to get out of

the star. And that works also if there's a closure involving parentheses. If the

character before the star is a parenthesis then we want to add the same kind of

mechanism from the parenthesis, go out and skip the whole thing to cover the zero

match case or go back and match as many times as we need to match and then

finally, go out. So there's three edges have to be added for each star and defined

and well defined what they are. and then or there's two epsilon transition edges

that we have to add. and that is to allow the machine to skip the first part of the

expression and do the second or to skip the, do the first part of the expression

and skp the second. so if we keep track of where the left parenthesis is when we end

the or operator when we get to the right parenthesis, we have all the information

that we need in order to be able to add those two edges. so those are the edges

that we have to put together to build the NFA. and the trick is keeping track of the

information of where the previous operators are particularly since

parentheses can be nested. but this is not that difficult to do because we have a

mechanism for doing that. how to, to, remember where the left parentheses are

and, and the or and that's to maintain a push down stack. and so the, the algorithm

is to push left parenthesis in or onto the stack. And then when we hit right

parenthesis, then we can pop of course the corresponding left parenthesis and maybe,

maybe the or and that gives us all the information that we need to add the

epsilon transition edges and so the stack takes care of the nesting of the

parenthesis. and when you think about it, this is a very natural mechanism to use

very similar to the early programs that we wrote using Dexter's algorithms for medic

expressions. so let's look at a demo and you'll see how that works. So we're going

to build the machine corresponding to this regular expression and it's the one that

we've been working with. And so what we do is just go from left to right through the

regular expression and , take the appropriate action, for each character. So

for left parenthesis. there's always an epsilon transition from, left parenthesis

to the next state. and then the other thing is, if you remember that last

parenthesis on the push down stack. So we put the index zero onto the stack. so now

we got another left parenthesis again, we put the epsilon transition on, and we push

that one onto the stack. so now, if we have an alphabet symbol what we need to do

is add the match transition to the next state. And then there's a couple ways to

this, but one easy way, in this case, is to just look for the star and if you see

that the next one is a star then you've got everything you need for the epsilon

transitions. So, in this case the next one is a star so we'll add those epsilon

transitions from the from two to three and from three to two.

And adding epsilon transitions, that's just, adding edges to the phi graph. then

with closure that just takes us to the next state and we took care of the other

two earlier. now we have an alphabet symbol, that's the B, so we just put in

the transition to the next state. Now we have an or. All we do for an or is put it

on the stack. now it's got for A, we got the match transition, for C, we got the

match transition. and now we have the first right parenthesis. so this right

parenthesis so one thing we, the first thing we do is an epsilon period just to

get it over to the next state. but then we go to the put down stack and we pop. and

if the thing at the top of the stack is an or that's one thing, piece information

that we need. the other piece of information we need is the position of the

corresponding left parenthesis and that'll be the next thing on the stack. So we add

the transition we pop the or off the stack and we pop the or on and off the stack and

that gives us the information that we need to put in the epsilon transition. We're at

stage eight. We put one from the or to eight, and then

we put one from the one to the or + one. There's the, that's what we need to do.

and, look for a star the same way as we did for one character. now in this case we

have a no star. So we just do the finite alphabet symbol and we add the match

transition. and now we have the right parenthesis and so we pop the

corresponding left parenthesis. and it's not an or. so in this case you know,

there's nothing to do except do the epsilon transition to the accept state. so

that's the process for each character in the regular expression, it's well defined

what we do. left parenthesis and or we put onto the stack characters we do the match

transitions and right parenthesis we do a pop and if it's an or, put a numeric

transitions. otherwise we do the look at to check for the star. and that's the demo

of the construction for nondeterministic finite state of phenomena. So, it's

actually a remarkably simple process. we figured out what to do with each character

in the regular expression and this is the second part of the regular expression

pattern-matching algorithm, which is constructing the NFA. And again it's a

remarkably little code. So it's a routine that builds the epsilon transition. this

is a part of the NFA. So it's got the regular expression . yes, a useless

variable to refer to. and it's going to build a new diagraph with one state, one

extra state, the accept state M+11 if the rate description has M characters. so, the

and then we maintain a stack which is just integers. and for every character in a

regular expression we do the processing that we just described. if it's a left

parenthesis or an or we just put it on to the stack. and that's it. if it's a right

parenthesis then we pop. If that pop is an or, then we put in the two edges to skip

the or. and otherwise, we look ahead and do the closure exactly as described. If

it's any one of the metal symbols, we just put in a next line transition to the next

edge. And then a remarkably little code to go ahead and construct the NFA from a

given regular expression. and so, the final step is the analysis that's going to

take time and space proportional to M. and that's immediate, because for every

character we do most of two stack operations and add at most three epsilon

transitions. And, this is a generous upper bound, time and space proportional to the

number of characters in the regular expression. So that's how we get the NFA