I’ve begun to establish a reasonably sound design pattern for the lexical analyzer. Of course this isn’t intended to be an ideal solution to the general case of writing a tokenizer for any language, it does not support any kind of short hand for describing token structure for example. But it isn’t overly complex and at this stage supports some of the common tokens seen in C, C++ or C#.
As I’ve been working on making the lexical analyzer (hereafter called “tokenizer”) more complete and simpler to understand, it’s begun to really dawn on me why immutability is so very important. I’ve begun to realize that by eliminating the traditional concept of assignment we are left only with function invocation.
Although the initial version of a lexical analyzer represents the input file as a List, it also exposes tokens in a crude “one at a time” manner, burdening the token consumer with the responsibility of passing the previous token into the tokenizer function when requesting the next token.
In the previous post I presented a simple lexical analyzer created purely as a means for me to get deeper into working with the F# language, such projects are a valuable means of subjecting oneself to realistic pressures so that one is compelled to deliver some kind of working solution.
The term Lexical Analysis refers to the conversion of a raw character by character stream into a stream of what are termed tokens. A token is a simple abstraction which strives to decouple the lexical structure of something from its logical meaning.