TDParseKit

About

TDParseKit is a Mac OS X Framework written by Todd Ditchendorf in Objective-C 2.0 and released under the MIT Open Source License. The framework is an Objective-C port of the tools described in "Building Parsers with Java" by Steven John Metsker. Some changes have been made to the designs in the book to match common Cocoa/Objective-C design patterns and conventions. However, the changes are relatively superficial, and the book is the best documentation available for this framework.

Xcode Project

The Xcode project containing this framework consists of 4 targets:

  1. Framework : the TDParseKit Framework.
  2. Tests : a UnitTest Bundle containing many unit tests (actually, more correctly, interaction tests) for the framework as well as some example classes that serve as real-world usages of the framework.
  3. DemoApp : A simple Cocoa demo app that gives a visual presentation of the results of tokenizing text using the TDTokenizer class.
  4. DebugApp : A simple Cocoa app that exists only to run arbitrary test code thru GDB with breakpoints for debugging (I was not able to do that with the UnitTest bundle.).

TDParseKit Framework

Classes in the TDParseKit Framework offer 2 basic services of general use to Cocoa developers:

  1. Tokenization via a tokenizer class
  2. Parsing via a high-level parser-building toolkit

Tokenization

The API for tokenization is provided by the TDTokenizer class. Cocoa developers will be familiar with the NSScanner class provided by the Foundation Framework which provides a similar service. However, the TDTokenizer class is much simpler, yet more configurable, flexible, and powerful.

Example usage:

NSString *s = @"\"It's 123 blast-off!\", she said, // watch out!\n"
              @"and <= 3.5 'ticks' later /* wince */, it's blast-off!";
TDTokenizer *t = [TDTokenizer tokenizerWithString:s];

TDToken *eof = [TDToken EOFToken];
TDToken *tok = nil;

while ((tok = [t nextToken]) != eof) {
    NSLog(@" (%@)", tok.stringValue);
}

outputs:

 ("It's 123 blast-off!")
 (,)
 (she)
 (said)
 (,)
 (and)
 (<=)
 (3.5)
 ('ticks')
 (later)
 (,)
 (it's)
 (blast-off)
 (!)

Each token produced is an object of class TDToken. TDTokens have a tokenType (Word, Symbol, Num, QuotedString, etc.) and both a stringValue and a floatValue.

As you can see from the output, TDTokenzier is configured by default to handle several common parsing tasks:

All of those features are configurable. TDTokenizer may be configured to:


Parsing

TDParseKit also includes a collection of token parser subclasses (of the abstract TDParser class) including collection parsers such as TDAlternation, TDSequence, and TDRepetition as well as terminal parsers including TDWord, TDNum, TDSymbol, TDQuotedString, etc. Also included are parser subclasses which work in individual chars such as TDChar, TDDigit, and TDSpecificChar. These char parsers are useful for things like RegEx parsing. Generally speaking though, the token parsers will be more useful and interesting.

The parser classes represent a Composite pattern. Programs can build a composite parser, in Objective-C (rather than a separate language like with lex&yacc), from a collection of terminal parsers composed into alternations, sequences, and repetitions to represent an infinite number of languages.

Parsers built from TDParseKit are non-deterministic, recursive descent parsers, which basically means they trade some performance for ease of user programming and simplicity of implementation.

Here is an example of how one might build a parser for a simple voice-search command language (note: TDParseKit does not include any kind of speech recognition technology). The language consists of:

search google for? <search-term>
...

	[self parseString:@"search google 'iphone'"];
...
	
- (void)parseString:(NSString *)s {
	TDSequence *parser = [TDSequence sequence];

	[parser add:[[TDLiteral literalWithString:@"search"] discard]];
	[parser add:[[TDLiteral literalWithString:@"google"] discard]];

	TDAlternation *optionalFor = [TDAlternation alternation];
	[optionalFor add:[TDEmpty empty]];
	[optionalFor add:[TDLiteral literalWithString:@"for"]];

	[parser add:[optionalFor discard]];

	TDParser *searchTerm = [TDQuotedString quotedString];
	[searchTerm setAssembler:self selector:@selector(workOnSearchTermAssembly:)];
	[parser add:searchTerm];

	TDAssembly *result = [parser bestMatchFor:[TDTokenAssembly assmeblyWithString:s]];
	
	NSLog(@" %@", result);

	// output:
	//  ['iphone']search/google/'iphone'^
}

...

- (void)workOnSearchTermAssembly:(TDAssembly *)a {
	TDToken *t = [a pop]; // a QuotedString token with a stringValue of 'iphone'
	[self doGoogleSearchForTerm:t.stringValue];
}