The Building Blocks of a F# Markdown Parser

by Tomas Petricek, author of F# Deep Dives

Editor’s Note: The F# Deep Dives MEAP will be 50% on December 18, 2012. Use code dotd1218au at checkout.

Markdown is a simple text-based markup language that can be used to produce clean HTML and is used by sites such as StackOverflow or GitHub. You can build your very own an efficient parser that can be extended with custom features and that allows you to process the document after parsing. In this article, based on chapter 11 of F# Deep Dives, author Tomas Petricek describes the key elements of such a project, in particular, the representation of a Markdown document.

I’ve been writing a blog for a number of years now. Since the beginning, I wanted the website to use clean and simple HTML code. Initially, I just wrote articles in HTML by hand, but then I became a big fan of Markdown, a simple text-based markup language that can be used to produce clean HTML and is used by sites such as StackOverflow or GitHub. However, none of the existing Markdown implementations supported what I wanted: I needed an efficient parser that can be extended with custom features and that allows me to process the document after parsing. That’s why I decided to write my own parser in F#.

In this article, I’ll describe the key elements of the project. In particular, I’ll discuss the key subject from a functional perspective: the representation of a Markdown document.

You might not need to implement your own text-formatting engine, but you may often face a similar task. Text processing is not only useful when working with external files (test scripts, behavior specifications, or configuration files) but also when processing user inputs in an application (such as commands or calculations).

Introducing the Markdown format

The Markdown format is a markup language that has been designed to be as readable as possible in the plain text form. It is inspired by formatting marks, such as *emphasis*, that are often used in text files, emails, or README documents. It specifies the syntax precisely and thus it is possible to translate Markdown documents to HTML.

Formatting text with Markdown

The formatting of Markdown documents is based on whitespace and common punctuation marks. The document consists of block elements (such as paragraphs, headings, and lists). A block element can contain emphasized text, links, and other formatting. The following sample demonstrates some of the syntax:

Visual F#  
=========
F# is a **programming language** that supports _functional_, as 
well as _object-oriented_ and _imperative_ programming styles. 
Hello world can be written as follows: 

    printfn "Hello world!" 

For more information, see the [F# home page] (http://fsharp.net) or 
read [Real-World Functional Programming](http://manning.com/petricek) 
published by [Manning](http://manning.com).

The document above consists of four block elements. It starts with a heading. The separator "=" is used for first level headings. We can also create second level headings using "-" as a separator. An alternative style uses a certain number of "#" characters at the beginning of the line, so, for example, ## Example is a second-level heading.

The second block is a paragraph, followed by a code sample and one more paragraph. The text in paragraphs is formatted using ** (strong) and _ (emphasis). Both asterisk and underscore can be used for strong and emphasized text—one character means emphasis and two characters means strong text. We can also create hyperlinks, which is demonstrated by the last line.

From a programming language perspective, formats such as Markdown can be viewed as domain specific languages, which is explained in the following sidebar.

Meta: external domain specific languages

The term domain specific languages (DSLs) refers to programming languages that are designed to solve problems from a particular domain or field. DSLs are useful when you need to solve a large number of problems of the same class. In that case, the time spent on developing the DSL will be balanced out by the time that you save when using the DSL to solve particular problems.

DSL can be categorized into two groups. Internal DSLs are embedded in another language (like F# or C#). Functions from the List module with the pipelining operator (|>) can be viewed as a DSL. They solve a specific problem—list processing—and solve it very well without other dependencies.

External DSLs are languages that are not constructed on top of other languages. They may be used as embedded strings (for example, regular expressions or SQL) or as standalone files (including Markdown, configuration files, Makefile, or for example, behavior specifications using language such as Cucumber).

Now that I’ve introduced the Markdown format and domain specific languages in general, let’s look at a number of benefits that we can expect from a Markdown parser written in F#.

Why another Markdown parser?

Markdown is a well-established format and there is a number of existing tools that convert it to HTML. Most of these are written using regular expressions and there are some written for almost any platform, including .NET. So, why do we need yet another processor? Here are a few reasons:

  • Creating a custom syntax extension for Markdown is quite difficult when using an implementation based on regular expressions. It is hard to find where the syntax is being processed, and changing a regular expression can lead to various unexpected interactions.
  • Most of the tools transform Markdown directly to HTML. This makes it hard to add a custom-processing step, for example, to process all code samples in the document before generating HTML.
  • A related problem is that HTML is the only supported output. What if we wanted to turn Markdown documents into another document format, such as Word or LaTeX?
  • Finally, performing a single regular expression replacement may be quite efficient, but, if the processor performs a huge number of them, the code can get quite CPU consuming. A custom implementation may give us better performance.

Let’s now look how we can achieve these goals using F#. The key element of the solution is an elegant functional representation of the document structure.

Representing Markdown documents

When solving problems in functional languages, the first question we need to answer often is: "What data structures do we need to represent the data we work with?" In case of Markdown processor, the data structure represents a document. As discussed earlier, a document consists of blocks of different kinds. Some of the blocks (like paragraphs) may contain additional inline formatting and hyperlinks.

fdeep101

Figure 1 Here you can see how different MarkdownBlock elements and different MarkdownSpan elements are used to format the sample document. All other unmarked text is represented as Literal.

Listing 1 Representation of Markdown document

type MarkdownDocument = list

and MarkdownBlock =  
  | Heading of int * MarkdownSpans
  | Paragraph of MarkdownSpans
  | CodeBlock of list

and MarkdownSpans = list

and MarkdownSpan =  
  | Literal of string
  | InlineCode of string
  | Strong of MarkdownSpans
  | Emphasis of MarkdownSpans
  | HyperLink of MarkdownSpans * string

The types that model Markdown documents are shown and explained in listing 1. I defined the types as a mutually recursive using the and keyword for two reasons. Firstly, the MakdownSpans and MarkdownSpan types are mutually recursive and they both reference each other. Secondly, I wanted to start with a type that represents the entire document rather than starting from the span to make the explanation easier to follow.

Summary

Broadly speaking, this article was about external domain specific languages. An external DSL is a small programming language or document format that has its own syntax and represents some script, document, or command. External DSLs can be used to configure an application, to provide scripting capabilities, customization, and various other tasks.

The domain specific language that we focused on was the Markdown document format. When working with external DSLs, we first write an F# representation of the language and then implement processing of the DSL.

The functional representation that I described in this article is the cornerstone of a new Markdown processor. Other components are all built around this representation. Chapter 3 of F# Deep Dives looks at three additional aspects: writing a parser that turns text into MarkdownDocument, writing an HTML generator that turns MarkdownDocument into a HTML file, and implementing the pre-processing of a document that generates the references section with all of the document links. All of these tasks are built on top of a simple representation using powerful F# features like pattern matching and active patterns.

1. The project can be found at https://github.com/tpetricek/FSharp.Formatting

2. For more information about Markdown, see http://daringfireball.net/projects/markdown

Here are some other Manning titles you might be interested in:

fdeep102

The Real-World Functional Programming

Tomas Petricek with Jon Skeet

fdeep103

HTML5 for .NET Developers

Jim Jackson II and Ian Gilman

fdeep104

IronPython in Action

Michael J. Foord and Christian Muirhead

 

del.icio.us Tags: ,,,

“Hello, World” Aspect

 

AOP in .NET
By Matthew Groves
Aspect-oriented programming is a technique that is complementary to object-oriented programming (OOP). The goal of AOP is to reduce repetitive, boilerplate code. This article, based on AOP in .NET, walks you through a very basic "Hello, World" example of using AOP in .NET. He breaks apart that example and identifies the individual puzzle pieces and how they fit together into something called an "aspect."

“Hello, World” Aspect

If you’ve never done aspects, we’ll give you a taste of what’s in store. Don’t worry if you don’t fully understand what’s going on just yet. Follow along just to get your feet wet. I’ll be using Visual Studio 2010 and PostSharp. Visual Studio Express (which is a free download) should work too. I’m also using NuGet, which is a great package manager tool for .NET that integrates with Visual Studio. If you’ve never used NuGet, you should definitely take a few minutes to check it out at NuGet.org and install it: it will make your life as a .NET developer much easier.

Start by selecting File>New Project and then Console Application. Call it whatever you want, but I’m calling mine "HelloWorld". You should be looking at an empty console project like so:

class Program {
    static void Main(string[] args) {
    }
}

Next, install PostSharp with NuGet. NuGet can work from a PowerShell command-line within Visual Studio, called Package Manager Console. To install PostSharp via the Package Manager Console, just use the Install-Package command.

Listing 1 Installing PostSharp with NuGet PowerShell console
PM> Install-Package postsharp
Successfully installed 'PostSharp 2.1.6.17'.
Successfully added 'PostSharp 2.1.6.17' to HelloWorld.

Alternatively, you can do it via the Visual Studio UI by first right-clicking on References in Solution Explorer.

image

Figure 1 Starting NuGet with the UI

Select Online, search for PostSharp, and click Install.

image

Figure 2 Search for PostSharp and install with NuGet UI

You may get a PostSharp message that asks you about licensing. Accept the free trial and continue. The Starter Edition is free for commercial use, so you can use it for free at your job too. Now that PostSharp is installed, you can close out of the NuGet dialog. In Solution Explorer under References, you should see a new PostSharp reference added to your project.

Now you’re ready to start writing your first aspect.

Create a class with one simple method that just writes to Console. Mine looks like this:

public class MyClass {
    public void MyMethod() {
        Console.WriteLine("Hello, world!");
    }
}

Instantiate this inside of the Main method,and call the method. Here’s what the Program class should look like now:

class Program {
    static void Main(string[] args) {
        var myObject = new MyClass();
        myObject.MyMethod();
    }
}

Execute that program now (F5 or CTRL+F5 in Visual Studio), and your output should look like this:

image

We’re not really pushing the limits of innovation just yet, but hang in there!

Now, create a new class that inherits from OnMethodBoundaryAspect, which is a base class in the PostSharp namespace. Something like this:

Listing 2 The first step in using the PostSharp API – derive from OnMethodBoundaryAspect
[Serializable]
public class MyAspect : OnMethodBoundaryAspect {
}

PostSharp requires aspect classes to be serializable (this is because PostSharp instantiates aspects at compile time, so they can be persisted between compile time and run time).

Congratulations, you just wrote an aspect, even though it doesn’t do anything yet. Like the name implies, this aspect allows you to insert code on the boundaries of a method. Let’s make an aspect that inserts code before and after a method gets called. Start by overriding the OnEntry method. Inside of that method, write something to Console, like this:

Listing 3 Override the OnEntry method to add some functionality
[Serializable]
public class MyAspect : OnMethodBoundaryAspect {
    public override void OnEntry(MethodExecutionArgs args) {
        Console.WriteLine("Before the method");
    }
}

Notice the MethodExecutionArgs parameter. It’s there to give information and context about the method being bounded. We won’t use it in this simple example, but argument objects like that are almost always used in a real aspect. Create another override, but, this time, override OnExit.

Listing 4 Override the OnExit to add more functionality
[Serializable]
public class MyAspect : OnMethodBoundaryAspect {
    public override void OnEntry(MethodExecutionArgs args) {
        Console.WriteLine("Before the method");
    }
    public override void OnExit(MethodExecutionArgs args) {
        Console.WriteLine("After the method");
    }
}

Now you have written an aspect that will write to Console before and after a method. But, which method? The most basic way to tell PostSharp which method (or methods) to apply this aspect to is to use the aspect as an attribute on the method. For instance, to put it on the boundaries of the earlier "Hello, World" method, just use it on the method like so:

Listing 5 Apply the aspect to a method by using an attribute
public class MyClass {
    [MyAspect]
    public void MyMethod() {
        Console.WriteLine("Hello, world!");
    }
}

Now, run the application again (F5 or CTRL+F5). You should see an output like this:

image

Figure 4 Output with MyAspect applied

That’s it. You’ve now written an aspect and told PostSharp where to use that aspect. This example may not seem that impressive, but notice that you were able to put code around the method without making MyMethod any changes to MyMethod itself. Yeah, you did have to add that [MyAspect] attribute, but there are more efficient and/or centralized ways of applying PostSharp aspects.

 

Here are some other Manning titles you might be interested in:

image

Spring in Action, Third Edition

Craig Walls

image

Spring in Practice

Willie Wheeler, John Wheeler, and Joshua White

image

Spring Integration in Action

Mark Fisher, Jonas Partner, Marius Bogoevici, and Iwein Fuld

 

 

The Dew Review – Visual Studio 2010 Best Practices by Peter Ritchie

I was recently given an eBook copy of Peter Ritchie’s new book, Visual Studio 2010 Best Practices, to review. I was excited to receive a copy because it was a title I had been planning to buy anyway. After reading it, I may order a print copy to keep within reach.

7164EN_mockupcover_normalWhen I first read the title, I wondered why they were publishing a Visual Studio 2010 book right before the launch of Visual Studio 2012. I hope this does not turn off any potential customers because the majority of the recommendations Ritchie gives in the book apply to development with both VS 2010 and 2012. And contrary to the book’s title, he doesn’t like to call them best practices.

I call them "recommended practices" instead of "best practices." The superlative "best" implies some degree of completeness. In almost all circumstances, the completeness of these practices has a shelf-life. Some best practices have a very small shelf-life due to the degree to which technology and our knowledge of it changes.

While this is not an introduction to Visual Studio or the .NET Framework, most Visual Studio developers should find this book useful. Those who are less experienced with .NET will be able to take these recommended practices to get into the world of .NET on the right foot. Even those developers who consider themselves experts in Visual Studio will probably find some new nuggets of wisdom.

The practices discussed in the book range from architecture to C# language features to toolsets. Each recommendation is discussed with examples and then distilled down to two statements, a Context and a Practice. Here’s an example around data transfer and messaging:

Context: When dealing with data that needs to be actioned independently and asynchronously.
Practice: Consider command classes.

I enjoyed reading Visual Studio 2010 Best Practices. I recommend reading it cover-to-cover and then keeping it on hand as a reference guide.