Rich's latest talk is titled Design in Practice and clarifies a structure of building and recording a design that I feel is "just enough" to helpful without being burdensome. Further it elaborates Architecture Decision Records (ADR), which is the design structure I'm most familiar with. I still refer back to Mike Nygard's post on ADRs from 2011 regularly.
video: https://www.youtube.com/watch?v=c5QF2HjHLSE
full transcript: https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/DesignInPractice.md
This talk is about formalizing the process of design. Given Rich's background, he's primarily talking about software design but you may apply these concepts to other domains.
Some of the benefits and goals include
He begins by talking about precision in words. Using precise words helps keep wording concise and makes understanding easier. An artifact of using precise words is a glossary.
Rich reminds us the questions and the Socratic method are powerful tools for building understanding. Exposing answers to find the truth helps everyone learn. Detach yourself from your ideas. There is an objective truth and the goal is discover it together.
A framework for questions:
These questions frame your current status and your direction, from both progress and understanding. Rich suggests these questions can help with reflective inquiry; being aware of your own thinking.
He suggests that we should try to structure the record of our design in stories. A story should have these sections:
Not a checklist. Goal is record decisions made and why.
The suggested phases that a design process follows are:
Rich continues to explain the 6 phases.
Describe
Diagnose
Delimit
Direction
Design
In my experience, root cause analysis often finds many factors. Explaining why I haven't updated the blog in a couple years has many factors.
First, my personal life became busier. I started a new job. I had a second child. I haven't been doing a lot of personal programming. These are all excuses - if I wanted to, I could have found time to make a new post.
The real reason I stopped updating the blog was that I wasn't happy with it. Writing new posts included some friction; copy & pasting DSL snippets to add new paragraphs or headings. I wanted to make some changes to reduce the friction, and didn't want to write code for more posts until I'd made these changes. In my mind, new posts hinged on complete these updates first so I just didn't.
I think building the blog generator was a great experience. It may not have been a challenging pursuit, but was enjoyable (if not a yak-shave). It gave me complete control over my site.
I had wanted to do major surgery to the blog generator to take the DSL in a new direction. As I thought about the mental burden of the changes, I realized that benefits of building my own software were not outweighing the costs (anymore).
If I want the blog to be an effective communication tool or just personal record of my thoughts, the most important feature is ease of writing. My own blog was not easy (at least, not yet) to write for.
The benefits of building a blog generator had run their course; I know I can do it, but is building and maintaining this tool how I want to spend my time? The answer came back a resounding - no.
One important lesson I’ve been pondering lately is that it’s priorities and judgement matters; it’s magnitudes faster to be on an effective path sooner than to move quickly on an ineffective path.
Prioritizing is discussed so often that its boring for me to mention it here but it has been revelatory for me, including several moments thinking “how could I have been so lost before?”
As a mid-career engineer, I’ve discovered that I can have an outsized impact by guiding efforts away from rabbit holes and pitfalls. For example, an early-career Adam would have spent many hours considering the dozens of blog generator tools available and the merits of each. Now, I’m able to first consider my priorities and criteria then quickly filter the available options to make a decision.
Once I realized that I wanted to get out of the blog generator game, it became clear that I should just leverage an existing tool that makes it easy to write and publish content.
I considered many blogging tools before landing on quickblog. I've been enjoying Clojure for nearly a decade and quickblog is built on tools and a language that I already understand while fitting with my priorities.
Porting over the handful of previous posts and existing styles required a bit of work but I'm expecting in the long run, the investment will pay itself back. And, you're likely to read new content from me in the future.
My daughter’s favorite reminder when we’re late is to “remember the story of the tortoise and the hare? Slow and steady wins!”. She’s quite wise for a 5 year old! Even at my age, the rushing mindset is an easy trap in which to succumb. I think it’s actually quite rare that rushing is the optimal path; the downside of being late is often less than the risk of making mistakes by mindlessly rushing. My daughter may test my patience by being so frequently distracted but does trying to rush her help? I’d venture “no”.
]]>I've probably written dozens of parsers over the years, of which I remember less than half. The following experience report and light introduction to the topics of parsing & grammars may lead to better decisions when building parsers.
We've got some input. It's a string. The string has some structure which follows a recognizable format. We want to turn that string into data we can use. We need a parser.
There are two primary approaches (I know of) to write parsers; hand built or parser generator with a grammar.
In my experience, parsers begin hand built. The input syntax is simple or you just want to get it done quickly. You write a small regular expression. You add an iterative loop or recursion. Suddenly, you've got a hand built parser.
You've got a string with a general syntax. You need code that finds the parts of every string matching the syntax and act on it. You write code that finds matches then directly calls the action code.
Hand built parsers can be fast. Being purpose built for the task, code can optimized for performance. Any abstraction would require more machine effort than a well chosen algorithm.
Time passes and after a couple updates or changes in syntax, the code gets messy. Each change brings an accumulating pain. You've got difficult-to-follow recursion or incomprehensible clauses in your switch/cond statement. You long for a better abstraction or easier debugging but you're vibing sunk cost fallacy and can't bear to toss this significant subsystem. If you muster enough courage or 20% time then you go for the full refactor but like an old back injury, the pain returns in time.
Whether explicit or not, hand built parsers perform 3 duties. First, they search the input for specific tokens. Often input languages are defined in mutually exclusive states. In the JavaScript programming language for example, some characters are invalid in identifiers but valid in strings.
Second, they parse the token stream into the rules for the domain specific language. In JavaScript, the var keyword must be followed by an identifier string.
Third, hand built parsers (often) act on the rules of the domain specific language.
Let's use this information to find a better abstraction. As Rich Hickey would say "let's decomplect it".
A lexical analyzer (or lexer) scans the input and splits it into tokens. In a string, a token is a collection of characters (including a collection of size one). Tokens should have meaning. Meaning that a parser would need to apply the rules of the domain specific language.
Lexer definition often looks like a regular expression for recognizing a specific character or sequence of characters. The lexer produces a series of tokens pulled from the input.
A common example of a lexical analyzer generator is Lex). Interestingly, Lex was originally written in 1975 by Mike Lesk and Eric Schmidt (the future CEO of Novell & Google).
Using the rules of a language, a parser takes a stream of tokens and produces a tree. Most languages are recursive so a tree data structure makes it clear which tokens are composed within the body of others.
Yacc is a commonly used parser, often paired with Lex. This is what my University computer science courses required (15 years ago).
Grammars are an expressive language for describing rules of a domain specific language. You write a grammar then give it to a parser generator, which generates code for interpreting the input (usually a string).
Here's an example grammar for the common CSV (comma separated values) format. This grammar is defined in ANTLR 4 which combines both lexer and parser definitions in the same grammar.
csvFile: hdr row+ ;
hdr : row ;
row : field (',' field)* ' '? ' ' ;
field
: TEXT
| STRING
|
;
TEXT : ~[, "]+ ;
STRING : '"' ('""'|~'"')* '"' ;
ANTLR combines both lexer and parser rules in the same grammar. In it's language, a lexer rule identifier begins with an upper case letter and a parser rule does not. TEXT
and STRING
are both lexer rules which result in tokens. The field
parser rule uses the tokens (including the inline ','
in the row
rule) to build the higher level abstractions. In ANTLR rules that use alternatives (|
) order matters; the field
rule with prefer TEXT
tokens over STRING
tokens.
There are languages that cannot be specified in a grammar, so beware but (in my experience) they are rare. More commonly, you're going to find languages that are ambiguous.
An ambiguous language can have more than one parser rule match a set of characters. For example, let's say you have a language with the following rules.
link: [[ STRING ]]
alias: [ STRING ]( STRING )
STRING: [a-zA-Z0-9 ]+
These two rules share the same left stop character. If a grammar were to parse [[alias](target)]
then the parser would be unable to determine which rule to follow. Likely, the parser would fail trying to apply the link rule but not finding the ]]
right stop characters.
There are ways to work around ambiguous rules, but it would be better to design the language to remove these ambiguities if possible. The best work around I have discovered is to define each rule with optional characters to cover other ambiguous rules. From our previous example, you could add an optional [
like so. ]
link: [[ STRING ]]
alias: [? [ STRING ]( STRING )
STRING: [a-zA-Z0-9 ]+
The parser can remove the ambiguity through matching the left stop characters on both rules. Note that this is ANTLR 4 specific, but you may be able find a similar solution in other grammar definition languages.
I am a fan of ANTLR 4. I have found it to be powerful, easy to use, performant and well supported. A Clojure wrapper exists for it's Java implementation. @aphyr even did some performance tests of it (specifically comparing it to Instaparse). If you want a deeper dive into using ANTLR then I'd recommend The Definitive ANTLR 4 Reference. There are plenty of helpful examples of ANTLR-based grammars for different languages available on github.
]]>I like building things. Software things in particular. I like the malleability of software. I like that a new ideas can significantly improve expressiveness, performance, features and developer productivity. I love the feeling of a clean and cohesive code base. I spend hours on a good refactor like I'm engrossed in a pressure washing video. I love the quick feedback loops that few other disciplines can provide.
I'm really into this software stuff.
I'm also very particular about the software I use. I've built a number of web sites with some of the currently (they seem to change so often!) popular static site generators. They make simple things complex or just don't work the way I want. It's hard to build good abstractions that work for everyone.
I understand HTML and CSS. Most of static site generators are built to translate other markup languages (like Markdown) into web site assets. Other markup languages are great for some people, but I don't need that abstraction when I can speak the destination markup language.
I need a different type of tool. A tool that makes it easy to manage complexity of code re-use. A tool that gives me access to full expressiveness of the destination data format. A tool that can be composed into a larger system. A tool that is simple to understand.
Instead of choosing a friendlier markup language, let's talk about a structured data representation of HTML and CSS. Once we have structured data, we can simply translate it into HTML and CSS. To get started, let's focus on simply HTML with inline CSS.
The Clojure programming language is my weapon of choice. It's data manipulation primitives make building domain specific languages relatively easy. Clojure has a popular domain specific language for representing HTML and CSS. Hiccup is a simple translation of HTML elements and CSS properties into collections/arrays and maps/objects. Here's what Hiccup looks like in a Clojure REPL:
user=> (html [:span {:class "foo"} "bar"])
"<span class="foo">bar</span>"
My theory on building static sites is that we can build the most complex static web assets with simple composition). Composition should give us the option to abstract away HTML and CSS (if we want) and build re-usable components (like layouts or common heading styles). I'm confident that this theory will pan out as function composition is my primary tool for building any software in any language.
Clojure has built-in tools for reading EDN data, which employs the same syntax as Clojure data structures. Let's add the Aero library, which offers a set of tag literals to our EDN content. Aero also makes it easy to implement our own tag literals which allows us to add composition to our static site language. We could build everything discussed here simply using clojure.edn but we'd end up re-implementing some of the code that Aero includes.
We're building static content for our website, so let's keep our structured data in static files. We can version static files with git and our tool can guarantee idempotency (the same data will always produce the same output).
Our tool should accept configuration for each of the site's assets, read Aero's tag literals, apply Hiccup rendering then produce the web site assets back to the filesystem.
Let's get into what structured data for an Aero/Hiccup HTML+CSS document might look like. Here's the asset configuration for the index page of this website.
{
:type :html
:slug "/index.html"
:content #template ["pages/index.edn" [:content] {:path "/"}]
}
Let's talk about the #template ["pages/index.edn" [:content] {:path "/"}]
section.
#template
is a custom tag literal. Think of it like a function being called. It's very similar to Aero's built-in #include tag
literal except that it adds additional data into it's render context. #template
will be the basis of composition from which we build our static content.
["pages/index.edn" [:content] {:path "/"}]
are the three arguments to our function.
First, is the path to the template's definition. We'll keep another file with structured data for :slug "index.html"
at "pages/index.edn"
Side note: "slug" may have been better named "output-path".
Second, a pull selection to filter the output. In a finished version of this tool, you might consider implementing the EDN Query
Language but let's stick to a simple vector of keys that can be applied to Clojure's get-in
. I would set a sane default (like [:content]
) to provide some consistent structure to our template files.
Third, map of input variables. The input is a map so that each template can apply a similar pull syntax for extracting the data it expects. You can include any data structure as input, so it's extremely flexible.
Let's take a look at a template definition.
{
:color #include "../styles/color.edn"
:content
#template ["../components/layout.edn"
[:content]
{:title "home"
:body #ref [:body]}]
:body
[:div
{:style
#css {:display :flex
:flex-direction :column
:justify-content :flex-start
:align-items :flex-start}}
[:div
{:style
#css {:font-size "50px"
:font-weight 700
:color #ref [:color :yellow]}}
"Hi. I'm Adam Tait."]]
}
This is some of the template definition for this site's (adamtait.com) home page. Hopefully, you first notice the resemblance to Hiccup or HTML. We have a body element with a flexbox column layout and a single text element.
In the site configuration, we said we would be pulling [:content]
from the evaluated data of this template, so the :content
section is the output. :content
renders a layout template which uses :body
as input. :body
is the heart of our template definition.
Given what you've seen of our tool so far, you might extrapolate what a larger site may look like. You'll find ways of reducing complexity by adding sane defaults, refactoring out shared references and templates.
Rich Hickey has a talk titled Are We There Yet? where he talks about incidental complexity. Incidental complexity is hidden; it wasn't requested (or expected), it just comes along for the ride.
Seek simplicity, and distrust it (Alfred North Whitehead)
Our site and template definitions don't hide incidental complexity but they don't hide complexity either. The complexity is (mostly) laid bare. There's not much "magic" to this tool. You have full access to base languages of the system (HTML and CSS) or you can abstract the Hiccup/HTML/CSS away in templates. You have power to build your own tool. The tool you build is one that you deeply understand (you created most of it, afterall) and well adapted to your specific use case.
I built this static site generator because I wanted an honest tool. I wanted to easily understand what data was available at each point. I wanted to build up my abstractions and organize my content in the most intuitive structure for me. Most people would consider this a poor abstraction because it's too raw. What we have built is a tool to build static site generators.
As I grow this site, this as-yet-unnamed tool is also maturing. I may eventually open source it (and the code for this site) in it's entirety. I'll post an update if it becomes available.
]]>There are so many tools for generating static sites, why build another? In short, I am biased towards building my own software systems.
My thinking has grown up on Rich Hickey talks. I mention this not for the content of any single talk he has given but the general philosophy of software design he shares. Daniel Higginbotham has tried to capture some of Hickey's philosophy here.
I have developed a healthy fear given the risks of depending on systems I don't understand, written by someone who's goals and biases of which I'm unaware and with unknown future development path. Not to mention the real legal risks of different software licenses. I have been burned plenty after deeply investing in learning a new software system and building my own software on it's abstractions, only to discover that the abstraction makes something simple difficult or the implementation has a bug that was difficult to anticipate then to debug or fix. It's disempowering to need to dig into the guts of a new abstraction when your goal is to just move past it.
With SaaS services, there are risks at every moment of outages, network failures or data loss as your system has hidden dependencies on many teams of people, legacy architecture choices and their own SaaS dependencies. It's frustrating when your systems are down and you're left without recourse.
As David Foster Wallace discusses in his speech "This is Water", the longer and deeper you submerge your mind in an environment (or programming language, or software tool) then the more your thinking begins to resemble the facts and limits of that environment. As you become more invested in someone else's software ideas, you may forget that some challenges you're facing are not common or miss that entire classes of errors exists in your environment that don't exist elsewhere.
At this point in my journey, I think about software in terms of a composition of simple and reliable tools (ala Ken Thompson's UNIX philosophy). Investing in building with a small set of tools & abstractions is slow in the short term but carries significant long term benefit. Earlier in my career, I wasted many hours learning broad but not deep, chasing the new popular language or framework but not sticking with any of them long enough to achieve mastery. Economies of scale exist in building deep knowledge (true in other disciplines, too!).
I'm not convinced that software mastery can result in perfection but practice does find improvements. The more small tools you build, the more ideas and code you have to draw upon when you need to compose something bigger. I haven't found that larger software systems have the same economies of scale; they tend to have their own specific quirks and norms.
I'd highly recommend reading or listening to Rich Hickey's talks but if you're short on time then Daniel Higginbotham has to captured some of Hickey's philosophy here.
]]>