Click here to register.
      

[code] TruncateText and BreakText -- Macros Save The Day!

Design Squid
[code] TruncateText and BreakText -- Macros Save The Day!
preaction · 1/21/2008 9:18 am
Macros in WebGUI remain a very powerful and flexible tool for displaying and altering markup. Instead of subclassing assets to perform some minor addition or change, we can make a Macro that will do what we need.

Enter my project. I needed to create an RSS feed using an existing dataset. Because the requirements for this RSS feed were so specific to this one use, I decided that an SQLReport was the simplest way to go. However, problems began cropping up when the SQLReport would not quite go far enough to satisfy the requirements.

First, I needed to display only a few lines of text from a field that contains HTML (and bad HTML at that, the TinyMCE doesn't exactly give out well-formed HTML). I actually started by trying to parse the HTML with MySQL, which turned out something like this:

SELECT *,
    UNIX_TIMESTAMP(creationDate) AS creationDate,
    SUBSTRING_INDEX(
        SUBSTRING(
            bodyText,
            IF( LOCATE( '>', bodyText  ) + 1 > 150, 0, LOCATE( '>', bodyText ) + 1 ),
            IF( LOCATE( '<', bodyText, IF( LOCATE( '>', bodyText  ) + 1 > 150, 0, LOCATE( '>', bodyText ) + 1 ) ) - 1 = 0, LENGTH( bodyText ), LOCATE( '<', bodyText, IF( LOCATE( '>', bodyText  ) + 1 > 150, 0, LOCATE( '>', bodyText ) + 1 ) ) - 1 )
        ),  
        " ", 20) AS bodyText
    FROM ...

Can you understand it? Because I can't anymore. It actually worked, for a while, until the input started giving it trouble. I was about to give up and write an asset until I had an epiphany, the TruncateText macro.

TruncateText

The TruncateText macro takes two arguments: The first argument is a plain-text string that describes what you want, the second is the text or HTML that should be truncated.

^TruncateText("20 words", <tmpl_var content>);
    # Get the first 20 words from the content
^TruncateText("2 paragraphs", <tmpl_var content>);
    # Get the first 2 paragraphs from the content
^TruncateText("4 sentences escape", <tmpl_var content>);
    # Get the first 4 sentences from the content and escape any HTML entities.

Using Perl regular expressions, the first argument is a free-form string, which makes the interface easy to use.

This solved one problem, how to display a couple lines from the beginning of the main content in the RSS feed, but one problem remained: The main application had a special tag "<cut>" to break the content into two sections. Then it would hide one of the sections until the user wanted it (like the synopsis/content in WebGUI Collaboration Systems).

BreakText

I wasn't about to try to figure it out with SQL again, especially not when I could make a solution that could be useful elsewhere. So I created the BreakText macro. This macro takes a regular expression as its first argument, and content as its last argument. Anything captured using parentheses in the regex will be returned by the macro, like so:

^BreakText("cut", "This is some cut text");
    # Will return nothing, since nothing is captured.
^BreakText("(.+)cut", "This is some cut text");
    # Will return "This is some "
^BreakText("cut(.+)", "This is some cut text");
    # Will return " text"
^BreakText("(.+)cut(.+)", "This is some cut text");
    # Will return "This is some  text"

Armed with these two simple tools, I was able to fulfill all the requirements. I was also able to clean up some less-than-optimal parts of the code, since the "<cut>" marker was a rather ugly hack.

Most importantly, by taking an hour to think about how a broad solution could be made that would be useful in other places, I have two more ways to create solutions for my clients.

Attached are the macros along with test suites for them. Feel free to look over them or use them in your own code.
Re: [code] TruncateText and BreakText -- Macros Save The Day!·
preaction · 1/21/2008 7:23 pm

One addendum: Since the macro parser breaks when " remain unescaped in any of the parameters, the only way to use these macros is to control the content that's put into them. 

Usually this means escaping " by turning it into &quot; (the HTML entity). YMMV. 

·
Stick
Lock
Subscribe