unnali

you tell the computer to do it and it does it

Newlines and regular expressions

| Comments

This has come up a few times recently, so a little note:

/./ will match any character—except a newline.

/[\s\S]/, /[\w\W]/ and so forth are the only way to portably match every character including newlines.

what about /./m?

/./m does not match a newline (usually—see below).

The m modifier does not alter . in any way (usually—see below).

m changes what ^ and $ match—normally they’ll match only the start and end of the string, but m alters them to also match the start and end of any line, i.e. immediately after and prior a newline.

Mnemonic: m for “multiline”.

what about /./s?

Yes, that will do what you want—unless you’re using Ruby or JavaScript.

In Ruby, the s modifier is inexplicably unused—in 1.8, it stays on the Regexp object, but in no way affects matching, and in 1.9, it doesn’t even stay on the object, it just disappears. Note that these behaviours are different to using a complete nonsense modifier, like f, which causes a SyntaxError[1].

Even more inexplicable, this feature has been rolled into m instead! So in Ruby, you can use /./m to match a newline, and you can also use /^a/m to match an a at the beginning of the string, or after any newline in the string.

In JavaScript, the s modifier is absent entirely, and it’s not rolled into anything else. Use /[\s\S]/.

Mnemonic: s for “single line” (the string is treated as one line, in a twisted sense).

conclusion

Yay, regular expressions. Be sure what you mean.

To match any character including newlines:

  • In JavaScript or any other language lacking the s modifier, use /[\s\S]/.
  • In Ruby, use /./m, but be aware that this also modifies ^ and $. If unsure, use /[\s\S]/.
  • If you have true PCREs, you may safely use /./s.

Note that this is often what you really want—rarely do you want to explicitly match every character except newlines. If you do that on purpose, at least leave a comment to that effect, otherwise your coworkers will just assume you didn’t know what you were doing.


  1. ^ Or just gets thrown away if you use Regexp.new directly.

On real relationships

| Comments

Writers of all stripes enjoy engaging in the most cynical readings of human behavior because they think it makes them appear hyper-rational. But in fact here is a perfect example of how trying to achieve that makes you irrational. Human emotion is real. It is an observable phenomenon. It observably influences behavior. Therefore to fail to account for it when discussing coupling and relationships is the opposite of cold rationality; it is in fact a failure of empiricism.

L’Hote on Kate Bolick’s “All the Single Ladies”.

What do you use to represent a set? A lesson from Perl.

| Comments

This is a bit of an FAQ in the Perl community, but it’s a piece of advice that has come in handy time and time again when I go back to other languages.

Question to the reader:

What datatype should you use if you want to store a set (as in mathematical set) of objects?

Typical operations on the set will be searching to see if a given object is an element of a set, adding elements to the set (while ignoring dupes), and removing a given element from the set.

For instance, in a chess server, you may want to store the set of game IDs which a user is participating in. The game may wish to store the set of user IDs of all users watching this game.

Think about what you’d use in a few of your favourite languages before reading on.

Go on, actually have an answer. It’ll be more satisfying for me this way.

Encoding, escaping; unencoding, unescaping

| Comments

As programmers, we spend a lot of time just carting data from one place to another. Sometimes that’s the entire purpose of a program or library (data conversion whatevers), but more often it’s just something that needs to happen in the course of getting a certain task done. When we’re sending a request, using a library, executing templates or whatever, it’s important to be 100% clear on the format of the data, which is a fancy way of saying how the data is encoded.

Let’s do the tacky dictionary thing:

encoding (plural encodings)

  1. (computing) The way in which symbols are mapped onto bytes, e.g. in the rendering of a particular font, or in the mapping from keyboard input into visual text.

  2. A conversion of plain text into a code or cypher form (for decoding by the recipient).

I think these senses are a bit too specific—if your data is in a computer in any form, then it’s already encoded. The keyboard doesn’t even have to come into it.

If you’re like me and you come from an English-speaking country, there’s a good chance that this might seem farfetched, or totally obvious but lacking in depth. The letter A is represented in ASCII by the integer 65, or hex 41.

From hereon, if I refer to a number with regular formatting, it’s decimal unless specified otherwise; likewise with code formatting, it’s hexadecimal.

You are also probably aware that non-Latin characters like do not have any mapping in ASCII, that people all tried to make their own ways to get around this—none of which interoperated particularly well—and that at some stage, a bunch of smart people decided to create Unicode, which assigns a unique integer codepoint to every character of every language (and then some), such that the character just mentioned is U+604b, and that there are character encodings, like UTF-8, which are used to represent the codepoints in a bytestream, such that becomes e6 81 8b.

This is all well and good. But what do you do with this stuff in your program?

Firstly, we need to straighten out what your environment does, or doesn’t do, with character encodings. I’m going to use PHP, Erlang and HTML as my examples, because they’re things I work with at work, and they each have slightly different ways of dealing with encoding[1] owing to their internal representation of strings.

Secondly, I’m going to expand this beyond character encodings to any encoding—which is ultimately what I want to talk about here. We’re not just encoding the textual content for decoding into codepoints; we’re also often encoding data to put it within other data in a demarcated way. In this case, we tend to refer to escaping, but escaping and encoding are different ways of talking about the same process.

Not what was meant, but

| Comments

1
2
3
4
coworker: If it were a critical task, how long to get that UI up and running?
me: Probably about the same amount of time if it were non-critical, all things considered
me: Maybe a little longer
me: Stress tends to make tasks drag out

Absence of understanding in motion

| Comments

Reference.

An employee of the company who mantains Mongo, who appears to be assigned Erlang driver maintenance lately, seems to deduce that the tests fail[1], and so changes an API which has been stable for 2 years in order to “make the tests pass”, despite also changing the tests.

WAT.


  1. ^ they don’t.

Bye, PHP. You won’t be missed.

| Comments

Today marks the day I finish migrating from my old WordPress blog to Octopress, finally galvanised into action by Eevee’s amazing takedown of PHP. It’s not that I didn’t realise this – I have to do plenty of PHP as part of my day job, and I know just how bad it is – but already using WordPress, had little reason to move.

I’ve been doing some autumn cleaning on my server, replacing lighttpd with nginx, reorganising others’ sites into their home directories, and generally sanitising things, so it seemed like I really should get this over and done with.

And now I’m done!

A new proposal from the Google Japanese Input Team (translation)

| Comments

Original article at http://googlejapan.blogspot.de/2012/04/google.html.

A new proposal from the Google Japanese Input Team 1st April, 2012 Posted by Google Japanese Input Team

Hi everyone! Have you used Google’s Japanese Input before? It supports Japanese text entry the way you think, boasting a huge vocabulary and powerful suggestion feature, and since our initial announcement we’ve had a huge uptake. At the end of last year we also announced the new Android version.

At the Google Japanese Input Team, we’ve been continuing our research every day, thinking about faster, more efficient ways to do input. Although it hasn’t been well known, we’ve also been conducting R&D; in the field of keyboard design.

In 2010, we developed a drumset-type keyboard. It had huge repercussions in directly tackling the issue with traditional keyboards of direct Japanese text entry.

Image of drumset with a few thousand keys on pads, plus foot-pedals!

However, our market research turned up many points to reflect on: “I can’t remember the keys”; “I can’t use it in the car”; “It’s too novel”. Based on these, we’ve developed an input tool which excels in the traits: “Minimal key count”; “Can be used everywhere”; “Real applications in abundance”.

And so today, we are very proud to announce the Google Japanese Input, Morse-Edition Keyboard.

Morse code entry device. Text on it says "Google Japanese Input". Text on the entry key itself is the letter "A"

Book review: Practical Common Lisp, by Peter Seibel

| Comments

Practical Common LISPPractical Common LISP by Peter Seibel

My rating: 4 of 5 stars

Just what Common Lisp needed: a book that doesn’t bubble at the mouth, frothing over how every other language is attempting to be CL but failing; a book that doesn’t tell you how macros mean you can write EVERY LANGUAGE EVER in CL; a book that doesn’t tell you how CL’s macros are the best thing since sliced bread, then follow it up with totally shit examples of what macros are actually used for; a book that actually tours the standard library in a semi-sensical fashion, and covers practical things you might actually want to do, and in the meantime does a pretty decent justice to the rather large language that is Common Lisp.

In short, a rather good book, suitable for total beginners to Lisp and Common Lisp.

I found that it lost traction at times—sometimes it degenerated into a little bit of a reference, and you felt like you were reading a dictionary—but fairly quickly it recovered and had your attention again (while still being didactic). Similarly, the practicals in the last few chapters were almost too well-architected; I really just felt like I was building the target application, sort of learning the techniques, but ultimately it wasn’t necessarily interesting.

Finally, the topics glossed over in the conclusion are probably more important to someone wanting to build real applications with Common Lisp (i.e. relating to “practical Common Lisp”) than some of the stuff that could probably be gleaned from an evening with the HyperSpec—finding libraries and deploying applications are two very big question marks for any Lisp developer. This is partly a result of developments in these areas being even more recent than the book (e.g. QuickLisp development began in 2010, after the book’s last copyright year of 2009).

All in all, an excellent introduction to the world of Common Lisp.

View all my reviews