The new HTTP server, part 2
Tuesday, April 29, 2008
It’s been a month and a half since the first part of this series. Why the long delay? I’ve been busy with other things. I implemented inheritance, various compiler optimizations, and many other things. In the last couple of weeks I’ve been working on the web framework again, tying up some loose ends and porting more existing web applications over (namely, the pastebin and planet factor).
In this entry I will talk about session management. Session management was one of the first things I implemented in the new framework when I started working on it, but recently I gave the code an overhaul.
Session management
The basic idea behind session management is that while HTTP is a stateless protocol, we can simulate state by sending a token to the client – either in the form of a hidden element on the page, or a cookie, which the client sends back to the server with a later request. This token is associated with an object on the server and the object holds state between requests.
Another approach for session management is to store state entirely on the client; instead of sending the client a session ID identifying an object on the server, you send the session data itself to the client. Traditionally this approach has only been used for user preferences and such where security is immaterial, but it can even be used for more sensitive data by encrypting it with a private key only known to the server. The client receives an opaque blob of binary data which cannot be inspected or tampered with (unless the public key encryption algorithm being used is compromised).
Currently Factor’s session manager does not support client-side sessions, but it will soon, using Doug Coleman’s public-key encryption code. Server-side sessions are supported, however.
The session manager uses two main strategies to pass state to the client:
- For GET and HEAD requests, a cookie is used. The cookie’s value is a randomly-generated session ID.
- For POST requests, the form must define a hidden field with the session ID. The value of the cookie is ignored to thwart cross-site scripting attacks.
The idea is to strike a balance between security and convenience; we don’t want to add a session ID to every link and start a new session if the user navigates to the site by directly entering a URL, but on the other hand we don’t want potentially destructive POST requests to be accepted unless they were sent by a form generated from within the session itself.
In Factor, a session is simply a hashtable where values can be stored.
Keys are known as “session variables” and values can be read and written
with the sget
and sset
words, there’s also a schange
combinator
which applies a quotation is applied to an existing session variable to
yield a new value. This all entirely analogous to the
get
/set
/change
words for dynamic variables.
Session namespaces are serialized and stored in a database using Doug’s
db.tuples
O/R mapper. I originally supported pluggable “session
storage” backends, with database storage and in-memory storage as the
two options, however I decided to simplify the code and hardcode
database storage. This has the side-effect that you’ll need to set up a
database to use the session management feature, however SQLite presents
a lightweight option which requires no configuration, so I don’t think
this is a big deal at all.
I will show a small example of a ‘counter’ web application, much like the counter example for the Seaside framework.
We start off with a vocabulary search path:
USING: math kernel accessors http.server http.server.actions
http.server.sessions http.server.templating.fhtml locals ;
IN: webapps.counter
Now, we define a symbol used to key a session variable:
SYMBOL: count
Next, we define a pair of actions which increment the counter value,
using the schange
combinator. The display
slot of an action contains
code to be executed upon a GET request; it is expected to output a
response object. In our case, the word outputs an action which applies
the quotation to the current counter value; the action outputs a
response which redirects back to the main page:
:: <counter-action> ( quot -- action )
<action> [
count quot schange
"" f <standard-redirect>
] >>display ;
The action to decrement the counter is entirely analogous:
: <dec-action> ( -- action )
<action> [ count [ 1- ] schange f ] >>display ;
Note that this word constructs actions, instead of invoking them. This approach is more flexible than the old “furnace” web framework, where actions were mapped directly to word execution, because it allows one to write “higher-order actions” parametrized by values more easily.
Here is the default action; it displays the counter value using a template:
: <counter-action> ( -- action )
<action> [
"resource:extra/webapps/counter/counter.fhtml" <fhtml>
] >>display ;
Finally we put everything together in a dispatcher:
: <counter-app> ( -- responder )
counter-app new-dispatcher
[ 1+ ] <counter-action> "inc" add-responder
[ 1- ] <counter-action> "dec" add-responder
<display-action> "" add-responder
<sessions> ;
We create a dispatcher, add instances of our actions to it, and wrap the whole thing in a session manager.
Now, the template:
<% USING: io math.parser http.server.sessions webapps.counter ; %>
<html>
<body>
<h1><% count sget number>string write %></h1>
<a href="inc">++</a>
<a href="dec">--</a>
</body>
</html>
Finally, once we have all the parts, we can create the counter responder and start the HTTP server:
<counter-app> "test.db" sqlite-db <db-persistence> main-responder set
8888 httpd
Note that here we wrap the counter responder in another layer of indirection, this time for database persistence; while the counter web app doesn’t use persistence the session manager does, and we chose to use SQLite since it requires no configuration or external services.
Navigating over to http://localhost:8888/ should now display the counter
app, and clicking the increment and decrement links should have an
effect on the displayed value. Sessions persist between server restarts
and time out after 20 minutes of inactivity by default. Looking at your
web browser’s cookie manager will show that a factorsessid
cookie has
been set.
As an aside, the Seaside version uses continuations to maintain state. The Factor version explicitly maintains state. Even though I ported Chris Double’s modal web framework over to the new HTTP server, I’m avoiding continuations in favor of explicit state for now. I am building up a form component framework with validation, easy persistence, and user authentication without resorting to continuations, and I plan on building a state-machine model with a page flow DSL, much like jBPM, to handle more complex multi-page flows such as shopping carts. While this will result in more work for me, I believe the benefits include transparent support for load-balancing and fail-over, readable URLs, and ultimately, simpler and more reusable web application code because page flow can be decoupled from logic and expressed in a custom DSL intended for that purpose.
The Seaside version is also somewhat shorter; it is easy to express with idiomatic Seaside (transparent session management, presentation logic mixed in with web app code). I will add better abstractions to make up for some of the difference, and for larger applications there should be no difference in code size; in fact since the scope of Factor’s framework is wider than Seaside (it covers persistence, authentication and validation, and soon, versioning of persistent entities) you might even need less code to accomplish the same thing.
Virtual hosting
The other topic I promised to cover last time was virtual hosting.
Virtual hosting is done with dispatchers, much like nested directory
structure is. You create a virtual host dispatcher with
<vhost-dispatcher>
and add responders for various virtual hosts using
add-responder
; the >>default
slot can be used to set the default
virtual host. The key difference between the new approach and the old
HTTP server virtual hosting implementation, which relied on a global
hashtable mapping virtual host names to responders, is flexibility; the
virtual host dispatcher does not necessarily have to be your top-level
responder.
For example, the <boilerplate>
responder gives you a way of enforcing
a common look and feel across a set of web apps, by adding common
headers and footers to every page. While I will describe boilerplate
responders and the template system in more detail in a later post, for
now here is an example:
<vhost-dispatcher>
<online-store> "store.acme.com" add-responder
<support-site> "support.acme.com" add-responder
<main-site> "acme.com" add-responder
<boilerplate>
"acme-site.xml" >>template
acme-db <db-persistence>
<sessions> main-responder set
Here, all virtual hosts share the same session management, database persistence, and common theme, and the virtual host dispatch only happens after the request filters through the mentioned layers of functionality. This would not be possible with the old HTTP server without duplicating code.
Cookies
Finally I promised to talk about cookies. The session management support
is great but sometimes you just want to get and set cookies directly.
This can be done by reading the cookies
slot of the request object,
and writing the cookies
slot of the response object. The slot contains
a sequence of cookie
objects, which are parsed and unparsed from their
HTTP representation for you. A cookie object contains a series of slots,
such as name, value, expiration date (as a Factor timestamp object),
max-age (as a Factor duration object), path, and host. While the
expiration date is deprecated as of HTTP/1.1, most sites still use it in
favor of max-age because older browsers don’t support max-age. Factor’s
HTTP server sets the date header on each response so that expiration
dates can work correctly.
Here is an example of using the HTTP client (which shares the cookie code with the server) to look at Google’s ridiculously long-lived cookies:
( scratchpad ) "http://www.google.com" http-get-stream drop cookies>> first describe
cookie instance
"delegate" f
"name" "pref"
"value" "ID=c0f4c074cd87502e:TM=1209466656:LM=1209466656:S=_6gGEKtuTgP..."
"path" "/"
"domain" ".google.com"
"expires" T{ timestamp f 2010 4 29 10 57 36 ~duration~ }
"max-age" f
"http-only" f