Incompatible changes in Factor 0.79
Sunday, October 30, 2005
Factor 0.79 is almost ready for release. There is a minor prettyprinter bug, and a show-stopper OpenGL failure on Windows that no doubt will be resolved soon. After that, I have to update the documentation, and it will be released. This release shakes things up a little, so I thought it would be worthwhile to post an entry about the most noticable changes, now that we have a number of active contributors.
First of all, the ifte
combinator has been renamed to if
.
Next up, literal syntax for various types has changed:
Data type
Old syntax
New syntax
Vector
{ 1 2 3 }
V{ 1 2 3 }
Array
None
{ 1 2 3 }
Hashtable
{{ [[ key value ]] ... }}
H{ [[ key value ]] ... }
Tuple
<< class slots ... >>
T{ class slots }
Complex number
#{ real imaginary }#
C{ real imaginary }
In particular, note that in 0.79, arrays play a much larger role in the
language. Formely, arrays were an implementation detail; they lived in
the kernel-internals
vocabulary, did not perform bounds checking, and
were only used internally to implement vectors and hashtables. In 0.79,
arrays have a literal syntax, are fully safe, and live in the arrays
vocabulary. Arrays are not resizable like vectors are, but are slightly
more efficient. This is the only real difference; the same operations
work on both, since Factor’s sequences are fully generic. Arrays are now
the preferred data type for literal sequences. In fact, literal vectors
are extremely rare; there’s really only two cases where they are
needed:
V{ } clone
– make a new, empty vector
[ 1 , 2 , 3 , ] V{ } make
– implicit sequence construction
This is why arrays now have the old vector syntax, and vectors now have a slightly more verbose syntax.
In particular, these changes have made the inspector display much more readable.
Finally, I also wanted to touch on a topic that I have discussed on IRC,
but never put in writing. Lists and conses are being phased out over
time. This does not affect the upcoming release of 0.79, but eventually
I hope that quotations can become a first-class immutable type with
underlying array storage; they will reuse the [ ... ]
syntax. Code
that uses conses to store data will be refactored to use arrays and
vectors instead. There are several reasons for this:
- Conses entail a 2x space overhead, and make the inner interpreter needlessly slow as it traverses the ‘cdr’ pointer while interpreting code. While the compiler makes this irrelevant, not all code is compiled. During bootstrap, interpretation overhead accounts for a significant portion of run time.
- If quotations become arrayed, first-class types, then the debugger
will be radically improved, and return stack traces will become much
more readable due to extra information available at run time.
Anybody who has looked at
:r
output knows what I mean. Its hard to understand if it is just a quotation soup, without any idea of what word each quotation came from. - Conses require their own implementations of algorithms such as
each
,map
,head
, and so on. Already, some combinators like2map
are only implemented for arrays, and result in a quadratic performance degradation with conses. I realize the “iterator” design pattern is supposed to eliminate this type of duplication, but iterators have performance and complexity of implementation problems. - The Factor programming style, which tends to favor the creation of new sequences from old ones rather than direct mutation, virtual sequences, and using combinators instead of explicit recursion, does not really require conses. Most algorithms can be expressed effectively using arrays and vectors instead.
- Factor’s conses are already quite constrained by being immutable. Thus circular lists cannot be implemented, which the main application of conses that is difficult to achieve with arrays.
- If a specific problem calls for conses, then nothing prevents anybody from implementing a library for working with them.
- Removal of conses would remove the requirement for the object memory manager to handle headerless objects. This will allow switching to a generational mark-sweep-compact garbage collector, which uses less memory than the generational copying collector we have now. Also, giving each object a header allows incremental GC to be implemented.
The total removal of conses from the core library is quite a major change. However, to keep the language lean and mean, this is something that should be done. Some people might find it surprising that I am still considering major changes to Factor, even as it is more than two years old. However, I did not have the benefit of extensive research and experience in language design before I began this effort, so I’d rather streamline my design while I still can rather than accumulate a large body of historical cruft and scar tissue.