Is Composing More Readable Than Folding?

OK it's been since January. I just saw this post fly by about an F# Folding Challenge and started wondering what's the most readable solution I could come up with.

Here it is...

(mapcat (fn [ns] [(count ns) (first ns)]) (partition-by identity [4 3 3 5 6 6 6 6 7 8 8]))

--------

One step better. I recently came across "juxt"...

(mapcat #((juxt count first) %) (partition-by identity [4 3 3 5 6 6 6 6 7 8 8]))

 

--------

It was the following until Christophe Grand pointed out the mapcat function. Thanks.

(flatten (map (fn [ns] [(count ns) (first ns)])                   (partition-by identity [4 5 5 3 6 6 6 6 7 8 8])))

Type Theorists Can (Sometimes) Go Pound Salt In Rat Holes

I came across a comment on a blog discussing Clojure's "meta data"
mechanism. Here's the quote...

"Credit where credit is due: these are "Dual Values" in the sense of
Clifford (1873) who introduced them, with a type constraint (Dual
Numbers), in the formulation of bi-quaternions."

No. No. No. This is just so much mathematical, type-theoretical poo.

Now, mind you, I could be wrong. Rich Hickey very well may have had
Clifford's Dual Values from 1873 (mind you!) as the inspiration for
Clojure's meta data.

...

But I highly doubt it.

...

1873, indeed.

RDF

Joel Amoussou has a recent blog piece asking "Relational, XML, or
RDF?" He addresses all three categories well but here I'm quoting the
essence of his RDF coverage. RDF has been around a while, and I
believe will be increasing in the next few years. Good, lasting
technologies often take 10 years or more before they really hit their
stride because (1) their core developers have to learn about their own
babies, and (2) other developers getting things done without them have
to see repeated mature evidence.

Here's a bit of what Joel has to say about RDF...

http://efasoft.blogspot.com/2009/09/relational-xml-or-rdf.html

The RDF Data Model

Semantic Web technologies like RDF, OWL, SKOS, SWRL, and SPARQL and
Linked Data publishing principles have received a lot of attention
lately. They are well suited for the following applications:

* Applications that need to infer new implicit facts based on existing
explicit facts.

* Applications that need to map concepts across domains such as a
trading network where partners use different e-commerce XML
vocabularies.

* Master Data Management (MDM) applications that provide an RDF view
and reasoning capabilities in order to facilitate and enhance the
process of defining, managing, and querying an organization's core
business entities.

* Applications that use a taxonomy, a thesaurus, or a similar concept
scheme such as online news archives and medical terminologies.

* Silo-busting applications that need to link data items to other data
items on the web, in order to perform entity correlation or allow
users to explore a topic further.

Editing PHP with TAGS and Emacs or Vim -ish Tools

I'm just starting to use PHP and after a bit of searching tonight found a way to generate pretty reasonable TAGS files for use with emacs. I edited a variant of a script found at the link below, which uses Exuberant CTAGS for generating Vim's tag format.

http://weierophinney.net/matthew/archives/134-exuberant-ctags-with-PHP-in-Vim...

Here's my variant if you can use it, or have suggestions...

#!/bin/bashexec etags \--languages=PHP \--langmap=PHP:+.phpt \-h ".php" -R \--exclude="\.git" \--totals=yes \--tag-relative=yes \--PHP-kinds=+cdf \--regex-PHP='/abstract class ([^ ]*)/\1/c/' \--regex-PHP='/interface ([^ ]*)/\1/c/' \--regex-PHP='/(public |static |abstract |protected |private )+function ([^ (]*)/\2/f/'

I suppose I should put this on github's deal.

Ubuntu hosting and upgrading VirtualBox

This was a wild path to success. This may or may not apply to the open
source edition of virtualbox. My story begins with VirtualBox 3.0 and
ends with a successful upgrade to 3.1 on an Ubuntu host.

I installed VBox 3.0 via a deb package via a click on the VBox site.
When I tried installing the deb for 3.1, dpkg complained about a
conflict with 3.0. Makes sense I guess.

But how to remove 3.0? I had trouble with various tactics.

Here's what finally worked... may you find a more simple path...

1. dpkg -i --force-all
2. Complaints about vboxuser group existing - just delgroup
3. Complaints about a missing dependency on libqt4-opengl - aptitude install it
4. Whole bunch of other stuff falls out of resolving that dependency...
5. ...including the intalled is removed.
6. dpkg -i --force-all
7. Oops - of course, vboxuser group is back so delgroup it again
8. Oops - virtualbox shows up in the System Tools menu twice, so remove one
9. Oops - virtualbox 3.0 shows up in the packages as a broken package, so...
10. Use synaptic gui to filter for that broken package
11. Tell synaptic to fix all broken packages
12. Tell synaptic to, yes, really do apply the request to fix the
broken package by removing it

Now everything seems fine. I saw other approaches to doing such an
upgrade, but this one worked and actually seems more simple and
complete than the others I found.

An Embarrassment of Riches - What's In Your Library

In "Clojure: Where's The Elegance" (
http://www.pointlessrants.com/2009/11/clojure-wheres-the-elegance/ )
the author states a preference for Python and "only one way to do it"
over Clojure's many various functions. The example given was the many
functions for accessing elements of a sequence.

Fair enough - I see this as a "to each one's own" kind of preference
rather than an "elegance" thing. Or maybe I should just come out and
say it. I see this the opposite way: those many various functions are
the result of elegance.

On all those functions, that’s the lisp way…

“It is better to have 100 functions operate on one data structure than
10 functions on 10 data structures.” -Alan Perlis

Clojure now takes a "best of both worlds" approach by applying these
functions to abstract sequences, rather than just lisp lists.

Good news for a n00b is they can just ignore all that for the most
part. You just need to learn the base functions. Then you add more
over time.

e.g. you can learn to use (first seq) and (rest seq) and combine those
to your hearts content. Then when you stumble upon (ffirst seq) you
realize you no longer have to write (first (first seq)) How often do
you need ffirst -well, not nearly as much as first or nth.

Lisp programmers like a large library of variations of a base set of
functions that can all be composed together. That is kind of a
highlight of functional programming. But in OOP the Smalltalk language
takes a similar approach: many variation of base methods, e.g. the
collection API.

I am not sure why Python would prefer a “one way for one thing”
approach, but that would certainly not be the Lisp (or the Smalltalk)
way. They are simple languages but with rich libraries.

One Future (and present) of NoSQL

Around 1986-1988 I was programming applications supporting factory planning and automation (remember MRP, http://en.wikipedia.org/wiki/Material_Requirements_Planning -which led to ERP which led to... gack, nevermind)...

I was using Lisp Machines and a set of tools known as KnowledgeCraft. KC was *hugely* expensive. And the Lisp Machines were not cheap either. The tools on top of Lisp provided a frame-based "semantic network" aka "knowledge representation" language, an OPS5-based forward-chaining rules engine, a Prolog-like backward chaining logic engine, and assorted other tools for graphs, UI, etc.

Did I mention this setup was hugely expensive? These were the last years before the "AI Winter" http://en.wikipedia.org/wiki/AI_winter -what happened to AI from the early 1980s and Japan's "5th Generation" effort http://en.wikipedia.org/wiki/Fifth_generation_computer up to the AI Winter was a more narrowly scoped boom/bust cycle along the lines of the Web 1.0 boom/bust that ended around 2001.

Anyway...

Did I mention that setup was hugley *expressive*?

Only a very small percentage of the worlds developers had access to these kinds of "knowledge representation" tools. They were all using Lisp or Smalltalk.

Fast-forward to 2009, several software revolutions later, and we have the coming "semantic web" -well, maybe. As an evolution of the web itself it will continue to be v.messy.

But my point is the core technologies of the "semantic web" are not at all unlike the core technologies of the "semantic network" 20 years ago. The logic has evolved, to be more formal and to support the different aspects of data on the web v. data in the office.

The main difference is that many of these tools are now open source, running on and/or accessible from many different platforms. Those that are not open source are also easily available and can be used in some cases at little or no cost.

One that really appeals to me is Franz's AllegroGraph (see http://www.franz.com/agraph/services/conferences_seminars/nosql-nyc_gwking_10-5-09.pdf ). A quote on AllegroGraph 4.0 from the recent NoSQL conference in NYC, on that system becoming more near real-time...

"In its newest release, AllegroGraph 4.0 totally breaks with this pattern. Multiple clients can concurrently add data in a transaction. A forward writing transaction log and a check-pointing mechanism provide complete recoverability. All triples are always completely indexed, enabling SPARQL queries, Prolog queries, and reasoning to happen at full speed while other processes are adding data. AllegroGraph does not use materialization: all reasoning is done dynamically and we still achieve industry leading query, reasoning, and loading speeds."

In addition to the database features and the logic reasoning features, other aspects...

* Per-predicate Lucene style text indexing
* 2D and 3D geo-temporal indexing for moving objects
* Social networking toolkit with path finding, importance measures, etc.
* REST protocol

I suspect some of the early NoSQL solutions will continue to serve their purpose, some will run out of gas, and others will evolve. Those that evolve will probably move in many of the directions of the "semantic web" to support easier integration and evolution, and higher-order searching and reasoning.

Software Development - Craft or Engineering? Actually, Neither

On the xpportland yahoo group someone complained that software development is not a "craft" - that it is an "engineering" discipline.

The problem with this dichotomy is that software development is neither craft nor engineering.

Eengineering of the non-software variety has established a more firm set of principles and mathematics. and yet product development that applies those engineering disciplines have to bring other practices as well. product development is N parts engineering and M parts creativity and collaboration (and throwing away any number of bad applications of the engineering practices).

Software development is N parts engineering, but our math is different from most other engineering disciplines. Our "physics" and "chemistry" and "biology" and "sociology" is very different.

Our product development practices are significantly different as well - we're not building another bridge across the same river as last time. At least CPU designers are aiming at pretty much the same instruction set, but with a new process and somewhat different materials.

Our materials are pretty much the same as those used in 1960. But the problems posed by the *product* are significantly different. This makes engineering more than a little important, but collaboration, communication, and creativity *really* important. That aspect still seems to be the essence of software development after my 29 years of software development.

Feeley: An Efficient and General Implementation of Futures on Large Scale Shared-Memory Multiprocessors

Marc Feeley (Gambit Scheme implementor) posted a link to his PhD thesis recently on the R6RS-discuss list the link in his email text follows...


======== Begin Quotes ========
>> - whether or not the implementation decides that the arguments of /
>>  should be evaluated in parallel,
>
> This is all beating the air, because in general no known algorithm is
> smart enough to reliably decide when this kind of argument parallelism
> is a win and when the program will just thrash with thread-creation
> and
> destruction overhead.  Experiments in AND and OR parallelism in
> Prolog-like
> languages established that pretty conclusively.

Not true.  Lazy-task creation will do a very good job dynamically.
Read my PhD thesis for details:

http://www.iro.umontreal.ca/~feeley/papers/futures.ps.gz

The work of Katz and Weise on the semantics of continuations in a
parallel setting (which I describe in my thesis) is also relevant to
this discussion.

It feels like a big fat lie

Tim Bray writes about "tail call optimization"...

"It feels like a big fat lie. It looks like a subroutine call, but in the case where it occurs as the last thing in the routine, it magically, silently, and automatically gets turned into, now how did I put it? “A highly controlled and structured GOTO."

But this is just a perception rather than a reality.

Whether tail calls look like GOTOs or like a function call seems to depend on whether you come from a functional programming background or an imperative programming background.

From an FP POV, there is no such thing as "loop" there is only "apply these args to this function". There is no such thing as a "GOTO" in FP.

The "GOTO" perception is valid, and in fact that is often how tail calls are explained by FP programmers to imperative programmers. But that perception should change if you want to use an FP language. In FP you call functions. Period. Full stop. End of sentence.

In any relatively pure FP language TCO is not an "optimization" so much as a "requirement". There's just no getting around it. You call functions for everything, so function calls *have* to be cheap.