There have been a lot of rough spots in D over the years, which is something to be expected in a language being developed by, in the early days, a one-man band. As more people have joined the development process, the wrinkles have been steadily ironed out. And as that has happened, I’ve been using D more and more. I love it, of course. It’s a wonderful language with a great deal of potential. But as I’ve used it more, I’ve found myself frustrated on occasion when dealing with Phobos.
The version of Phobos in D1 was heavily criticized as being subpar, resulting in the community-driven Tango project. With D2, that criticism disappeared. Personally, I’ll say one good thing about D1 Phobos that I miss: it’s intuitive.
A couple of days ago I was putting together a script to add some automation to my process of adding bindings to Derelict (yes, I’m moving into the 21st century). For a particular piece of code, I wanted to take a block of text and split it on a specific character. I know I’ve used std.string.split before. And I’m quite certain I’ve used a version that takes a parameter specifying the character to split on (it may have been D1). Well, times have changed.
The documentation for std.string contains the following:
IMPORTANT NOTE: Beginning with version 2.052, the following symbols have been generalized beyond strings and moved to different modules. This action was prompted by the fact that generalized routines belong better in other places, although they still work for strings as expected.
This notice is followed by a list of methods that have been moved either to std.algorithm or std.array. I had noticed it before when I needed to use std.string.insert, which is now std.array.insertInPlace. As it happens, the split method also has been moved to std.array.
So I go to the docs for std.array.split. And I see this:
Split the string s into an array of words, using whitespace as delimiter. Runs of whitespace are merged together (no empty words are produced).
WTF? I thought this was supposed to be a generalized function, hence the move to std.array. Yet it still operates on strings and splits on whitespace. If that’s the case, then doesn’t it make more sense to keep it in std.string? Well, whatever.
So, I look for the version that allows me to specify the character to split on. And… it doesn’t exist. Not in std.array, not in std.string. Given that I’ve already wasted enough time looking for it, I decide to take a different approach to implementing my script. No big deal.
Then today, I see this post by Chad J. in the D.learn newsgroup. My first thought is, “Good to see I’m not the only one who thinks this way.” He’s also looking for a string splitter that lets you specify the separator. simendsjo gives the answer:
See http://dlang.org/phobos/std_algorithm.html#splitter
Seriously? So if I want to split a string using whitespace, the split function that operates on strings using whitespace as the separator is std.array.split rather than std.string.split. And if I want to specify a separator, I need to use std.algorithm.splitter instead.
I can’t think of any word to describe this other than ridiculous. And this sort of thing comes up time and again when working with ranges. While working on the same script, I wanted to use std.algorithm.find. The example in the documentation shows this:
auto a = [ 1, 2, 3 ];
assert(find(a, 5).empty); // not found
assert(!find(a, 2).empty); // found
OK, easy enough. But then I get this error when compiling:
undefined identifier ‘empty
Huh? After more head scratching and keyboard banging, I realize I’m supposed to import std.range in order to get access to the ‘empty’ property of ranges. Given that the functions in std.algorithm all operate on ranges, shouldn’t it be importing std.range publicly so that I don’t have to?
Every time I want to do something simple with ranges, I always have to dig through not just the documentation, but the source to Phobos so that I can see exactly what’s happening and try to figure out why I’m getting the errors that inevitably pop up. For this reason, I try to avoid ranges as much as I can. But they keep intruding every time I want to do something simple like split a string.
What’s more, ranges pervade Phobos. Some time ago I was putting together a build script for Derelict and needed to use std.file.dirEntries to iterate the files in a directory tree. What I really wanted was an array of files. What I got was an “InputRange”. It took a bit of trial and error (several errors) and digging around the Phobos source before I could finally do something with it.
A while back, I was quite proud of myself when I finally grokked the basics of what ranges are and how they operate. But as implemented in Phobos, they aren’t intuitive to use at all. With methods spread out across several modules, I don’t see how anybody keeps everything straight. I shouldn’t have to look all over the place to split a string. Yes, strings are arrays and arrays are ranges. But, conceptually, strings are strings. IMO, std.string should be the place to look for string operations. Furthermore, I shouldn’t need to import three different modules to do one operation, as you often have to do when you find yourself suddenly dealing with ranges in a module where you didn’t expect to find them.
As I work with this stuff more, it will eventually become second nature to me. I’ll know that I need to look in std.algorithm for this, or std.array for that. But for now, the learning curve is steep. And I don’t think it’s just a matter of documentation. I think the layout of Phobos needs to be reconsidered. At the very least, std.range, std.algorithm, and std.array all need to be available with one import since they are so tightly coupled. Also, range/array operations that specialize on strings have no business in std.array or std.algorithm. They belong in std.string. That’s the only intuitive place to put them. Otherwise, why have the ‘string’ alias at all?
This is definitely a problem, though I’d say using Java/C# I don’t look for what I need in the documentation, I search to get to the correct documentation page.
Also you have it wrong. std.array is imported so that arrays have the Range functions needed, not std.range. And std.array.array() will be your friend.
Ranges are very well done in D. They have their quirks but they really help to get things fitting together. I almost never wrote an iterator in Java/C#, they just don’t have much use. In D, I create ranges all the time since I will have all of the std.algorithm and std.range functions available after doing so. Actually I don’t use std.range much but a common import list is algorithm, string, array.
I have no problem with ranges. I think they’re just fine. It’s just that getting any work done with them is painful if you don’t already know the ins and outs of all of the functions across the different modules and how they tie together. I don’t always know where to look for a particular function.
The std.algorithm.splitter/std.array.split case is a perfect example. I’ve tried to find the sense in it, but, to me at least, there is none. It goes against everything I’ve ever learned. In Java, when I want to manipulate a string, I use the String class methods. In C, I use string.h. In D, I have to use one of std.string, std.array, or std.algorithm. How many people picking up D for the first time would intuitively realize they need std.algorithm to split a string on a specific character? The first place they’ll look is std.string. And then when they can’t find it, there’s nothing that indicates std.algorithm is the place to look.
Some people have no problem with this stuff. I see in the newsgroup all the time people answering questions, pointing to functions in std.algorithm or posting code samples using stuff I never would have thought to look for. What I want to know is, was it intuitive for them in the first place, or did they figure it out by trial and error. If it’s the latter, then definitely there’s a problem. Such a major component of the language shouldn’t be so cryptic.
@Aldacron: I would suspect that people learn this stuff either by trial-and-error or by reading/making posts about it.
The only reason I think we don’t see more complaints is due to the politeness/humility/laziness of the community members.
The promising thing to me is that phobos does move forward. It’s not in lockdown like the language itself. These things can be fixed. I do think phobos has been improving over time, too. Ranges are a really amazing concept in D, and the way they integrate with low/no-overhead templating to accomplish a bunch of expressive functional concepts is, AFAIK, very unique and powerful.
I just think we need to post more about concrete instances of these “WTF?” moments, make bug reports, and make pull requests. Chances are, Andrei, like most of us, has a large degree of focus on whatever new feature he’s working on at the moment and will miss things in the older modules like std.string since he’s not focusing on them. It will probably help him a lot (and anyone else maintaining phobos with commit access) if people provide concrete and actionable requests to fix these things that most people would agree need to be fixed.
I’m not sure if I misunderstood your post or missed something, but…
In http://dlang.org/phobos/std_array.html#split , right after the documentation for “splitter”, you can find the version of “split” that takes an arbitrary separator. I don’t think it works with splitting a string by a single character, but it works with splitting a string by another string. (The pattern str.split(delim) is common in my code.)
Chad J’s post is a little different in that he seemed to be looking for something that works on ranges, not arrays. I agree that as it is, “splitter” definitely does not belong in std.array, though.
Oops. Thanks for pointing that out. I feel a bit silly now, as that’s the bit that I found the most annoying.
Well, then your gripe should be “the documentation isn’t well organized,” which I agree with.
What D really needs at this stage in its lifecycle is a usability nerd to focus all their attention on ironing out issues like this.