Download | Discuss | Source code | Reference | Official μSTL homepage
Last updated: April 20, 2012 - r2a (based on μSTL v1.6)
The C++ standard template library (STL) is a collection of common containers and algorithms in template form. Unfortunately its standard incarnation shipped with gcc is implemented without much concern for code size. Not only is the library itself large, the current version being over a megabyte in size, but with all the code you instantiate by using a vector for each of your containers, it is easy to become fearful and opt for using static arrays instead or, worse yet, abandon C++ altogether for C.
The problem is aggravated by the Nintendo DS' small RAM: just 4 MB. Not only gcc's STL implementation is frowned upon, it also eats up a lot of the preciously-limited RAM! Enter μSTL. Its FeOS port is only 11 KB in size, and a simple program that uses vectors, strings and UTF-8 is just 2 KB. This is why it's FeOS' recommended STL implementation.
The FeOS port of μSTL is very easy to install. All you need to do is
copy this folder to your sdk/userlib
directory, and issue
this command:
make install
In order to link to μSTL, you have to edit your project's Makefile in order to add several variables:
CONF_USERLIBS := ustl CONF_LIBS := -lustl
Finally, here's a simple hello world application:
#include <stdio.h> #include <ustl.h> using namespace ustl; int main() { string s("Hello world!"); printf("%s\n", s.c_str()); return 0; }
STL containers provide a generic abstraction to arrays, linked lists, and other methods of memory allocation. They offer the advantages of type-safety, the peace of mind that comes from never having to malloc anything again, and a standard access API called iterators. Each container's API is equivalent to that of a simple array, with iterators being the equivalent of pointers into the array. The uniform access API allows creation of standardized algorithms, discussed futher down, that work on any container. Here are some examples of using vector, the container representing a simple array:
vector<int> v; v.push_back(1); v.push_back(2); v[1] = 0; v.erase(v.begin() + 1); v.pop_back(); v.insert(v.begin(), 4); v.resize(15);
As you can see, a vector is basically the same thing as the arrays you use now, except that it is resizable. The function names ought to be self-explanatory with the exception of the addressing arguments. You can do index addressing and get free bounds checking with asserts. Incidentally, I highly recommend you work with a debug build when writing code; μSTL is chock full of various asserts checking for error conditions. In the optimized build, most such errors will be silently ignored where possible and will cause crashes where not. That is so because they are programmer errors, existing because you have a bug in your code, not because the user did something wrong, or because of some system failure. Programmer errors assert. User or system errors throw exceptions.
Vectors are addressed with iterators, which are just like pointers (and
usually are). Calling begin()
gives you the pointer to the first element,
calling end()
gives you the pointer to the end of the last element. No,
not the last element, the end of it, or, more accurately, the end of the
array. It's that way so you can keep incrementing an iterator until it
is equal to the end()
value, at which point you know you have processed
all the elements in the list. This brings me to demonstrate how you
ought to do that:
foreach(vector<int>::iterator, i, v) if (*i < 5 || *i > 10) *i = 99;
The foreach()
macro is a μSTL-only extension.
It is a great way to ensure you don't forget to increment
the counter or run past the end of the vector. The only catch to be aware
of, when inside an iterator loop, is that if you modify the container,
by adding or removing entries, you have to update the iterator, since the
container memory storage may have moved when resized. So, for example,
if you wish to remove certain types of elements, you'd need to do use
an index loop or something like:
foreach(vector<CEmployee>::iterator, i, employees) if (i->m_Salary > 50000 || i->m_Performance < 100) --(i = employees.erase(i));
This is pretty much all there is to say about containers. Create them,
use them, resize them, that's what they are for. There are other
container types, but you will probably not use them much. There's
set
, which is a perpetually sorted vector, useful when you
want to binary_search()
a large collection. There's map
which
is an associative container where you can look up entries by key. Its
utility goes down drastically when you have complex objects that need to
be searched with more than one parameter, in which cast you are better
off with vector
and foreach
. I have never needed the others, and do
not recommend their use. Their implementations are fully functional,
but do not conform to STL complexity guarantees and are implemented as
aliases to vector, which naturally changes their performance parameters.
Every program uses strings, and STL was kind enough to provide a
specification. μSTL deviates a bit from the standard by not implementing
wchar strings. There is only one string
class, which assumes
all your strings will be UTF-8 encoded, and provides some additional
functionality to make working with those easier. I did that for the same
reason I dropped the locale classes; bloat. It is simply too expensive to
implement the standard locale classes, as the enormous size of libstdc++
illustrates.
Anyway, back to strings. You can think of the string object as a char vector with some additional operations built-in, like searching, concatenation, etc.
string s("Hello"); s += ' '; s += "world?"; s.replace(s.find ('?'), 1, "!"); s[3] = s[s.find_first_of("lxy")]; s[s.rfind('w')] = 'W'; s.format("A long %zd number of 0x%08lX", 345u, 0x12345); printf("%s\n", s.c_str());
A non-standard behaviour you may encounter is from linked strings created by the string constructor when given a null-terminated const string. In the above example, the constructor links when given a const string and stays as a const link until the space is added. If you try to write to it, you'll get an assert telling you to use copy_link first to convert the link into a copy. Resizing the linked object automatically does that for you, so most of the time it is transparent. You may also encounter another instance of this if you try getting iterators from such an object. The compiler uses the non-const accessors by default for local objects, so you may need to declare it as a const string if you don't wish to copy_link. Why does μSTL string link instead of copying? To save space and time. All those strings are already in memory, so why waste heap space and processor time to copy them if you just want to read them? I thought it a good tradeoff, considering that it is trasparent for the most common uses.
Other non-standard extensions include a format()
function to
give you the functionality of sprintf for string objects. Another is
the UTF-8 stuff. Differing a bit from the standard, size()
returns the string length in bytes, length()
in characters.
You can iterate by characters instead of bytes with a special UTF-8
iterator:
for(string::utf8_iterator i = s.utf8_begin(); i < s.utf8_end(); ++ i) DrawChar(*i);
or just copy all the chars into an array and iterate over that:
vector<int> result(s.length()); copy(s.utf8_begin(), s.utf8_end(), result.begin());
A few words must be said regarding reading wide characters. The shortest possible rule to follow is "don't!". I have received a few complaints about the fact that all offsets given to and returned by string functions are byte offsets and not character offsets. The problem with modifying or even looking for specific wide characters is that you are not supposed to know what they are. Your strings will be localized into many languages and it is impossible for you to know how the translation will be accomplished. As a result, whenever you are hardcoding a specific character value, or a specific character length (like a three-character extension), you are effectively hardcoding yourself into a locale. The only valid operation on localized strings is parsing it via standard delimiters, treating anything between those delimiters as opaque blocks. For this reason, whenever you think you need to do something at a particular character offset, you should recognize it as a mistake and find the offset by the content that is supposed to be there.
If this philosophy is consistently followed, it becomes clear that
actual character boundaries are entirely irrelevant. There are only
two exceptions to this: first occurs if you are writing a text editor
and want to insert user data at a character position, the second occurs
if you are writing a font renderer and want to translate characters to
glyphs. In both cases you should make use of the utf8_iterator
to find
character boundaries and values. Given that these two cases apply to
just a handful of people who are involved in implementing user interface
frameworks, I believe that the opacity restriction is well justified by
the amount of code space it saves for the vast majority of library users.
Algorithms are the other half of STL. They are simply templated common
tasks that take iterator arguments, and as a result, work with any
container. Most will take an iterator range, like (v.begin(), v.end())
,
but you can, of course operate on a subset of a container by giving a
different one. Because the usual operation is to use the whole container,
μSTL provides versions of most algorithms that take container arguments
instead of the iterator range. Here are the algorithms you will actually
find useful:
copy(v1, v2.begin()); // Copies vector v1 to vector v2. fill(v, 5); // Fills v with fives. copy_n(v1, 5, v2.begin()); // Copies first five elements only. fill_n(v.begin() + 5, 10, 5); // Fills elements 5-15 with fives. sort(v); // Sorts v. find(v, 14); // Finds 14 in v, returning its iterator. binary_search(v, 13); // Looks up 13 with binary search in a sorted vector. lower_bound(v, 13); // Returns the iterator to where you want to insert 13. iota(v.begin(), v.end(), 0); // Puts 0,1,2,3,4,... into v. reverse(v); // Reverses all the elements in v.
The rest you can discover for yourself. There are obscure mathematical
operations, like inner_product()
, set operations, heap operations, and
lots and lots of predicate algorithms. The latter are algorithms that
take a functor (an object that can be called like a function) and were
supposed to help promote code reuse by encapsulating common operations.
For example, STL expects you to use the for_each()
algorithm and
write a little functor for all your iterative tasks:
class CCompareAndReplace { public: CCompareAndReplace (int minValue, int maxValue, int badValue) : m_MinValue (minValue), m_MaxValue (maxValue), m_BadValue (badValue) {} void operator(int& v) { if(v < m_MinValue || v > m_MaxValue) v = m_BadValue; } private: int m_MinValue; int m_MaxValue; int m_BadValue; }; for_each(v.begin(), v.end(), CCompareAndReplace(5, 10, 99));
And yes, it really does work. Doesn't always generate much bloat either,
since the compiler can often see right through all this trickery and
expand the for_each()
into a loop without actually creating the functor
object. However, the compiler has a much harder time when you start
using containers of complex objects or operating on member variables
and member functions. Since that is what you will most likely have in
any real code outside the academic world, the utility of predicate
algorithms is questionable. Their readability is even more so,
considering that the above fifteen line example can be written as a
three line iterative foreach loop. Finally, there is the problem of
where to put the functor. It just doesn't seem to "belong" anywhere in
the object-oriented world. (C++11 changes that somewhat with lambda
functions) Sorry, Stepanov, I just don't see how these things can be
anything but an ugly, bloated hindrance.
The STL specification is only about containers and algorithms, the stuff described from here on is totally non-standard.
The major difference between the standard STL implementation and μSTL is
that the former has memory management stuff all over the place, while
the latter keeps it all together in the memblock
class. Normally
STL containers are resized by calling new
to create more storage
and then copying the elements there from the old one. This method wastes
space by fragmenting memory, wastes time by copying all the existing data
to the new location, and wastes codespace by having to instantiate all
the resizing code for each and every container type you have. This method
is also absolutely necessary to do this resizing in a perfectly object-safe
way. The μSTL way is to manage memory as an opaque, typeless block, and
then use the container templates to cast it to an appropriate pointer type.
This works just fine, except for one little catch: there is one type of object you can't store in μSTL containers -- the kind that has pointers to itself. In other implementations, resizing actually creates new objects in the new location and destroys them in the old location. μSTL simply memcpys them there without calling the copy constructor. In other words, the object can not rely on staying at the same address. Most objects really don't care. Note that this is not the same thing as doing a bitwise copy, that you were rightly warned against before! It's a bitwise move that doesn't create a new object, but simply relocates an existing one.
What this one small concession does is allow aggregation of all memory
management in one place, namely, the memblock
class. All the
containers are thus converted mostly into typecasting wrappers that
exist to ensure type safety. Look at the assembly code and you'll see
mostly calls to memblock
's functions. This is precisely the feature
that allows reduction in code instantiated by container templates.
However, memblock
's usefulness doesn't end there! It can now replace
all your dynamically allocated buffers that you use for unstructured
data. Need to read a file? Don't use new to allocate memory; use a
memblock
!
memblock
is derived from memlink
, an object for linking to a memory
block. Now you get to store a pointer and the size of whatever it
points to, but with μSTL you can use a memlink
object to keep them
together, reducing source clutter and making your code easier to
read and maintain. You can link to constant blocks too with cmemlink
,
from which memlink
is derived. Because all three are in a single
hierarchy, you never need to care whether you're working on an
allocated block or on somebody else's allocated block. Pointers are
kept together with block sizes, memory is freed when necessary,
and you never have to call new or delete again. Who needs garbage
collection? memblock
gives you the same functionality at a fraction
of the cost.
Linking is not limited to memlink
. You can link memblock
objects.
You can link string
objects. You can even link containers! Now
you can use alloca to create a vector on the stack; use the
typed_alloca_link(v,int,99)
macro. All linked objects
will allocate memory and copy the linked data when you increase their
size. You can also do it explicitly by calling copy_link
.
Why link? It's cheaper than copying and easier than keeping track
of pointers. For example, here's a line parser (FIXME: implement read_file()
and write_file()
without using C++ streams)
string buf, line; buf.read_file("some_config_file.txt"); for(uoff_t i = 0; i < buf.size(); i += line.size() + 1) { line.link(buf.iat(i), buf.iat(buf.find('\n', i))); process_line(line); }
This way process_line()
gets a string object instead of a pointer and
a size. If you don't rely on the string being null-terminated, which
basically means not using libc functions on it, this is all you need.
Otherwise buf
will have to be writable and you can replace the newline
with a null. In either case you are using no extra heap. The overhead
of link is negligible in most cases, but if you really want to do this
in a tight loop, you can use relink()
, which expands completely
inline into one or two instructions, avoiding the virtual unlink()
call.
The C++ standard library specification defines global stream objects called cin
,
cout
, and cerr
to replace printf()
and friends for accessing stdin, stdout,
and stderr, respectively; as well as fstream
for accessing files.
Since their added value is small when it comes to basic IO functionality and
they are memory hungry, they are not included in the FeOS port of μSTL.
One last container I'll mention is a tuple
, which is a
fixed-size array of identical elements. No, it's not the same as the tuple
in boost, which is more like a template-defined struct. This one should
have been named "array", which is what it will be called in the next STL
standard, but I guess I'm stuck with the name now. What are they good
for? Graphical objects. Points, sizes, rectangles, triangles, etc. Any fixed size-array also works better as a tuple,
since it becomes a standard STL container, which you can use with any
algorithm, copy by assignment, initialize in the constructor, etc.
typedef int32_t coord_t; typedef tuple<2, coord_t> Point2d; typedef tuple<2, coord_t> Size2d; typedef tuple<2, Point2d> Rect; Rect r(Point2d(1,2), Point2d(3,4)); r += Size2d(4, 4); r[1] -= Size2d(1, 1); foreach(Rect::iterator, i, r) TransformPoint(*i); Point2d pt(1, 2); pt += r[0]; pt *= 2;
μSTL implements all the standard exception classes defined by the
C++ standard. The exception tree is always derived
from std::exception
.
Due to Nintendo DS constraints, libc_exception
, file_exception
and
stream_bounds_exception
are not included in the FeOS port.