Code Monkey home page Code Monkey logo

sds's Introduction

Simple Dynamic Strings

Notes about version 2: this is an updated version of SDS in an attempt to finally unify Redis, Disque, Hiredis, and the stand alone SDS versions. This version is NOT binary compatible* with SDS verison 1, but the API is 99% compatible so switching to the new lib should be trivial.

Note that this version of SDS may be a slower with certain workloads, but uses less memory compared to V1 since header size is dynamic and depends to the string to alloc.

Moreover it includes a few more API functions, notably sdscatfmt which is a faster version of sdscatprintf that can be used for the simpler cases in order to avoid the libc printf family functions performance penalty.

How SDS strings work

SDS is a string library for C designed to augment the limited libc string handling functionalities by adding heap allocated strings that are:

  • Simpler to use.
  • Binary safe.
  • Computationally more efficient.
  • But yet... Compatible with normal C string functions.

This is achieved using an alternative design in which instead of using a C structure to represent a string, we use a binary prefix that is stored before the actual pointer to the string that is returned by SDS to the user.

+--------+-------------------------------+-----------+
| Header | Binary safe C alike string... | Null term |
+--------+-------------------------------+-----------+
         |
         `-> Pointer returned to the user.

Because of meta data stored before the actual returned pointer as a prefix, and because of every SDS string implicitly adding a null term at the end of the string regardless of the actual content of the string, SDS strings work well together with C strings and the user is free to use them interchangeably with other std C string functions that access the string in read-only.

SDS was a C string I developed in the past for my everyday C programming needs, later it was moved into Redis where it is used extensively and where it was modified in order to be suitable for high performance operations. Now it was extracted from Redis and forked as a stand alone project.

Because of its many years life inside Redis, SDS provides both higher level functions for easy strings manipulation in C, but also a set of low level functions that make it possible to write high performance code without paying a penalty for using an higher level string library.

Advantages and disadvantages of SDS

Normally dynamic string libraries for C are implemented using a structure that defines the string. The structure has a pointer field that is managed by the string function, so it looks like this:

struct yourAverageStringLibrary {
    char *buf;
    size_t len;
    ... possibly more fields here ...
};

SDS strings as already mentioned don't follow this schema, and are instead a single allocation with a prefix that lives before the address actually returned for the string.

There are advantages and disadvantages with this approach over the traditional approach:

Disadvantage #1: many functions return the new string as value, since sometimes SDS requires to create a new string with more space, so the most SDS API calls look like this:

s = sdscat(s,"Some more data");

As you can see s is used as input for sdscat but is also set to the value returned by the SDS API call, since we are not sure if the call modified the SDS string we passed or allocated a new one. Not remembering to assign back the return value of sdscat or similar functions to the variable holding the SDS string will result in a bug.

Disadvantage #2: if an SDS string is shared in different places in your program you have to modify all the references when you modify the string. However most of the times when you need to share SDS strings it is much better to encapsulate them into structures with a reference count otherwise it is too easy to incur into memory leaks.

Advantage #1: you can pass SDS strings to functions designed for C functions without accessing a struct member or calling a function, like this:

printf("%s\n", sds_string);

In most other libraries this will be something like:

printf("%s\n", string->buf);

Or:

printf("%s\n", getStringPointer(string));

Advantage #2: accessing individual chars is straightforward. C is a low level language so this is an important operation in many programs. With SDS strings accessing individual chars is very natural:

printf("%c %c\n", s[0], s[1]);

With other libraries your best chance is to assign string->buf (or call the function to get the string pointer) to a char pointer and work with this. However since the other libraries may reallocate the buffer implicitly every time you call a function that may modify the string you have to get a reference to the buffer again.

Advantage #3: single allocation has better cache locality. Usually when you access a string created by a string library using a structure, you have two different allocations for the structure representing the string, and the actual buffer holding the string. Over the time the buffer is reallocated, and it is likely that it ends in a totally different part of memory compared to the structure itself. Since modern programs performances are often dominated by cache misses, SDS may perform better in many workloads.

SDS basics

The type of SDS strings is just the char pointer char *. However SDS defines an sds type as alias of char * in its header file: you should use the sds type in order to make sure you remember that a given variable in your program holds an SDS string and not a C string, however this is not mandatory.

This is the simplest SDS program you can write that does something:

sds mystring = sdsnew("Hello World!");
printf("%s\n", mystring);
sdsfree(mystring);

output> Hello World!

The above small program already shows a few important things about SDS:

  • SDS strings are created, and heap allocated, via the sdsnew() function, or other similar functions that we'll see in a moment.
  • SDS strings can be passed to printf() like any other C string.
  • SDS strings require to be freed with sdsfree(), since they are heap allocated.

Creating SDS strings

sds sdsnewlen(const void *init, size_t initlen);
sds sdsnew(const char *init);
sds sdsempty(void);
sds sdsdup(const sds s);

There are many ways to create SDS strings:

  • The sdsnew function creates an SDS string starting from a C null terminated string. We already saw how it works in the above example.

  • The sdsnewlen function is similar to sdsnew but instead of creating the string assuming that the input string is null terminated, it gets an additional length parameter. This way you can create a string using binary data:

    char buf[3];
    sds mystring;
    
    buf[0] = 'A';
    buf[1] = 'B';
    buf[2] = 'C';
    mystring = sdsnewlen(buf,3);
    printf("%s of len %d\n", mystring, (int) sdslen(mystring));
    
    output> ABC of len 3

    Note: sdslen return value is casted to int because it returns a size_t type. You can use the right printf specifier instead of casting.

  • The sdsempty() function creates an empty zero-length string:

    sds mystring = sdsempty();
    printf("%d\n", (int) sdslen(mystring));
    
    output> 0
  • The sdsdup() function duplicates an already existing SDS string:

    sds s1, s2;
    
    s1 = sdsnew("Hello");
    s2 = sdsdup(s1);
    printf("%s %s\n", s1, s2);
    
    output> Hello Hello

Obtaining the string length

size_t sdslen(const sds s);

In the examples above we already used the sdslen function in order to get the length of the string. This function works like strlen of the libc except that:

  • It runs in constant time since the length is stored in the prefix of SDS strings, so calling sdslen is not expensive even when called with very large strings.
  • The function is binary safe like any other SDS string function, so the length is the true length of the string regardless of the content, there is no problem if the string includes null term characters in the middle.

As an example of the binary safeness of SDS strings, we can run the following code:

sds s = sdsnewlen("A\0\0B",4);
printf("%d\n", (int) sdslen(s));

output> 4

Note that SDS strings are always null terminated at the end, so even in that case s[4] will be a null term, however printing the string with printf would result in just "A" to be printed since libc will treat the SDS string like a normal C string.

Destroying strings

void sdsfree(sds s);

The destroy an SDS string there is just to call sdsfree with the string pointer. Note that even empty strings created with sdsempty need to be destroyed as well otherwise they'll result into a memory leak.

The function sdsfree does not perform any operation if instead of an SDS string pointer, NULL is passed, so you don't need to check for NULL explicitly before calling it:

if (string) sdsfree(string); /* Not needed. */
sdsfree(string); /* Same effect but simpler. */

Concatenating strings

Concatenating strings to other strings is likely the operation you will end using the most with a dynamic C string library. SDS provides different functions to concatenate strings to existing strings.

sds sdscatlen(sds s, const void *t, size_t len);
sds sdscat(sds s, const char *t);

The main string concatenation functions are sdscatlen and sdscat that are identical, the only difference being that sdscat does not have an explicit length argument since it expects a null terminated string.

sds s = sdsempty();
s = sdscat(s, "Hello ");
s = sdscat(s, "World!");
printf("%s\n", s);

output> Hello World!

Sometimes you want to cat an SDS string to another SDS string, so you don't need to specify the length, but at the same time the string does not need to be null terminated but can contain any binary data. For this there is a special function:

sds sdscatsds(sds s, const sds t);

Usage is straightforward:

sds s1 = sdsnew("aaa");
sds s2 = sdsnew("bbb");
s1 = sdscatsds(s1,s2);
sdsfree(s2);
printf("%s\n", s1);

output> aaabbb

Sometimes you don't want to append any special data to the string, but you want to make sure that there are at least a given number of bytes composing the whole string.

sds sdsgrowzero(sds s, size_t len);

The sdsgrowzero function will do nothing if the current string length is already len bytes, otherwise it will enlarge the string to len just padding it with zero bytes.

sds s = sdsnew("Hello");
s = sdsgrowzero(s,6);
s[5] = '!'; /* We are sure this is safe because of sdsgrowzero() */
printf("%s\n', s);

output> Hello!

Formatting strings

There is a special string concatenation function that accepts a printf alike format specifier and cats the formatted string to the specified string.

sds sdscatprintf(sds s, const char *fmt, ...) {

Example:

sds s;
int a = 10, b = 20;
s = sdsnew("The sum is: ");
s = sdscatprintf(s,"%d+%d = %d",a,b,a+b);

Often you need to create SDS string directly from printf format specifiers. Because sdscatprintf is actually a function that concatenates strings, all you need is to concatenate your string to an empty string:

char *name = "Anna";
int loc = 2500;
sds s;
s = sdscatprintf(sdsempty(), "%s wrote %d lines of LISP\n", name, loc);

You can use sdscatprintf in order to convert numbers into SDS strings:

int some_integer = 100;
sds num = sdscatprintf(sdsempty(),"%d\n", some_integer);

However this is slow and we have a special function to make it efficient.

Fast number to string operations

Creating an SDS string from an integer may be a common operation in certain kind of programs, and while you may do this with sdscatprintf the performance hit is big, so SDS provides a specialized function.

sds sdsfromlonglong(long long value);

Use it like this:

sds s = sdsfromlonglong(10000);
printf("%d\n", (int) sdslen(s));

output> 5

Trimming strings and getting ranges

String trimming is a common operation where a set of characters are removed from the left and the right of the string. Another useful operation regarding strings is the ability to just take a range out of a larger string.

void sdstrim(sds s, const char *cset);
void sdsrange(sds s, int start, int end);

SDS provides both the operations with the sdstrim and sdsrange functions. However note that both functions work differently than most functions modifying SDS strings since the return value is void: basically those functions always destructively modify the passed SDS string, never allocating a new one, because both trimming and ranges will never need more room: the operations can only remove characters from the original string.

Because of this behavior, both functions are fast and don't involve reallocation.

This is an example of string trimming where newlines and spaces are removed from an SDS strings:

sds s = sdsnew("         my string\n\n  ");
sdstrim(s," \n");
printf("-%s-\n",s);

output> -my string-

Basically sdstrim takes the SDS string to trim as first argument, and a null terminated set of characters to remove from left and right of the string. The characters are removed as long as they are not interrupted by a character that is not in the list of characters to trim: this is why the space between "my" and "string" was preserved in the above example.

Taking ranges is similar, but instead to take a set of characters, it takes to indexes, representing the start and the end as specified by zero-based indexes inside the string, to obtain the range that will be retained.

sds s = sdsnew("Hello World!");
sdsrange(s,1,4);
printf("-%s-\n");

output> -ello-

Indexes can be negative to specify a position starting from the end of the string, so that -1 means the last character, -2 the penultimate, and so forth:

sds s = sdsnew("Hello World!");
sdsrange(s,6,-1);
printf("-%s-\n");
sdsrange(s,0,-2);
printf("-%s-\n");

output> -World!-
output> -World-

sdsrange is very useful when implementing networking servers processing a protocol or sending messages. For example the following code is used implementing the write handler of the Redis Cluster message bus between nodes:

void clusterWriteHandler(..., int fd, void *privdata, ...) {
    clusterLink *link = (clusterLink*) privdata;
    ssize_t nwritten = write(fd, link->sndbuf, sdslen(link->sndbuf));
    if (nwritten <= 0) {
        /* Error handling... */
    }
    sdsrange(link->sndbuf,nwritten,-1);
    ... more code here ...
}

Every time the socket of the node we want to send the message to is writable we attempt to write as much bytes as possible, and we use sdsrange in order to remove from the buffer what was already sent.

The function to queue new messages to send to some node in the cluster will simply use sdscatlen in order to put more data in the send buffer.

Note that the Redis Cluster bus implements a binary protocol, but since SDS is binary safe this is not a problem, so the goal of SDS is not just to provide an high level string API for the C programmer but also dynamically allocated buffers that are easy to manage.

String copying

The most dangerous and infamus function of the standard C library is probably strcpy, so perhaps it is funny how in the context of better designed dynamic string libraries the concept of copying strings is almost irrelevant. Usually what you do is to create strings with the content you want, or concatenating more content as needed.

However SDS features a string copy function that is useful in performance critical code sections, however I guess its practical usefulness is limited as the function never managed to get called in the context of the 50k lines of code composing the Redis code base.

sds sdscpylen(sds s, const char *t, size_t len);
sds sdscpy(sds s, const char *t);

The string copy function of SDS is called sdscpylen and works like that:

s = sdsnew("Hello World!");
s = sdscpylen(s,"Hello Superman!",15);

As you can see the function receives as input the SDS string s, but also returns an SDS string. This is common to many SDS functions that modify the string: this way the returned SDS string may be the original one modified or a newly allocated one (for example if there was not enough room in the old SDS string).

The sdscpylen will simply replace what was in the old SDS string with the new data you pass using the pointer and length argument. There is a similar function called sdscpy that does not need a length but expects a null terminated string instead.

You may wonder why it makes sense to have a string copy function in the SDS library, since you can simply create a new SDS string from scratch with the new value instead of copying the value in an existing SDS string. The reason is efficiency: sdsnewlen will always allocate a new string while sdscpylen will try to reuse the existing string if there is enough room to old the new content specified by the user, and will allocate a new one only if needed.

Quoting strings

In order to provide consistent output to the program user, or for debugging purposes, it is often important to turn a string that may contain binary data or special characters into a quoted string. Here for quoted string we mean the common format for String literals in programming source code. However today this format is also part of the well known serialization formats like JSON and CSV, so it definitely escaped the simple goal of representing literals strings in the source code of programs.

An example of quoted string literal is the following:

"\x00Hello World\n"

The first byte is a zero byte while the last byte is a newline, so there are two non alphanumerical characters inside the string.

SDS uses a concatenation function for this goal, that concatenates to an existing string the quoted string representation of the input string.

sds sdscatrepr(sds s, const char *p, size_t len);

The scscatrepr (where repr means representation) follows the usualy SDS string function rules accepting a char pointer and a length, so you can use it with SDS strings, normal C strings by using strlen() as len argument, or binary data. The following is an example usage:

sds s1 = sdsnew("abcd");
sds s2 = sdsempty();
s[1] = 1;
s[2] = 2;
s[3] = '\n';
s2 = sdscatrepr(s2,s1,sdslen(s1));
printf("%s\n", s2);

output> "a\x01\x02\n"

This is the rules sdscatrepr uses for conversion:

  • \ and " are quoted with a backslash.
  • It quotes special characters '\n', '\r', '\t', '\a' and '\b'.
  • All the other non printable characters not passing the isprint test are quoted in \x.. form, that is: backslash followed by x followed by two digit hex number representing the character byte value.
  • The function always adds initial and final double quotes characters.

There is an SDS function that is able to perform the reverse conversion and is documented in the Tokenization section below.

Tokenization

Tokenization is the process of splitting a larger string into smaller strings. In this specific case, the split is performed specifying another string that acts as separator. For example in the following string there are two substrings that are separated by the |-| separator:

foo|-|bar|-|zap

A more common separator that consists of a single character is the comma:

foo,bar,zap

In many progrems it is useful to process a line in order to obtain the sub strings it is composed of, so SDS provides a function that returns an array of SDS strings given a string and a separator.

sds *sdssplitlen(const char *s, int len, const char *sep, int seplen, int *count);
void sdsfreesplitres(sds *tokens, int count);

As usually the function can work with both SDS strings or normal C strings. The first two arguments s and len specify the string to tokenize, and the other two arguments sep and seplen the separator to use during the tokenization. The final argument count is a pointer to an integer that will be set to the number of tokens (sub strings) returned.

The return value is a heap allocated array of SDS strings.

sds *tokens;
int count, j;

sds line = sdsnew("Hello World!");
tokens = sdssplitlen(line,sdslen(line)," ",1,&count);

for (j = 0; j < count; j++)
    printf("%s\n", tokens[j]);
sdsfreesplitres(tokens,count);

output> Hello
output> World!

The returned array is heap allocated, and the single elements of the array are normal SDS strings. You can free everything calling sdsfreesplitres as in the example. Alternativey you are free to release the array yourself using the free function and use and/or free the individual SDS strings as usually.

A valid approach is to set the array elements you reused in some way to NULL, and use sdsfreesplitres to free all the rest.

Command line oriented tokenization

Splitting by a separator is a useful operation, but usually it is not enough to perform one of the most common tasks involving some non trivial string manipulation, that is, implementing a Command Line Interface for a program.

This is why SDS also provides an additional function that allows you to split arguments provided by the user via the keyboard in an interactive manner, or via a file, network, or any other mean, into tokens.

sds *sdssplitargs(const char *line, int *argc);

The sdssplitargs function returns an array of SDS strings exactly like sdssplitlen. The function to free the result is also identical, and is sdsfreesplitres. The difference is in the way the tokenization is performed.

For example if the input is the following line:

call "Sabrina"    and "Mark Smith\n"

The function will return the following tokens:

  • "call"
  • "Sabrina"
  • "and"
  • "Mark Smith\n"

Basically different tokens need to be separated by one or more spaces, and every single token can also be a quoted string in the same format that sdscatrepr is able to emit.

String joining

There are two functions doing the reverse of tokenization by joining strings into a single one.

sds sdsjoin(char **argv, int argc, char *sep, size_t seplen);
sds sdsjoinsds(sds *argv, int argc, const char *sep, size_t seplen);

The two functions take as input an array of strings of length argc and a separator and its length, and produce as output an SDS string consisting of all the specified strings separated by the specified separator.

The difference between sdsjoin and sdsjoinsds is that the former accept C null terminated strings as input while the latter requires all the strings in the array to be SDS strings. However because of this only sdsjoinsds is able to deal with binary data.

char *tokens[3] = {"foo","bar","zap"};
sds s = sdsjoin(tokens,3,"|",1);
printf("%s\n", s);

output> foo|bar|zap

Error handling

All the SDS functions that return an SDS pointer may also return NULL on out of memory, this is basically the only check you need to perform.

However many modern C programs handle out of memory simply aborting the program so you may want to do this as well by wrapping malloc and other related memory allocation calls directly.

SDS internals and advanced usage

At the very beginning of this documentation it was explained how SDS strings are allocated, however the prefix stored before the pointer returned to the user was classified as an header without further details. For an advanced usage it is better to dig more into the internals of SDS and show the structure implementing it:

struct sdshdr {
    int len;
    int free;
    char buf[];
};

As you can see, the structure may resemble the one of a conventional string library, however the buf field of the structure is different since it is not a pointer but an array without any length declared, so buf actually points at the first byte just after the free integer. So in order to create an SDS string we just allocate a piece of memory that is as large as the sdshdr structure plus the length of our string, plus an additional byte for the mandatory null term that every SDS string has.

The len field of the structure is quite obvious, and is the current length of the SDS string, always computed every time the string is modified via SDS function calls. The free field instead represents the amount of free memory in the current allocation that can be used to store more characters.

So the actual SDS layout is this one:

+------------+------------------------+-----------+---------------\
| Len | Free | H E L L O W O R L D \n | Null term |  Free space   \
+------------+------------------------+-----------+---------------\
             |
             `-> Pointer returned to the user.

You may wonder why there is some free space at the end of the string, it looks like a waste. Actually after a new SDS string is created, there is no free space at the end at all: the allocation will be as small as possible to just hold the header, string, and null term. However other access patterns will create extra free space at the end, like in the following program:

s = sdsempty();
s = sdscat(s,"foo");
s = sdscat(s,"bar");
s = sdscat(s,"123");

Since SDS tries to be efficient it can't afford to reallocate the string every time new data is appended, since this would be very inefficient, so it uses the preallocation of some free space every time you enlarge the string.

The preallocation algorithm used is the following: every time the string is reallocated in order to hold more bytes, the actual allocation size performed is two times the minimum required. So for instance if the string currently is holding 30 bytes, and we concatenate 2 more bytes, instead of allocating 32 bytes in total SDS will allocate 64 bytes.

However there is an hard limit to the allocation it can perform ahead, and is defined by SDS_MAX_PREALLOC. SDS will never allocate more than 1MB of additional space (by default, you can change this default).

Shrinking strings

sds sdsRemoveFreeSpace(sds s);
size_t sdsAllocSize(sds s);

Sometimes there are class of programs that require to use very little memory. After strings concatenations, trimming, ranges, the string may end having a non trivial amount of additional space at the end.

It is possible to resize a string back to its minimal size in order to hold the current content by using the function sdsRemoveFreeSpace.

s = sdsRemoveFreeSpace(s);

There is also a function that can be used in order to get the size of the total allocation for a given string, and is called sdsAllocSize.

sds s = sdsnew("Ladies and gentlemen");
s = sdscat(s,"... welcome to the C language.");
printf("%d\n", (int) sdsAllocSize(s));
s = sdsRemoveFreeSpace(s);
printf("%d\n", (int) sdsAllocSize(s));

output> 109
output> 59

NOTE: SDS Low level API use cammelCase in order to warn you that you are playing with the fire.

Manual modifications of SDS strings

void sdsupdatelen(sds s);

Sometimes you may want to hack with an SDS string manually, without using SDS functions. In the following example we implicitly change the length of the string, however we want the logical length to reflect the null terminated C string.

The function sdsupdatelen does just that, updating the internal length information for the specified string to the length obtained via strlen.

sds s = sdsnew("foobar");
s[2] = '\0';
printf("%d\n", sdslen(s));
sdsupdatelen(s);
printf("%d\n", sdslen(s));

output> 6
output> 2

Sharing SDS strings

If you are writing a program in which it is advantageous to share the same SDS string across different data structures, it is absolutely advised to encapsulate SDS strings into structures that remember the number of references of the string, with functions to increment and decrement the number of references.

This approach is a memory management technique called reference counting and in the context of SDS has two advantages:

  • It is less likely that you'll create memory leaks or bugs due to non freeing SDS strings or freeing already freed strings.
  • You'll not need to update every reference to an SDS string when you modify it (since the new SDS string may point to a different memory location).

While this is definitely a very common programming technique I'll outline the basic ideas here. You create a structure like that:

struct mySharedString {
    int refcount;
    sds string;
}

When new strings are created, the structure is allocated and returned with refcount set to 1. The you have two functions to change the reference count of the shared string:

  • incrementStringRefCount will simply increment refcount of 1 in the structure. It will be called every time you add a reference to the string on some new data structure, variable, or whatever.
  • decrementStringRefCount is used when you remove a reference. This function is however special since when the refcount drops to zero, it automatically frees the SDS string, and the mySharedString structure as well.

Interactions with heap checkers

Because SDS returns pointers into the middle of memory chunks allocated with malloc, heap checkers may have issues, however:

  • The popular Valgrind program will detect SDS strings are possibly lost memory and never as definitely lost, so it is easy to tell if there is a leak or not. I used Valgrind with Redis for years and every real leak was consistently detected as "definitely lost".
  • OSX instrumentation tools don't detect SDS strings as leaks but are able to correctly handle pointers pointing to the middle of memory chunks.

Zero copy append from syscalls

At this point you should have all the tools to dig more inside the SDS library by reading the source code, however there is an interesting pattern you can mount using the low level API exported, that is used inside Redis in order to improve performances of the networking code.

Using sdsIncrLen() and sdsMakeRoomFor() it is possible to mount the following schema, to cat bytes coming from the kernel to the end of an sds string without copying into an intermediate buffer:

oldlen = sdslen(s);
s = sdsMakeRoomFor(s, BUFFER_SIZE);
nread = read(fd, s+oldlen, BUFFER_SIZE);
... check for nread <= 0 and handle it ...
sdsIncrLen(s, nread);

sdsIncrLen is documented inside the source code of sds.c.

Embedding SDS into your project

This is as simple as copying the following files inside your project:

  • sds.c
  • sds.h
  • sdsalloc.h

The source code is small and every C99 compiler should deal with it without issues.

Using a different allocator for SDS

Internally sds.c uses the allocator defined into sdsalloc.h. This header file just defines macros for malloc, realloc and free, and by default libc malloc(), realloc() and free() are used. Just edit this file in order to change the name of the allocation functions.

The program using SDS can call the SDS allocator in order to manipulate SDS pointers (usually not needed but sometimes the program may want to do advanced things) by using the API exported by SDS in order to call the allocator used. This is especially useful when the program linked to SDS is using a different allocator compared to what SDS is using.

The API to access the allocator used by SDS is composed of three functions: sds_malloc(), sds_realloc() and sds_free().

Credits and license

SDS was created by Salvatore Sanfilippo and is released under the BDS two clause license. See the LICENSE file in this source distribution for more information.

Oran Agra improved SDS version 2 by adding dynamic sized headers in order to save memory for small strings and allow strings greater than 4GB.

sds's People

Contributors

antirez avatar dchest avatar frodsan avatar gavinwahl avatar juanitofatas avatar junlon2006 avatar melo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sds's Issues

Memory leak?

In the function 'sdscatvprintf' I think the statement va_end(cpy) is missing after vsnprintf call.

sds and interrupts

I've been investigating a nasty firmware crash that, until now, was a huge mystery.

While debugging some new code that uses SDS, I single stepped into sdscmp(). There, I noticed that one of the two variables was totally different than the ones being compared. And, when sdscmp() returned, the stack pointer was apparently ruined, as I was taken back to a process that a 1-second interrupt handler uses for sdscmp().

I suppose it's possible this is an artifact of my debugger (Atmel Studio 7).

However, if not, could there be a case where the sdscmp() function (or for that matter any sds function) cannot be called by two different processes, one of which is an interrupt service routine?

realloc without checking the return value

By running a static analyzer, I got this information and I'd like to share.

[src/sds.c:159]: (error) Common realloc mistake: 'sh' nulled but not freed upon failure
[src/sds.c:717]: (error) Common realloc mistake: 'vector' nulled but not freed upon failure

Thanks
Carlos Tangerino

Unexpected behaviour

Hi Salvatore,

I came across some unexpected behaviour while using the sdsMakeRoomFor() and sdsrange() functions. Please have a look below.
PS: It might be my one misuse of the library, but curious to hear what you think.

$ echo "test" >> test_file.txt

#include <stdio.h>
#include "sds/sds.h"

#define BUFSIZE 4096

static void expected_behaviour(FILE* const f) {
  sds s = sdsnew("abc");
  s     = sdsMakeRoomFor(s, BUFSIZE);

  while (fgets(s, BUFSIZE, f)) {
    sdsrange(s, 0, 2);
    printf("%s", s); // prints: tes
  }
  printf("\n");
  sdsfree(s);
}

static void unexpected_behaviour(FILE* const f) {
  // Can sdsempty() be used here?
  sds s = sdsMakeRoomFor(sdsempty(), BUFSIZE);

  while (fgets(s, BUFSIZE, f)) {
    // Why isn't sdsrange() working as expected?
    sdsrange(s, 0, 2);
    printf("%s", s); // prints: test\n
  }
  printf("\n");
  sdsfree(s);
}
int main(void) {
  FILE* const f = fopen("test_file.txt", "r");  // assume valid file
  expected_behaviour(f);
  fseek(f, 0, SEEK_SET);
  unexpected_behaviour(f);
  fclose(f)
  return 0;
}

Using sds in embedded environment, even on Arduino?

Google doesn't indicate any connection between sds and Arduino, Stm32, etc. And it seems that lightweight and comfortable sds is a welcomed proportion between string.h and C++ string and it could be used even with 2 kB RAM arduino (or 528 kB stm32). Are there any pitfalls in this claim?

2 long fields stored before string data instead of 1 might be significant obstacle for 2 kB RAM, so that's one example pitfall.

Has anyone seen sds use in embedded environment?

Pull requests

Why are there so many pull requests, even really old ones without any response to say if they'll be incorporated or not, or if you don't yet have time to review them?

I think leaving those PRs open discourages possible developers from contributing, am I missing something? If so, I'm sorry, but I'm really curious.

sdscat & sdscatsds in-place?

As I've read in the reference and source when you pass an sds string to these two functions, you always have to assign the pointer back.

Is there any workaround to make these functions work in-place? That is: pass them an sds and have it modified? (without changing the object's pointer)

Maintenance?

I like the API of SDS but I see no (development) activity in this repository. Is it still maintained?

The sds source in the redis seems to be newer?

sds doesn't compile as C++

There exist platforms that use a C++ compiler even for C code. Regardless of that, I think it would be a good idea to make sds compile out of the box with C++ compilers...

why not use macro to get the address of sdshdr?

I find that almost in every function, you write "struct sdshdr sh = (void)(s - sizeof *sh);" to get the address of sdshdr, I wonder why don't you make it macro?
Like this :
#define SDS_ADDR(s) (void *)((s) - sizeof(struct sdshdr))
struct sdshdr *sh = SDS_ADDR(s);

Potential undefined behavior when negating value in sdsll2str

When compiling with UndefinedBehaviorSanitizer and running the tests (CC=clang-6.0 CFLAGS='-fsanitize=undefined', the following error is raised:

[...]
6 - sdscatprintf() seems working in the base case: PASSED
sds.c:446:23: runtime error: negation of -9223372036854775808 cannot be represented 
in type 'long long'; cast to an unsigned type to negate this value to itself
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior sds.c:446:23 in
[...]

Dead Store v.2.0.0

The value written to &oldfree (sds.c, line 1240) is never used.

if (type != SDS_TYPE_5) {
         test_cond("sdsMakeRoomFor() free", sdsavail(x) >= step);
         oldfree = sdsavail(x);
}

One way to correct this error is to reverse the order of lines 1239 and 1240 and replace sdsavail (x)> = step with oldfree> = step.

Question, why in sds.h you have (static inline functions)?

Hi, why in sds.h you have (static inline functions)? Also why sdshdr* placed in header? I try to link sds.h into C++ project with -Wpedantic flag and receive errors like "warning: ISO C++ forbids zero-size array in sdshdr* structs".

memory leak in sdssplitlen() ?

In sdssplitlen(), slots starts with a default of 5. Then
tokens = malloc(sizeof(sds)*slots);
or generally 40 bytes.

However, if (len == 0) { *count = 0; return tokens;}

How do we free the 40 bytes? sdsfreesplitres() will use a count of 0.

Am I wrong that this is technically a memory leak? Of course, this is a situation that will probably never come up, since who splits a string with 0 chars, but still, it looks incorrect.

Why do we allocate memory to tokens before the test for len == 0?

Some confusions on sdshdr5 & sdshdr8 on key/value actual memory usage and MEMORY USAGE command

I did some investigation on Redis source code 4.0 while I was doing my job. Something about sdshdr5 and sdshdr8 storage had raised my curiosity and something really confused me popped up. Here are the steps to reproduce the scenario:

  1. Open redis cli
  2. Type SET key value
  3. So here from my perspective and observation, THE value has been stored with sdshdr8 and THE key has been stored with sdshdr5(through dbAdd->sdsdup)
  4. I was guessing with the help of MEMORY USAGE, THE key should be analysed as sdshdr5, while THE key was passed via c->argv[2]->ptr, it will be first encapsulated with method createdEmbeddeStringObject which adopt sdshdr8 as the struct type. So it will be analysed as sdshdr8 rather than sdshdr5.
    So, could anyone explain what is the actual reason for this and will this affect the accuracy of MEMORY USAGE command? Appreciated

will not compile in vs2013

static inline size_t sdslen(const sds s) {
and
static inline size_t sdsavail(const sds s) {

google says to use __inline not inline

Potential integer overflow in sds.c

The sdsnewlen and sdsMakeRoomFor function implemented in sds.c is quite similiar to those in the redis. Thus, it's very likely that this integer overflow in CVE-2021-21309 also affects sds.
Would you can help to check if this bug is true? If it's true, I'd like to open a PR for that if necessary. Thank you for your effort and patience!
And here is the patch for CVE-2021-21309 for your reference if this issue needs to be fixed.

malloc/realloc only powers of two?

It might be worth considering only allocating strings of a capacity that is a power of two (including header) to reduce memory fragmentations:

A simple code snippet to round up to the next power of two:

unsigned int v; // compute the next highest power of 2 of 32-bit v

--v;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
++v;

I can make the code contribution if you'd like.

A question in sdsIncrLen

In the function sdsIncrLen, I think the statement "assert(sh->free >= 0) " is redundant, because "assert(sh->free >= incr)" makes sure this situation would not happen.

void sdsIncrLen(sds s, int incr) {
struct sdshdr sh = (void)(s - sizeof *sh);
assert(sh->free >= incr);
sh->len += incr;
sh->free -= incr;
assert(sh->free >= 0);
s[sh->len] = '\0';
}

Errors in the Reference Manual

Use it like this:

sds s = sdsfromlonglong(10000);
printf("%d\n", (int) sdslen(s));

output> 5

Why did you get the number 5? This simply can not be

sds s1 = sdsnew("abcd");
sds s2 = sdsempty();
s[1] = 1;
s[2] = 2;
s[3] = '\n';
s2 = sdscatrepr(s2,s1,sdslen(s1));
printf("%s\n", s2);

output> "a\x01\x02\n"

Why the array s is not created but variables are assigned there? And it is not used in this code.

You have these errors, correct

Memleak in sds.c:40

There's a possible memleak and NULL ptr usage in sds.c:40 in sdssplitargs:

vector = s_realloc(vector,((*argc)+1)*sizeof(char*));
vector[*argc] = current;

Realloc here might return NULL and thus NULLify vector while losing the reference to original object. I believe something like the following should be done:

char** new_vector = s_realloc(vector,((*argc)+1)*sizeof(char*));
if (new_vector == NULL) goto err;
vector = new_vector;
vector[*argc] = current;

C90 Compliance

I'm trying to get another library (https://github.com/redis/hiredis) to compile with an app that is shooting for C89/C90 compliance. I'm running into problems. It uses sds.

The errors in sds that are preventing compilation are things like the following:

sds.h:48:10: warning: ISO C90 does not support flexible array members [-Wpedantic]
char buf[];

sds.h:86:8: error: unknown type name ‘inline’
static inline size_t sdslen(const sds s) {

sds.c:640:52: warning: ISO C90 does not support ‘long long’ [-Wlong-long]

Is there any chance that sds (being a low level library) can be changed to become C90 compliant?

Related Issue:
redis/hiredis#494

Error in example

In the example:

sds s1 = sdsnew("abcd");
sds s2 = sdsempty();
s[1] = 1;
s[2] = 2;
s[3] = '\n';
s2 = sdscatrepr(s2,s1,sdslen(s1));
printf("%s\n", s2);

output> "a\x01\x02\n"

We are editing indices of s but that is not declared, s should be s1.

realloc may return NULL

Hi @antirez,
according to the man page

realloc()  returns  a  pointer  to the newly allocated memory, which is
       suitably aligned for any kind of variable and  may  be  different  from
       ptr, or NULL if the request fails.  If size was equal to 0, either NULL
       or a pointer suitable to be passed to free() is returned.  If realloc()
       fails the original block is left untouched; it is not freed or moved.

in sds.c at line 717:

vector = realloc(vector, ((*argc)+1)*sizeof(char*));

if realloc() fails to allocate the requested memory, at line 718 it may get segfault.

Generate libsds.so

Hello team , I am cross-compiling sds for i.MX6UL SoC . I could successfully able to build sds-test . But now I have requirement of ".so".
Can anyone kindly suggest me what modifications needs to be done in Makefile to generate .so file .

Error I am getting : arm-poky-linux-gnueabi/bin/ld: cannot find -lsds

Thanks

Question: Violation of Strict Aliasing Rule ?

When a new sbs is created (as a char array) it is casted to struct type pointer, ex. SDS_HDR(8,s);. Isn't this violating the strict aliasing rule?
(Dereferencing a pointer that aliases an object that is not of a compatible type or one of the other types allowed by C 2011 6.5 paragraph 71 is undefined behavior)

Example:

sh = s_malloc(hdrlen+initlen+1);
...
s = (char*)sh+hdrlen;
...
SDS_HDR_VAR(8,s);
sh->len = initlen;
sh->alloc = initlen;

#define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T)));
#define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T))))

Null Dereferences v2.0.0

In many functions in file “sds.h”, the parameter “sds s” is dereferenced without checking if it is NULL. The same error is also present in some functions in file “sds.c”, such as: sdscat, sdsMakeRoomFor, sdsRemoveFreeSpace, sdsdup, sdsupdatelen, sdscatrepr, sdscmp, sdstoupper, sdstolower, sdsrange, sdstrim, sdscatfmt, sdsclear, sdslen, sdscatvprintf, sdscatprintf, sdscpy, sdscpylen, sdscatsds, sdscatlen, sdsgrowzero, sdsIncrLen, sdsAllocSize e sdsAllocPtr.

This functions should check for a parameter with value NULL and possibly return an error code in such case.

Minimal example:

int sdsTest(void) {
        sds x = NULL;
        test_cond("Create a string and obtain the length",
            sdslen(x) == 3 && memcmp(x,"foo\0",4) == 0)

    sdsfree(x);
    test_report();
    return 0;
}

Forcing the variable “sds s = NULL” while running the test programs generates a segmentation fault (due to the attempt to dereference NULL).

Use of _ in string causes sdssplitlen() to return no tokens

Thank you for a fantastic library!!

One minor possible issue:

I am using sdssplitlen() to parse out the elements of a path+file string, as in:
/DIR1/DIR2/this_isafile.txt <-- cTempString

tokens = sdssplitlen(cTempString,sdslen(cTempString),"/",1,&count);

With an underscore in the string, sdssplitlen() returns NULL.

Haven't single-step debugged it yet. For now, avoiding using filenames with _

Potential NULL pointer usage

In the function sdsnewlen we can see the test on a pointer that was already used as in:
sh = s_malloc(hdrlen+initlen+1);
if (!init)
memset(sh, 0, hdrlen+initlen+1);
if (sh == NULL) return NULL; <-----if this can be null, memset breaks

valgrind reports realloc-related memory leaks

There is something about sdsMakeRoomFor that leaks memory, at least according to valgrind:

==31090== HEAP SUMMARY:
==31090==     in use at exit: 40 bytes in 2 blocks
==31090==   total heap usage: 33 allocs, 31 frees, 526 bytes allocated
==31090== 
==31090== 13 bytes in 1 blocks are definitely lost in loss record 1 of 2
==31090==    at 0x4C2CE8E: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31090==    by 0x401B51: sdsMakeRoomFor (sds.c:160)
==31090==    by 0x40132A: main (sds.c:926)
==31090== 
==31090== 27 bytes in 1 blocks are definitely lost in loss record 2 of 2
==31090==    at 0x4C2CE8E: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31090==    by 0x401B51: sdsMakeRoomFor (sds.c:160)
==31090==    by 0x401CF0: sdscatlen (sds.c:269)
==31090==    by 0x401EFA: sdscatvprintf (sds.c:342)
==31090==    by 0x401FA6: sdscatprintf (sds.c:367)
==31090==    by 0x4026AE: sdscatrepr (sds.c:602)
==31090==    by 0x40126A: main (sds.c:915)
==31090== 
==31090== LEAK SUMMARY:
==31090==    definitely lost: 40 bytes in 2 blocks
==31090==    indirectly lost: 0 bytes in 0 blocks
==31090==      possibly lost: 0 bytes in 0 blocks
==31090==    still reachable: 0 bytes in 0 blocks
==31090==         suppressed: 0 bytes in 0 blocks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.