Parsing API

Parsing Context

The first thing you need to parse a document is a parsing context. Every parsing function take a context as a first argument, and this is to keep the parsing state machine and the event callback mechanism. It also contains optional user defined memory functions and the parser configuration (limits, comments allowed).

The following example initializes a new parser context:

a NULL config which means the default parser config.
my_callback a function that will be called each time there's a JSON event.
my_callback_data a void * value that will be passed to my_callback every times it's called.

json_parser parser;

if (json_parser_init(&parser, NULL, my_callback, my_callback_data)) {
	fprintf(stderr, "something wrong happened during init\n");
}

Remember that's the parser need to allocate some data, so when the parser context is not used anymore, it needs to free with the appropriate function:

json_parser_free(&parser);

Parsing Data

The only thing left is feeding data into the parser. This is done with the json_parser_string function which takes a string data and a length, and an optional offset pointer. This function is completly incremental, which means you don't have to parse all your document at once.

You're in charge of passing the data, from your file, your socket, your pipe, your in-memory string to the parsing function.

The following example take a in-memory string and pass it to json_parser_string 1 characters by 1 characters:

char my_json_string[] = "{ \"key\": 123 }";

for (i = 0; i < strlen(my_json_string); i += 1)  {
	ret = json_parser_string(&parser, my_json_string + i, 1, NULL);
	if (ret) {
		/* error happened : print a message or something */
		break;
	}

or 4 characters by 4 characters:

char my_json_string[] = "{ \"key\": 123 }";

for (i = 0; i < strlen(my_json_string); i += 4)  {
	ret = json_parser_string(&parser, my_json_string + i, 4, NULL);
	if (ret) {
		/* error happened : print a message or something */
		break;
	}

or from a file reading 1024-bytes blocks at the same time:

int fd, len, ret;
char block[1024];

fd = open(file, ...);
while ((len = read(fd, block, 1024)) > 0) {
	ret = json_parser_string(&parser, block, len, NULL);
	if (ret) {
		/* error happened : print a message or something */
		break;
	}
}

Parsing events

Each time the function json_parser_string function is called with data, the callback registered at context init time might be called.

There's a callback each time the parser has processed a JSON atom.

a JSON atom can be:

JSON_OBJECT_BEGIN : opening a new object
JSON_OBJECT_END : current object is closing
JSON_ARRAY_BEGIN : opening a new array
JSON_ARRAY_END : current array is closing
JSON_INT : a JSON int has been parsed
JSON_FLOAT : a JSON float has been parsed
JSON_STRING : a JSON string has been parsed
JSON_KEY : a JSON key has been parsed
JSON_TRUE : a JSON true constant has been parsed
JSON_FALSE : a JSON false constant has been parsed
JSON_NULL : a JSON null constant has been parsed

A callback prototype looks like the following:

userdata: is the callback object registered at context init time.
type: is the type of the atom parsed.
data: is a optional string that contained the data associated with this atom.
length: is the length in byte of the data associated with the atom.

int my_callback(void *userdata, int type, const char *data, uint32_t length)

the following example is a full callback function that just print the atom received. as a callback object it can support an optional FILE * to print to, otherwise it will use stdout.

int my_callback(void *userdata, int type, const char *data, uint32_t length)
{
	FILE *output = (userdata) ? userdata : stdout;
	switch (type) {
	case JSON_OBJECT_BEGIN:
	case JSON_ARRAY_BEGIN:
		fprintf(output, "entering %s\n", (type == JSON_ARRAY_BEGIN) ? "array" : "object");
		break;
	case JSON_OBJECT_END:
	case JSON_ARRAY_END:
		fprintf(output, "leaving %s\n", (type == JSON_ARRAY_END) ? "array" : "object");
		break;
	case JSON_KEY:
	case JSON_STRING:
	case JSON_INT:
	case JSON_FLOAT:
		fprintf(output, "value %*s\n", length, data);
		break;
	case JSON_NULL:
		fprintf(output, "constant null\n"); break;
	case JSON_TRUE:
		fprintf(output, "constant true\n"); break;
	case JSON_FALSE:
		fprintf(output, "constant false\n"); break;
	}
}

Parser configuration

Parser configuration can be set when initializing the parsing context. this is done by passing a non-NULL pointer to a valid json_config.

The configuration structure support 7 differents variables, which can be group in 3 categories:

user defined memory functions.
security.
optional extensions.

User defined memory function

The library user can choose to redefine its own allocation functions (realloc and calloc), in this case the parser will allocate using those functions. this is controlled by user_calloc and user_realloc.

Security

there's 2 security settings available: max_nesting and max_data.

max_data control the size of data allowed, which control the maximum size of int, string, float, and keys in bytes. this is directly connected to the size of the data buffer. if set to 0, the buffer will try to grow each time it's necessary.

max_nesting controls the number of nested structures allowed by the parser. each new nesting increases the parser memory use by 1 byte (for 4096 nested structures, you need 4K of memory).

For security purpose, if the parser is directly connected to a network stream, setting those variables, is strongly recommended.

Comments

You can enable C comments and enable YAML comments arbitrarly from each other.

allow_c_comment will enable C comment: starting at /*, and finishing at */

allow_yaml_comment will enable YAML/python comment: starting at # and finishing at the end of line.

comments cannot be nested.

{
	# this is a YAML comment
	"key": /* this is a C comment */ 123,
}