-
Notifications
You must be signed in to change notification settings - Fork 0
Implements a lazily_parse option for simpleIon load/dump. #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| // TODO should append the c_buf to _value_stream directly. Not an Ion BLOB! | ||
| // E.g. Something like ION_PUT(writer->_typed_writer.binary._value_stream, c_buf); | ||
| IONCHECK(ion_writer_write_blob(writer, (BYTE*)c_buf, len)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good callout. We need to decide the best way to expose this functionality in ion-c. It's probably not an API we'd want to add to the public ion_writer.h header.
I wonder if we can just flush the writer before writing raw bytes, and then write those bytes directly to the underlying ION_STREAM...? If we can do that it may help us avoid having to add to the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it works. But I need to investigate why ion_writer_flush() didn't work in my sample code so the output is not in a right order...
Current logic parses a list Lazy object to an Ion list including child lazy objects, we don't know the total length when we append the cached bytes in the output ion_stream so it requires a little more calculation.
To know how to calculate the length, the question then turns to how we handle a lazy list. My idea is:
- Simply create a lazy object, cache the list's binary representation and set the type to
ION_TYPE.LISTlike other Ion types. - When users only need to append/delete objects, they can
partially_deserializethe Lazy object into a list holding all child lazy objects in order. They can add new Ion values or delete existing values. - When users need to modify a list and its child Ion values, they can
fully_deserializethe lazy object into a regular Ion list class holding all deserialized Ion values. They can add, delete or modify values.
The second point above partially deserialize the list so we can reuse the cached binary of its child lazy objects. In other words, user just need to deserialize desired Ion values instead of a full list. Later on if users want to modify a specific child lazy object within the List/Sexp, they can deserialize that lazy object, modify it, and replaced the original lazy object with the modified Ion value. This sounds doable but still need a little more experiment, if partial deserialization and the strategy of modifying lazy objects is faster (and correct too), there is no need to do a fully deserialization. In addition we should consider Ion struct situation since it is unordered and allows duplicated keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I need to investigate why ion_writer_flush() didn't work in my sample code so the output is not in a right order
Figured out, I forgot to call it in the top-level.
| printf("p_offset is: %" PRId64 "\n", p_offset); | ||
| printf("p_length is: %" PRId32 "\n\n", p_length); | ||
|
|
||
| // TODO is it possible to return a memory view pointing to the specific position of the original buffer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| ion_reader_get_value_offset(hreader, &p_offset); | ||
| ion_reader_get_value_length(hreader, &p_length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I forgot ion-c provided this functionality. It might not be well-tested, however... That is worth looking into. We want to make sure it's accurate in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will leave this comment unsolved first.
Description:
Some changes to make the lazily_parse method work.
Highlights:
Three main changes in this PR:
A little more explanation for the test file I uploaded:
[1,2][3],b'\xe0\x01\x00\xea\xb4\x21\x01\x21\x02\xb2\x21\x03'By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.