Implemented rle_fast Extension for Real Time Encoding#1
Implemented rle_fast Extension for Real Time Encoding#1AshishS-1123 wants to merge 5 commits intotnwei:masterfrom
Conversation
|
Hello @AshishS-1123 , this is wonderful! Any speed up is certainly welcome. Hope you don't mind me getting back to you in a week or so when I'm able to take a closer look after vacation. Thank you and happy new year! |
|
Coming back to take a closer look at this. I'm thinking of pointing this PR to a separate
I cannot redirect PRs, so I will need to close this one. If you are still interested, you are welcome to re-submit a PR to the newly created Once again, thanks and appreciate your contribution! |
Changes Made
Why?
While the algorithm for performing encoding and decoding operations is pretty efficient, one of its major drawbacks is that it is written is Python. Python is not a very fast language.
As such, python provides a C API for users to write extensions, which are written in C, but can be run from Python.
The extension I wrote, uses a similar algorithm to the one in rle package. But the speed has increased by almost 5x.
How?
The extension I built is present in the folder rle_fast.
Wrapper For Extension
1. The file rle_fast/rle_fast_extension.c contains the wrapper code for the extension. It contains the module definitions, method declarations, and two wrapper methods- one for encode and decode each.
2. These methods are namely- encode_c and decode_c.
3. The first step in these functions is to get and parse the arguments.
4. The next step is to check the correctness of the given arguments and raise appropriate errors if any.
5. After that, we call a function from the header file rle_utils.h
Utility Functions for Encode and Decode
1. The functions in the file, rle_fast/rle_utils.h are responsible for the actual encoding and decoding operations.
2. The algorithm used in these functions is the same as the one in rle/init.py file. The only difference being how various operations are being performed.
Different functions are used for performing operations like creating a new empty list, getting iterators, etc. These functions can be found on the official python site
Docstrings
No software can be complete without documentation. The docstrings for various methods and modules in C extensions need to be added at module definitions. Here, the module definitions are present in rle_fast_extension.c
But the docstrings can be found in the file rle_fast/rle_docs.h.
You can either view the docs from this file or use the help function after installing the extension.
Installation
To install the extension, the code has already been added to setup.py
Run the following commands from the terminal.
To build the package
python setup.py buildTo install the package
python setup.py installTO-DO
The extension doesn't support encoding operations on sequences containing strings or characters. This is because the code assumes that we are operating on numbers only. When such an input is given, it raises a NotImplementedError.
So, I have commented out the test that fails in file tests/test_encode_rlefast.py
Also, the README file needs to be updated with the extension.
And since this seems like major improvement to the package, maybe we can release this as a major version?