Data science is a massive field. As I begin my career, there are a few tools I’d like to have under my belt. One is secure computing. In this article I explore python’s cryptography library and the use of memory-bound helpers to deliver data to a simple algorithm without ever writing a key to file or exposing the decrypted data anywhere in memory directly.
An Operational Overview
Get Data – Lock Data – Pass Data to Algorithm – Kill the Messenger
The data exists in bulk form somewhere and is accessible somehow. Let’s assume I can authorize a ‘bot’ or in this case an instance of class ‘helper’ to pick up a limited piece of the data. Here, I simulate fetching an email address with DocumentGenerator from the essential_generators library.
from essential_generators import DocumentGenerator def genData(): generator = DocumentGenerator() return generator.email()
I am simulating a single data item for two reasons:
1. Just enough and just in time data will help prevent unencrypted data lakes (or ponds in memory of reasonable size). Any attacker would have to intercept many fetches instead of dumping a single, regularly accessed pool of memory. That pattern of regular intercepts or leaks should be easier to detect.
2. Scalability and tuning are interests of mine. I am looking forward a bit and I know I’ll need separate objects to send on jobs for parallel operations.
Each helper worker is instantiated with its own key. This prevents unencrypted data in memory.
from cryptography.fernet import Fernet def genKey(): key = Fernet.generate_key() f = Fernet(key) return f
There are a lot of cryptographic methods that have important features to consider like level of security and computational difficulty. I used the Fernet generator because it was the easiest to set up for this example. In the future, genKey should be context aware, tailoring the key algorithm to needs of the data and processing pipeline.
To give the helper a little encryption engine:
def lockStore(self, token): self.data = self.key.encrypt(bytes(token, 'utf-8'))
If there’s one thing that tripped me up it’s that strings need to be converted to bytes.
Pass Data to Algorithm
Finally! Let’s do something with the data. First, the helper needs to decrypt what’s in its bag:
def unlock(self):</p> return self.key.decrypt(self.data)
**See why I chose the Fernet key object. Someone did a lot of work for me 😊
I know I’m getting an email address. Something that is useful is parsing domains.
**Remember that the decrypted string is still in bytes. I cast it back to a string to use the wonderful list methods built into python.
worker.lockStore(genData()) print(str(worker.unlock(), 'utf-8').split('@'))
The split will break an email at the @ and give me a list of two elements (before and after the @). I take the after bit list which will have the domain and print it.
Killing the Messenger
There are ways of destroying an object, but they are a bit tedious. Here, the data held is destroyed when new data is loaded into the helper. That seems like an easy way to destroy a lot of helpers, just give them a nonsense term at the end of the run.
**poop because I hate foo and bar. They are meaningless to me. Data, even in examples, should not be meaningless in my opinion.
Putting it all together
class helper(): def __init__(self, key=None): if key == None: self.key = genKey() self.data = 0 def lockStore(self, token): self.data = self.key.encrypt(bytes(token, 'utf-8')) def unlock(self): return self.key.decrypt(self.data) def genData(): generator = DocumentGenerator() return generator.email() def genKey(): key = Fernet.generate_key() f = Fernet(key) return f worker = helper() for _ in range(5): worker.lockStore(genData()) print(str(worker.unlock(), 'utf-8').split('@'))
Python is very nice.
This is a naïve first attempt! Any tips or tricks, please contact me on my about page. Any comments, tips, tricks, or thoughts on which direction to move forward, please leave them!