Background: I am working on a Python project where, given a set of input files (text/image/audio), it generates an executable game. The text files are there to describe the rules of the game.
Currently, the program reads and parses the files upon each startup, and builds a Python class that contains these rules, as well as links to image/audio files. This is fine for now, but I don’t want the end executable to have to bundle these files and re-parse them each time it gets run.
My question: Is there a way to persist the instance of my class to disk, as it exists in memory? Kind of like a snapshot of the object. Since this is a Python project, my question is specific to Python. But, I’d be curious if this concept exists anywhere else. I’ve never heard of it.
My aim is not to serialize/de-serialize the class to a text file, but instead load the 1’s and 0’s that existed before into an instance of a class.
The quick answer is to use a serialization/deserialization library like pickle. You can’t just dump a binary image and reload it in any simple way.
You are describing pickle, but it does come with some serious risks, especially if the file can be modified by a third party.
https://arjancodes.com/blog/python-pickle-module-security-risks-and-safer-alternatives/
I’d suggest using protobuf or similar instead, but its a bit more work.
I think pickle is what you want.
Keep in mind that this might have a huge performance impact if you do it all the time - it’s still IO even when it’s not parsing.
My idea would be to load one larger file one time and not parse anything, and keep it in memory the entire time. Versus what it does now which is load the files and parse them and keep everything in memory.
But three people responding here so far with “pickle” so maybe that is the way.
You can stuff all the info into an object and use it this way, no problem. I just wanted to point out that this doesn’t have zero performance impact compared to what you currently have.
So (depending on how your OS caches files) you might not want to do this like twice in a lambda that you pass to an iterator over a huge slice or something.
I don’t want the end executable to have to bundle these files and re-parse them each time it gets run.
No matter how you persist data you will need to re-parse it. The question is really just if the new format is more efficient to read than the old format. Some formats such as FlatBuffers and Cap'n Proto are designed to have very efficient loading processes.
(Well technically you could persist the process image to disk, but this tends to be much larger than serialized data would be and has issues such as defeating ASLR. This is very rarely done.)
Lots of people are talking about Pickle. But it isn’t particularly fast. That being side with Python you can’t expect much to start with.
What is the “executable” in this context? I’m kinda confused as to what you are looking for.
What’s wrong with parsing the input files at runtime? Is it performance? Do you want one file to load instead of multiple?
Many have suggested pickle, which is kinda what you are asking for, but on some level it’s not much different from parsing the input files. Also, depending on your code, you may have to write custom serialization code as part of getting pickle to work.
Note that pretty much every modern game is a bundle of often multiple pieces of executable code alongside a whole bunch of separate assets.
Not anything more efficient than just serialising and deserialising the data u want to load.
Im sure u could use pickle to do somthing to store the entire object graph but opens u up for all kinds of exploits (arbitrary code execution etc).
The Zope Object Data Base (aka ZODB) exists for more complex persistence use cases. It’s been a long time, though, there are probably more modern options.
Pickle.
But, depending on the needs, writing to SQLite can be blazing fast and you could store your data as BLOB as needed
I took a closer look at what you are asking for and no, you cannot hand a reference to a python structure to a library and have it write the binary data from memory out to disk, then read that same binary data back into living Python instances later. That’s just not how Python works. For one thing, any such structure is full of pointers which would be invalid unless you re-load to the same address in memory, which is not practical. You have to serialize and de-serialize.