Hashing And Checksums To Validate Game Save Integrity
When developing video games that allow players to save their progress and return later, ensuring the integrity of save data is crucial. Game save data can easily become corrupted or altered through bugs, hardware issues, or even user tampering. Implementing validation checks using cryptographic hashing functions can detect cases of corrupted data and prevent errors.
Why Save Data Integrity Matters
Save games represent all player progression, unlockables, stats, configurations, and more. Corrupted save data can negate hours of progress, creating very negative experiences.
Data corruption can also lead to crashes, errors, and bugs that significantly hurt game stability and quality. Flawed save data means the resumption of gameplay is built on a damaged foundation.
In online and multiplayer games, lapses in save validity checks can enable cheating and hacks by allowing players to manipulate save files. This compromises competitiveness and fairness.
Adding hashing and checksums to validate save data before loading protects against many issues. The performance trade-offs are minimal compared to the major benefits.
Common Types of Save Data Corruption
Understanding the vulnerabilities of save game formats is essential for designing robust verification measures.
Software Bugs and Errors
Bugs in the game code itself are a leading cause of save data problems. Errors during the save or load process can write bad data or misinterpret data.
Resource leaks, buffer overflows, faulty pointers, and race conditions are examples of issues that may corrupt game state serialization or deserialization.
Storage Errors
Saved games are vulnerable to the same file system and storage integrity issues as any other data. Hard disk failures, disconnected devices, write interrupts, etc can modify bytes unintentionally.
Heavily used save games that undergo many rewrite cycles are especially prone to eventual storage medium degradation.
User Tampering
Players may be tempted to manually edit saved game files to cheat, gain advantages, or exploit bugs. While motivation varies, the end result is untrusted data.
Some players may modify files unintentionally trying to fix issues. Homebrew tools and hacks also pose tampering threats.
Memory Errors
In-memory game states planned for saving can be corrupted by various issues:
- Pointer errors dereferencing bad addresses
- Buffer overflows overwriting adjacent data
- Undefined behaviors causing unexpected changes
- Memory leaks keeping stale data
- Uninitialized values storing garbage
All these present dangers even before the data reaches storage mediums. Catching them early prevents cascading problems.
Transmission Errors
Saved games uploaded to servers or shared peer-to-peer among devices risk errors altering data in transit:
- Network packets dropping or corruption
- Unexpected disconnections mid-transfer
- Router problems scrambling buffered data
Later sync checks can catch these, but only if using valid checksums locally against clean baseline copies.
Using Cryptographic Hashes to Detect Corruption
Hash functions offer an ideal mathematical approach for identifying changed or corrupt game save data.
A cryptographic hash algorithm processes an arbitrary input then deterministically calculates a short fixed-length bit sequence digest called the hash value, hash code, hash sum, or simply hash.
Common one-way hash functions like MD5, SHA-1, SHA-256 etc repeatedly scramble, transpose, combine, and distort the input data to saturate avalanche effects that magnify changes.
The key defining property is that even minute changes in the source data produce wildly varying hash values. Thus comparing hashes instead of full data quickly reveals cases of corruption without false positives.
Key Characteristics
Cryptographic hash functions suitable for data integrity checks generally share key attributes:
- Deterministic – Identical inputs always yield identical hashes
- One-way – Infeasible to reconstruct input from output hashes
- Flat avalanche – Single bit flips in input cause cascade bit flips in output
- Puzzle friendly – Intentional difficulty tuning to require work
- Non-linear – Outputs seem random compared to inputs
- High bit diffusion – Every input bit affects many output bits
These traits maximize deviation detecting power from slight data abnormalities while preventing reverse engineering.
Hashing Save Games
The choice hash algorithm depends on priorities. Key factors include speed, security level, collision resistance, and digest sizes.
For save data checks, MD5 remains widely used despite growing risks of collisions. SHA-1 and SHA-256 provide stronger bit diffusion at a slight computational cost.
The output digest length does not require cryptographic security. Even basic checksums with collisions provide good corruption detection if stored digests match generated digests.
Example Implementation of MD5 Hash Checking
MD5 is fast, well-tested, and sufficient for the data integrity needs of game saves in most single-player cases. The wide programming language support also helps.
// MD5 hash library imported import md5 class SaveGameValidator: def __init__(self): self.valid = False def generate_hash(self, save_data): return md5.compute_digest(save_data) def verify(self, loaded_data, stored_hash): computed_hash = generate_hash(loaded_data) self.valid = computed_hash == stored_hash return self.valid
The class interface allows the necessary separation between untrusted loaded data and trusted known-good hashes created earlier. This handles corruption detection before acting upon any deserialized save game state.
The compute_digest() call runs the full MD5 algorithm on the supplied data buffer, returning the 16-byte hash. Byte arrays, strings, files, or streams can be used as inputs.
Comparing the regenerated hash to the stored hash via simple equality checks for mismatches. The valid flag allows clean reactions to any failures.
Alternatives to MD5 like SHA-1
MD5 is starting to show some theoretical weaknesses as researchers continue to probe hash security limits. SHA-1 and SHA-256 provide drop-in replacements without code changes.
import sha1 class SaveGameValidator: def generate_hash(self, save_data): return sha1.hash_digest(save_data) ...
The modular, abstract interfaces allow painless transitions between algorithms. This future-proofs save data handling as new hash standards emerge.
SHA-512 is overkill for save games but ideal for cases needing slow key derivation stretches. Blake2, Blake3, and others also offer options with unique performance/security trade-offs.
Adding Salt Values to Hash Inputs
Salting hashes involves combining unpredictable data with inputs before hashing to thwart rainbow table attacks.
def generate_hash(self, save_data): salt = os.random(16) salted = salt + save_data return sha1.hash_digest(salted)
The 16-byte salt prepends the save data prior to hashing. This salt value then also gets stored alongside the hash for later verification recreate the identical salted hash.
Even if attackers reverse engineer game save formats, leaked hashes remain useless for spoofing valid hashes without the random salt values.
Storing Hash Values Alongside Save Files
Save game formats often use binary serialization to compactly store structured game data into files or buffers.
Appending hash digest bytes directly is convenient but does increase nominal save sizes. Alternate storage options also work:
- Json property
- Separate sibling file
- Remote server calls
- Encrypted compressed blobs
- Binary packing helpers
- Key-value stores
- Database integration
Face hash storage challenges mirroring game data solutions. Server-client architectures can reuse authentication and security layers.
Checking Hashes Before Loading Saves
Verifying hashes occurs immediately after loading raw save data and before interpreting any contents.
class SaveLoader: def load(filename): data = storage.read(filename) validator = SaveGameValidator() if not validator.verify(data, stored_hash): logger.warn("Save file failed validation") return // Process valid data self.gamestate = GameState(data)
This cleanly separates checking from usage while logging issues. The game state only populates after validation passes.
Surrounding risky deserialization in try/catch blocks provides another wise precaution. Exceptions can still slip through on deeply corrupt files.
For file formats allowing partial loads, consider incremental hash checking intermixed at logical record boundaries.
Handling Detected Corruption Cases
Failed validation requires notification and handling. Severity levels determine responses:
Warnings
Treat minor data mismatches as warnings allowing games to continue but advise players of potential issues.
warn_user("Save file corruption detected. Data errors may occur")
Errors
Major corruption warrants blocking gameplay but offering recovery options via reloads, defaults values, or pruned backups.
error_and_rollback("Save failed checksum. Restoring from last week's backup...") issue_refund(user)
Fail-Secure
For sensitive contexts like leaderboards, halt immediately and enter a safe failure mode protecting game integrity.
ban_user(user_id) invalidate_scores(user_id) trigger_manual_review()
Automated banning may require human verification before applying permanently.
User Experience
Carefully crafted user experience can mean the difference between uninstalled games and thoughtful damage control.
Explain corruption issues while guiding recovery. Suggest replay values in lost progression to retain players.
Example C++ Code for SHA-1 Hash Generation
Portable C++ SHA-1 implementations create digests from STL containers, streams, strings, and smart pointers:
#include <sha1.hpp> std::string checksum_file(const std::string& filename) { // Hash streams std::ifstream file(filename, std::ios::binary); SHA1 checksum; checksum.process(file); return checksum.hexdigest(); } std::string checksum_data(const std::vector<uint8_t>& buffer) { SHA1 checksum; checksum.process(buffer); return checksum.hexdigest(); }
Member functions like process() iteratively digest data during multiple calls. The final hexdigest() outputs the cumulative 40-character hash string.
SHA1 classes readily acceptNODEs or wrap memory blocks. Multithreading, hardware acceleration, and chained algorithms provide further extensions.
Extending Integrity Checks to Networked Games
Network play requires enhanced security to prevent tampering and exploits.
- Use lockstep verification between hosts comparing digest chains
- Introduce third-party auditing services to arbitrate disputes
- Log hash values along entire session replays as evidence packages
- Enable asynchronous verifications with failure notifications
- Securely transmit hashes over TLS connections
Appoint designated hosts to perform audits. Replay reviews can pinpoint unauthorized editing down to the frame. Penalize repeat offenders with timeouts or bans.
Relying purely on client save data lends itself to manipulation. Server authoritative backups act as grounds for disputes.