This repository is a collection of coding challenges, with one standout project: a custom compression system for cricket match data that achieves remarkable space savings by thinking about the problem differently than traditional compression tools. The main challenge here is data compression, but not in the way most people think about it. Instead of using generic compression algorithms that look for repeated patterns in raw text, this project exploits the specific rules of cricket itself. Cricket matches follow strict, predictable laws, for example, the same bowler (pitcher) must throw all six deliveries (pitches) in an over (round). Rather than storing the bowler's name or ID repeatedly with each pitch and letting generic compression tools figure out the pattern, this solution bakes that rule directly into the file format. Since cricket law guarantees the bowler per over, the system stores it just once in the "over header" and doesn't need to repeat it. The result is a shrinking of the data from 2.87 GB to just 7.3 MB, roughly 400 times smaller. The project compares this smart, domain-aware approach against standard tools like gzip and 7-Zip. Those generic compressors achieve good results (the dataset compresses to around 50-45 MB), but the custom codec gets there first by eliminating redundancy at the structural level rather than finding and squishing it statistically. The README calls this a "schema-driven binary codec", meaning it builds the cricket dataset's rules into how the data is physically laid out in bytes, making illegal states impossible to represent in the first place. This would interest data engineers, especially those working with large datasets that have predictable structures, sports analytics, sensor logs, financial records, or any domain where you understand the constraints upfront. It's also a good case study in how deeply understanding your data's domain can lead to smarter solutions than applying generic tools. The project includes detailed write-ups and downloadable compressed files, making it both an educational reference and a practical resource for anyone working with cricket data.
← datavorous on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.