A technical question about CRC and md5 and whatnot...

Status
Not open for further replies.

hotdog003

Member
One article said:
A message digest (md5) is a compact digital signature for an arbitrarily long stream of binary data. An ideal message digest algorithm would never generate the same signature for two different sets of input, but achieving such theoretical perfection would require a message digest as long as the input file.
Another atricle said:
Given an input file and its corresponding message digest, it should be computationally infeasible to find another file with the same message digest value.

I've been looking into MD5 checks, and I have a question:
Couldn't you compress the size of a file to something ridiculously small by using a big huge md5 check and by putting a few bytes in various key points within the file? Using those 'key bytes', wouldn't you be able to piece together the rest of the file by using the md5?

Consider it as a game of minesweeper. Using the numbers and what you already know, you could tell what's in the other boxes by piecing together more bytes untill you have the whole file. Would that work, and if not, then why not?

Sure, it would be slow. Compression time would take a day and decompression time wouldn't be so hot either, but if I'm right, we could have a full installation package of SRB2 under 2 megs or so.
*coughcoughsavesbandwidthcoughcough*
Gosh, I wish I knew more C.
 
Google for the 3D, pixel-shaded, first person shooter that's under 96k.
 
I don't remember what it's called.

It's the 'final version' of a contest entry a team did to make an FPS that's under 64k.

It's really cool.. everything is generated on the fly using an algorithm and some key values. Takes freaking forever to load, though.
 
I'm just saying, if we could harness the power of, oh, 'md5 compression' I'll call it, then we could have an insanely small srb2 install package.

Self Extractor > Install Package > SRB2 data

1. Unsuspecting user downloads and runs self extractor.
2. Self extractor works it's magic (extracts SRB2 installer to temp folder - takes a little while)
3. Lowly installer does it's dirty work (extracts and installes SRB2 data)
4. Installer finishes. Self extractor notices, so
5. Self extractor packs up and scrams (discards installer, tells user that it's done)
 
Yeah, that's it... kkrieger.

The way that compression works though, hotdog, is that you create the data from the algorithms... there is no 'compressing files'. I.e., I'd have to think up and create the mathemetical formulas to generate all of the sprites, music, etc.. which isn't possible because they're so varied. If you play that FPS you can tell everything has an 'order' to it.
 
Well, frankly, I can't play it, since my card apparently can't even handle DirectX 7 properly (that I can tell; 9 is installed, but to think it can handle 9 is laughable). Damn Intel chipsets with their crappy integrated graphics cards and mobos with only PCI slots for upgrading... you make SH sad. Very sad. Watching HL2 plod along like a turtle on sleeping pills, all the while nullfying his flashlight... oh, how you disappoint me.

The sooner I get my awexome computer up and runnin', the better. But getting employment's a real pain in the rear...
 
'Compression files'?
What do you mean?

I'd say that the compressor works like this:

1. Calculate md5 of target file
2. Write md5 to compressed file
3. Write first few bytes of target file to compressed file
4. Compute what more bytes to write, if more are needed
5. Write those bytes along with the offset to compressed file.
6. Test decompressing the file. If it doesn't work, loop back to 4

The structure of a compressed file would look like this:
Code:
00-06: "md5Comp"
Loop for more than one file in archive
07: How long the name of compressed file is.
08-??: Name of target file
??-+32: Extended md5 (256-bit instead of 128-bit) of target file
+2: CRC check (for debug)
Begin byte data
+4: Byte increment offset
+1: Byte value
End byte data
End loop
Gosh, I'm confusing myself.
 
I said "compressing files" not, "compression files". Verb Noun, not Noun Noun.

What I mean is that when you use that super-compression thing you're talking about, the data can't just be anything. It has to have some kind of 'unity to it'... For example, in that FPS, if I remember right, there are only 90-degree turns in the level corridors, and a lot of things repeat themselves.
 
Really? Why is that?

I'm just talking about something similar to .ZIP Archiving, nothing more. Just a different, better way of doing it. Not some freakingly complex math equation for 'generating' the files from chicken scratch like graphing an equation. That would be hard.
 
Uh... yeah... why d'ya ask? It's not like it's some major revelation or something.

Just tried .kkreiger. Crashes instantly. That's, like, a new record. Tenebrae or FuhQuake's OpenGL component usually wait a good while before crashing, this does it from the bat.
 
Status
Not open for further replies.

Who is viewing this thread (Total: 0, Members: 0, Guests: 0)

Back
Top