Not exactly sure if this is the right forum, but it's a start. I want to get into the ROM of Harvest Moon 64 in search of in-game text or dialogue from NPCs, then scan said dialogue or text for key words. Is it as easy as opening it up in a hex editor or do I need to go another way? Sorry for sounding like a total noob, I know nothing about ROM analysis or stuff like that.
In this case, unless I'm really terribly wrong about it, the text should be images which are revealed one fixed block at a time. In fact, the game is entirely uncompressed and consists in very large part of images stored (usually) in archives wrapped with one or more directories for lookup. Game's annoying though; except for sound files all the data banks are hardcoded in ASM. No nice, simple tables. That said, I haven't actually identified the script's text itself. I've found other examples (names, the credits roll, and a lot of other stuff) and ruled out the use of texrects or anything resembling a typical font writer. Also, there doesn't seem to be an archive filled with entries that match the expected size for the font. Sadly, palette selection for indexed images doesn't seem to follow a pattern, so not sure how much of a hassle it would be to automate ripping everything out.
I'll put you guys out of your misery. The font is stored as an i2 image with a width of 16 pixels. In other words, each word is one row. The text is stored as an encoded value that's decoded into an index within the font, then uses that to generate an image for the string. So that means there's 663 characters to map out before you hit the unused section. Have fun with that ;*) In addition to the font allowing multibyte sequences, there's also some control codes embedded. 0 seems to trigger the mouth movements. Text values are just slightly offset from actual index values. Here's an example from the attract demo. Control chars are in brackets. 10094D B3 CA 0E [01] D8 F9 CC D4 [00] D3 CA F0 F0 F0 F0 F0 F0 [00] F9 [00] B4 D9 F9 C6 D1 D1 [00] F9 CD C6 D5 D5 CA D3 CA [00] C9 [00] D8 D4 F9 CB C6 D8 [00] D9 F0 F0 F0 F0 F0 F0 F0 [00] F0 F9 031E020000 -printed as: He's gone...... It all happened so fast........ Scripts, just like images, are two-part archives. One contains the scripts, the other a table of offsets to each one. I've only positively identified one's use: 0xE58160 bin attract.script 0xE583C0 bin attract.idx Preliminary ASCII mapping: AC 'A' AD 'B' AE 'C' AF 'D' B0 'E' B1 'F' B2 'G' B3 'H' B4 'I' B5 'J' B6 'K' B7 'L' B8 'M' B9 'N' BA 'O' BB 'P' BC 'Q' BD 'R' BE 'S' BF 'T' C0 'U' C1 'V' C2 'W' C3 'X' C4 'Y' C5 'Z' C6 'a' C7 'b' C8 'c' C9 'd' CA 'e' CB 'f' CC 'g' CD 'h' CE 'i' CF 'j' D0 'k' D1 'l' D2 'm' D3 'n' D4 'o' D5 'p' D6 'q' D7 'r' D8 's' D9 't' DA 'u' DB 'v' DC 'w' DD 'x' DE 'y' DF 'z' E0 '1' E1 '2' E2 '3' E3 '4' E4 '5' E5 '6' E6 '7' E7 '8' E8 '9' E9 '0' EA '?' EB '!' EC '-' ED '~' EE '・' EF ',' F0 '.' F1 '/' F4 '&' F9 ' ' Known text banks. First is scripts, followed by its offset table: (Everything between the font and the sound. Or, of the ~920 files, 284 of them.) 0xE13800 0xE13920 0xE13990 0xE13C30 0xE13C60 0xE167C0 0xE16A90 0xE19F60 0xE19FF0 0xE1B3A0 0xE1B4E0 0xE1C8C0 0xE1C9C0 0xE21150 0xE21700 0xE218A0 0xE218D0 0xE24E20 0xE24F60 0xE24F70 0xE24F80 0xE25170 0xE251D0 0xE28460 0xE28720 0xE2A230 0xE2A380 0xE2D080 0xE2D330 0xE2E500 0xE2E600 0xE2F650 0xE2F730 0xE30ED0 0xE31010 0xE33020 0xE33190 0xE34650 0xE34790 0xE36000 0xE36150 0xE39360 0xE39610 0xE3C700 0xE3C9A0 0xE3D600 0xE3D6D0 0xE40B90 0xE40E50 0xE42010 0xE42100 0xE42D50 0xE42E60 0xE43E80 0xE43F60 0xE45130 0xE45200 0xE461C0 0xE46280 0xE47140 0xE471E0 0xE485D0 0xE48720 0xE49480 0xE49540 0xE4A230 0xE4A2D0 0xE4B6E0 0xE4B7D0 0xE4C8D0 0xE4C9B0 0xE4E230 0xE4E320 0xE4EC70 0xE4ED10 0xE4F370 0xE4F3F0 0xE50520 0xE50600 0xE51A00 0xE51AE0 0xE52740 0xE527E0 0xE53450 0xE53540 0xE54C60 0xE54D60 0xE55AE0 0xE55BA0 0xE569C0 0xE56A80 0xE58010 0xE58160 bin attract.script 0xE583C0 bin attract.idx 0xE583F0 0xE58880 0xE588D0 0xE5A6D0 0xE5A8C0 0xE5D240 0xE5D4C0 0xE5E270 0xE5E340 0xE5F220 0xE5F300 0xE60080 0xE60170 0xE64680 0xE64AF0 0xE66260 0xE663F0 0xE677B0 0xE678E0 0xE68620 0xE68700 0xE68BA0 0xE68BF0 0xE6ADC0 0xE6AFE0 0xE6F1E0 0xE6F5C0 0xE714D0 0xE71690 0xE72680 0xE72780 0xE73070 0xE73110 0xE7C7B0 0xE7D1E0 0xE81660 0xE81F50 0xE82B30 0xE82C40 0xE83880 0xE83960 0xE84690 0xE84760 0xE84910 0xE84990 0xE84F50 0xE84FD0 0xE85730 0xE857A0 0xE87080 0xE871A0 0xE871B0 0xE871C0 0xE872C0 0xE87320 0xE87BB0 0xE87CE0 0xE88190 0xE88240 0xE88B20 0xE88C60 0xE89070 0xE89120 0xE89600 0xE896B0 0xE89B60 0xE89C00 0xE8A460 0xE8A590 0xE8AE00 0xE8AF30 0xE8B7A0 0xE8B8F0 0xE8BCE0 0xE8BD80 0xE8C000 0xE8C070 0xE8C400 0xE8C480 0xE8C750 0xE8C7D0 0xE8CBA0 0xE8CC40 0xE8CF90 0xE8D010 0xE8D3F0 0xE8D470 0xE8D7B0 0xE8D830 0xE8DA90 0xE8DB00 0xE8DDF0 0xE8DE60 0xE8E0E0 0xE8E160 0xE8E490 0xE8E520 0xE8E780 0xE8E7F0 0xE8EAC0 0xE8EB30 0xE8EE50 0xE8EEC0 0xE8F1E0 0xE8F250 0xE8F550 0xE8F5C0 0xE8F7A0 0xE8F810 0xE8FA00 0xE8FA70 0xE8FC90 0xE8FD00 0xE90000 0xE90080 0xE903D0 0xE90460 0xE90810 0xE90890 0xE90B00 0xE90B70 0xE90DD0 0xE90E40 0xE91210 0xE91290 0xE914C0 0xE91520 0xE917E0 0xE91860 0xE918A0 0xE918D0 0xE91900 0xE91920 0xE91970 0xE919A0 0xE91A50 0xE91AB0 0xE91AD0 0xE91AE0 0xE91B50 0xE91B80 0xE91B90 0xE91BA0 0xE91BF0 0xE91C20 0xE91CE0 0xE91D30 0xE91DC0 0xE91E00 0xE91EF0 0xE91F60 0xE91F90 0xE91FB0 0xE91FF0 0xE92010 0xE92030 0xE92040 0xE92180 0xE92220 0xE922E0 0xE92370 0xE92380 0xE92390 0xE924A0 0xE92520 0xE92550 0xE92570 0xE925B0 0xE925F0 0xE92600 0xE92610 0xE92630 0xE92640 0xE92800 0xE928B0 0xE92910 0xE92950 0xE92B40 0xE92B90 0xE92C50 0xE92C70 0xE92DB0 0xE92DE0 0xE92E90 0xE92EC0 0xE92EE0 0xE92EF0 0xE92F90 0xE92FF0 0xE93060 It's actually a bit intresting how it draws the font. To save processing it draws each string out character by character once, then references this image. In that respect it's very much like Virtual Pro Wrestling, VPW2, and their alter egos. The only really annoying part is they didn't use a traditional character mapping--thankfully many other N64 titles do. Oh no, they take a string object stowed at 8030B000, then encode is as a series of 16bit offsets into the font table in a temporary buffer at 80204BF0. There aren't even newlines. They use spaces until they hit an auto-linewrap at 16 chars. Built a primitive extractor for images. Seems to be able to pull stuff out, presuming they used the image in a frame someplace. granted, you have to know what blocks are what though ;*)
Wow dude, nice find! So I jumped into a HEX editor and see exactly what you are talking about, but the image you created, how exactly could I use it to translate the values to the characters? I'm trying to understand all of this, but like I said, I haven't much experience with this. Anything I should be doing/is missing?
Sorry, I only have access to internet at work and it's been a tad bit busy lately. The character codes I listed are in hexadecimal, which is a counting system where you go up to 16 instead of 10. Since you can count 10-15 they use letters for those (A=10, B=11, etc.) The order letters go in happen to follow the font image. They're all in order, but special commands use some values too. If you want to shortcut, you can count backward or forward from the letters you do know--the ones listed, in other words. The first image isn't zero though. The way you usually go about this is to make a table of all the image codes (like the one I listed), one line per value, followed by a space or tab, and then the character it displays. You could use a table like that, along with the image and the binary, in a tool like TheCheat or other font viewer. You'll probably have to make the table by hand though; images that small don't OCR properly. Personally, I'd just write out a python script and directly map the table to binary and spit out text. If I have time I'll dump the text for you. Like I said though, work has been murder lately. (Figuratively--well, so far.) I don't know the purpose of this, so would you need the commands encoded as well? Please say you don't need the Japanese too...
Sweet man, thanks. All i will need is the English, and having the commands would be nice if it isn't too much more work. If you do end up dumping the text I'll make sure your time is paid for properly.
Sorry, major holiday and all. Since I had some extra time also dumped the sprites. Text is in the "dlg" folder. Also included python scripts used for the conversion. https://www.mediafire.com/?pc9vp88wxjhvh8h Seem to remember some of the characters share banks of text. I think Karen and Popuri events are. To be honest I was sort of guessing on names since they never blasted tell you any. They're also annoyingly inconsistant with spellings when they do. "Cain" and "Kane", for instance. Not terribly professional. Keep your money for something more useful.