sqpat Posted November 22, 2023 (edited) 14 hours ago, viti95 said: Another idea to reduce memory usage, compress better the IWADs. @fraggle has updated wadptr and fixed old issues, this reduces the amount of memory used for certain graphics, and specially the size of the sidedefs. Hmm, so my original intent was to make it be compatible with the original commercial WADs. However, there's no reason I can't add some initialization code (in an overlay, which wont take up ingame runtime memory) to run the wadptr code on the input commercial WAD at startup if it's detected as uncompressed then to compress the IWAD ... so for now, I will probably just use the compressed WADs and make a TODO to port wadptr into some 16 bit code that can optionally run at startup. Thanks for the suggestion. Some doom 2 sidedef lumps were over 64k and I didn't really want to write a kludge to deal with that. Edited November 22, 2023 by sqpat 0 Share this post Link to post
sqpat Posted November 27, 2023 Well, I spent a week cleaning up bugs but also trying out some optimizations. There are some notes in the comments of the codebase about precalculating lineopenings and i gave it a try. You can precalulcate it in level setup but have to treat it as a cache and mark things dirty whenever platforms move and sector floor/ceils change, basically. It wastes 6-7 KB and unfortunately its hard to measure the improvement. I tried average ten runs worth of timedemos and it's a tiny bit faster, maybe 0.25% on average or so? (is there a better way?) . I tried 8 bit validcounts for linedefs to save some space but it seems that's not enough uniqueness to avoid overlaps/collisions of values and bugs happen like a thousand tics into demos. Removing the back references in mobj_t (for sector lists and block lists) is not hard to do since they aren't used all that much. A few KB can be saved by rewriting a bit of code and removing sPrev and bPrev fields. I think they are ultimately a tiny bit measurably slower, so I may revert this later. I'm realizing that if i can get the EMS 4.0 improvements I mentioned earlier to work, then I will have way more memory than i need during the physics portion of the code, so all these tricks to reduce sizes of certain fields wont really matter - what will matter is only speed during that phase, and memory usage becomes more important during rendering due to the amount of memory used by textures. I'm going to clean a couple remaining broken features and known bugs, make a simple release build that can hopefully run all the shareware levels, then after that focus on allocating UMB space and EMS 4.0. 2 Share this post Link to post
sqpat Posted December 1, 2023 I've managed to fix a half dozen or so known bugs (corrupted palettes, timedemo desyncs, readthis rendering, level intermission crashes) and all the demos are running correctly again with all the improvements from the past few weeks. There is a weird random bug happening where the level begins to render in a mess - A lot of the walls suddenly rendering at the wrong angles among other garbage. I've managed to dump all the level data (sectors, lines, segs, etc etc etc) from memory to file after this has happened but all the data checked out as good. Automap also renders fine when this happens. I don't think this seems like a visplane bug either. I will probably check a few other things like if the trigonometry lookup tables are getting corrupted or something. I think once this last bug is fixed, I will cut the 0.1 release based off of that. Basically, all the shareware content plays fine except for E1M6 which is so much bigger than all the other shareware levels - I would need to clear up another 30kb or so to make it fit. That'll be easy once I have UMB allocation working, which should be another easy 64k (at least)- I will probably update with another minor release after that, then begin on EMS 4.0 work, which feels like it will free up at least another 100k. Stability is as good as it's ever been, outside of the random render bug popping up every couple minutes and other known issues (savegames, sound, nightmare respawns) I don't see too much wrong anymore. 2 Share this post Link to post
sqpat Posted December 4, 2023 (edited) 0.1 Release OK, the game has been working pretty cleanly in DOSBOX again, which for it's faults with compatibility at least makes development cycles so much faster since I don't have to mount images and stuff. Thanks to that I got tons of work done in the past few days - I cleaned up a dozen or so bugs and also converted lots and lots of physics and render code to 16 bit logic (then fixed the bugs that caused). Tomorrow I will figure out how github releases work and update the project page there, but attached here is the exe I assume will go out as RealDOOM 0.1. Basically this should work with all maps in shareware except E1M6. I don't *think* it will crash anymore - I haven't seen funny memory bugs recently, I think they are all fixed... There's some features not in there yet like sound of course, savegames, and screen wiping. There are some render bugs like overdrawing in the intermission screen, and a minor fuzz draw bug in timedemo3 and a also some sprite masking bugs. You need about 605-610k for this one, a little less than the previous one but if you use a standard nearly blank MS-DOS 6.22 config with EMM386 loaded (sample config.sys in the zip file) it should work. The zip also has the dstrings.txt file which is necessary. Don't forget to include DOOM1.WAD The 286-25 bench (screenblocks 5 hi quality Demo3) ran at 31821 realtics, compared to 36074 a month ago. That corresponds to about 12% faster, and the fps is around 2.37 now. This has pretty much just been from rewriting algorithms and refactoring code - there's still no ASM or anything. The pentium times are around 10% faster from the same period... it's hard to say for sure but I think some of the improvements aimed at the 286 specifically (like reducing shifts) have been beneficial. EDIT: I've run some extra benches, and it seems like on real hardware (vs 86box) i get 15% faster times on my 286 and 4-5% faster times on my 386 DX-40. My mmx meanwhile ran about 30% faster. Not really sure what's the reason, maybe memory access times or something, but it's a good sign anyway. I may try to get it to run on the turbo xt tomorrow for funsies... i wonder if it will beat 100,000 realtics... realdoom_0.1.7z Edited December 4, 2023 by sqpat 4 Share this post Link to post
realjohnmadden Posted December 4, 2023 Cool stuff. I'm surprised it runs, slow as it may be. I tried running it normally as well as running it on the highest cycle speed in DOSBox for funsies, and you can very well brute-force your way to a stable-ish 20fps through emulation, although that defeats the point. Not sure how helpful this will be, but I'll post some of my findings after a lot of messing around: - Doom 2 does not run, no matter what. I slimmed the IWAD down massively - only the textures used in MAP01 (as well as only MAP01), removing all of the sprites for the Doom 2 enemies and any enemies that aren't in MAP01 (e.g Pinkies, Chaingunners), and removing some sounds. It's now at just 4,637 KB, and it runs fine in the original executable, but not here, giving a "ran out of refs" error, presumably something to do with memory. - After beating E1M8 with the shareware IWAD, the text scroll doesn't show up. The game just hangs before it even appears, perhaps it's because of the screenwipe? I'm not super knowledgable about Doom's inner workings. - Compressing the truncated Doom 2 IWAD using wadptr didn't cause an "out of refs" error, but instead hanged on Init Playloop State. The same happened with the shareware Doom IWAD. Both IWADs work just fine in the original executables, so it seems it has trouble with the compressed wadptr output. 0 Share this post Link to post
sqpat Posted December 4, 2023 11 minutes ago, realjohnmadden said: Cool stuff. I'm surprised it runs, slow as it may be. I tried running it normally as well as running it on the highest cycle speed in DOSBox for funsies, and you can very well brute-force your way to a stable-ish 20fps through emulation, although that defeats the point. You can definitely get faster fps on 86box by a factor of 2 or 3 so, but yeah dosbox should at least be playable and is generally more convenient. 11 minutes ago, realjohnmadden said: - Doom 2 does not run, no matter what. I slimmed the IWAD down massively - only the textures used in MAP01 (as well as only MAP01), removing all of the sprites for the Doom 2 enemies and any enemies that aren't in MAP01 (e.g Pinkies, Chaingunners), and removing some sounds. It's now at just 4,637 KB, and it runs fine in the original executable, but not here, giving a "ran out of refs" error, presumably something to do with memory. Oh, yeah - sorry, I thought that was clear - only shareware doom is supported right now. Doom2 generally has larger levels and different content and will need more work. I haven't tested doom2 content at all and it may be buggy - especially viles and the last boss and stuff. Once I've freed up a lot more memory via some upcoming improvements, it'll make more sense to get things like doom 2 or sound working,. 11 minutes ago, realjohnmadden said: - After beating E1M8 with the shareware IWAD, the text scroll doesn't show up. The game just hangs before it even appears, perhaps it's because of the screenwipe? I'm not super knowledgable about Doom's inner workings. Oh yeah, the finale... I've not tested at all! I'm sure it's probably broken, haha. Maybe it is because of the wipe. Wipes were removed because they basically need 128KB of free space to run (two VGA screen buffers) which is a huge chunk of your 640 KB. Once EMS multitasking is in there I can probably make it happen. That reminds me - level restarts after game over also don't work - perhaps because of the screen wipe. You have to go to the menu and select a new game in that case 11 minutes ago, realjohnmadden said: - Compressing the truncated Doom 2 IWAD using wadptr didn't cause an "out of refs" error, but instead hanged on Init Playloop State. The same happened with the shareware Doom IWAD. Both IWADs work just fine in the original executables, so it seems it has trouble with the compressed wadptr output. Wadptr will crash without extra work on the codebase due to some of it's optimizations. In order to save space on indexing the wad, I calculate and store the sizes in a certain way and when wad entries overlap it creates 'negative size' entries which leads to trouble. I think what's likely to happen down the road is that rather than using wadptr, I will basically generate the wadptr-style compressed wad at runtime by finding duplicate entries and filtering out duplicates. Thanks for your testing and input! 0 Share this post Link to post
sqpat Posted December 6, 2023 Some updates - I got UMBs working, and now pull around 70k from there. I then went ahead and allocated a bunch more memory to level data allowing e1m6 to be playable. The conventional memory usage for the build went from 615818 -> 543738 -> 569802, which is still pretty comfortably low now. I might fix a couple bugs and make a quick 0.11 release before going heavy on EMS 4.0 multitasking prep work. EMS 4.0 hardware is somewhat more difficult to come by, especially for a board that will be compatible with XT class systems. The lo-tech EMS boards won't work, and a lot of other ones are 16 bit only I think. Later, faster 286es have chipset support and aren't a problem. I'm not sure how emm386 will work out yet. Meanwhile, I benched the 0.1 release on a few pieces of hardware over the past day. Below is a full quality timedemo 3 playback on a 4.77 mhz 8088. It's a little under a million realtics, about a 7 hour video and 0.0888 fps :) A V20 at 9.5 mhz finished the typical 5 screenblocks hi detail run bench I tend to run at 162157 realtics, which is about 1/5th of the speed of a fast 286. I have a 16 mhz turbo v20 board I can try it on too at some point. 5 Share this post Link to post
deathz0r Posted December 7, 2023 Now that is peak gaming content! Funnily enough, my Lo-Tech EMS card had just arrived yesterday (and still sitting in the post office until I can pick it up) and I was going to do the exact same thing with my NuXT for shits and giggles, great to see the results regardless :D 1 Share this post Link to post
sqpat Posted December 11, 2023 The github readme has been updated with a basic roadmap (and more timedemo scores) Release 0.11 is will come soon - It's not really too different from 0.10, I've fixed a few render bugs and also added UMB support to free enough memory to make the last remaining shareware level playable (e1m6). Upcoming releases will all have to do with varying levels of EMS 4.0 and multitasking support, so this will be the last EMS 3.2 playable version. This isn't a problem for late 286 machines whose chipsets should support these features or machines running EMM386, but earlier 286es without advanced ISA memory cards or XT machines dependent on lotech EMS cards and other simple EMS boards won't be able to run later versions. Maybe a repro card will come around that makes this easier to do, oh well. 0 Share this post Link to post
Mike Chambers Posted December 23, 2023 Thanks for doing this! I've wanted to run Doom on an 8088 for a long time. Yes I'm weird. 0 Share this post Link to post
Mike Chambers Posted December 23, 2023 I tried to run it in 86box in 8088 mode with an Everex EMS card but it says it needs 64 KB of UMB and won't run. config.sys is pretty bare, just loading the EMS driver and FILES=30 Nothing is being loaded in autoexec. DOS 6.22 0 Share this post Link to post
shroomie Posted December 23, 2023 this project seems impressive and all, i don't know that much about how DOS works, but are you aware that this project shares a name with one of Eric Harris' WADs? like i don't want anything messy or generally awkward to happen as this project moves forward. 0 Share this post Link to post
Plerb Posted December 23, 2023 22 minutes ago, shroomie said: this project seems impressive and all, i don't know that much about how DOS works, but are you aware that this project shares a name with one of Eric Harris' WADs? like i don't want anything messy or generally awkward to happen as this project moves forward. Considering how obscure that WAD is (I hadn't even heard of it until now), especially since it was apparently never even released, I highly doubt anyone will confuse the two. This project gets its name from Real Mode, a CPU operating mode which is limited to 640k of RAM and is what anything designed to run on an 8088 runs in. 0 Share this post Link to post
shroomie Posted December 23, 2023 1 minute ago, Plerb said: This project gets its name from Real Mode, a CPU operating mode which is limited to 640k of RAM and is what anything designed to run on an 8088 runs in. i skimmed the original post, i know that much 0 Share this post Link to post
sqpat Posted December 23, 2023 1 hour ago, Mike Chambers said: I tried to run it in 86box in 8088 mode with an Everex EMS card but it says it needs 64 KB of UMB and won't run. config.sys is pretty bare, just loading the EMS driver and FILES=30 Nothing is being loaded in autoexec. DOS 6.22 The 0.11 release needs UMBs - try the 0.10 release which didn't use them (basically the only difference between the two). You will probably have to make sure your config has as much free memory as absolutely possible - it's definitely easier if you have UMBs and DOSMAX or something like that available. 0 Share this post Link to post
The Doommer Posted January 6 On 12/23/2023 at 7:53 AM, Plerb said: since it was apparently never even released I'm guessing at least someone has it but they don't fess up because of probably the chance of being accused as an accomplice or the shame of it You're telling me no one sent an email to him? That being said, it's a good thing that name is being used for something better 1 Share this post Link to post
sqpat Posted January 6 Happy new year! Time for a big post on technicals and progress since it's been a while. A lot of big work has been going into RealDOOM recently, mostly having to do with EMS 4.0 rework. It's somewhat feature complete, with a few big nasty bugs I have to track down, but I'm sure a meaningful release isn't too far away. I went ahead and created a diagram of how the previous versions work versus how the current version works. It's not 100% exact but it should get the point across. Diagrams are pretty big and hidden behind spoiler text. Here is how RealDOOM up till version 0.11 has handled extra memory using the 64KB EMS page frame. Compare this with vanilla doom, where all this extra data is just freely accessible in memory at any time. Things had to be paged in to the EMS page and a cache system using a LRU (least-recently-used) algorithm replaced the most stale data with newer data as required. Any time new data was needed, it'd just get allocated to the next spot it fit in the logical EMS pages. This page frame system is basically the way EMS worked up until EMS version 3.2, and it was one of the main ways where the 640k memory limit on IBM PCs and DOS was overcome. Spoiler EMS 4.0 eventually came along and made multi-tasking operating systems possible by allowing a big portion of main memory to be swapped, similar to the page frame. Multitasking operating systems would page out entire programs and large regions of memory (384 KB worth, the region from 256KB-640KB) at a time. Using this strategy in RealDOOM, it's no longer necessary for each variable to be paged in and out individually. Instead, we can have a fairly small number of memory setups we switch between. Spoiler In the old system, I had a bunch of code that had to figure out what data to page out to make room to page more data in, and this code originally ran thousands of time per game tic, but I got it down to several dozen eventually. In the last release (0.11), I think there were about 60000 cache misses resulting in page swaps (out of 800,000 accesses to the page frame) to run timedemo 3 on screenblocks 5 and high quality. Each of these cache misses was a relatively slow operation, mostly due to all the LRU updating and overhead. The EMS page swap isn't exactly fast, but it's not so terrible to do on its own either. There are a few other cool aspects to this system - I am loading data for trig tables, strings, mobj states, and stuff like that dynamically from data files at runtime and placing them memory. In fact, every variable in the 256-640kb range is being placed at exact locations I determine at runtime. So once ASM code starts to get written, I can load that ASM code dynamically too, and then modify it in memory to use exact data addresses and immediates instead of loading pointers from variables which should produce faster code. I can also place relevant data into the same segments. That ASM code can also be paged in and out as necessary. And now there's also the fact that there's very little static data and a lot of dynamically loaded leading to a very small binary. The binary is under 256k, and a lot of that is overlayed startup code. WLink reports the conventional memory usage is at under 190,000 bytes right now. Most of that is code - there is less than 64k of data in the binary now, which means additional memory models have become possible. Which may also mean another compiler? Now it the new system, for the most part all the page swaps are being done in predetermined fashion - we know what memory we want active in a given part of the runtime and call functions with preloaded arguments for the page swap interrupts. Where there used to be about 60000 cache misses + page swaps on 800,000 calls to the EMS memory access code, there is now 50000 page swaps in total that do not involve any sort of LRU maintenance. There is one exception, which basically has to do with all the texture stuff. Even in shareware doom, many levels need 500-750 KB of textures each, which wont fit in memory at once. So some space is allocated to a texture cache and we do run an LRU algorithm in there that manages the caches for patches, flats, composite textures, and sprites. The current architecture has some empty space and room to grow, and I'm pretty sure it will be able to handle the biggest levels in DOOM 1 and DOOM 2. I have also completely removed any use of the page frame - so technically I have another 64 KB I am not using for anything. I am sort of imagining this will eventually be used for sound data, but maybe I will allocate it as extra texture cache in the meanwhile. There's been a noticeable performance improvement - sort of. Funnily enough the performance improvement is more noticeable on faster machines. Pentiums are 5-10% faster and its like 1% for slower machines like 286-386. I think that just boils down to the fact that pentiums do not waste a lot of time doing things like multiplications and divisions while 286 processors do, and there's not a ton of slow mult/div calls going on in this memory management code. So pentiums were wasting a comparatively higher amount of their cycles in this code. Aside from this, I also got screen wipes working again, and the finale screen is working. I have to re-add dynamic visplanes, which were recently removed, but 60 is fine for all the timedemos so I haven't bothered to fix this yet. I'm also upping the memory requirement from 2 MB of EMS to 4 MB, which hopefully will stay there even for doom 2. Partly out of laziness I have not implemented cache eviction for the texture cache, which made the memory numbers balloon a bit. Texture cache clears after a level of course, but I don't bother clearing unused textures to save space mid-level. I will probably do a release for version 0.15 with these EMS 4.0 features once bugfixing is complete. The high level roadmap going forward: Version 0.16 will mostly revolve around a medium memory model build. I may investigate another compiler, but openwatcom overlays save me 30KB right now... Version 0.20 will mostly revolve around savegame support and commercial DOOM 1 support. Version 0.21 will mostly revolve around commercial DOOM 2 support. Heavy optimizations and ASM work will begin after that with some small tasks first, but eventually a full handwritten ASM rewrite of the render codepath (RenderPlayerView) being the goal. The ASM stuff sounds like a lot of fun and I can't wait to get to it. I have a lot of ideas . . . 7 Share this post Link to post
slowfade Posted January 7 Wow, you've done a lot! Good to hear it's going so well. 0 Share this post Link to post
sqpat Posted January 11 (edited) Version 0.15 is now released. You can read the release notes for more details - a few features were re-implemented or fixed, but mostly it was an architectural rework. It makes doom1 and doom2 support much more straightforward, which should come up soon. - EMS 4.0 support is mandatory of course now. I think you technically need 2.8 MB or so, but let's just say 4 MB going forward. I don't think DOOM 1 will require more. DOOM 2, maybe not either. - The requirements on memory and UMBs are pretty tight. You need about 90 KB of UMB space in additional to the EMS page frame, so basically C800-EFFF all needs to be free. Until commercial doom1/doom2 support art added and I know exactly how much memory I need for certain fields, I'm going to leave things this way. - You need about 604k or so free in conventional sapce the full 384kb region between 256K-640K, then about 220k free below that to fit the binary, stack, etc. This isn't a big deal on EMM386 machines, but it's tighter than before on a 16-bit machine with an EMS driver in low memory. - There is a 286-optimized binary now - it's about 6-7k smaller than the 8088 library. I'm not sure, it might be just because of shift instructions (8088 can only shift one bit at a time, which is funny). The conventional memory requirements should become less tight with 0.16, as the goal of that is a medium memory model compilation, which should naturally result in smaller code. EDIT: I quickly tossed together a medium memory model build, which of course isn't really bugfixed or running right (will take some time, it seems I need to write far versions of fread/fwrite among other things) but immediately there is another 12 KB of memory savings in the binary. If a decent amount of memory frees up, I may be able to move the more critical items into the main data segment to speed things up down the road. Edited January 11 by sqpat 1 Share this post Link to post
Frenkel Posted January 11 On 1/7/2024 at 12:18 AM, sqpat said: The ASM stuff sounds like a lot of fun and I can't wait to get to it. I have a lot of ideas . . . I use some assembly in Doom8088, but it doesn't speed up the game that much compared to the gcc-ia16 generated code. In most cases FixedDiv(a, b) can be replaced by FixedMul(a, FixedDiv(0x10000, b)) or actually FixedMul(a, 0xffffffff / b). (Except for P_InterceptVector() which needs FixedDiv() to keep demo 3 in sync.) To calculate the reciprocal of a fixed_t value gcc-ia16 generates generic code that divides two int32_t values. To calculate the reciprocal one of the input values is always 0xffffffff, so I made my own reciprocal specific assembly code. I got a speed boost of 4% by programming the loop in R_DrawColumn() in assembly. Then I noticed it could be faster if the segment registers could be set once at the start of the loop, instead of switching during every iteration between pointing to the texture source, video memory destination and the colormap. So I put the colormap in near memory. This change also sped up the C code variant, so now the assembly version is only 0.8% faster. :| 0 Share this post Link to post
sqpat Posted January 12 (edited) 1 hour ago, Frenkel said: In most cases FixedDiv(a, b) can be replaced by FixedMul(a, FixedDiv(0x10000, b)) or actually FixedMul(a, 0xffffffff / b). (Except for P_InterceptVector() which needs FixedDiv() to keep demo 3 in sync.) I'm guessing this is a good approximation but not necessarily 100% accurate? There's a number of these 'good enough but not perfect' types of optimizations. (For example, all the hassle with sine table special cases for off-by-one errors). For now I'm leaving them out, but once the project is further along, I can imagine making a branch with a number of changes that break timedemo compatibility but aren't noticeable in normal gameplay, saving a bunch of memory and speeding things up. But maybe at the same time there is a way to do the FixedDiv thing in a lossless way that does not break timedemos? Quote I got a speed boost of 4% by programming the loop in R_DrawColumn() in assembly. Then I noticed it could be faster if the segment registers could be set once at the start of the loop, instead of switching during every iteration between pointing to the texture source, video memory destination and the colormap. So I put the colormap in near memory. This change also sped up the C code variant, so now the assembly version is only 0.8% faster. :| Yes, I don't want to do too much assembly optimization too early myself because I think a better compiler and optimizer might do most of the work. But I think there is a lot of savings to be done with data locality within segments. I also really want to investigate some other crazy ideas - for instance hacking the data segment at startup. For example, if we know all the initial values for the variables in the data segment at compile time, we can output that to a file and then say "let's set DS to 0x4000" - which is a pageable EMS region. And we load the default variable values into that segment from the file - but now it's in a pageable EMS segment and that wouldn't only address 64k of memory, but maybe 128k, or 192k of effective memory depending on paging and what code was running at a particular time. I have already done some things with far variables such as (not an exact example, but for illustrative purposes) #define thinker_list ((mobj_t far *) 0x90000000) This is possible because at run time I am placing this variable at that memory address, and paging the necessary EMS pages in whenever it needs to be accessed. However, if we change the location of this variable to segment 0x4000 offset 0x4000 maybe we could do: #define thinker_list ((mobj_t near *) 0x4000) Now, thinkers might be really useful in P_Ticker but not in D_Drawer, so in that case we page out the thinkers from there and instead put seg_t, sector_t, line_t, etc in that region. Then we could at the same time have alternatie __far versions of these defines, pointing to a different address range and page these variables to those areas in regions of code when they are needed, but less frequently accessed. This whole idea might require a custom compiler solution, but that's not out of the question either considering the toolsets are open source. 0 Share this post Link to post
sqpat Posted Tuesday at 01:15 AM I managed to save another 15-20k in conventional memory again today, by pulling the lumpinfo data (basically the WAD directory) into EMS pages and only pulling it in dynamically when necessary. In order to make this work I had to save a more accurate picture of what was currently in memory at a given time so I could page certain areas of memory back and forth like a stack. This was really necessary because with DOOM1/DOOM2 looming, this structure would grow from 16k or so to 25 or 30k or more. In practice it only increased EMS page swaps by a couple percent and speed wasn't affected. I will extend this idea of having a clearer picture of what's in the memory at a given time, and see if i can reduce some instances where i page something that is already in memory, reduing some delay. (But I sort of have a feeling I'm not doing it too much). Additionally I'm going to keep hammering at the DGROUP/data segment, which now stands at about 20,000 bytes (including 3000 bytes allocated to stack.). Moving DS to 0x4000 will be much easier if the actual near data fits in just on EMS page and i can dynamically move things in and out of the other three pages of the 0x4000 segment freely. Some other crazy ideas: - openwatcom overlays have proven to be buggy, so that's like 40k of memory savings I have lost by taking them out. Basically, there's just a whole bunch of initialization code that runs once and not again (or other things like credits code, intermission code, etc that in theory could be paged in and out). Anyway, If i can't use overlays, then what I'll try is just packing those game initialization functions near each other in the game binary, then at runtime once initialization is done, just zero out that memory region and use it for something else. - move sine tables back into low memory (< 0x4000, but not near allocations) if i can free up 48k down there, which doesn't look too hard. I'm already around 25-30k free and medium memory model already gives me 7k more. This will free up almost another 48k in EMS pageable memory, which can easily be used for more texture cache. - move a few more static data fields to files - things like sprnames, switchlists, animdefs... these add up to a few KB of the 20kb i have allocated, and really they can be in files and loaded into temporary variables instead of always being in memory. Otherwise, the medium memory model builds, but fails during certain file opens. I had to write a far version of fread, which works fine, but my far version of read is not working. I may be better off just replacing open/read/write with fopen/fread/fwrite anyway, but I have to examine it's use more closely - i think maybe it is keeping the file handle to reopen the wad file over and over so it's a little more complicated. 1 Share this post Link to post
sqpat Posted Wednesday at 01:42 AM I tried idea 1 above (zeroing out init code memory) and it worked fine, after i made sure to remove all functions that were truly "game setup" and not "level setup". This creates an empty area of about 13k. I haven't figured out what to put in there yet. While this is a low memory region it's still a far data segment. Maybe I can use it to reduce UMB usage. However, i tried moving a few other regions (idea 3 above) into files and loading. and it seems that because of the way the initialization code is ordered... once I do an EMS pagination involving segment 0x4000 (0x4400 is fine) all later fopens fail. It's not even a problem with fread or fwrite - the fopen itself fails. I think this is because far malloc returns an address barely in the 0x4000 segment area, and something internally must be using farmalloc when I call fopen in a large memory model, so I need to reduce memory usage a bit more. I think whats going on is, while my binary has like 190k or so used and 16k or so is from the data segment - the near data segment (at the end of the binary) is considered to extend to 64k, so the first far segment address for a far malloc is going to be after the near data segment is extended to its maximum, which pushes into that 0x4000 region. Maybe after I cut memory usage several thousand more bytes, this problem will disappear. But i've actually had a lot of these weird errors pop up (including when I use overlays) where everything just breaks after I EMS paginate that memory area. I may have to look into the openwatcom source to understand things more. 0 Share this post Link to post