You found it --- this tells you how to use the JIT compiler. First things first: This is not an explanation on how to use UAE under linux, or how to use UAE in general. There is other documentation about that, and if you are not familiar with it, PLEASE read it! This document only ever mentions stuff specific to the JIT compiler versions! More disclaimers: This stuff is still definitely in a pre-Beta stage. It works for me, and I am trying to make sure it works for others, too, but it is impossible to test even a significant portion of the possible configurations. Most of my testing is done on 8 bit screens. Other bit depths *should* work, but have seen little to no testing. Things might crash at any time, and in interesting ways, and while you can curse me for it, that's the worst you can do. No liability whatsoever! There are two executables, uae_Xwin and uae_DGA2. Normally, you will want to use uae_Xwin, it is the much more mature and less experimental one. Both will connect to the X display in your DISPLAY environment variable, bring up the GTK Gui (unless you disable it in the config file), and start the emulation. If you have a working configuration file for UAE/linux, then you can use it as-is with these executables. However, be aware that the default settings for the new configuration options are extremely conservative, and to get best performance, you should really change them (see below). [Update: This is no longer true. The default settings are now pretty much optimal, and you probably won't have any reason to change them!] If you don't have a working configuration file, each executable comes with a sample config file. Of course, you'll have to change a lot of options, because your setup and mine are different, but it is a start. I recommend, however, that you first get and install pristine UAE 0.8.15, and make sure you *do* have *that* working correctly. UAE-JIT will have no chance whatsoever of working correctly otherwise. There are several new options in the config file. PLease take the time to read through this, so you know what you are dealing with! comptrustbyte: comptrustword: comptrustlong: Possible values are direct, indirect, indirectKS, and afterPic. ***** These options are obsolete now! Leave them at their ***** ***** default value of "direct" unless you really, really ***** ***** have a good reason for changing them! ***** These describe how aggressive to be when it comes to accessing Amiga memory. If you choose "direct", the emulation will be very aggressive. If you choose "indirect", the emulation will always use the slower but safe method. "indirectKS" will use the aggressive method for all code except Kickstart code, and "afterPic" uses the safe method until the first time a Picasso96 mode is switched on, and the aggressive method from then on. I usually use "afterPic" for all of them; If this fails (you get a core dump and UAE exits suddenly --- for me that happens when starting SysInfo or GeneticSpecies2), it usually is enough to set comptrustbyte to "indirect". Defaults are "indirect" for all three. Unless you are not using P96 graphics (why not?), there isn't much point setting this to "direct". During the startup, weird and wonderful things happen in the Amiga, and only having faith in the aggressive method once that difficult time is over is certainly a wise thing to do. comptrustnaddr: Same as above. ***** This option is obsolete now! Leave it at its ***** ***** default value of "direct" unless you really, really ***** ***** have a good reason for changing it! ***** I have yet to find any software that can't handle "afterPic", and I'd be very surprised if there is any. If you find something that works with "indirect", but not with "afterPic", please tell me! compnf: "yes" or "no". Whether to optimize away flag generation when it isn't needed. There really shouldn't be any reason why you'd want to set this to "no"; If you find something that works with "no" and doesn't with "yes", that's a bug and I need to know about it! The reverse is a bug, too, but hopefully I squashed that one before the release ;-) cachesize: The size (in kb) the JIT compiler uses to store pretranslated code. When this becomes full, or when the OS issues a flush icache instruction, this gets completely emptied, and then refilled during execution. Setting it to 0 will disable the JIT compiler. comp_flushmode: *NEW* "hard" or "soft". If this is set to soft (the default), an OS induced icache flush doesn't actually empty the cache, but instead checksumming will be used to check whether blocks have to be discarded. You'll probably want to leave this at its default (otherwise lots of stuff, like the OS, gets translated over and over). comp_constjump: *NEW* If this is "yes" (the default), unconditional branches will not end a block; Effectively, UAE-JIT compiles "through" them. Generally, that's a good idea, as it improves performance. However, it makes soft cache flushing impossible for some blocks, so if you experience lots and lots of soft cache flushes (e.g. when using a Mac emulator), you might try "no" and see whether it does any better. compfpu: If this is "yes" (the default), the JIT compiler will be used for the most commonly used FPU instructions. Setting it to "no" will disable JIT-compiling for the FPU. [Note: The "unroll" option is no longer supported. You should remove it from your config files if it's still in there] [Note2: Setting some of those options to sub-optimal values will cause UAE-JIT to exit with a message pointing at README.JIT-tuning] ================= All of the above can be set from the GTK GUI, too =========== ================= The options below are one-time, config-file only ============ avoid_cmov: "yes" or "no". If you have a processor that doesn't support the P6-class CMOV instructions, you have to set this to "yes". The JIT compiler will then not try to translate any instructions for which it would generate code with CMOV in it. Better slower than "illegal instruction", right ;-) avoid_dga: If you use the Xwin executable, setting this to "yes" will stop it from even looking for the DGA extension. Obviously, it won't use it, either. avoid_vid: If you use the Xwin executable, setting this to "yes" will stop it from even looking for the Vidmode extension. Obviously, it won't use it, either. [Note: the following options are not available in the "sanitized" versions of UAE-JIT. The executables made available on byron@csse.monash.edu.au are not sanitized, but if you compile your own from the patches, you need to include the "extra options" patch to get these. And don't take my use of the plural in this paragraph to mean anything --- it is generic ;-) ] override_dga_address: If you use the DGA2 executable, this will allow you to override the linear frame buffer address DGA2 detects. Try it first without this, but if you just get a blank grey screen (and F12-S gets you a window with the right content), your XServer might get it wrong (seems fairly common, in fact). Find out the linear frame buffer address (preferably by looking at /proc/nnnnn/maps, with nnnnn the pid of the X server --- look for a mapping of /dev/mem with the right size; The offset of that mapping is the value you are looking for). In this option, you provide the *upper 16 bits* of that address. So if your linear frame buffer is at 0xd5000000, you set override_dga_address to 0xd500. Yes, the config file will take hex numbers. ============================ End of Options ============================= Many of these options can be changed through the GTK UI. However, as many of them influence code *generation*, changes will only take effect when code is newly translated; The already translated code in the cache is uneffected. In order to make your changes take effect, you need to force a hard cache flush. The easiest way to do so is to change the cache size by some small amount. Remember this step if you try to benchmark the result of various option settings on performance, otherwise results will be rather inconclusive ;-) How to get the maximum performance: ----------------------------------- Here are a few tips on how to get the best possible performance, and to avoid common pitfalls. * Use a 2.3.*, or even better a 2.4test* kernel. Without it, you might not be able to do aggressive memory modes (see README.JIT-tuning) * The really aggressive memory modes use sysv_shm. By default, the largest sysv_shm block you can allocate at one time is 32M, so if you have a larger Z3Mem, allocation will fail and the aggressive modes get disabled. You can change the max size through /proc/sys/kernel/shmmax, the first parameter is the max size. * Use Picasso96 modes! * Use DGA for your actual display! (If you don't, you CANNOT make any comments about sluggish gfx performance. Understood?) * Alternatively, use CGX3 with direct access to an S3 Virge PCI card (see README.pci) * Set as many of the comptrust* options as possible as aggressively as you can without creating a crash [*** obsolete ***] * For the adventurous: If you use the DGA2 executable with an XFree86 4.0x server, AND select a Picasso video mode that has the same width as your X virtual screen[1], AND haven't done anything else to prevent you from using aggressive memory access (like setting comptrust* to indirect), you *should* end up with vastly faster gfxmem access. This is still buggy, occasional display corruption when using blits occurs. But for seeing how fast Doom can go on an "Amiga", this is the ticket ;-) * If your app comes in versions for different CPUs, try all of them. I have had good experiences with using 040 versions, particularly of RC5 (use "-c 2" to select the 040 core). Of course, this only works if the 040 apps don't use 040-specific features, or if you have enabled 040 support for UAE Feedback: ========= I need to know about remarkable experiences you have, but I really don't need to know about unremarkable things. Here is a little guide as to what is what: Remarkable: * Something that works with the compiler disabled, but fails with it enabled * Any occurrence of "illegal instruction" (from Linux) on a P6 class machine, or a P5 class machine with avoid_cmov=yes * Any failure to boot with a config file that does boot "normal" UAE/linux * Anything else that you can clearly identify as an emulation bug, rather than as a configuration, hardware or user problem * Any patches you can come up with * Any offers of sponsorship for further work on it ;-) Unremarkable: * Any failures attributable to memory shortage * Any problems you might have with linux, UAE or the Amiga in general, not specific to the JIT compiler version * Any statements to the effect that I am a traitor, a lamer, a wannabe, a loser, a demigod, a guru, a procrastinator, or anything else along those lines * Any non-constructive criticism of my coding style. Remember: The only valid form of criticism is a patch! (Of course, certain people are excepted from this, notably everyone who would be involved with integrating this code into other UAE versions ;-) If you think you found something remarkable, PLEASE let me know. And please describe the circumstances as precisely as possible --- only if I can recreate the fault can I have a real shot at figuring out what went wrong. Good luck, and looking forward to your feedback, Bernie (bmeyer@csse.monash.edu.au) P.S.: There is some output to stdout/stderr while running (and some directly to the tty). The lines that pop up every second have a number of fields. Here are short explanations of each: * compiled: The total number of bytes the compiled code (and the related bookkepping information) takes up * soft: Number of soft cache flushes done in the last second * hard: Number of hard cache flushes done in the last second * trans: Number of 68k blocks translated in the last second * check: Number of 68k blocks that had their checksums check in the last second as a result of a soft cache flush * lost: Time "lost" during the last second of emulation time. This should be 0, but if the emulation can't keep up for some reason (like file I/O happening), it can be larger. The output is in seconds; Keep an eye on this if you do self-timed benchmarks! * debug/2/3/4: Internal counters I use for debugging. If you have software that can make debug3 and/or debug4 reach more than 100,000, please tell me about it --- these are counting non-compiled FPU instructions executed. [1] In reality, things are even more complex --- what you need to match is the pitch of the mode. Normally, that matches the virtualwidth, but my Trident 3DImage975 uses a pitch of 1024 for a 640 wide mode....