Saturday, April 18, 2020

Help us preserve the original Furby!



Update 2020-04-27

Given submissions have tapered off we've taken down the server. Thanks to all that have contributed so far! We'll let everyone know when more information is available.

Update 2020-04-26

Thanks to all of those who have contributed! We've been running for about a week, and results have leveled off around 80% complete and currently around 84 complete%. We've now have some statistics and a few proposals.

Some basic statistics:
  • Pages: 297
  • Lines: 19510
  • Page submissions: 744
  • Line changes (roughly): 10297
  • Change 2/3 agree: 9191
  • Can't 2/3 agree: 1106
Of those 297 pages, we have all of them with at least two submits and about 50% have three submits. These results were combined to result in about 50% of lines flagged for adjustment. Of those suggestions, about 89% of agree. Based on existing data, getting all 3 sets of challenges completed will reduce that to about 600 still requiring manual review.

A few more advanced heuristics were also tried (ex: partial line matching, weighting user results based on how much we trust their results), but ultimately wasn't convinced any of these are the right approach.

So, where does this leave things? Two main options are being considered:
  • Push the annotated source to github or gitlab as is. We estimate that it would take someone about 6-12 hours to fix, which is not intractable. Default would have been the furby-source repository on github, but they have stopped responding
  • Restart the crowdsource server using the best result with annotated conflicts. Users would need to delete the extra lines and submit. However, we suspect users need a break, so at a minimum we would probably hold off a few months to regain momentum
Note we suspect additional fixes will be required upon eventual manual review, whichever path is taken. Generally the first option seems like the best. A few dedicated users could knock this out fairly quickly without too much coordination. If we get a few volunteers (or one very dedicated volunteer), we'll figure out where to push this and move the project forward. Ideally one of these people would also be interested in coordinating other community contributions.

So we're asking if people are interested in the first option and we'll likely default to the second if we don't get traction. Please let us known here in the comments or on Twitter!

Update 2020-04-20


Higher quality .pngs have been swapped in after reports that compression is swapping letters (!). Special thanks to Video Game Preservation Collective for the above image! The old set was from the text annotated version while the new set is believed to be the original scan. Unfortunately these images are about 5x larger, but should improve accuracy.

Also now we've done a very crude analysis of the existing submits and used them to make a quick guess at better default text to present. This effects about 85% of entries. So going forward you'll typically get higher quality defaults. But please still be attentive and look for errors!

There have also been a few backend tweaks, notably favoring showing pages with fewer submissions. However these generally should not be visible externally.

Update 2020-04-19

We're up to 197 submissions! Thanks to all of you that have posted so far! We need to meet a minimum of 297, so we're making great progress. Our goal is to get 3 submissions to help correct errors, for a total of 891.

We will briefly bring down the site for maintenance at 2020-04-21 6:00 AM. We will use this window to improve the default text based on submissions so far. This should make challenges much easier as mostly you'll only need to do small corrections instead of large edits. We will also fix the overall progress indicator, which currently says 1485 required, but it should be 891.

Once again, thanks for your help and please let us know if you have any feedback!

Micro update: the progress indicator fix has been pushed out (it was not necessary to bring the server down)

Background

The Furby is an iconic talking toy from the late 90s. A couple of years ago scans of the original Furby source code were acquired. Unfortunately the scans are noisy and automatic image to text conversion is difficult. So we're asking the community to help preserve game history by proofreading computer generated transcripts. Generating a proper copy of the Furby source code will be enormously valuable to understanding how it works!

Project TLDR:
  • Complete using your web browser
  • You need a large screen (laptop or desktop)
  • Scanned image at left, noisy text interpretation at right
  • Fix errors in the image to text translation and submit
  • Remove headers and footers (ex: "Page 6", "A-121", "Diag7.asm" ) 
  • Unreadable: put best guess if possible, or random characters as last resort (will flag for review)

Although the crowdsourcing system wasn't a good fit for Great Swordsman, it spurred some conversations on what it could be used for. It has been revived and adapted to work on improving pdf image to text conversion.

Join the effort by signing up for an account! If you had an account on the previous TGP project, it likely is still available. Additional instructions are available after creating an account. If you have some time, please try a few images!

Finally, the person who gets the most pages accepted (ie with acceptable accuracy) will get early blog access for 3 months! Note however you must provide your e-mail address to qualify so that we can actually send it to you.

Sounds good? Sign up here! Instructions are available after logging in.

Note: due to various issues we are unable to split the pages into smaller tasks. So the images are relatively large and this is best completed on systems with a large screen such as a laptop or a desktop. So apologies if you only have mobile, but you may not be able to help with this specific project.

Special thanks to Andrew Gardner for writing the original tool and John McMaster for recent modifications!

FAQ

We'd also love if you have suggestions for improving the work flow. These are things already on our mind:


Q: What happened after the last crowd sourcing project? (Fujitsu DSPs / TGPs)

A: Post processing took a while, but it ultimately led to massive improvements on how well the community understands these games. However we've been doing a poor job at communicating those results and still need to write a post about it. See for example this MAME post which mentions recovering "...the Sega Model 1 coprocessor TGP programs for Star Wars Arcade and Wing War, making these games fully playable."


Q: Can you make the challenges smaller?

A: Not easily. The pages aren't well aligned, we'd need to both figure out correct straightening and cropping


Q: Can you align the text editor to the images better? Maybe rich text features like find and replace?

A: While the chip community can unlock the secrets of the micro universe, we can't code websites for beans. Really it's a miracle that the site is running at all. If you can help with improving text entry, please reach out! FYI its written in Python/Django and could use some cleanup. If you haven't been scared off, more info is here



Q: What happens after its captured?

A: First we'll post process to remove errors. After that we'll use the CPU manual to make a special 6502 assembler to create a binary. Ideally we'll also combine this with the Furby 70-800 ROM microscope images (sample above) at some point.


Q: Where did the source come from?

A: Not sure exactly, but some information is available at the Internet Archive


Q: Can I edit my result after submission?

A: It is not possible to modify it at this time. But don't worry, most of the time we can detect errors by combining a few results.


Q: Can you reset my password?

A: Yes, but it requires manual admin intervention. We suggest creating a new account if you aren't really tied to your old one


Q: Isn't that Furby image for the Furby 2012, not the original Furby?

A: Maybe... Actually we have a 70-800 image now

Prologue

More questions? Type them below, or reach out to us on Twitter. Thanks again for your help!

Tuesday, April 14, 2020

You are great swordsman!


Great Swordsman (not to be confused with Hiro Protagonist) is a Taito arcade game where you engage in various styles of sword play ranging from fencing to samurai combat.


The game firmware is comprised of Z80 EPROMs, AA-013, AA-016, and AA-017. The EPROM is easy as Z80 architecture is well understood and EPROMs are trivial to extract. However, little was known about the last three. Collectively though they handle things like getting player inputs, reading DIP switches, and tracking coins.



Previous decapping showed that AA-013 is an Intel D8741A.



Unfortunately it was received with severe damage which discouraged us from looking at it.



We then decapped AA-016 (#8) and AA-017 (#9) which are both NEC D8041AH. Fortunately neither NEC D8041AH nor Intel 8741A have protection schemes, so in theory we can simply read the data out. Unfortunately we were unable to activate the test interface. After some analysis we suspected that the algorithm we tried to dump them with (as 8741 IIRC) might have over-voltaged EA and damaged them. More on that later.


Unfortunately the EPROM based 8741A is difficult to read as is. But D8041AH are contact ROMs which traditionally we've been reasonably successful with (example). So we attempted to visually read them but got a lot of errors. It was hard to read the bits and attempting to disassemble them resulted in something only vaguely reassembling a valid program.


So due to the combination of noisy bits and severely damaged chips the project essentially got shelved some time ago. However somewhat recently we got another chip set and a little later there was a forum post asking about the state of the project. In general lockdown and with a little more time right now, this prompted us to take a second look. These acquisitions ultimately gave us 3 ROM sets to work with: the original STARRIDER set via Guru (8/9/10), a set from STARRIDER via Smitdogg (C030/C031/C032), and a set that was separately acquired.

With these extra sets, the first priority was to analyze the test interface and assess if it was healthy. We used small test currents to characterize the ESD diodes on sample chips and compared them to 8741A and 8041AH chips from Great Swordsman. This showed the chips from Great Swordsman consistently have different responses on EA pins vs samples, indicating this pin was likely intentionally damaged to prevent read out.


This may have been a common practice at one time as commercial systems from companies like RunFei have a "special protect" option that does exactly this. We've also seen it on other chips like the NEC D8748D EA pin shown above

So a few options. One is that we may be able to repair or bypass the blown pad. Repair would be easier if we had FIB access but this isn't easily available. We could bypass it but there were misc complications at the time and this wasn't seriously considered. We do however plan on attempting this for AA-013.

That said we figured there was a chance that the test interface *might* still work even if it was damaged. To our surprise we managed to get a plausible dump out of one of the new AA-016s! The interface only worked once or twice and then rapidly deteriorated. Unfortunately due to the test interface instability and some disassembly errors we weren't confident we had a good dump. Finally it didn't remotely match our earlier attempts to decode the mask ROM into binaries. This gave us low confidence that the EPROM dump was correct.

So anyway we at least had an answer: the test interface is not reliable and probably wont't yield anything more. So we decided to revisit brute force ROM capture by photographing bits. How could we improve the accuracy? Let's say the existing capture has about 100 bad bits out of 8192 => about 1% error rate. This means that if you took two of these captures, the expected number of bad bits is about 8192 * 0.01 * 0.01 = 0.8. So while it might not be perfect (say a few bit errors might be expected), it would drastically improve the accuracy to something usable.



With this in mind, few weeks ago we decapped the second ROM set as C031 (AA-016) and C032 (AA-017). And for one reason or another the contrast was considerably better!


We then asked the community to help convert these images into bits. This was broadcast here on this blog, on twitter, and on mameworld. We suggested using rompar, a specialized tool for this task, although in general it wasn't easy enough for people to setup. There is an open ticket about easier Windows support which the rompar team has been working on addressing.


That said, we got a combination of submissions in rompar, typed as .txt files, or even as colorful spreadsheets (AA-017 above, other images are AA-016).

One lesson learned is that we should have aligned all of the image sets (or at least C031 and C032). This would have made some of the post processing easier as sometimes we were trying to resolve bits by comparing several different image sets.


Anyway, once we got around 3 submits for each set we did a cursory inspection on each set to gauge the submission quality. If the submission is reasonable (say 99%+ accurate), we then add it to the submission pool. Then all of the locations in the pool that didn't fully agree with the entire ROM pool are flagged for review and displayed in rompar. After reviewing these we got ROMs that we think are probably within a few bits of being correct.

D8041AH datasheet

But unfortunately we have a problem: the ROMs still don't disassemble well. So next we read up a bit on MCS-48 architecture and learned that the interrupt vectors are at the start of the chip: 0, 3, and 7. Usually these are comprised of either a jump (typically LJMP, 0xX4 0xXX, or RET 0x83). Here's the start of a sample keyboard BIOS ROM:

00000000 04 08 00 83 00 00 00 83 15 23 f0 90 85 95 22 14 |………#….“.|

Here you can at 0x0000 (reset) there's JMP 0x008 which skips over the reset of the vector table. Similarly there's RET on the other vectors to basically ignore them.

With that in mind, here's the start of our old AA-016 microscope based submit:

00000000  40 d9 96 a9 fa 03 1f aa  e8 a8 04 13 04 d8 04 e0  |@...............|

Hmm there are some 4's in there, but doesn't really look valid. For comparison though, here is the AA-016 EPROM submit:

00000000  04 08 00 83 00 00 00 83  15 23 f0 90 85 95 22 14  |.........#....".|

Aha! This looks much better. So we started thinking: maybe the ROM decoding script doesn't really work? It is producing mostly valid disassembly, but maybe we missed something? The scheme was relatively complicated and its entirely possible something was missed.


So after some munging, we came up with a new physical address space layout. Now AA-016 starts with:

00000000  04 08 00 83 00 00 00 83  15 23 f0 90 85 95 22 14  |.........#....".|

Aha! Now this matches the EPROM dump. In fact we verified against the original EPROM dump and decided it is 100% accurate.

But there's still one more problem: if the EPROM dump is good, why didn't it disassemble properly? Why did we get told the submitted dump was unusable? First, the unusable dump was probably someone talking about the earlier AA-016 dump vs the newer EPROM dump. Second, although we tried several ways to disassemble the dumps (notably MAME,  Ghidra, but also some others), they generally were biased towards MCS-48 (classic 8048) and not some of the finer points of UPI-41, the family D8041AH is from. One source described it as “The 8042 and 8041 is code compatible with the 8048, except that there are no external program memory instructions, and that data bus register instructions have been added.” For example, Ghidra 8048 gave:

CODE:0008 15              DIS        I
CODE:0009 23 f0           MOV        A,#0xf0
CODE:000b 90              MOVX       @R0,A
CODE:000c 85              CLR        F0
CODE:000d 95              CPL        F0
CODE:000e 22              ??         22h    "


MAME mcs48 gave:

unidasm -arch mcs48 great_swordsman_aa-016_d8041ah_decap-c031.bin
...
0:008: 15     dis  i
0:009: 23 f0  mov  a,#$F0
0:00b: 90     movx @r0,a
0:00c: 85     clr  f0
0:00d: 95     sel  an1
0:00e: 22     illegal


But really should have been upi41:

unidasm -arch upi41 great_swordsman_aa-016_d8041ah_decap-c031.bin
...
008: 15     dis  i
009: 23 f0  mov  a,#$F0
00b: 90     mov  sts,a
00c: 85     clr  f0
00d: 95     sel  an1
00e: 22     in   a,dbb

Which looks good!

So to summarize, the hurdles were:

  • Intentionally damaged test interface
  • Possibly unintentionally damaged test interface
  • Noisy microscope images
  • Not using the right disassembler
  • Getting people to look at the data
  • Incorrect address decoding

Finally, there were a lot of people that helped with this project. Some of them include:

  • Our Patreon contributors
  • STARRIDER: chips, ROM capture
  • rompar team (John McMaster et al): software support
  • EdHunter: ROM layout decoding
  • Guru: logistics
  • Smitdogg: logistics
  • f205v: ROM capture
  • sadikyo: ROM capture
  • belegdol: ROM capture

Enjoy this post? Please support us on Patreon or follow us on TwitterNote: with the Indiegogo campaign over we unfortunately don't currently have a way to accept one time donations.

Thursday, April 2, 2020

Help us preserve Great Swordsman!

UPDATE 2020-04-06: we tentatively have enough submissions to decode the ROMs, assuming a few people we know are working on them finish. Thanks to all of those that have submitted and we'll try to post an update in the near future!

Arcade Game: Great Swordsman (1984 Taito) - YouTube

Previously we decapped a few NEC D8041AH MCUs from Great Swordsman in order to better document the game. Unfortunately the images were a little hard to read. However we recently decapped a new AA-016 (C031) and a new AA-017 (C032) and the contrast is much better! Specific cause hasn't been investigated.


Anyway, we are looking for help digitizing the firmware microscope images into bits. This can be done either by manually typing out all 8192 bits or using the rompar utility (preferred)

If you're interested, here is the raw data:

  • AA-016
    • Suggested: nec_8041ah_gswm_aa-016_decap-c031_xpol.jpg
  • AA-017:
    • Suggested: nec_8041ah_gswm_aa-017_decap-c032_xpol.jpg

Specifically:
  • We are especially looking for help with AA-016
  • Multiple people submitting improves accuracy
  • There are some stitching artifacts. If they get in the way of digitizing we can revisit stitching
  • If applicable, please provide rompar project file
  • We will take care of post processing into binary
  • rompar_decap-8_rom_mit20x_xpol is provided as a reference project. Note the image contrast wasn't great, so there were a lot of errors
  • By convention, brighter bits are generally typed as "1" and dark as "0". But we can accept either
  • Advanced rompar users: you can use the reference project as a template, but you'll need to re-align the images. We did this to produce the above rompar image while checking AA-017 results
Please let us know if you have any questions!

Update 2020-04-04


We are starting to process submissions. Thanks to everyone who has submitted so far!

It seems there's some confusion as to where the ROM starts and ends. The above image shows the first 4 rows. When this is exported from rompar it looks like this:

11101011...
01001000...

01101011...
11001101...
...

This is because the the rows are designed in pairs, but the paired bits have some space between them. That is, the bits that actually are adjacent are not from the same pair. This has caused some people to skip the first row:


And give it as:

01001000...
01101011...

11001101...

So with that in mind, we suggest you type it up closer to this if you want to preserve a rough visual layout:

11101011...

01001000...
01101011...

11001101...
...

This won't match rompar output, but this doesn't effect post processing. Hope that helps clarify!

Prologue

Enjoy this post? Please support us on Patreon or follow us on TwitterNote: with the Indiegogo campaign over we unfortunately don't currently have a way to accept one time donations.