Flash testing learning

Flash Test Learning#

References:
[1] Yang Chao, Zhang Jinfeng, Ma Chengying. Discussion on NAND FLASH Test Design and Use[J]. Electronic World, 2018, No.551(17):116-118. DOI:10.19353/j.cnki.dzsj.2018.17.063.

NAND FLASH is a non-volatile memory, and its basic storage unit is a block, which is composed of several pages. Among them, the minimum read/write unit is a page, and the minimum erase unit is a block. Before programming a page, the block where the page is located needs to be erased. NAND FLASH has a small number of bad blocks when it leaves the factory, and the manufacturer will mark the bad blocks to facilitate user identification during use. Similarly, new bad blocks will be generated during testing and use. The reliability organization of the component should identify and rewrite the bad blocks, and the user should manage the bad blocks to skip them, thereby avoiding data loss. The main suppliers of NAND FLASH are SAMSUNG and MICRON, and their internal structures are similar. They are divided into 16-bit and 8-bit data formats, and the address lines, data lines, and command lines are time-multiplexed. They work in a serial manner and require specific control commands to perform corresponding operations. The MT-29F64G08AJABAWP-IT selected in this article has 16,384 blocks, each block consists of 128 pages, each page contains (4096+224) bytes, including 224 redundant bytes for storing configuration information; a total of 64GB of storage space, with data storage format of 8 bits; the device in TSOP package has 48 pins, including 8 multiplexed IO ports, as well as some control pins and power pins, with the remaining pins left floating.

During testing, first read the ID of the chip and compare it with the technical manual. If they match, continue testing; if they do not match, the chip is not qualified and the testing should be stopped. NAND FLASH has a small number of bad blocks when it leaves the factory, and the manufacturer will mark the bad blocks by writing 00 to the first redundant byte of the first page or the second page. During testing, it is necessary to determine the value of this byte for each block. If it is 00, it means that the block was marked as a bad block when it left the factory, and the bad block counter is incremented, then move on to the next block for testing. At this time, it is important not to erase the bad blocks marked by the manufacturer, because the erase operation will clear the manufacturer's bad block identification. If it is FF, it means that the block was a good block when it left the factory and needs to be tested.

During the testing process, each block of the NAND FLASH storage array is verified. The testing process includes block-level read, write, and erase operations. If any unmarked bad blocks are found, the testing system stores the addresses of the bad blocks. After completing the testing of the entire chip, the number of bad blocks is counted. If the number of bad blocks meets the requirements of the technical manual, it indicates that the chip is qualified. If there are unmarked bad blocks, the information of the bad blocks is rewritten. The rewriting method is the same as the factory marking method, that is, writing 00 to the first redundant byte of the first page or the second page of the block, to facilitate unified identification by users.

It should be noted that in the above figure, in order to avoid complexity, only the read, write, and erase operations of one type of test pattern algorithm are shown. In actual testing process, it is necessary to consider the fault coverage and time complexity to determine which test algorithm to use. If necessary, multiple test algorithms need to be combined for testing. Commonly used test pattern algorithms include two categories: test algorithms with a time complexity of N, such as all 0, all 1, random number, checkerboard, anti-checkerboard, diagonal, etc.; test algorithms with a time complexity of N², such as stepping, walking, jumping, etc. Different test algorithms can detect specific faults, and test algorithms with higher time complexity have relatively higher fault coverage.

Since the selected chip operates in serial mode, with an 8-bit data port addressing 64GB of storage space, using a test algorithm with a time complexity of N to perform full-chip write requires an estimated time of about 400s. Therefore, even if a test algorithm with a time complexity of N² has a higher fault coverage, it is not suitable for mass production reliability organizations. Finally, it is decided to use four N-type test algorithms, including all 0, random number, checkerboard, and anti-checkerboard, to test the chip. These algorithms can cover common memory faults such as fixed 0, fixed 1, address decoding faults, short circuits, open circuits, etc. Within the allowable testing time range, the fault coverage is maximized, achieving good testing results and effectively eliminating defective chips.