Master of Science (MS)
With a selection of biomedical literature available for open access, a natural pairing seems to be the use of open source software to automatically analyze content, in particular, the content of gures. Considering the large number of possible tools and approaches, we choose to focus on the recognition of printed characters. As the problem of optical character recognition (OCR) under rea- sonable conditions is considered to be solved, and as open source software is fully capable of isolating the location of characters and identifying most of them accurately, we instead use OCR as an application area for the relatively recent development of compressive sampling, and in particular a fast implementation called compressive sensing matching pursuit (CoSaMP). Compressive sampling enables recovery of a signal from noisy measurements if certain rigorous mathe- matical conditions hold on previously measured samples, the mathematical con- ditions stating that measured samples must be essentially nearly perpendicular, orthogonal, to each other. For OCR, we investigate approximating such nearly orthogonal samples by selecting random curves, then using CoSaMP to deter- mine a sparse number of samples approximating character shapes. We compare the accuracy of three di erent methods of applying CoSaMP to the problem of matching a blurred character to one of a set of previously sampled characters. We show numerically that selecting random curves does not satisfy the strict mathematical conditions for compressive sampling theory to guarantee optimal solutions. However, character matching strategies using CoSaMP transformed characters can be developed whose accuracy is roughly comparable to a base- line comparison of blurred characters with original characters, suggesting that OCR is an example where the performance of compressive sampling methods declines gracefully as conditions are weakened on the sampling matrix.
Shao, David, "Open Source Analysis of Biomedical Figures" (2010). Master's Projects. 62.