Sunday, 14 February 2016

signal analysis - How to remove text region in a document image? or extract image from a document image


Given a document image (i.e. newspaper), how to extract photos in it or remove text region?


I think traditional OCR methods may not be suitable here, as I don't need to recognize the text, and OCR is not accurate and slow. I believe text region (i.e. text blocks) and image region should be distinguishable by some threshold based methods in image processing. Any suggestions or example codes in OpenCV will be appreciated. Thanks!


BTW, what if the background color is not white, or the background color of certain blocks are not white?


Example image:


enter image description here




No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...