Tuesday, 1 November 2016

python - Counting the number of groups of 1s in a boolean map of numpy.array


I am right now dealing with some image processing in Python via PIL (Python Image Library). My main aim is counting the number of colored cells in an immunohistochemistry image. I know that there are relevant programs, libraries, functions and tutorials about it, and I checked almost all of them. My major aim is writing the code manually from scratch, as much as possible. Hence I am trying to avoid using lots of exterior libraries and functions. I have written the most of the program. So here is what's going on step by step:


Program takes in the image file:example


And processes it for the red cells (basically, it turns off the RGB values below a certain threshold for red): enter image description here


And creates the boolean map of it, (gonna paste a part of it since it is big) which basically just puts 1 wherever it encounters with a red pixel in the processed second image above.


22222222222222222222222222222222222222222
20000000111111110000000000000000000000002
20000000111111110000000000000000000000002
20000000111111110000000000000000000000002
20000000011111100000000000000000001100002

20000000001111100000000000000000011111002
20000000000110000000000000000000011111002
20000000000000000000000000000000111111002
20000000000000000000000000000000111111102
20000000000000000000000000000001111111102
20000000000000000000000000000001111111102
20000000000000000000000000000000111111002
20000000000000000000000000000000010000002
20000000000000000000000000000000000000002
22222222222222222222222222222222222222222


I intentionally generated that frame like thing on the borders with 2s to help me with counting the number of groups of 1s in that boolean map.


My question to you guys is, how come I can efficiently count the number of cells (groups of 1s) in that kind of boolean map? I have found http://en.wikipedia.org/wiki/Connected-component_labeling which look extremely related and similar, but as far as I see, it is at the pixel level. Mine is at the boolean level. Just 1s and 0s.


Thanks a lot.



Answer



Something of a brute force approach, but done by inverting the problem to index over collections of pixels to find regions, instead of rasterizing over the array.


data = """\
000000011111111000000000000000000000000
000000011111111000000000000000000000000
000000011111111000000000000000000000000

000000001111110000000001000000000110000
000000000111110000000011000000001111100
000000000011100000000000100000011111100
000000000000000000000000000000011111100
000000000000000000000000000000011111110
000000000000000000000000000000111111110
000000000000000000000000000000111111110
000000000000000000000000000000011111100
000000000000000000000000000000001000000
000000000000000000000000000000000000000"""


from collections import namedtuple
Point = namedtuple('Point', 'x y')

def points_adjoin(p1, p2):
# to accept diagonal adjacency, use this form
#return -1 <= p1.x-p2.x <= 1 and -1 <= p1.y-p2.y <= 1
return (-1 <= p1.x-p2.x <= 1 and p1.y == p2.y or
p1.x == p2.x and -1 <= p1.y-p2.y <= 1)


def adjoins(pts, pt):
return any(points_adjoin(p,pt) for p in pts)

def locate_regions(datastring):
data = map(list, datastring.splitlines())
regions = []
datapts = [Point(x,y)
for y,row in enumerate(data)
for x,value in enumerate(row) if value=='1']
for dp in datapts:

# find all adjoining regions
adjregs = [r for r in regions if adjoins(r,dp)]
if adjregs:
adjregs[0].add(dp)
if len(adjregs) > 1:
# joining more than one reg, merge
regions[:] = [r for r in regions if r not in adjregs]
regions.append(reduce(set.union, adjregs))
else:
# not adjoining any, start a new region

regions.append(set([dp]))
return regions

def region_index(regs, p):
return next((i for i,reg in enumerate(regs) if p in reg), -1)

def print_regions(regs):
maxx = max(p.x for r in regs for p in r)
maxy = max(p.y for r in regs for p in r)
allregionpts = reduce(set.union, regs)

for y in range(-1,maxy+2):
line = []
for x in range(-1,maxx+2):
p = Point(x, y)
if p in allregionpts:
line.append(str(region_index(regs, p)))
else:
line.append('.')
print ''.join(line)
print



# test against data set
regs = locate_regions(data)
print len(regs)
print_regions(regs)

Prints:


4
........................................

........00000000........................
........00000000........................
........00000000........................
.........000000.........1.........33....
..........00000........11........33333..
...........000...........2......333333..
................................333333..
................................3333333.
...............................33333333.
...............................33333333.

................................333333..
.................................3......
........................................

No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...