Post by Richard DamonPost by Don YPost by Don YPost by Don Y- reduce the overall frame rate such that N cameras can
be serviced by the USB (or whatever) interface *and*
the processing load
- reduce the resolution of the cameras (a special case of the above)
- reduce the number of cameras "per processor" (again, above)
- design a "camera memory" (frame grabber) that I can install
multiply on a single host
- develop distributed algorithms to allow more bandwidth to
effectively be applied
The fact that you are starting for the concept of using "USB Cameras" sort
of starts you with that sort of limit.
My personal thought on your problem is you want to put a "cheap" processor
right on each camera using a processor with a direct camera interface to
pull in the image and do your processing and send the results over some
comm-link to the center core.
If I went the frame-grabber approach, that would be how I would address the
hardware. But, it doesn't scale well. I.e., at what point do you throw in
the towel and say there are too many concurrent images in the scene to
pile them all onto a single "host" processor?
Thats why I didn't suggest that method. I was suggesting each camera has its
own tightly coupled processor that handles the need of THAT
My existing "module" handles a single USB camera (with a fairly heavy-weight
But, being USB-based, there is no way to look at *part* of an image.
And, I have to pay a relatively high cost (capturing the entire
image from the serial stream) to look at *any* part of it.
Yep, having chosen USB as your interface, you have limited yourself.
Doesn't matter. Any serial interface poses the same problem;
I can't examine the image until I can *look* at it.
Post by Richard DamonSince you say you have a fairly heavy-weight processor, that frame grab likely
isn't you limiting factor.
It becomes an issue when the number of cameras increases
significantly on a single host. I have one scene that requires
11 cameras to capture, completely.
Post by Richard DamonPost by Don Y*If* a "camera memory" was available, I would site N of these
in the (64b) address space of the host and let the host pick
and choose which parts of which images it wanted to examine...
without worrying about all of the bandwidth that would have been
consumed deserializing those N images into that memory (which is
a continuous process)
But such a camera would almost certainly be designed for the processor to be on
the same board as the camera, (or be VERY slow in access), so much less apt
allow you to add multiple cameras to one processor.
Yes. But, if the module is small, then siting the assembly "someplace
convenient" isn't a big issue. I.e., my modules are smaller than most
Post by Richard DamonPost by Don YPost by Don YISTM that the better solution is to develop algorithms that can
process portions of the scene, concurrently, on different "hosts".
Then, coordinate these "partial results" to form the desired result.
I already have a "camera module" (host+USB camera) that has adequate
processing power to handle a "single camera scene". But, these all
assume the scene can be easily defined to fit in that camera's field
of view. E.g., point a camera across the path of a garage door and have
it "notice" any deviation from the "unobstructed" image.
And if one camera can't fit the full scene, you use two cameras, each with
there own processor, and they each process their own image.
That's the above approach, but...
The only problem is if your image processing algoritm need to compare parts
of the images between the two cameras, which seems unlikely.
Consider watching a single room (e.g., a lobby at a business) and
tracking the movements of "visitors". It's unlikely that an individual's
movements would always be constrained to a single camera field. There will
be times when he/she is "half-in" a field (and possibly NOT in the other,
HALF in the other or ENTIRELY in the other). You can't ignore cases where
the entire object (or, your notion of what that object's characteristics
might be) is not entirely in the field as that leaves a vulnerability.
Sounds like you aren't overlapping your cameras enough or have insufficent
coverage. Maybe your problem is wrong field of view for your lens. Maybe you
need fewer but better cameras with wider fields of view.
Distance from camera to target means you have to play games with optics
that can distort images.
I also can't rely on "professional installers" *or* for the cameras to remain
aimed in their original configurations.
Post by Richard DamonThis might be due to try to use "stock" inexpensive USB cameras.
Post by Don YFor example, I watch our garage door with *four* cameras. A camera is
positioned on each side ("door jam"?) of the door "looking at" the other
camera. This because a camera can't likely see the full height of the door
opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side"
and I'll watch *its* side!).
Right, and if ANY see a problem, you stop. So no need for inter-camera
But you don't know there is a problem until you can identify *where*
the obstruction exists and if that poses a problem for the vehicle
or the "obstructing item". Doing so requires knowing what the
object likely is.
E.g., SWMBO frequently stands in the doorway as I pull the car in or
out (not enough room between vehicles *in* the garage to allow for
ease of entry/egress). I'd not want this to be flagged as a
problem (signalling an alert in the vehicle).
Likewise, an obstruction on one vehicle-side of the garage shouldn't
interfere with access to the other side.
Post by Richard DamonPost by Don Y[The other two cameras are similarly positioned on the overhead *track*
onto which the door rolls, when open]
An object in (or near) the doorway can be visible in one (either) or
both cameras, depending on where it is located. Additionally, one of
those manifestations may be only "partial" as regards to where it is
located and intersects the cameras' fields of view.
But since you aren't trying to ID, only Detect, there still isn't a need for
camera-camera processing, just camera-door controller
The cameras need to coordinate to resolve the location of the object.
A "toy wagon" would present differently, visually, than a tall person.
Post by Richard DamonPost by Don YPost by Don YWhen the scene gets too large to represent in enough detail in a single
camera's field of view, then there needs to be a way to coordinate
multiple cameras to a single (virtual?) host. If those cameras were just
"chunks of memory", then the *imagery* would be easy to examine in a single
host -- though the processing power *might* need to increase geometrically
(depending on your current goal)
Yes, but your "chunks of memory" model just doesn't exist as a viable camera
Apparently not -- in the COTS sense. But, that doesn't mean I can't
build a "camera memory emulator".
The downside is that this increases the cost of the "actual camera"
(see my above comment wrt ammortization).
Yep, implementing this likely costs more than giving the camera a dedicated
moderate processor to do the major work. Might not handle the actual ID problem
of your Door bell, but could likely process the live video, take a snapshot of
a region with a good view of the vistor coming, and send just that to your
master system for ID.
But, then I could just use one of my existing "modules". If the
target fits entirely within its field of view, then it has everything
that it needs for the assigned functionality. If not, then it
needs to consult with other cameras.
Post by Richard DamonPost by Don YThe CMOS cameras with addressable pixels have "access times" significantly
lower than your typical memory (and is read once) so doesn't really meet
that model. Some of them do allow for sending multiple small regions of
intererst and down loading just those regions, but this then starts to
require moderate processor overhead to be loading all these regions and
updating the grabber to put them where you want.
You would, instead, let the "camera memory emulator" capture the entire
image from the camera and place the entire image in a contiguous
region of memory (from the perspective of the host). The cost of capturing
the portions that are not used is hidden *in* the cost of the "emulator".
Yep, you could build you system with a two-port memory buffer between the frane
grabber loading with one port, and the decoding processor on the other.
Yes. But large *true* dual-port memories are costly. Instead, you would
emulate such a device either by time-division multiplexing a single
physical memory *or* sharing alternate memories (fill one, view the other).
Post by Richard DamonThe most cost effective way to do this is likely a commercial frame-grabber
with built "two-port" memory, that sits in a slot of a PC type computer. These
would likely not work with a "USB Camera" (why would you need a frame grabber
with a camera that has it built in) so would be totally changing your cost models.
Yes, I have a few of these intended for medical imaging apps.
Way too big; way too expensive. Designed for the wrong type of "host"
Post by Richard DamonIF your current design method is based on using USB cameras, trying to do a
full custom interface may be out of your field of operation.
Post by Don YAnd yes, it does mean that there might be some cases where you need a core
module that has TWO cameras connected to a single processor, either to get a
wider field of view, or to combine two different types of camera (maybe a
high res black and white to a low res color if you need just minor color
information, or combine a visible camera to a thermal camera). These just
become another tool in your tool box.
I *think* (uncharted territory) that the better investment is to develop
algorithms that let me distribute the processing among multiple
(single) "camera modules/nodes". How would your "two camera" exemplar
address an application requiring *three* cameras? etc.
The first question comes, what processing are you thinking of that needs images
from 3 cameras.
Note, my two camera example was a case where the processing needed to be done
did need data from two cameras.
If you have another task that needs a different camera, you just build a system
with one two camera model and one 1 camera module, relaying back to a central
control, or you nominate one of the modules to be central control if the load
there is light enough.
Your garage doer example would be built from 4 seperate and independent 1
camera modules, either going to one as the master, or to a 5th module acting as
the master.
Yes, but they have to share image data (either raw or abstracted)
to make deductions about the targets present.
Post by Richard Damon1) a system stiching images from 3 cameras and generating a single image out of
it, but that totally breaks your concept of needing only bits of the images,
that inherently is using most of each camera, and doing some stiching
processing on the overlaps.
2) A Multi-spectrum system, where again, you are taking the ENTIRE scene from
the three cameras and producing a merged "false-color" image from them. Again,
this also breaks you partial image model.
Or, tracking multiple actors in an "arena" -- visitors in a business,
occupants in a home, etc. In much the same way that the two garage
door cameras conspire to locate the obstruction's position along the
line from left doorjam to right, pairs of cameras can resolve
a target in an arena and *sets* of cameras (freely paired, as needed)
can track all locations (and targets) in the arena.
Post by Richard DamonPost by Don YI can, currently, distribute this processing by treating the
region of memory into which a (local) camera's imagery is
deserialized as a "memory object" and then exporting *access*
to that object to other similar "camera modules/nodes".
But, the access times of non-local memory are horrendous, given
that the contents are ephemeral (if accesses could be *cached*
on each host needing them, then these costs diminish).
So, I need to come up with algorithms that let me export abstractions
instead of raw data.
Sounds like you current design is very centralized. This limits its scalability,
The current design is completely distributed. The only "shared component"
is the network switch through which they converse and the RDBMS that acts
as the persistent store.
If a site realizes that it needs additional coverage to track <whatever>
it just adds another camera module and lets the RDBMS know about it's general
location/functionality (i.e., how it can relate to any other cameras
covering the same arena)
Post by Richard DamonPost by Don YPost by Don YMy first feeling is you seem to be assuming a fairly cheep camera and then
doing some fairly simple processing over the partial image, in which case
you might even be able to live with a camera that uses a crude SPI
interface to bring the frame in, and a very simple processor.
I use A LOT of cameras. But, I should be able to swap the camera
(upgrade/downgrade) and still rely on the same *local* compute engine.
E.g., some of my cameras have Ir illuminators; it's not important
in others; some are PTZ; others fixed.
Doesn't sound reasonable. If you downgrade a camera, you can't count on it
being able to meet the same requirements, or you over speced the initial camera.
Sorry, I was using up/down relative to "nominal camera", not "specific camera
previously selected for application". I'd 8really* like to just have a
single "camera module" (module = CPU+I/O) instead of one for camera type A
and another for camera type B, etc.
That only works if you are willing to spend for the sports car, even if you
just need it to go around the block.
If the "extra" bits of the sports car can be used by other elements,
then those costs aren't directly borne by the camera module, itself.
E.g., when the garage door is closed, there's no reason the modules
in the garage can't be busy training speech models or removing
commercials from recorded broadcast content.
If, OTOH, you detect objects with a photo-interrupter across the door's
path, there's scant little it can do when not needed.
Post by Richard DamonIt depends a bit on how much span you need of capability. A $10 camera is
likely having a very different interface to a $30,000 camera, so will need a
different board. Some boards might handle multiple camera interface types if it
doesn't add a lot to the board, but you are apt to find that you need to make
some choice.
I don't ever see a need for a $30,000 camera. There may be a need for a
PTZ model. Or, a low lux model. Or, one with longer focal length. Or,
shorter (I'd considered putting one *in* the mailbox to examine its
contents instead of just detecting that it had been "visited").
Instead of a 4K device, I'd opt for multiple simpler devices better
But, not radically different in terms of cost, size, etc.
If you walk into a bank lobby, you don't see *one* super-high resolution,
wide field camera surveilling the lobby but, rather half a dozen or more
watching specific portions of the lobby. Similarly, if you use the
self-check at the store, there is a camera per checkout station instead
of one "really good" camera located centrally trying to take it all in.
This gives installers more leeway in terms of how they cover an arena.
Post by Richard DamonThen some tasks will just need a lot more computer power than others. Yes, you
can just put too much computer power on the simple tasks, (and that might make
sense to early design the higher end processor), but ultimately you are going
to want the less expensive lower end processors.
I can call on surplus processing power from other nodes in the system
in much the same way that they can call on surplus capabilities from
a camera module that isn't "seeing" anything interesting, at the moment.
There will always be limits on what can be done; I'm not going
to be able to VISUALLY verify that you have the right wrench in
your hand as you set about working on the car. Or, that you
are holding an eating utensil instead of a random piece of
plastic as you traverse the kitchen.
But, I'll know YOU are in the kitchen and likely the person whose
voice I hear (to further reinforce the speaker identification
Post by Richard DamonPost by Don YYou put on a camera a processor capable of handling the tasks you expect out
of that set of hardware. One type of processor likely can handle a variaty
of different camera setup with
Exactly. If a particular instance has an Ir illuminator, then you include
controls for that in *the* "camera module". If another instance doesn't have
this ability, then those controls go unused.
Yes, Auxilary functionality is often cheap to include the hooks for.
But, it often requires looking at your TOTAL needs instead of designing
for specific (initial) needs. E.g., my camera modules now include
audio capabilities as there are instances where I want an audio
pickup in the same arena that I am monitoring. Silly to have to add
an "audio module" just because I didn't have the foresight to
include it with the camera!
Post by Richard DamonPost by Don YPost by Don YWatching for an obstruction in the path of a garage door (open/close)
has different requirements than trying to recognize a visitor at the front
door. Or, identify the locations of the occupants of a facility.
Yes, so you don't want to "Pay" for the capability to recognize a visitor in
your garage door sensor, so you use different levels of sensor/processor.
Exactly. But, the algorithms that do the scene analysis can be the same;
you just parameterize the image and the objects within it that you seek.
Actually, "Tracking" can be a very different type of algorithm then
"Detecting". You might be able to use a Tracking base algorithm to Detect, but
likely a much simpler algorithm can be used (needing less resources) to just
My current detection algorithm (e.g., garage) just looks for deltas between
"clear" and "obstructed" imagery, conditioned by masks. There is some
image processing required as things look different at night vs. day, etc.
I don't have to "get it right". All I have to do is demonstrate "proof of
concept". And, be able to indicate why a particular approach is superior
to others/existing ones.
E.g., if you drive a "pickup-on-steroids", you'd need to locate a
photointerrupter "obstruction detector" pretty high up off the ground
to catch the case where the truck bed was in the way of the door.
Or, some lumber overhanging the end of the bed that you forgot you'd
brought home! And, you'd likely need *another* detector down low
to catch toddlers or toy wagons in the path of the door.
OTOH, doing the detection with a camera catches these use conditions
in addition to the "nominal" one for which the photointerrupter was
Tracking two/four occupants of a home *suggests* that you can track
6 or 8. Or, dozens of employees in a business conference room, etc.
I have no desire to spend my time perfecting any of these
technologies (I have other goals); just lay the groundwork and the
framework to make them possible.
Post by Richard DamonPost by Don YThere will likely be some combinations that exceed the capabilities of
the hardware to process in real-time. So, you fall back to lower
frame rates or let the algorithms drop targets ("You watch Bob, I'll
watch Tom!")