DSP like MCUs, or MCU like DSPs?

Post by Rick C
I don't recall the TI designator, but they make some DSP parts that
have peripherals like MCUs. I know that some time back, ARM made a
push into DSP territory by adding some DSPish instructions to I
believe it was the CM3 devices, or maybe CM4.
Anyone here use these crossover devices? What sort of apps? Why did
you pick that device over others?

You are maybe thinking of the TMS320F family of DSP/MCU's from TI.
These have a traditional DSP-style processor core - 16-bit "char" (no
8-bit byte access at all), gruesome assembly where each instruction does
several different things in a single cycle, multiple memory buses for
simultaneous accesses, hardware support for cyclic buffers, FFT
twiddling, etc. It lets you make very efficient DSP-style algorithms
but is a pain for more microcontroller-style control code. The chips
have typical microcontroller-style peripherals such as timers, UARTs,
CAN controllers, etc.

So they are a hybrid. They are popular for high-temperature
electronics, as they are one of the few families of microcontrollers
that are available for 175 °C and above.

These days, true DSP's are much less common. On the one side, once
FPGA's started having multiplier blocks they could outcompete DSP's in
parallel and pipelined MAC-based algorithms, and have much more
flexibility for memory and operand organisation. On the other side,
microcontrollers and processors gained single-cycle MAC instructions and
SIMD instructions, giving them similar performance to DSP's for many
algorithms while being far easier to use in other situations. True
DSP's are now usually found only in very specialised systems, or so
deeply embedded that you never see their programmability (i.e., you buy
a "video converter" chip and don't care how its insides work).

The Cortex-M4 is basically a Cortex-M3 with DSP instructions added -
MACs in various formats, saturating arithmetic, and 8-bit and 16-bit
SIMD instructions (within 32-bit registers). They don't have all the
features of DSP's, but they have enough to make common DSP algorithms
quite efficient, and ARM provides optimised libraries. The latest
Cortex-M55 core has additional vector/SIMD instructions, but I don't
know if any microcontrollers are available yet.

As for anyone using them, I think you'll have a very hard job finding
anyone who does embedded development with microcontrollers that has
/not/ used Cortex-M4 devices. They are everywhere.

And as for why I pick a given device for a given project, it will depend
entirely on the project - as well as other projects I have done and
other projects other colleagues have done. There are thousands of
Cortex-M4 devices available, not including variations of memory sizes,
chip packages, or speeds. The common reasons are the same as for any
other type of chip - price, support, familiarity, peripherals, package, etc.

The biggest reason for any choice these days, however, is availability -
many designs start off by asking what microcontrollers our suppliers
have in stock with the given minimum requirements, because we rarely
have time to wait for 52 week lead times.

Grant Edwards

2022-12-22 15:54:16 UTC

Post by David Brown
You are maybe thinking of the TMS320F family of DSP/MCU's from TI.
These have a traditional DSP-style processor core - 16-bit "char"
(no 8-bit byte access at all), gruesome assembly where each
instruction does several different things in a single cycle,
multiple memory buses for simultaneous accesses, hardware support
for cyclic buffers, FFT twiddling, etc.

IIRC, branches were also delayed. The later 320's (C30/C40 and on)
were all 32-bit (in C: char, int, long int, float, double were all
"one byte" which contained 32-bits). And the floating point format
wasn't IEEE.

That combination made supporting byte-oriented serial protocols that
used IEEE FP extra fun.

The dev tools from TI were a but clunky, but worked OK and were
available for Solaris (including the in-circuit emulators).

But, compared to what else was available 20+ years ago, they were damn
fast (especially for the price).

--
Grant

David Brown

2022-12-22 20:56:19 UTC

IIRC, branches were also delayed.

If you say so - I don't remember. (Delayed branches are not uncommon in
processors designed for single-cycle instruction throughput - they are
also found in several RISC architectures.)

Post by Grant Edwards
The later 320's (C30/C40 and on)
were all 32-bit (in C: char, int, long int, float, double were all
"one byte" which contained 32-bits). And the floating point format
wasn't IEEE.

I did not know they were part of the TMS320F family, though I know Texas
Instruments made other DSP's with 32-bit "char".

Post by Grant Edwards
That combination made supporting byte-oriented serial protocols that
used IEEE FP extra fun.

I had enough fun with a byte-oriented UART protocol on a 16-bit TMS320
with very little ram (so little that I could not afford to waste it on
unpacked buffers). Combine that with a UART peripheral that didn't
actually work correctly (the "receive" flag was never set) and a
toolchain with plenty of "undocumented features" (and some barely
documented critical non-conformances). I did not pick the device for
any other projects.

Post by Grant Edwards
The dev tools from TI were a but clunky, but worked OK and were
available for Solaris (including the in-circuit emulators).
But, compared to what else was available 20+ years ago, they were damn
fast (especially for the price).

Grant Edwards

2022-12-23 01:40:11 UTC

IIRC, branches were also delayed.

If you say so - I don't remember. (Delayed branches are not uncommon in
processors designed for single-cycle instruction throughput - they are
also found in several RISC architectures.)

Post by Grant Edwards
The later 320's (C30/C40 and on) were all 32-bit (in C: char, int,
long int, float, double were all "one byte" which contained
32-bits). And the floating point format wasn't IEEE.

I did not know they were part of the TMS320F family, though I know Texas
Instruments made other DSP's with 32-bit "char".

Ah, I overlooked the "F" in your original post. I don't remember any F
parts. Interestingly the Wikipedia page on TMS320 doesn't mention the
F parts at all. I did find this page abouit the TMS320F28335, but it's
a 32-bit part also:

https://www.ti.com/product/TMS320F28335

--
Grant

David Brown

2022-12-23 08:38:00 UTC

IIRC, branches were also delayed.

If you say so - I don't remember. (Delayed branches are not uncommon in
processors designed for single-cycle instruction throughput - they are
also found in several RISC architectures.)

Post by Grant Edwards
The later 320's (C30/C40 and on) were all 32-bit (in C: char, int,
long int, float, double were all "one byte" which contained
32-bits). And the floating point format wasn't IEEE.

I did not know they were part of the TMS320F family, though I know Texas
Instruments made other DSP's with 32-bit "char".

I think the "F" part might just have meant "flash". It was long ago
when I used the part, so I suppose they have expanded to 32-bit since then.

Dimiter_Popoff

2022-12-22 17:45:54 UTC

Post by David Brown
...
The Cortex-M4 is basically a Cortex-M3 with DSP instructions added -
MACs in various formats, saturating arithmetic, and 8-bit and 16-bit
SIMD instructions (within 32-bit registers). ...
...

Just a word of caution for Rick re this portion.
Make sure that a 32 bit accumulator will be enough for what you are
doing; it can easily fall short in many cases. "Normal" DSPs have
40 or so bits for this reason; or, you can pick some processor with
64 bit FPU MAC ability, 32 bit FPU will fall a lot shorter even than
the 32 bit integer regs David is mentioning.
David said it all, I am just cautioning because this is the kind of
"oh shit" factor which comes at the end of the project (a friend once
told me of that "oh shit", you either say it at the beginning or at
the end :).

Rick C

2022-12-22 19:57:00 UTC

Post by Dimiter_Popoff

I'm not selecting a DSP part. I typically use FPGAs for what I do. Not because they are required for speed, but because they work well and have complete flexibility. I used a $10 FPGA in a product I designed in 2008 and have to refresh the design for a couple of parts that are not made anymore. The new design will still use an FPGA. If I need an MCU in the design, it will be a custom design in the FPGA. I have one I've been pushing around in my head that would have one CPU, pipelined to work like 8 CPUs. Interrupt response of 1 clock cycle and no need to save registers, because all context is switched with the interrupt. ~600 LUTs for 8 processors running at 20 MIPS each. Not bad.

I was just curious about what people have used for DSP applications, but in particular if anyone had used one of the "crossover" parts. So far, the answer has been "no".

--
Rick C.

+ Get 1,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209

Dimiter_Popoff

2022-12-22 20:24:59 UTC

Post by Dimiter_Popoff

I'm not selecting a DSP part. I typically use FPGAs for what I do. Not because they are required for speed, but because they work well and have complete flexibility. I used a $10 FPGA in a product I designed in 2008 and have to refresh the design for a couple of parts that are not made anymore. The new design will still use an FPGA. If I need an MCU in the design, it will be a custom design in the FPGA. I have one I've been pushing around in my head that would have one CPU, pipelined to work like 8 CPUs. Interrupt response of 1 clock cycle and no need to save registers, because all context is switched with the interrupt. ~600 LUTs for 8 processors running at 20 MIPS each. Not bad.
I was just curious about what people have used for DSP applications, but in particular if anyone had used one of the "crossover" parts. So far, the answer has been "no".

I have used a "real" DSP just once, 20+ years ago. The TI 5420,
I did our first DSP based MCA module back then.
The 5420 had two cores clocked at 100 MHz, some dual access RAM
(meaning an address can be accessed twice in one clock cycle) and
multiple serial ADC interfaces, *very* flexible ones, allowed me
to serially push an (almost) 10Msps 16 bit wide stream sequentially
using 3 of these (one had just 1/3 the seed I needed). A CPLD
was doing the serialization, the 3 streams were getting into the
DSP memory in a large FIFO, in the correct sequence, all this
could be just programmed into their serial interfaces.
Then one core had just one job, to detect an event and pass it
to the other core which would do the filtering etc. processing,
there was a nice FIFO connecting the two cores on chip.
A decade or so later I did the same - with some more sophistication
though - using a 400 MHz power architecture part with DDRAM,
single core. The sampling rate was half that of the former version
(had been somewhat overkill) and it was all done by the processor
using 64 bit FP for the filtering (2 cycles per MAC, was hard to
get at that but this is another story, it did work once I figured
out how to do it). And this uses up to half the CPU resources
under real load so it still manages to maintain the user interface,
support VNC over tcp/ip etc.
Like David said, with processors getting faster the need for a
"real" DSP goes down and down.
As for those other, mixed sort of TI DSP/MCU I have no experience,
never even needed to consider any of them.

======================================================
Dimiter Popoff, TGI http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/

David Brown

2022-12-22 21:03:24 UTC

Post by Rick C
I was just curious about what people have used for DSP applications,
but in particular if anyone had used one of the "crossover" parts.
So far, the answer has been "no".

I don't know exactly how you are defining a "crossover" part. But if it
is "a DSP with microcontroller features", then the answer so far is
"yes". Both Grant and I have used TMS320F parts - but I would not
choose to use one again if I could avoid it. (I can't answer for Grant
there.) I have also used a "DSP with microcontroller features" from
Freescale (from the MC56000 family, IIRC) - though I hadn't mentioned
that at all.

And if you mean "a microcontroller with DSP features", then as I said
almost everyone who works with embedded software has used Cortex-M4
devices. I have lost count of the number of different ones I have used
(plus Cortex-M7, ColdFire, and PPC based microcontrollers that had DSP
features).

So I don't quite see how you could have interpreted the posts as "no".

Rick C

2022-12-23 01:11:02 UTC

Post by Rick C
I was just curious about what people have used for DSP applications,
but in particular if anyone had used one of the "crossover" parts.
So far, the answer has been "no".

I don't know exactly how you are defining a "crossover" part.

Please read the first post in this thread for that.

Post by David Brown
But if it
is "a DSP with microcontroller features", then the answer so far is
"yes". Both Grant and I have used TMS320F parts - but I would not
choose to use one again if I could avoid it. (I can't answer for Grant
there.) I have also used a "DSP with microcontroller features" from
Freescale (from the MC56000 family, IIRC) - though I hadn't mentioned
that at all.
And if you mean "a microcontroller with DSP features", then as I said
almost everyone who works with embedded software has used Cortex-M4
devices. I have lost count of the number of different ones I have used
(plus Cortex-M7, ColdFire, and PPC based microcontrollers that had DSP
features).
So I don't quite see how you could have interpreted the posts as "no".

I was looking for some insight into their experiences with such devices for DSP work, and I'm counting both DSP like MCUs and MCU like DSPs. I don't see in your post that you talk about any particular experience, rather offer a 10,000 foot overview of the state of the market. Thanks for that, but this is not new to me. So your post was pretty much a "no", to me.

I guess I was not quite explicit enough in my initial post. I was asking about specific experiences where a crossover part was chosen for a project with a significant DSP content, which would have required a DSP chip, if these devices were not available.

I am fully aware that MCUs are getting faster and more capable, but that doesn't mean DSPs are not needed. It simply means they are used in other applications that require more horsepower. Sometimes, it's not even the horsepower, but the performance to power consumption ratio. There are application specific DSPs for hearing aids that run on very low power, much better than any MCU could do.

Years ago DSP split into two categories based on the cell phone market. The high performance devices needed their own power plants, but cranked out some serious MIPS/MFLOPS. The much smaller, lower power, fixed point devices gained in speed, without sucking all the juice from mobile batteries, while serving in hand sets. Now the hand sets have dedicated CPU chips with built in DSP sections for the front end processing of cell phones, rather than separate DSP chips.

There's no shortage of DSP cores in the world, we just don't see all of them because they are part of system chips.

--
Rick C.

-- Get 1,000 miles of free Supercharging
-- Tesla referral code - https://ts.la/richard11209

David Brown

2022-12-23 09:48:36 UTC

Post by Rick C
I was just curious about what people have used for DSP
applications, but in particular if anyone had used one of the
"crossover" parts. So far, the answer has been "no".

I don't know exactly how you are defining a "crossover" part.

Please read the first post in this thread for that.

I did. That's why I said I don't know exactly how you are defining your
personal meaning of "crossover part". But I see you've given more
information below, so maybe people can give you more helpful feedback
(or at least say that they don't have the relevant experience).

Post by David Brown
But if it is "a DSP with microcontroller features", then the answer
so far is "yes". Both Grant and I have used TMS320F parts - but I
would not choose to use one again if I could avoid it. (I can't
answer for Grant there.) I have also used a "DSP with
microcontroller features" from Freescale (from the MC56000 family,
IIRC) - though I hadn't mentioned that at all.
And if you mean "a microcontroller with DSP features", then as I
said almost everyone who works with embedded software has used
Cortex-M4 devices. I have lost count of the number of different
ones I have used (plus Cortex-M7, ColdFire, and PPC based
microcontrollers that had DSP features).
So I don't quite see how you could have interpreted the posts as "no".

Of course it is an overview. Do you want detailed information about
everything I have done for the past 15 years or so since Cortex-M
devices took over the embedded world?

I can give a bit more insight into my experience with the TI320F24x
device. That was over 20 years ago, and lots will have changed since
then. The device was horrible to use. The assembly was impenetrable,
and extremely complicated to do well. The C compiler was hopelessly
inefficient, meaning you /had/ to use assembly for critical parts. The
hardware debugging tools were absurdly overpriced (some $1500 for what
was basically a couple of 74-series logic chips), and broke easily. The
software tools had annoying quirks. But the sensorless BLDC motor
control worked well in the end.

I would not willingly choose to do development on these parts again -
there are simply too many alternatives that are vastly easier to work
with for most purposes. But I know TI sell various pre-programmed parts
as dedicated motor control peripherals, and I'd be quite happy to
consider them.

As I said, the great majority of embedded microcontroller work is now
done with Cortex-M microcontrollers - they dominate the industry. At
the low end you have Cortex-M0 and M0+ devices for the very cheapest,
but the most popular are M3 or M4 parts (and the M7 at the high end).
The M4 is like an M3 but with added "DSP" instructions - MAC's of
various types, simple SIMD, saturating arithmetic. In reality,
relatively few people actually do anything that could be called "DSP"
work - it's usually more general control code. And when you want a
digital filter or FFT, you typically use ARM's optimised libraries.
Your code runs the same whether the device has DSP optimisation
instructions or not - only the speed is different.

So when you ask about "experience using these devices", you are really
asking "experience doing microcontroller development".

Post by Rick C
I guess I was not quite explicit enough in my initial post. I was
asking about specific experiences where a crossover part was chosen
for a project with a significant DSP content, which would have
required a DSP chip, if these devices were not available.

That is a different question, and more specific.

I've only done quite limited DSP algorithms (such as simple filters) in
my own code, and these devices are absolutely fine for that. As always,
you have to be careful about your scalings when working with fixed-point
numbers.

If you want floating point, some Cortex-M4 have single-precision
floating point (Cortex-M4F). You need to be careful to avoid
accidentally using double precision operations in your C code - there
are gcc flags to help warn you about this. If you want double
precision, it's worth going for an M7 microntroller like an NXP RT10xx
device (ironically called a "crossover microcontroller" by NXP), since
these have double precision floating point in hardware.

I have been involved in a project that was more relevant, using wavelet
transformations, but I did not work directly on the wavelet code. I did
help out on some of the optimising and translation from the original
code (from a PC). Working that way is not optimal, but it was good
enough - we required a certain amount of transformations per second, and
got that from the chip we had on the board, and did not see any point in
going further.

There is no doubt that dedicated DSP cores have instruction types and
features that can make a significant difference to the efficiency of
some algorithms. A good DSP can do "x += *p++ * *q++;" in a single
operation, once per cycle. They generally support cyclic buffers
directly, which can save a fair bit of code. And they have the
specialised bit manipulation instructions useful for FFT's.

However, it is all about getting the results out in the time (and power
and cost budget) you need. And if your code runs fast enough on the
device you have, it really doesn't matter if a different device could do
it faster.

A lot of the choice will, as so often, come down to experience and
familiarity. Getting decent DSP algorithm performance from an M4 is not
too hard if you are already a good embedded programmer. It comes down
to knowing your toolchain, knowing how to write efficient code, and
knowing how to work with vendor's libraries. And since you have good
toolchains, easy and cheap debugging (usually), and peripherals such as
serial ports, USB, and Ethernet, you often have a much nicer development
environment. If you develop appropriately, the same code will also
compile directly on a PC making simulation and testing vastly easier.

On a DSP, getting optimal performance is very difficult - there is a
/lot/ you need to track, and you are often making use of so many
compiler extensions, intrinsics, etc., that you are really programming
in assembly. Getting the same code running on a PC for testing is
hugely harder. Accidentally getting significantly poorer efficiency is
very easy - you might find that writing "while (--n)" gives you
extremely fast specialised loop modes while "while (n--)" gives you
explicit decrements, comparisons and jumps. Toolchains are often poor
quality and very expensive (that is not universal, however). And
non-DSP code is much harder than in a microcontroller - you often don't
have access to 8-bit bytes, and portability between the DSP and other
processors is poor.

We haven't talked much about peripherals or hardware, but DSP's usually
have fewer "general" peripherals, and their interfaces can be more
specialised.

Post by Rick C
I am fully aware that MCUs are getting faster and more capable, but
that doesn't mean DSPs are not needed. It simply means they are used
in other applications that require more horsepower. Sometimes, it's
not even the horsepower, but the performance to power consumption
ratio. There are application specific DSPs for hearing aids that run
on very low power, much better than any MCU could do.

Yes, that is correct.

DSP's are still very much an important technology, but they are getting
more niche. There are few people that develop with them - the majority
of companies that have a DSP on their boards will buy the code ready
made, often just as a binary blob or pre-programmed. In many cases, the
code is written by the companies that develop the DSP.

This is not just because getting maximal efficiency from a DSP is
technically hard and requires knowledge and experience (and if you don't
need maximal efficiency, why are you bothering with the DSP in the first
place?). IP and patent licensing is a nightmare in many of the
applications where DSPs really shine, such as in audio and video codecs.
If you are Sony or Sonos, you can afford a big development team and an
even bigger lawyer team and make your own audio codecs. For most
companies, it is a fraction of the overall price if you buy your DSP's
with licenses for codec binary blobs all in one.

Standalone DSP chips are also getting rarer - it is more common to see
them as accelerators alongside a "host" processor that handles the
non-DSP functionality, all within the same die.

Post by Rick C
Years ago DSP split into two categories based on the cell phone
market. The high performance devices needed their own power plants,
but cranked out some serious MIPS/MFLOPS. The much smaller, lower
power, fixed point devices gained in speed, without sucking all the
juice from mobile batteries, while serving in hand sets. Now the
hand sets have dedicated CPU chips with built in DSP sections for the
front end processing of cell phones, rather than separate DSP chips.
There's no shortage of DSP cores in the world, we just don't see all
of them because they are part of system chips.

Agreed.

Most (in terms of numerical quantities) are probably generated
specifically for the ASIC or dedicated chip they are used in. There are
parametrized DSP cores available that are often used with 24-bit or
18-bit "bytes" - TMS320's with 16-bit or 32-bit "char" are
programmer-friendly in comparison. And sometimes it is not easy to draw
the line between hardware filters with very programmable state machines,
and limited DSPs.

But a lot is changing. At the high end, processors with SIMD are able
to do many of the tasks that DSP's used to do. Other kinds of
accelerators such as found in graphics card cores can do a better job
than traditional DSPs, while also being easier to work with. At the
lower end, normal microcontrollers, possibly augmented with a few
DSP-friendly instructions, can do a better job. For your hearing aids,
when you have a Cortex-M device that takes less power than the leakage
current of the smallest battery while doing all the filtering fast
enough, the DSP has lost its advantage.

Rick C

2022-12-23 11:44:03 UTC

Post by Rick C
I was just curious about what people have used for DSP
applications, but in particular if anyone had used one of the
"crossover" parts. So far, the answer has been "no".

I don't know exactly how you are defining a "crossover" part.

Please read the first post in this thread for that.

I did. That's why I said I don't know exactly how you are defining your
personal meaning of "crossover part".

You read my description of what I called a crossover part, and you don't know how I define it??? I don't know how to respond to that.

But I see you've given more
information below, so maybe people can give you more helpful feedback
(or at least say that they don't have the relevant experience).

If they don't have relevant experience, there's no need to reply. I'd prefer that not everyone in the group chimes in to say they have nothing to say.

Post by David Brown
But if it is "a DSP with microcontroller features", then the answer
so far is "yes". Both Grant and I have used TMS320F parts - but I
would not choose to use one again if I could avoid it. (I can't
answer for Grant there.) I have also used a "DSP with
microcontroller features" from Freescale (from the MC56000 family,
IIRC) - though I hadn't mentioned that at all.
And if you mean "a microcontroller with DSP features", then as I
said almost everyone who works with embedded software has used
Cortex-M4 devices. I have lost count of the number of different
ones I have used (plus Cortex-M7, ColdFire, and PPC based
microcontrollers that had DSP features).
So I don't quite see how you could have interpreted the posts as "no".

Of course it is an overview. Do you want detailed information about
everything I have done for the past 15 years or so since Cortex-M
devices took over the embedded world?

I asked a simple question about a flavor of DSP products. You wrote about the entire gamut. I'm not asking about all of your experience. I don't seem to be able to communicate this to you.

I can give a bit more insight into my experience with the TI320F24x
device. That was over 20 years ago, and lots will have changed since
then. The device was horrible to use. The assembly was impenetrable,
and extremely complicated to do well. The C compiler was hopelessly
inefficient, meaning you /had/ to use assembly for critical parts. The
hardware debugging tools were absurdly overpriced (some $1500 for what
was basically a couple of 74-series logic chips), and broke easily. The
software tools had annoying quirks. But the sensorless BLDC motor
control worked well in the end.

I'm a bit lost. Was the TI320F24x a crossover part, with MCU like features?

I guess I stopped paying much attention to the DSP market some 10 years or so ago. I don't recall these parts.

I would not willingly choose to do development on these parts again -
there are simply too many alternatives that are vastly easier to work
with for most purposes. But I know TI sell various pre-programmed parts
as dedicated motor control peripherals, and I'd be quite happy to
consider them.
As I said, the great majority of embedded microcontroller work is now
done with Cortex-M microcontrollers - they dominate the industry.

I'm not asking about MCU work. I'm asking about DSP work.

At
the low end you have Cortex-M0 and M0+ devices for the very cheapest,
but the most popular are M3 or M4 parts (and the M7 at the high end).
The M4 is like an M3 but with added "DSP" instructions - MAC's of
various types, simple SIMD, saturating arithmetic. In reality,
relatively few people actually do anything that could be called "DSP"
work - it's usually more general control code. And when you want a
digital filter or FFT, you typically use ARM's optimised libraries.
Your code runs the same whether the device has DSP optimisation
instructions or not - only the speed is different.
So when you ask about "experience using these devices", you are really
asking "experience doing microcontroller development".

No, I'm asking about DSP work. I clarified that. Otherwise, there would have been no point in mentioning DSP like devices. If not using them for DSP work, the DSP aspects are not relevant.

That is a different question, and more specific.

Not different, simply a clarification.

I've only done quite limited DSP algorithms (such as simple filters) in
my own code, and these devices are absolutely fine for that. As always,
you have to be careful about your scalings when working with fixed-point
numbers.
If you want floating point, some Cortex-M4 have single-precision
floating point (Cortex-M4F). You need to be careful to avoid
accidentally using double precision operations in your C code - there
are gcc flags to help warn you about this. If you want double
precision, it's worth going for an M7 microntroller like an NXP RT10xx
device (ironically called a "crossover microcontroller" by NXP), since
these have double precision floating point in hardware.
I have been involved in a project that was more relevant, using wavelet
transformations, but I did not work directly on the wavelet code. I did
help out on some of the optimising and translation from the original
code (from a PC). Working that way is not optimal, but it was good
enough - we required a certain amount of transformations per second, and
got that from the chip we had on the board, and did not see any point in
going further.
There is no doubt that dedicated DSP cores have instruction types and
features that can make a significant difference to the efficiency of
some algorithms. A good DSP can do "x += *p++ * *q++;" in a single
operation, once per cycle. They generally support cyclic buffers
directly, which can save a fair bit of code. And they have the
specialised bit manipulation instructions useful for FFT's.
However, it is all about getting the results out in the time (and power
and cost budget) you need. And if your code runs fast enough on the
device you have, it really doesn't matter if a different device could do
it faster.
A lot of the choice will, as so often, come down to experience and
familiarity. Getting decent DSP algorithm performance from an M4 is not
too hard if you are already a good embedded programmer. It comes down
to knowing your toolchain, knowing how to write efficient code, and
knowing how to work with vendor's libraries. And since you have good
toolchains, easy and cheap debugging (usually), and peripherals such as
serial ports, USB, and Ethernet, you often have a much nicer development
environment. If you develop appropriately, the same code will also
compile directly on a PC making simulation and testing vastly easier.
On a DSP, getting optimal performance is very difficult - there is a
/lot/ you need to track, and you are often making use of so many
compiler extensions, intrinsics, etc., that you are really programming
in assembly. Getting the same code running on a PC for testing is
hugely harder. Accidentally getting significantly poorer efficiency is
very easy - you might find that writing "while (--n)" gives you
extremely fast specialised loop modes while "while (n--)" gives you
explicit decrements, comparisons and jumps. Toolchains are often poor
quality and very expensive (that is not universal, however). And
non-DSP code is much harder than in a microcontroller - you often don't
have access to 8-bit bytes, and portability between the DSP and other
processors is poor.
We haven't talked much about peripherals or hardware, but DSP's usually
have fewer "general" peripherals, and their interfaces can be more
specialised.

Yes, that's why they have these "crossover" devices, that have more MCU like peripherals. At least they did have them...

Yes, that is correct.
DSP's are still very much an important technology, but they are getting
more niche. There are few people that develop with them - the majority
of companies that have a DSP on their boards will buy the code ready
made, often just as a binary blob or pre-programmed. In many cases, the
code is written by the companies that develop the DSP.
This is not just because getting maximal efficiency from a DSP is
technically hard and requires knowledge and experience (and if you don't
need maximal efficiency, why are you bothering with the DSP in the first
place?). IP and patent licensing is a nightmare in many of the
applications where DSPs really shine, such as in audio and video codecs.
If you are Sony or Sonos, you can afford a big development team and an
even bigger lawyer team and make your own audio codecs. For most
companies, it is a fraction of the overall price if you buy your DSP's
with licenses for codec binary blobs all in one.
Standalone DSP chips are also getting rarer - it is more common to see
them as accelerators alongside a "host" processor that handles the
non-DSP functionality, all within the same die.

Yes, that's what I said.

Agreed.
Most (in terms of numerical quantities) are probably generated
specifically for the ASIC or dedicated chip they are used in. There are
parametrized DSP cores available that are often used with 24-bit or
18-bit "bytes" - TMS320's with 16-bit or 32-bit "char" are
programmer-friendly in comparison. And sometimes it is not easy to draw
the line between hardware filters with very programmable state machines,
and limited DSPs.
But a lot is changing. At the high end, processors with SIMD are able
to do many of the tasks that DSP's used to do. Other kinds of
accelerators such as found in graphics card cores can do a better job
than traditional DSPs, while also being easier to work with. At the
lower end, normal microcontrollers, possibly augmented with a few
DSP-friendly instructions, can do a better job. For your hearing aids,
when you have a Cortex-M device that takes less power than the leakage
current of the smallest battery while doing all the filtering fast
enough, the DSP has lost its advantage.

Thanks for your comments.

--
Rick C.

+- Get 1,000 miles of free Supercharging
+- Tesla referral code - https://ts.la/richard11209

Paul Rubin

2022-12-23 06:31:49 UTC

Post by Rick C
I was just curious about what people have used for DSP applications,
but in particular if anyone had used one of the "crossover" parts. So
far, the answer has been "no".

I've done some audio stuff on ordinary CPU's, that in an embedded system
would probably go on something like a Cortex M4, if that's what you call
a crossover part. The next thing after that is probably a GPU or FPGA,
either of which contains a stupendous amount of parallel MAC's. As
others have said, dedicated DSP's are now pretty niche.

FPGA's may have displaced general purpose processors for some realtime
applications as well, since you get low latency and deterministic timing
without having to go crazy worrying about caches and interrupts.

I didn't personally work on it, but spent a while studying a
cryptography app that ran on the now ancient Motorola DSP 56000 series.
The model number came from the architecture's 24 bit words and 56 bit
MAC accumulator. The app wasn't particularly connected with realtime or
with signal processing. Rather, the 24*24->56 MAC came in handy for
high precision arithmetic used by the crypto algorithm.

Rick C

2022-12-23 08:14:07 UTC

Post by Paul Rubin

Post by Rick C
I was just curious about what people have used for DSP applications,
but in particular if anyone had used one of the "crossover" parts. So
far, the answer has been "no".

I've done some audio stuff on ordinary CPU's, that in an embedded system
would probably go on something like a Cortex M4, if that's what you call
a crossover part. The next thing after that is probably a GPU or FPGA,
either of which contains a stupendous amount of parallel MAC's. As
others have said, dedicated DSP's are now pretty niche.
FPGA's may have displaced general purpose processors for some realtime
applications as well, since you get low latency and deterministic timing
without having to go crazy worrying about caches and interrupts.
I didn't personally work on it, but spent a while studying a
cryptography app that ran on the now ancient Motorola DSP 56000 series.
The model number came from the architecture's 24 bit words and 56 bit
MAC accumulator. The app wasn't particularly connected with realtime or
with signal processing. Rather, the 24*24->56 MAC came in handy for
high precision arithmetic used by the crypto algorithm.

At that time there were generally, 16 bit fixed point DSP, and 32 bit floating point DSP. Neither was appropriate for audio work. 16 bits is not enough resolution for high quality audio and 32 bit floating point was overkill, using extra power and burning extra dollars. Motorola came out with 24 bit devices as the sweet spot for high quality audio work.

--
Rick C.

-+ Get 1,000 miles of free Supercharging
-+ Tesla referral code - https://ts.la/richard11209

David Brown

2022-12-23 09:54:59 UTC

Post by Paul Rubin

Post by Rick C
I was just curious about what people have used for DSP
applications, but in particular if anyone had used one of the
"crossover" parts. So far, the answer has been "no".

I've done some audio stuff on ordinary CPU's, that in an embedded
system would probably go on something like a Cortex M4, if that's
what you call a crossover part. The next thing after that is
probably a GPU or FPGA, either of which contains a stupendous
amount of parallel MAC's. As others have said, dedicated DSP's are
now pretty niche.
FPGA's may have displaced general purpose processors for some
realtime applications as well, since you get low latency and
deterministic timing without having to go crazy worrying about
caches and interrupts.
I didn't personally work on it, but spent a while studying a
cryptography app that ran on the now ancient Motorola DSP 56000
series. The model number came from the architecture's 24 bit words
and 56 bit MAC accumulator. The app wasn't particularly connected
with realtime or with signal processing. Rather, the 24*24->56 MAC
came in handy for high precision arithmetic used by the crypto
algorithm.

At that time there were generally, 16 bit fixed point DSP, and 32 bit
floating point DSP. Neither was appropriate for audio work. 16 bits
is not enough resolution for high quality audio and 32 bit floating
point was overkill, using extra power and burning extra dollars.
Motorola came out with 24 bit devices as the sweet spot for high
quality audio work.

Yes. There are many manufacturers of 24-bit DSPs, and they almost all
have a background in audio.

Motorola (then Freescale, now NXP) also had a peripheral they called the
TPU (Timer Processing Unit), found in microcontrollers aimed at engine
control and advanced motor control usage. The original version was
16-bit and programmed in a weird kind of assembly, but the later
versions were 24-bit and had a specialised C compiler. It turns out
that 16 bits is often not quite enough for many high resolution timing
tasks, and again 32-bit would have been overkill.

(Now, of course, you just use the 32-bit - the millidollar difference in
hardware costs is worth it for the added convenience.)

Clifford Heath

2023-01-17 01:15:09 UTC

Post by Paul Rubin
I didn't personally work on it, but spent a while studying a
cryptography app that ran on the now ancient Motorola DSP 56000
series. The model number came from the architecture's 24 bit words
and 56 bit MAC accumulator. The app wasn't particularly connected
with realtime or with signal processing. Rather, the 24*24->56 MAC
came in handy for high precision arithmetic used by the crypto
algorithm.

At that time there were generally, 16 bit fixed point DSP, and 32 bit
floating point DSP. Neither was appropriate for audio work. 16 bits
is not enough resolution for high quality audio and 32 bit floating
point was overkill, using extra power and burning extra dollars.
Motorola came out with 24 bit devices as the sweet spot for high
quality audio work.

Yes. There are many manufacturers of 24-bit DSPs, and they almost all
have a background in audio.
(Now, of course, you just use the 32-bit - the millidollar difference in
hardware costs is worth it for the added convenience.)

NXP still make DSC56xx and MC56xx (16 and 32 bit) families. No 24/56 bit
ones as far as I can see.

<https://www.nxp.com/products/processors-and-microcontrollers/additional-mpu-mcus-architectures/digital-signal-controllers/32-bit-56800ef-dsc-core:56F8XXXX-32BIT>

dalai lamah

2022-12-23 14:07:24 UTC