Discussion:
Patch fixed strings in .hex file
(too old to reply)
pozz
2024-01-16 12:19:27 UTC
Permalink
In one project I have many quasi-fixed strings that I'd like to keep in
non volatile memory (Flash) to avoid losing precious RAM space.

static const char s1[] = "/my/very/long/string/of/01020304";
static const char s2[] = "/another/string/01020304";
...

Substring "01020304" is a serial number that changes during production
with specific device. It has the same length in bytes (it's a simple hex
representation of a 32-bits integer).

Of course it's too difficult and slow to rebuild the firmware during
production passing to the compiler the real serial number. I think a
better solution is to patch the .hex file generated by the compiler.

I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.

The build system is gcc, so I could search for s1 in the elf file. Do
you know of a tool that returns the address of a symbol in the elf or
map file?

Could you suggest a better approach?
David Brown
2024-01-16 12:51:39 UTC
Permalink
Post by pozz
In one project I have many quasi-fixed strings that I'd like to keep in
non volatile memory (Flash) to avoid losing precious RAM space.
static const char s1[] = "/my/very/long/string/of/01020304";
static const char s2[] = "/another/string/01020304";
...
Substring "01020304" is a serial number that changes during production
with specific device. It has the same length in bytes (it's a simple hex
representation of a 32-bits integer).
Of course it's too difficult and slow to rebuild the firmware during
production passing to the compiler the real serial number. I think a
better solution is to patch the .hex file generated by the compiler.
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
The build system is gcc, so I could search for s1 in the elf file. Do
you know of a tool that returns the address of a symbol in the elf or
map file?
Could you suggest a better approach?
In the source code, put the serial number in as "PQRXYZ" or some other
distinct string of characters. Generate bin files, not hex (or convert
with objcopy). Then do a simple search for the special string to find
its position and replace it with the serial number using a simple Python
script or your other favourite tool (awk, sed, perl, whatever).

Oh, and in the source code, don't forget to make the string "volatile".
pozz
2024-01-16 14:42:29 UTC
Permalink
Post by David Brown
Post by pozz
In one project I have many quasi-fixed strings that I'd like to keep
in non volatile memory (Flash) to avoid losing precious RAM space.
static const char s1[] = "/my/very/long/string/of/01020304";
static const char s2[] = "/another/string/01020304";
...
Substring "01020304" is a serial number that changes during production
with specific device. It has the same length in bytes (it's a simple
hex representation of a 32-bits integer).
Of course it's too difficult and slow to rebuild the firmware during
production passing to the compiler the real serial number. I think a
better solution is to patch the .hex file generated by the compiler.
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
The build system is gcc, so I could search for s1 in the elf file. Do
you know of a tool that returns the address of a symbol in the elf or
map file?
Could you suggest a better approach?
In the source code, put the serial number in as "PQRXYZ" or some other
distinct string of characters.  Generate bin files, not hex (or convert
with objcopy).  Then do a simple search for the special string to find
its position and replace it with the serial number using a simple Python
script or your other favourite tool (awk, sed, perl, whatever).
I thought about this approach, but is it so difficult to have the same
exact sequence of bytes somewhere else in the output?
Post by David Brown
Oh, and in the source code, don't forget to make the string "volatile".
Why?
David Brown
2024-01-16 15:36:27 UTC
Permalink
Post by pozz
Post by David Brown
Post by pozz
In one project I have many quasi-fixed strings that I'd like to keep
in non volatile memory (Flash) to avoid losing precious RAM space.
static const char s1[] = "/my/very/long/string/of/01020304";
static const char s2[] = "/another/string/01020304";
...
Substring "01020304" is a serial number that changes during
production with specific device. It has the same length in bytes
(it's a simple hex representation of a 32-bits integer).
Of course it's too difficult and slow to rebuild the firmware during
production passing to the compiler the real serial number. I think a
better solution is to patch the .hex file generated by the compiler.
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
The build system is gcc, so I could search for s1 in the elf file. Do
you know of a tool that returns the address of a symbol in the elf or
map file?
Could you suggest a better approach?
In the source code, put the serial number in as "PQRXYZ" or some other
distinct string of characters.  Generate bin files, not hex (or
convert with objcopy).  Then do a simple search for the special string
to find its position and replace it with the serial number using a
simple Python script or your other favourite tool (awk, sed, perl,
whatever).
I thought about this approach, but is it so difficult to have the same
exact sequence of bytes somewhere else in the output?
Try it and see.
Post by pozz
Post by David Brown
Oh, and in the source code, don't forget to make the string "volatile".
Why?
If you have :

static const char s1[] = "PQRXYZ";

and your code later does, say :

const int last_digit = s1[5] - '0';

the compiler will optimise it to :

const int last_digit = '*';

i.e., it will calculate 'Z' - '0' at compile time - and if I remember by
ASCII codes correctly, that matches '*'.

You will be messing with the string behind the compiler's back. Make it
volatile. "volatile const" might be unusual, but it is useful in
exactly this kind of circumstance.
pozz
2024-01-16 15:57:36 UTC
Permalink
Post by David Brown
Post by pozz
Post by David Brown
Post by pozz
In one project I have many quasi-fixed strings that I'd like to keep
in non volatile memory (Flash) to avoid losing precious RAM space.
static const char s1[] = "/my/very/long/string/of/01020304";
static const char s2[] = "/another/string/01020304";
...
Substring "01020304" is a serial number that changes during
production with specific device. It has the same length in bytes
(it's a simple hex representation of a 32-bits integer).
Of course it's too difficult and slow to rebuild the firmware during
production passing to the compiler the real serial number. I think a
better solution is to patch the .hex file generated by the compiler.
I'm wondering how to detect the exact positions (addresses) of
serial numbers to fix.
The build system is gcc, so I could search for s1 in the elf file.
Do you know of a tool that returns the address of a symbol in the
elf or map file?
Could you suggest a better approach?
In the source code, put the serial number in as "PQRXYZ" or some
other distinct string of characters.  Generate bin files, not hex (or
convert with objcopy).  Then do a simple search for the special
string to find its position and replace it with the serial number
using a simple Python script or your other favourite tool (awk, sed,
perl, whatever).
I thought about this approach, but is it so difficult to have the same
exact sequence of bytes somewhere else in the output?
Try it and see.
Post by pozz
Post by David Brown
Oh, and in the source code, don't forget to make the string "volatile".
Why?
    static const char s1[] = "PQRXYZ";
    const int last_digit = s1[5] - '0';
    const int last_digit = '*';
i.e., it will calculate 'Z' - '0' at compile time - and if I remember by
ASCII codes correctly, that matches '*'.
You will be messing with the string behind the compiler's back.  Make it
volatile.  "volatile const" might be unusual, but it is useful in
exactly this kind of circumstance.
Oh yes, I got the point now.
dalai lamah
2024-01-16 15:47:51 UTC
Permalink
Post by pozz
Post by David Brown
In the source code, put the serial number in as "PQRXYZ" or some other
distinct string of characters.  Generate bin files, not hex (or convert
with objcopy).  Then do a simple search for the special string to find
its position and replace it with the serial number using a simple Python
script or your other favourite tool (awk, sed, perl, whatever).
I thought about this approach, but is it so difficult to have the same
exact sequence of bytes somewhere else in the output?
Extremely unlikely, especially since you use text strings and therefore you
actually use 64 bits (eigth ASCII characters) to represent a 32 bit number.
Besides, you don't need to use an ASCII string as the placeholder, you can
use any 64 bit number.

If for example your binary file is 1 MB, there is one chance over 2.2
trillion to have the same number duplicated somewhere else.
Post by pozz
Post by David Brown
Oh, and in the source code, don't forget to make the string "volatile".
Why?
To avoid that the compiler will optimize the code and "obfuscate" your
string. I don't think it is very likely, but it is not impossible,
especially if you use a very aggressive optimization level.
--
Fletto i muscoli e sono nel vuoto.
David Brown
2024-01-16 17:50:26 UTC
Permalink
Post by dalai lamah
Post by pozz
Post by David Brown
In the source code, put the serial number in as "PQRXYZ" or some other
distinct string of characters.  Generate bin files, not hex (or convert
with objcopy).  Then do a simple search for the special string to find
its position and replace it with the serial number using a simple Python
script or your other favourite tool (awk, sed, perl, whatever).
I thought about this approach, but is it so difficult to have the same
exact sequence of bytes somewhere else in the output?
Extremely unlikely, especially since you use text strings and therefore you
actually use 64 bits (eigth ASCII characters) to represent a 32 bit number.
Besides, you don't need to use an ASCII string as the placeholder, you can
use any 64 bit number.
If for example your binary file is 1 MB, there is one chance over 2.2
trillion to have the same number duplicated somewhere else.
Post by pozz
Post by David Brown
Oh, and in the source code, don't forget to make the string "volatile".
Why?
To avoid that the compiler will optimize the code and "obfuscate" your
string. I don't think it is very likely, but it is not impossible,
especially if you use a very aggressive optimization level.
Actually, this sort of thing really does happen in practice. In one of
my current projects, I have some data that is filled in by
post-processing the binary file, and I had to use volatile accesses to
read the data or the compiler would optimise based on its knowledge of
the contents it saw at compile time. This is not just theoretical.

(To be fair, it is a bit more likely if - like in my case - the source
file uses null characters rather than a pseudo-random string of characters.)
Herbert Kleebauer
2024-01-16 14:07:52 UTC
Permalink
Post by pozz
In one project I have many quasi-fixed strings that I'd like to keep in
non volatile memory (Flash) to avoid losing precious RAM space.
static const char s1[] = "/my/very/long/string/of/01020304";
static const char s2[] = "/another/string/01020304";
...
Substring "01020304" is a serial number that changes during production
with specific device. It has the same length in bytes (it's a simple hex
representation of a 32-bits integer).
Of course it's too difficult and slow to rebuild the firmware during
production passing to the compiler the real serial number. I think a
better solution is to patch the .hex file generated by the compiler.
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
Generate two binaries with different substrings and then do
a binary file compare to find the position.
pozz
2024-01-16 14:42:54 UTC
Permalink
Post by Herbert Kleebauer
Post by pozz
In one project I have many quasi-fixed strings that I'd like to keep in
non volatile memory (Flash) to avoid losing precious RAM space.
static const char s1[] = "/my/very/long/string/of/01020304";
static const char s2[] = "/another/string/01020304";
...
Substring "01020304" is a serial number that changes during production
with specific device. It has the same length in bytes (it's a simple hex
representation of a 32-bits integer).
Of course it's too difficult and slow to rebuild the firmware during
production passing to the compiler the real serial number. I think a
better solution is to patch the .hex file generated by the compiler.
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
Generate two binaries with different substrings and then do
a binary file compare to find the position.
Thank you for this?
Grant Edwards
2024-01-16 15:24:01 UTC
Permalink
Post by pozz
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
Assuming there's a symbol associated with the address, the link map
will tell you what the address is.

--
Grant
David Brown
2024-01-16 15:38:33 UTC
Permalink
Post by Grant Edwards
Post by pozz
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
Assuming there's a symbol associated with the address, the link map
will tell you what the address is.
Making the symbol extern linkage (remove the "static") would help with that!
Grant Edwards
2024-01-16 18:32:39 UTC
Permalink
Post by David Brown
Post by Grant Edwards
Post by pozz
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
Assuming there's a symbol associated with the address, the link map
will tell you what the address is.
Making the symbol extern linkage (remove the "static") would help with that!
IIRC, if you're using gcc/binutils, there are ways to get even static
symbols to show up in the link map (e.g. --fdata-sections), but making
the symbol global is smplest.

--
Grant
pozz
2024-01-16 16:01:15 UTC
Permalink
Post by Grant Edwards
Post by pozz
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
Assuming there's a symbol associated with the address, the link map
will tell you what the address is.
The map file is simple to read by human, but I think it's better to use
some tool (readelf or objdump) that access elf file.

Even if I weren't able to create a command line for this task.
David Brown
2024-01-16 15:39:43 UTC
Permalink
Post by pozz
In one project I have many quasi-fixed strings that I'd like to keep in
non volatile memory (Flash) to avoid losing precious RAM space.
static const char s1[] = "/my/very/long/string/of/01020304";
static const char s2[] = "/another/string/01020304";
...
Substring "01020304" is a serial number that changes during production
with specific device. It has the same length in bytes (it's a simple hex
representation of a 32-bits integer).
Of course it's too difficult and slow to rebuild the firmware during
production passing to the compiler the real serial number. I think a
better solution is to patch the .hex file generated by the compiler.
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
The build system is gcc, so I could search for s1 in the elf file. Do
you know of a tool that returns the address of a symbol in the elf or
map file?
Could you suggest a better approach?
Another - perhaps more reliable - method would be to put the string in
its own section with __attribute__((section('serial_number'))), and then
have a linker file entry to fix it at a specific known address.
Tauno Voipio
2024-01-17 18:19:46 UTC
Permalink
Post by David Brown
Post by pozz
In one project I have many quasi-fixed strings that I'd like to keep
in non volatile memory (Flash) to avoid losing precious RAM space.
static const char s1[] = "/my/very/long/string/of/01020304";
static const char s2[] = "/another/string/01020304";
...
Substring "01020304" is a serial number that changes during production
with specific device. It has the same length in bytes (it's a simple
hex representation of a 32-bits integer).
Of course it's too difficult and slow to rebuild the firmware during
production passing to the compiler the real serial number. I think a
better solution is to patch the .hex file generated by the compiler.
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
The build system is gcc, so I could search for s1 in the elf file. Do
you know of a tool that returns the address of a symbol in the elf or
map file?
Could you suggest a better approach?
Another - perhaps more reliable - method would be to put the string in
its own section with __attribute__((section('serial_number'))), and then
have a linker file entry to fix it at a specific known address.
My vote for this.

If there are many strings, they could be set into
a const volatile struct which then is located into
a known place.
--
-TV
Stefan Reuther
2024-01-16 16:44:22 UTC
Permalink
Post by pozz
The build system is gcc, so I could search for s1 in the elf file. Do
you know of a tool that returns the address of a symbol in the elf or
map file?
Last time I needed that, I hacked it up myself; at least back in 32-bit
times, ELF was not that hard (but I had to do that anyway to convert ELF
into something the controller could boot).
Post by pozz
Could you suggest a better approach?
Define your memory allocations explicitly. Instead of building a binary
and hacking the strings, place the strings at a fixed address and
regenerate the ELF or .hex file containing them from scratch. Whether
you then give the fixed addresses a name using linker magic, or just
cast pointers, is a matter of taste.


Stefan
Grant Edwards
2024-01-16 18:39:10 UTC
Permalink
Post by Stefan Reuther
Post by pozz
The build system is gcc, so I could search for s1 in the elf file. Do
you know of a tool that returns the address of a symbol in the elf or
map file?
Last time I needed that, I hacked it up myself; at least back in 32-bit
times, ELF was not that hard (but I had to do that anyway to convert ELF
into something the controller could boot).
I think scanelf from pax-utils will do it.

https://github.com/gentoo/pax-utils
Michael Schwingen
2024-01-16 19:30:55 UTC
Permalink
Post by Stefan Reuther
Post by pozz
The build system is gcc, so I could search for s1 in the elf file. Do
you know of a tool that returns the address of a symbol in the elf or
map file?
Last time I needed that, I hacked it up myself; at least back in 32-bit
times, ELF was not that hard (but I had to do that anyway to convert ELF
into something the controller could boot).
libelf should help.

The requirements sound similar to "we need to patch the checksum in the
vector table so that a LPC MCU will boot":

https://github.com/imi415/lpchecksum

It should be easy to modify that to patch serial numbers.
Post by Stefan Reuther
Define your memory allocations explicitly. Instead of building a binary
and hacking the strings, place the strings at a fixed address and
regenerate the ELF or .hex file containing them from scratch. Whether
you then give the fixed addresses a name using linker magic, or just
cast pointers, is a matter of taste.
Yes. Placing the string in a special section via the linker script will
make it easier for the patch tool to locate the string.

cu
Michael
--
Some people have no respect of age unless it is bottled.
Hans-Bernhard Bröker
2024-01-16 18:35:56 UTC
Permalink
Post by pozz
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
You do not.

Instead, you set up linker scripts, linker options and/or add
__attribute(()) to the variables' definitions to _place_ them at a
predetermined, fixed, known-useful location.

And do yourself one favour: have only _one_ instance of that number in
your code. Use concatenation or similar to output it where needed.

Then you can use tools like srecord GNU binutils to stamp your desired
number into that fixed location in the hex file. Professional-grade
chip flashing tools for production environments can usually do that by
themselves, so you don't even have to edit your "official" files.

Details will obviously vary by tool chain.
pozz
2024-01-17 07:45:28 UTC
Permalink
Post by Hans-Bernhard Bröker
Post by pozz
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
You do not.
Instead, you set up linker scripts, linker options and/or add
__attribute(()) to the variables' definitions to _place_ them at a
predetermined, fixed, known-useful location.
Do you mean to choose by yourself the exact address of *each* string?
And where would you put them, at the beginning, in the middle or at the
end of the Flash? You need to calculate the address of the next string
from the address *and length* of the previous string. It seems to me a
tedious and error-prone job that could be done easily by the linker.
Post by Hans-Bernhard Bröker
And do yourself one favour: have only _one_ instance of that number in
your code.  Use concatenation or similar to output it where needed.
Then you can use tools like srecord GNU binutils to stamp your desired
number into that fixed location in the hex file.  Professional-grade
chip flashing tools for production environments can usually do that by
themselves, so you don't even have to edit your "official" files.
Details will obviously vary by tool chain.
Patching the .hex or .bin file replacing 8 bytes starting from a known
address is simple. I would write a Python script or would use one of
srecord[1] tools.

[1] https://srecord.sourceforge.net/
pozz
2024-01-17 08:07:52 UTC
Permalink
Post by pozz
Post by Hans-Bernhard Bröker
Post by pozz
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
You do not.
Instead, you set up linker scripts, linker options and/or add
__attribute(()) to the variables' definitions to _place_ them at a
predetermined, fixed, known-useful location.
Do you mean to choose by yourself the exact address of *each* string?
And where would you put them, at the beginning, in the middle or at the
end of the Flash? You need to calculate the address of the next string
from the address *and length* of the previous string. It seems to me a
tedious and error-prone job that could be done easily by the linker.
Post by Hans-Bernhard Bröker
And do yourself one favour: have only _one_ instance of that number in
your code.  Use concatenation or similar to output it where needed.
Then you can use tools like srecord GNU binutils to stamp your desired
number into that fixed location in the hex file.  Professional-grade
chip flashing tools for production environments can usually do that by
themselves, so you don't even have to edit your "official" files.
Details will obviously vary by tool chain.
Patching the .hex or .bin file replacing 8 bytes starting from a known
address is simple. I would write a Python script or would use one of
srecord[1] tools.
[1] https://srecord.sourceforge.net/
The command to patch 8 bytes in the address range 0x800-0x808 with the
string "01020304" would be:

srec_cat original.hex -I -E 0x800 0x808 -GEN 0x0800 0x0808 -REP_S
"01020304" -O patched.hex -I

-I is for Intel hex formato (input and output)
-E is to exclude the bytes to patch from the original hex
-GEN is to generate new bytes at a certaing range
-REP_S is the constant string to repeat in the range

In my case I don't really need to repeat the string in the range,
because the length of the string is exactly the length of the address range.
David Brown
2024-01-17 10:27:01 UTC
Permalink
Post by pozz
Post by Hans-Bernhard Bröker
Post by pozz
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
You do not.
Instead, you set up linker scripts, linker options and/or add
__attribute(()) to the variables' definitions to _place_ them at a
predetermined, fixed, known-useful location.
Do you mean to choose by yourself the exact address of *each* string?
And where would you put them, at the beginning, in the middle or at the
end of the Flash? You need to calculate the address of the next string
from the address *and length* of the previous string. It seems to me a
tedious and error-prone job that could be done easily by the linker.
How many strings do you need here?

While it is possible to do all this using patching of odd places in your
file, using specific locations is often a better choice. Since you
haven't already said "Thanks for the advice - I tried it that way, it
worked, and I'm happy" in response to any post, I would say that now is
the time to take fixed location solutions seriously.

The way I always handle this is to define a struct type of fixed size,
containing all the information that might be added post-build. That can
be version information, serial numbers, length of the image (very useful
if you tag a CRC check on the end of the image), etc., - whatever you
want to add. Strings have fixed maximum sizes and space.

Make a dedicated section, and in the source code have a default instance
of the type in that section, with default values. (This is especially
handy when running from a debugger, as your elf file will not have
post-build values.) Empty strings should be all null characters.
Remember to declare it "volatile const". Your linker file specifies
that this section goes at a specific known fixed address (perhaps just
after interrupt vectors, or whatever is appropriate for your
microcontroller).

Now your post-build scripts have a simple fixed address to patch the
binaries.
Post by pozz
Post by Hans-Bernhard Bröker
And do yourself one favour: have only _one_ instance of that number in
your code.  Use concatenation or similar to output it where needed.
Then you can use tools like srecord GNU binutils to stamp your desired
number into that fixed location in the hex file.  Professional-grade
chip flashing tools for production environments can usually do that by
themselves, so you don't even have to edit your "official" files.
Details will obviously vary by tool chain.
Patching the .hex or .bin file replacing 8 bytes starting from a known
address is simple. I would write a Python script or would use one of
srecord[1] tools.
[1] https://srecord.sourceforge.net/
Don't bother with hex or srec files. Use binary files - it makes things
easier.
pozz
2024-01-17 11:54:28 UTC
Permalink
Post by David Brown
Post by pozz
Post by Hans-Bernhard Bröker
Post by pozz
I'm wondering how to detect the exact positions (addresses) of
serial numbers to fix.
You do not.
Instead, you set up linker scripts, linker options and/or add
__attribute(()) to the variables' definitions to _place_ them at a
predetermined, fixed, known-useful location.
Do you mean to choose by yourself the exact address of *each* string?
And where would you put them, at the beginning, in the middle or at
the end of the Flash? You need to calculate the address of the next
string from the address *and length* of the previous string. It seems
to me a tedious and error-prone job that could be done easily by the
linker.
How many strings do you need here?
They are 10 strings.
Post by David Brown
While it is possible to do all this using patching of odd places in your
file, using specific locations is often a better choice.  Since you
haven't already said "Thanks for the advice - I tried it that way, it
worked, and I'm happy" in response to any post, I would say that now is
the time to take fixed location solutions seriously.
There are many suggested solutions and I think all of them can be used
with success. Just for sake of curiosity and studying, I'm exploring all
of them.

Sincerely I don't *like* solutions where you need to choose a fixed
location by yourself. Why you should make a job that can be done by the
linker?
Post by David Brown
The way I always handle this is to define a struct type of fixed size,
containing all the information that might be added post-build.  That can
be version information, serial numbers, length of the image (very useful
if you tag a CRC check on the end of the image), etc., - whatever you
want to add.  Strings have fixed maximum sizes and space.
Make a dedicated section, and in the source code have a default instance
of the type in that section, with default values.  (This is especially
handy when running from a debugger, as your elf file will not have
post-build values.)  Empty strings should be all null characters.
Remember to declare it "volatile const".  Your linker file specifies
that this section goes at a specific known fixed address (perhaps just
after interrupt vectors, or whatever is appropriate for your
microcontroller).
Now your post-build scripts have a simple fixed address to patch the
binaries.
How the post-build script should know the exact address of a certain
field in the struct?

volatile const struct post_build_data {
uint32_t serial_number;
uint64_t mac_address;
uint32_t image_size;
char s1[32];
char s2[64];
char s3[13];
} post_build_data __attribute(...);

I know the fixed address of the symbol post_build_data (the only object
in my custom section), but now I have to calculate the offset of the
field s1 in the struct. This calculations is error prone.

In my opinion, it's much simpler to use a production script that
retrieves, without any error or manual calculation, the address of a
certain symbol directly from the elf.

From another post of mine:

readelf -s output.elf | grep string_to_patch | sed -e 's/^ *//' | sed -e
's/ */ /g' | cut -d " " -f 2
Post by David Brown
Post by pozz
Post by Hans-Bernhard Bröker
And do yourself one favour: have only _one_ instance of that number
in your code.  Use concatenation or similar to output it where needed.
Then you can use tools like srecord GNU binutils to stamp your
desired number into that fixed location in the hex file.
Professional-grade chip flashing tools for production environments
can usually do that by themselves, so you don't even have to edit
your "official" files.
Details will obviously vary by tool chain.
Patching the .hex or .bin file replacing 8 bytes starting from a known
address is simple. I would write a Python script or would use one of
srecord[1] tools.
[1] https://srecord.sourceforge.net/
Don't bother with hex or srec files.  Use binary files - it makes things
easier.
Yes of course, patching an hex file or a binary file isn't the complex
task here.
David Brown
2024-01-17 12:54:29 UTC
Permalink
Post by pozz
Post by David Brown
Post by pozz
Post by Hans-Bernhard Bröker
Post by pozz
I'm wondering how to detect the exact positions (addresses) of
serial numbers to fix.
You do not.
Instead, you set up linker scripts, linker options and/or add
__attribute(()) to the variables' definitions to _place_ them at a
predetermined, fixed, known-useful location.
Do you mean to choose by yourself the exact address of *each* string?
And where would you put them, at the beginning, in the middle or at
the end of the Flash? You need to calculate the address of the next
string from the address *and length* of the previous string. It seems
to me a tedious and error-prone job that could be done easily by the
linker.
How many strings do you need here?
They are 10 strings.
I thought you were storing serial numbers? But okay, if you need 10
strings you need 10 strings. The number is just a detail. (But if the
number were 200 strings for supporting different languages, you might do
things differently.)
Post by pozz
Post by David Brown
While it is possible to do all this using patching of odd places in
your file, using specific locations is often a better choice.  Since
you haven't already said "Thanks for the advice - I tried it that way,
it worked, and I'm happy" in response to any post, I would say that
now is the time to take fixed location solutions seriously.
There are many suggested solutions and I think all of them can be used
with success. Just for sake of curiosity and studying, I'm exploring all
of them.
Fair enough.
Post by pozz
Sincerely I don't *like* solutions where you need to choose a fixed
location by yourself. Why you should make a job that can be done by the
linker?
You do so because it makes live much easier. It is the same reason you
write your patching script in Python, rather than C.
Post by pozz
Post by David Brown
The way I always handle this is to define a struct type of fixed size,
containing all the information that might be added post-build.  That
can be version information, serial numbers, length of the image (very
useful if you tag a CRC check on the end of the image), etc., -
whatever you want to add.  Strings have fixed maximum sizes and space.
Make a dedicated section, and in the source code have a default
instance of the type in that section, with default values.  (This is
especially handy when running from a debugger, as your elf file will
not have post-build values.)  Empty strings should be all null
characters. Remember to declare it "volatile const".  Your linker file
specifies that this section goes at a specific known fixed address
(perhaps just after interrupt vectors, or whatever is appropriate for
your microcontroller).
Now your post-build scripts have a simple fixed address to patch the
binaries.
How the post-build script should know the exact address of a certain
field in the struct?
You figure it out /once/ - using one of many possible methods. Counting
with static asserts to check, or looking at the binary after putting
canaries in the sample data.

If you think that you might change the struct often, you can use
separate variables and put them all in the same section, then look at
the map file. In practice you rarely need to do something like that.
Post by pozz
volatile const struct post_build_data {
  uint32_t serial_number;
  uint64_t mac_address;
  uint32_t image_size;
  char s1[32];
  char s2[64];
  char s3[13];
} post_build_data __attribute(...);
I know the fixed address of the symbol post_build_data (the only object
in my custom section), but now I have to calculate the offset of the
field s1 in the struct. This calculations is error prone.
Static assertions are your friend here.
Post by pozz
In my opinion, it's much simpler to use a production script that
retrieves, without any error or manual calculation, the address of a
certain symbol directly from the elf.
I doubt it is simpler. But of course it is possible, and what is
simpler for me is not necessarily the same as simpler for you.
Post by pozz
readelf -s output.elf | grep string_to_patch | sed -e 's/^ *//' | sed -e
's/  */ /g' | cut -d " " -f 2
You are using a Python script to do the patching. Use pyelftools and do
this all in the one Python script. That way, future you who has to
maintain this system will not build a time machine to go back and
strangle the past you that thought this monster made sense. These kinds
of pipes can seem elegant, but they are write-only and a maintainer's
nightmare.
Stefan Reuther
2024-01-17 16:39:37 UTC
Permalink
Post by pozz
Post by David Brown
While it is possible to do all this using patching of odd places in
your file, using specific locations is often a better choice.  Since
you haven't already said "Thanks for the advice - I tried it that way,
it worked, and I'm happy" in response to any post, I would say that
now is the time to take fixed location solutions seriously.
There are many suggested solutions and I think all of them can be used
with success. Just for sake of curiosity and studying, I'm exploring all
of them.
Sincerely I don't *like* solutions where you need to choose a fixed
location by yourself. Why you should make a job that can be done by the
linker?
It's not you vs. the linker. You co-operate. You need to tell the linker
about your chip anyway ("code is from 0x1000 to 0xc000, data is from
0xc000 to 0xd000"). So you can as well tell it "version stamp is from
0xcc00 to 0xd000, data only before 0xcc00".

If you have your identification information in a fixed place, you can,
for example, more easily analyze field returns. It's easy for your field
service has to change something, and it's easy to do software updates
that preserve the identification information. You don't need to figure
out which software build is running on the chip and what the address of
the structure happens to be in that one.
Post by pozz
Post by David Brown
Now your post-build scripts have a simple fixed address to patch the
binaries.
How the post-build script should know the exact address of a certain
field in the struct?
By defining the struct in a compatible way. For example....
Post by pozz
volatile const struct post_build_data {
  uint32_t serial_number;
  uint64_t mac_address;
...this is a bad idea, because in most (but probably not all) chips,
uint64_t after uint32_t means there's 32 bits of padding, so if you need
serial-before-mac, you should at least make the padding explicit. There
also might be endian problems.

Using only char/uint8_t fields gives you a very high chance of identical
structure layout everywhere (`uint8_t mac_address[8]`).


Stefan
David Brown
2024-01-17 17:28:06 UTC
Permalink
Post by Stefan Reuther
Post by pozz
Post by David Brown
While it is possible to do all this using patching of odd places in
your file, using specific locations is often a better choice.  Since
you haven't already said "Thanks for the advice - I tried it that way,
it worked, and I'm happy" in response to any post, I would say that
now is the time to take fixed location solutions seriously.
There are many suggested solutions and I think all of them can be used
with success. Just for sake of curiosity and studying, I'm exploring all
of them.
Sincerely I don't *like* solutions where you need to choose a fixed
location by yourself. Why you should make a job that can be done by the
linker?
It's not you vs. the linker. You co-operate. You need to tell the linker
about your chip anyway ("code is from 0x1000 to 0xc000, data is from
0xc000 to 0xd000"). So you can as well tell it "version stamp is from
0xcc00 to 0xd000, data only before 0xcc00".
If you have your identification information in a fixed place, you can,
for example, more easily analyze field returns. It's easy for your field
service has to change something, and it's easy to do software updates
that preserve the identification information. You don't need to figure
out which software build is running on the chip and what the address of
the structure happens to be in that one.
Post by pozz
Post by David Brown
Now your post-build scripts have a simple fixed address to patch the
binaries.
How the post-build script should know the exact address of a certain
field in the struct?
By defining the struct in a compatible way. For example....
Post by pozz
volatile const struct post_build_data {
  uint32_t serial_number;
  uint64_t mac_address;
...this is a bad idea, because in most (but probably not all) chips,
uint64_t after uint32_t means there's 32 bits of padding, so if you need
serial-before-mac, you should at least make the padding explicit. There
also might be endian problems.
Using only char/uint8_t fields gives you a very high chance of identical
structure layout everywhere (`uint8_t mac_address[8]`).
You can also keep things safe by making sure that you are aligned by
"natural" alignment, at least to size 8 bytes (I have never heard of a
platform that has more than 8 byte alignment for anything). So two
uint32_t's followed by an uint64_t is fine.

#pragma GCC diagnostic push
#pragma GCC diagnostic error "-Wpadded"
volatile const struct ...
#pragma GCC diagnostic pop

is a useful check (if you are using gcc or clang).

And a static assert on the size of the struct is another important
safe-guard.
Grant Edwards
2024-01-17 19:00:39 UTC
Permalink
Post by Stefan Reuther
Post by pozz
Post by David Brown
While it is possible to do all this using patching of odd places in
your file, using specific locations is often a better choice.  Since
you haven't already said "Thanks for the advice - I tried it that way,
it worked, and I'm happy" in response to any post, I would say that
now is the time to take fixed location solutions seriously.
There are many suggested solutions and I think all of them can be used
with success. Just for sake of curiosity and studying, I'm exploring all
of them.
Sincerely I don't *like* solutions where you need to choose a fixed
location by yourself. Why you should make a job that can be done by the
linker?
It's not you vs. the linker. You co-operate. You need to tell the linker
about your chip anyway ("code is from 0x1000 to 0xc000, data is from
0xc000 to 0xd000"). So you can as well tell it "version stamp is from
0xcc00 to 0xd000, data only before 0xcc00".
If you have your identification information in a fixed place, you can,
for example, more easily analyze field returns. It's easy for your field
service has to change something, and it's easy to do software updates
that preserve the identification information. You don't need to figure
out which software build is running on the chip and what the address of
the structure happens to be in that one.
Post by pozz
Post by David Brown
Now your post-build scripts have a simple fixed address to patch the
binaries.
How the post-build script should know the exact address of a certain
field in the struct?
By defining the struct in a compatible way. For example....
Post by pozz
volatile const struct post_build_data {
  uint32_t serial_number;
  uint64_t mac_address;
...this is a bad idea, because in most (but probably not all) chips,
uint64_t after uint32_t means there's 32 bits of padding, so if you need
serial-before-mac, you should at least make the padding explicit.
Yes, defintely that. Or make the packing explicit. And add compile
time checks to verify the offsets of fields withing the structure and
fail if they're not what is expected. That has saved my many times.
pozz
2024-01-19 08:10:44 UTC
Permalink
Post by Stefan Reuther
Post by pozz
Post by David Brown
While it is possible to do all this using patching of odd places in
your file, using specific locations is often a better choice.  Since
you haven't already said "Thanks for the advice - I tried it that way,
it worked, and I'm happy" in response to any post, I would say that
now is the time to take fixed location solutions seriously.
There are many suggested solutions and I think all of them can be used
with success. Just for sake of curiosity and studying, I'm exploring all
of them.
Sincerely I don't *like* solutions where you need to choose a fixed
location by yourself. Why you should make a job that can be done by the
linker?
It's not you vs. the linker. You co-operate. You need to tell the linker
about your chip anyway ("code is from 0x1000 to 0xc000, data is from
0xc000 to 0xd000"). So you can as well tell it "version stamp is from
0xcc00 to 0xd000, data only before 0xcc00".
If you have your identification information in a fixed place, you can,
for example, more easily analyze field returns. It's easy for your field
service has to change something, and it's easy to do software updates
that preserve the identification information. You don't need to figure
out which software build is running on the chip and what the address of
the structure happens to be in that one.
Good point, firmware upgrade. I wasn't thinking about it.

If the addresses of post-builds strings weren't fixed over all the
versions, it would be more complex for the software that manages the
upgrade.
Post by Stefan Reuther
Post by pozz
Post by David Brown
Now your post-build scripts have a simple fixed address to patch the
binaries.
How the post-build script should know the exact address of a certain
field in the struct?
By defining the struct in a compatible way. For example....
Post by pozz
volatile const struct post_build_data {
  uint32_t serial_number;
  uint64_t mac_address;
...this is a bad idea, because in most (but probably not all) chips,
uint64_t after uint32_t means there's 32 bits of padding, so if you need
serial-before-mac, you should at least make the padding explicit. There
also might be endian problems.
Using only char/uint8_t fields gives you a very high chance of identical
structure layout everywhere (`uint8_t mac_address[8]`).
Stefan
Hans-Bernhard Bröker
2024-01-18 17:49:24 UTC
Permalink
Post by pozz
Post by Hans-Bernhard Bröker
Post by pozz
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
You do not.
Instead, you set up linker scripts, linker options and/or add
__attribute(()) to the variables' definitions to _place_ them at a
predetermined, fixed, known-useful location.
Do you mean to choose by yourself the exact address of *each* string?
Of each st4ring that needs this kind of post-build treatment? Oh yes,
absolutely. The key insight is that these are not just ordinary strings
like any other: they're post-build configuration data.
Post by pozz
And where would you put them, at the beginning, in the middle or at the
end of the Flash?
That discussion is what hides behind the term "known-useful", above.
Typically such elements end up near pre-existing memory region
boundaries, with some additional space reserved near them for future
expansion.
Post by pozz
You need to calculate the address of the next string
from the address *and length* of the previous string.
You didn't seriously plan on having the length of this kind of string
actually changing willy-nilly, did you? These actually have to be
fixed-size arrays, i.e.

volatile const char foo[D_LENGTH];

or equivalent.
pozz
2024-01-17 08:57:57 UTC
Permalink
Post by pozz
In one project I have many quasi-fixed strings that I'd like to keep in
non volatile memory (Flash) to avoid losing precious RAM space.
static const char s1[] = "/my/very/long/string/of/01020304";
static const char s2[] = "/another/string/01020304";
...
Substring "01020304" is a serial number that changes during production
with specific device. It has the same length in bytes (it's a simple hex
representation of a 32-bits integer).
Of course it's too difficult and slow to rebuild the firmware during
production passing to the compiler the real serial number. I think a
better solution is to patch the .hex file generated by the compiler.
I'm wondering how to detect the exact positions (addresses) of serial
numbers to fix.
The build system is gcc, so I could search for s1 in the elf file. Do
you know of a tool that returns the address of a symbol in the elf or
map file?
Could you suggest a better approach?
With this command

readelf -s output.elf | grep string_to_patch

the output would be:

543: 00019c44 58 OBJECT LOCAL DEFAULT 1 lwt_message

In order to retrieve only the address:

readelf -s output.elf | grep string_to_patch | sed -e 's/^ *//' | sed
-e 's/ */ /g' | cut -d " " -f 2

The first sed removes all the spaces at the beginning, the second sed
squeezes multiple spaces to one and the cut command extract the second
field.
Don Y
2024-01-17 17:03:40 UTC
Permalink
In one project I have many quasi-fixed strings that I'd like to keep in non
volatile memory (Flash) to avoid losing precious RAM space.
static const char s1[] = "/my/very/long/string/of/01020304";
static const char s2[] = "/another/string/01020304";
...
Substring "01020304" is a serial number that changes during production with
specific device. It has the same length in bytes (it's a simple hex
representation of a 32-bits integer).
Of course it's too difficult and slow to rebuild the firmware during production
passing to the compiler the real serial number. I think a better solution is to
patch the .hex file generated by the compiler.
I'm wondering how to detect the exact positions (addresses) of serial numbers
to fix.
Don't "detect" (i.e., "find"); rather, *place* it/them in a specific location
that your code already knows about. How else would you force vector tables
to reside at specific locations, jump tables, etc.?

You will also code this "hole" into any checksum routine that your
code executes at POST and cover it with a check of its own (that you
will have to ensure is satisfied by whatever tool you use to "patch"
the binary image).
The build system is gcc, so I could search for s1 in the elf file. Do you know
of a tool that returns the address of a symbol in the elf or map file?
Could you suggest a better approach?
When faced with *small* memory regions (e.g., 100 bytes) of a particular
resource (e.g., NVRAM), I prefer a tagged format that allows the available
space to be dynamically traded among uses -- much like cramming a variable
number of parameters in a BOOTP packet.

The downside is that you have to parse the region to extract any specific
parameter. But, it eliminates the need to define static structures
that might change from instance to instance:

STRING1, "/my/very/long/string/of/01020304",
STRING8, "Some Guy's Really long name or address",
STRING9, "/another/string/01020304",
etc.

Note that the tag can be designed to act as the delimiter of a field.
E.g., if all tags (STRING1, STRING8, etc.) have values outside the valid
range of the data being stored (e.g., > 0x7F for ASCII), then the
parse can know that a string terminates when any value > 0x7F is
encountered. (you know that the first value in the region is a tag)

A more versatile approach is to have each tag invoke it's own parse
algorithm:

CITY, "Cañon City\0", ZIPCODE, (long) 81212, AREACODE, ...

Note that 'ñ' is outside the ASCII code points but the CITY parse
routine could rely on some other mechanism ('\0') to detect the end
of that field; similarly, ZIPCODE can expect a 4-byte integer to
immediately follow it; AREA code can expect three BSD digits, etc.

One can insist that tags appear in some fixed order (like TIFF
files) so encountering anything that violates that rule where a
tag is expected can act as a terminator for the field. Or, you
can add a tag that is ENDOFDATA, etc.

This makes the task of replacing any *individual* datum a bit
harder as the fields aren't rigidly defined -- just the start
of the region and its TOTAL length.

OTOH, it's a win when the design evolves to require yet another
parameter without altering the space available to that COLLECTION
of parameters (like a network protocol trying to cram more functionality
into a single packet)

Protecting the integrity of this section can be accomplished with
an error *correcting* code instead of just an error DETECTING
checksum. E.g., I often create nonvolatile instances of the
state of a pseudo-random number generator (because you don't
want a user to be able to "reinitialize" it just by cycling
power) with it's own ECC -- as *it* is often far more valuable than
any other "settings" in the device.
Paul Rubin
2024-01-18 05:42:12 UTC
Permalink
Post by pozz
The build system is gcc, so I could search for s1 in the elf file. Do
you know of a tool that returns the address of a symbol in the elf or
map file?
The nlist command line tool is for that, and iirc there are some library
functions that do similar.
Loading...