How To Put A 32 Bit Pattern In A Register With Only 16 Bit Adressesassembly Language

When working with peripherals, we need to be able to read and write to the device's internal registers. How nosotros attain this in C depends on whether nosotros're working with memory-mapped IO or port-mapped IO. Port-mapped IO typically requires compiler/language extensions, whereas memory-mapped IO tin exist accommodated with the standard C syntax.

Embedded "Hello, World!"

We all know the embedded equivalent of the "Hello, earth!" program is flashing the LED, then true to form I'm going to utilise that equally an example.

The examples are based on a STM32F407 fleck using the GNU Arm Embedded Toolchain .

The STM32F4 uses a port-based GPIO (General Purpose Input Output) model, where each port can manage 16 physical pins. The LEDS are mapped to external pins 55-58 which maps internally onto GPIO Port D pins viii-xi.

Flashing the LEDs

Flashing the LEDs is fairly straightforward, at the port level at that place are merely two registers we are interested in.

Manner Register – this defines, on a pivot-by-pivot basis what its function is, eastward.g. nosotros desire this pin to deport every bit an output pin.
Output Data Register – Writing a '1' to the advisable pin will generate voltage and writing a '0' will ground the pivot.

Style Register (MODER)

Each port pin has four modes of performance, thus requiring two configuration $.25 per pin (pivot 0 is configured using fashion $.25 0-1, pin two uses mode bits two-iii, and so on):

00 Input
01 Output
ten Culling function (details configured via other registers)
11 Analogue

And then, for case, to configure pin 8 for output, nosotros must write the value 01 into bits 16 and 17 in the MODER register (that is, chip 16 => 1, chip 17 => 0).

Output Data Annals (ODR)

In the Output Data Register (ODR) each chip represents an I/O pin on the port. The bit number matches the pin number.

If a pin is set to output (in the MODER register) then writing a 1 into the appropriate fleck will drive the I/O pin high. Writing 0 into the appropriate flake volition drive the I/O pin low.

At that place are xvi IO pins, merely the register is 32bits wide. Reserved bits are read as '0'.

Port D Addresses

The absolute addresses for the MODER and ODR of Port D are:

MODER – 0x40020C00
ODR – 0x40020C14

Pointer access to registers

Typically when we access registers in C based on retentivity-mapped IO we use a pointer annotation to 'trick' the compiler into generating the correct load/store operations at the absolute accost needed.

So for the Port D we might see something forth the lines of (I'll keep the code brief and use magic numbers) for simplicity):

          #include <stdint.h>  volatile uint32_t* const portd_moder   = (uint32_t*) 0x40020C00; volatile uint32_t* const portd_odr     = (uint32_t*) 0x40020C14;  extern void slumber(uint32_t ms); // use systick to busy-wait  int chief(void) {   uint32_t moder = *portd_moder;   moder |= (one << 16);   moder &= ~(1 << 17);   *portd_moder = moder;    while(1) {     *portd_odr |= (1 << eight);   // led-on     sleep(500);     *portd_odr &= ~(1 << 8);  // led-off     sleep(500);   } }

Alternatively we may see the registers defined using the pre-processors, e.grand.

          #include <stdint.h>  #define PORTD_MODER   (*((volatile uint32_t*) 0x40020C00)) #define PORTD_ODR     (*((volatile uint32_t*) 0x40020C14))  extern void sleep(uint32_t ms); // utilise systick to busy-wait  int principal(void) {   uint32_t moder = PORTD_MODER;   moder |= (1 << 16);   moder &= ~(1 << 17);   PORTD_MODER = moder;    while(1) {     PORTD_ODR |= (1 << viii);  // led-on     sleep(500);     PORTD_ODR &= ~(1 << viii); // led-off     sleep(500);   } }

There is a misconception amid many C programmers that the pointer model is less efficient than the #define model. With C99 and modern compilers this is non the case, they will generate identical code (C99 allows for the complier to optimise away const objects).

Enabling Port D

Nosotros are missing one terminal step; each peripheral on the the STM32F407 is clock gated. The clock signal does not achieve the peripheral until nosotros tell information technology to practice so past way of setting a bit in a specific register. By default, clock signals never accomplish peripherals that are non in utilize, thus saving power.

To enable the clock to accomplish the GPIO port D the GPIODEN (GPIO D Enable) bit (chip iii) of the AHB1ENR (AMBA High-functioning Bus i Enable) register in the RCC (Reset and Clock Control) peripheral needs setting.

          #include <stdint.h>  volatile uint32_t* const portd_moder   = (uint32_t*) 0x40020C00; volatile uint32_t* const portd_odr     = (uint32_t*) 0x40020C14;  volatile uint32_t* const rcc_ahb1enr   = (uint32_t*) 0x40023830;  extern void sleep(uint32_t ms); // use systick to decorated-await  int primary(void) {   *rcc_ahb1enr |= (1 << 3);     // enable PortD'south clock    uint32_t moder = *portd_moder;   moder |= (1 << 16);   moder &= ~(1 << 17);   *portd_moder = moder;    while(1) {     *portd_odr |= (1 << 8);     // led-on     sleep(500);     *portd_odr &= ~(one << 8);    // led-off     sleep(500);   } }

Using structs

The code and so far works just fine, just has a number of shortcomings.

Showtime, to back up multiple IO ports we would accept to define a set up of pointers for each set up of registers for each port, e.grand.:

                      volatile uint32_t* const porta_moder   = (uint32_t*) 0x40020000; volatile uint32_t* const porta_odr     = (uint32_t*) 0x40020014;  volatile uint32_t* const portb_moder   = (uint32_t*) 0x40020400; volatile uint32_t* const portb_odr     = (uint32_t*) 0x40020414;  volatile uint32_t* const portc_moder   = (uint32_t*) 0x40020800; volatile uint32_t* const portc_odr     = (uint32_t*) 0x40020014;  volatile uint32_t* const portd_moder   = (uint32_t*) 0x40020C00; volatile uint32_t* const portd_odr     = (uint32_t*) 0x40020C14;  volatile uint32_t* const porte_moder   = (uint32_t*) 0x40021000; volatile uint32_t* const porte_odr     = (uint32_t*) 0x40021014;

Because the port really has 10 different registers we may want to access, this involves a lot of repetition. Where there is repetition, simple to make, but difficult to track down bugs can creep in (did you spot the deliberate fault?).

In improver, and more significantly, we can see that the port's ODR is always 0x14 bytes offset from the MODER. The MODER is ever at offset 0x00 from the port address (this the MODER is besides the port's base accost).

In Software Applied science terms we'd view this separate annunciation of related pointers
as a lack of cohesion in the code. 1 of our goals is to strive for loftier cohesion, thus grouping things together that should naturally be together (equally alter furnishings them all).

struct Overlay

The full annals layout for the STM32F4 GPIO port is shown below:

By using a struct to ascertain the relative memory offsets, we tin get the compiler to generate all the correct address accesses relative to the base accost.

          #include <stdint.h>  typedef struct {   uint32_t MODER;   // manner annals,                     offset: 0x00   uint32_t OTYPER;  // output type register,              offset: 0x04   uint32_t OSPEEDR; // output speed annals,             starting time: 0x08   uint32_t PUPDR;   // pull-upward/pull-downwards annals,        offset: 0x0C   uint32_t IDR;     // input data register,               showtime: 0x10   uint32_t ODR;     // output data annals,              offset: 0x14   uint32_t BSRR;    // fleck set/reset register,            commencement: 0x18   uint32_t LCKR;    // configuration lock register,       start: 0x1C   uint32_t AFRL;    // GPIO alternate function registers, offset: 0x20   uint32_t AFRH;    // GPIO alternate function registers, commencement: 0x24 } GPIO_t;

Now we define the pointer as before, simply this fourth dimension using the struct blazon rather than a uint32_t:

          volatile GPIO_t*   const portd       = (GPIO_t*)0x40020C00;

Finally we tin use information technology equally before, but this time utilise struct-pointer dereferencing to access the individual registers:

          int main(void) {   *rcc_ahb1enr |= (1 << 3); // enable PortD's clock    uint32_t moder = portd->MODER;   moder |= (i << sixteen);   moder &= ~(1 << 17);   portd->MODER = moder;    while (ane) {     portd->ODR |= (i << eight);  // led-on     slumber(500);     portd->ODR &= ~(one << 8); // led-off     sleep(500);   } }

At present when we access the ODR via the argument:

          portd->ODR |= (1 << viii);     // led-on

the compiler can summate the relative offset (0x14) of the ODR member relative to the base of operations address of the arrow (0x40020C00).

This means that we just need one pointer per port rather than 10, e.1000.

          volatile GPIO_t* const   porta       = (GPIO_t*)0x40020000; volatile GPIO_t* const   portb       = (GPIO_t*)0x40020400; volatile GPIO_t* const   portc       = (GPIO_t*)0x40020800; volatile GPIO_t* const   portd       = (GPIO_t*)0x40020C00; volatile GPIO_t* const   porte       = (GPIO_t*)0x40021000;

Alternatively nosotros could do the aforementioned with #defines;

          #define PORTA       ((volatile GPIO_t*) 0x40020000) #define PORTB       ((volatile GPIO_t*) 0x40020400) #ascertain PORTC       ((volatile GPIO_t*) 0x40020800) #ascertain PORTD       ((volatile GPIO_t*) 0x40020C00) #define PORTE       ((volatile GPIO_t*) 0x40021000)

Note in the #defines the leading '*' as a dereference has been dropped, and then access to the register is coded thus:

          PORTD->ODR |= (1 << eight);     // led-on

If we left the dereference in:

          #define PORTD       (*((volatile GPIO_t) 0x40020C00))

the lawmaking would be:

          PORTD.ODR |= (1 << 8);  // led-on

It'due south a matter of style, the generated instructions are the same.

Code Comparison

And so how does the struct code expression compare to our original arrow code (compiled with optimisation flag -Og):

Original code

          $ arm-none-eabi-objdump -d -Southward primary.o ...   *portd_odr |= (i << eight);       // led-on   1a:   4c0b            ldr     r4, [pc, #44]   ; (48 <main+0x48>)   1c:   6823            ldr     r3, [r4, #0]   1e:   f443 7380       orr.w   r3, r3, #256    ; 0x100   22:   6023            str     r3, [r4, #0] ...

The assembler code does the following:

Load the value 0x40020C14 into r4
Read the contents of 0x40020C14 [r4 + 0] as a 32-bit value into r3
Or 0x100 with the contents of r3 (prepare scrap 8)
Store r3 every bit a 32-bit value at address 0x40020C14

Comparing this to the struct access:

          $ arm-none-eabi-objdump -d -S main.o ...   portd->ODR |= (one << 8);       // led-on   1a:   4c0a            ldr     r4, [pc, #40]   ; (44 <primary+0x44>)   1c:   6963            ldr     r3, [r4, #20]   1e:   f443 7380       orr.w   r3, r3, #256    ; 0x100   22:   6163            str     r3, [r4, #20] ...

So how does this differ? only in the use of an offset-load:

Load the value 0x40020C00 into r4
Read the contents of 0x40020C14 [r4 + 20] as a 32-chip value into r3
Or the value 0x100 with the contents of r3
Shop r3 every bit a 32-chip value at address 0x40020C14 – [r4 + 0x14]

This lawmaking demonstrates that, from a size and operation perspective, there is no difference between the two approaches (at to the lowest degree for the Arm).

Note: An Arm load (ldr) teaching with or without a secondary offset takes 2-cycles.

Caveats

Before rush off and refactor legacy lawmaking to now employ structs there are a couple of factors we are relying on, which may vary from compiler to compiler.

First, what tin we exist sure of?

The offset of the commencement struct member is always 0x0 from the objects accost (this is not guaranteed in C++ but usually is the case).
The compiler cannot reorder the members, so OTYPER will e'er come at a higher accost in retention than MODER and at a lower than OSPEEDR.

However, we cannot guarantee that the compiler will not introduce padding between members, every bit the standard states:

In that location may exist unnamed padding within a structure object, but non at its outset.

So we cannot guarantee that address of OTYPER is equal to the accost of MODER + 4 bytes.

That said, in practical terms, with modern compilers, it is unlikely to be a problem (for this code). Padding tends to occur when a data member crosses its natural purlieus (i.e. a 32-bit type is not word aligned). due east.g.

          typedef struct {   int  a;   char b;   int  c; } Padding_t;

would likely return a result of 12 from sizeof(Padding_t); because 3 paddings bytes
are added after char b to align the int c definition.

Mitigating the risk

The obvious, and most straightforward, approach is to ensure you have a unit test that checks the size of the generated structure, e.g.

          void test_GPIO_t_struct_size(void) {     TEST_ASSERT_EQUAL(40, sizeof(GPIO_t)); }

Alternatively, one of the compelling reasons to use C11 is the introduction of static_assert[link], e.g.

          int main(void) {   static_assert(sizeof(GPIO_t) == 40, "padding in GPIO_t present"); }

This is a compile-time check; if padding was present, then the following compiler error is generated:

          src/main.c: In role 'main': src/master.c:87:3: error: static exclamation failed: "padding in GPIO_t present"    static_assert(sizeof(GPIO_t) == twoscore, "padding in GPIO_t present");    ^

If you're not using C11 (I've yet to come beyond an embedded C project using it) then a final arroyo is to try and ensure no padding is nowadays by requesting the compiler 'pack' the struct to the most optimal memory model.

This is always a compiler-specific request, which may be done through #pragmas. However GCC uses its own 'aspect' approach instead of pragmas.

Defining the structure with the aspect 'packed' will commonly remove whatever potential padding, eastward.k.

          typedef struct {   uint32_t MODER;   // mode register,                  commencement: 0x00   uint32_t OTYPER;  // output type annals,           offset: 0x04   uint32_t OSPEEDR; // output speed register,          offset: 0x08   uint32_t PUPDR;   // pull-upward/pull-down register,     start: 0x0C   uint32_t IDR;     // input data register,            commencement: 0x10   uint32_t ODR;     // output data register,           first: 0x14   uint32_t BSRR;    // bit set/reset annals,         showtime: 0x18   uint32_t LCKR;    // configuration lock annals,    start: 0x1C   uint32_t AFRL;    // alternating function registers,   offset: 0x20   uint32_t AFRH;    // alternate role registers,   beginning: 0x24 } __attribute__((packed)) GPIO_t;  typedef struct {   int  a;   char b;   int  c; } __attribute__((packed)) Padding_t;   int main(void) {   static_assert(sizeof(GPIO_t) == 40, "padding in GPIO_t present");   static_assert(sizeof(Padding_t) == 9, "padding in Padding_t present"); }

Unaligned access can cause a whole host of problems and performance issues, so be extremely careful using packing.

Vendor Supplied Headers

On nigh modern microcontrollers you are likely to find headers provided with register definitions already supplied. Many years ago Arm introduced the
Cortex Micro-controller Software Interface Standard (CMSIS). Equally part of the standard information technology is expected that between Arm and the Vendor, register definitions volition be supplied.

For instance, ST supply a series for headers for their STM32 family of microcontrollers. Searching out the ST provided file stm32f407xx.h you will notice definitions for all peripheral included in the 407 variant.

On line 544 of this header file (based on version V2.1.0) you volition detect the following definition:

          typedef struct {   __IO uint32_t MODER;    /*!< GPIO port mode register,               Address offset: 0x00      */   __IO uint32_t OTYPER;   /*!< GPIO port output blazon register,        Address offset: 0x04      */   __IO uint32_t OSPEEDR;  /*!< GPIO port output speed register,       Address showtime: 0x08      */   __IO uint32_t PUPDR;    /*!< GPIO port pull-upward/pull-down register,  Accost showtime: 0x0C      */   __IO uint32_t IDR;      /*!< GPIO port input information register,         Address commencement: 0x10      */   __IO uint32_t ODR;      /*!< GPIO port output data register,        Accost kickoff: 0x14      */   __IO uint16_t BSRRL;    /*!< GPIO port bit set/reset low annals,  Address commencement: 0x18      */   __IO uint16_t BSRRH;    /*!< GPIO port bit fix/reset high register, Address offset: 0x1A      */   __IO uint32_t LCKR;     /*!< GPIO port configuration lock register, Address offset: 0x1C      */   __IO uint32_t AFR[two];   /*!< GPIO alternating function registers,     Address start: 0x20-0x24 */ } GPIO_TypeDef;

This is a slightly different interpretation of the register layout from before, notably:

The BSRR has been split into two sixteen-bit register (BSRRL and BSRRH)
The AFR has been combined into an assortment of two elements (rather than a Loftier and Depression).

There could be a hazard of padding between BSRRL and BSRRH, just unlikely and does not occur here.

The __IO macro simply maps onto volatile. There is a macro for __I (volatile const) to ascertain 'read just' access (in that location is a __O (volatile) to indicate 'write only' access – but this can't be enforced in C).

Further down in the file (line 1130):

          #define GPIOD               ((GPIO_TypeDef *) GPIOD_BASE)

Once more, another slight difference in the lawmaking is the choice to put the volatile directive in the struct rather than at the pointer definition.

The RCC struct definition is on line 615 with the #define on line 1137.

The CMSIS code to drive the LED is:

          #include "stm32f407xx.h" #include "timer.h"  int main(void) {   RCC->AHB1ENR = (1 << 3);    uint32_t moder = GPIOD->MODER;   moder |= (1 << 16);   moder &= ~(one << 17);   GPIOD->MODER = moder;    while (ane) {     GPIOD->ODR |= (1 << viii);  // led-on     sleep(500);     GPIOD->ODR &= ~(1 << 8); // led-off     sleep(500);   } }

In summary

Programs are decomposed into modules in several ways of which one is chosen during the design process (assuming design happens!). The choice of decomposition has a critical effect on the architecturel and thus the production's quality attributes such as maintainability, reliability, modifiability, and testability of the final system.

Cohesion is ane of the most of import concepts in software decomposition. High cohesion is central to adept design principles and patterns, guiding separation of concerns and maintainability.

Using a struct-based model for device access improves cohesion through practiced brainchild models, making lawmaking easier to understand and maintain.

In the side by side article I shall starting time to compare the relative claim and consequences of using the #define model verse the pointer model.

Almost
Latest Posts

Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over xxx years. He has worked in different sectors, including aerospace, telecomms, authorities and cyberbanking.
His electric current interest lie in IoT Security and Agile for Embedded Systems.

Niall Cooling

Co-Founder and Director of Feabhas since 1995.
Niall has been designing and programming embedded systems for over 30 years. He has worked in different sectors, including aerospace, telecomms, authorities and banking.
His current interest prevarication in IoT Security and Agile for Embedded Systems.