OTA update does not work

tsvetan.filevtsvetan.filev Sofia, Bulgaria
edited December 2016 in Mongoose Networking Library

Hi.

I tested OTA with c_hello example and got Wrong platform response.

Could you give it a try ?

Comments

  • Hi Tsvetan,

    What platform do you use?

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria

    It happens on cc3200. It seems ESP works fine.

  • Confirmed. We'll write here once we fix it; thanks for the reporting.

  • Fixed in the master branch of https://github.com/cesanta/mongoose-iot , please try it.

    Thanks again for reporting.

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria
    edited December 2016

    Hi.

    OTA now works fine,but I noticed two things.

    1) I get this message:

    make[1]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
    make[1]: Leaving directory '/opt/mbedtls/library'
    

    during compilation, but I can compile successfully.
    What does it mean ?

    2) When I try to upload a FW for ESP for e.g. I get a "Wrong platform" response which is OK.
    But then when I try to upload a correct FW for cc3200 I get this message again and then the device reboots. Could you have a look as well ?

    Regards.

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria

    Hi again.

    OTA works fine consider 2) as invalid.

    Thanks.

  • rojerrojer Dublin, Ireland
    edited January 13

    (1) is benign, but i fixed it anyway (6aa74)

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria
    edited January 13

    Hi.

    With the latest mongoose I get an error when trying to perform OTA several times.
    The first time it works fine but the second time I get "Failed to create file" as a HTTP result and the console log is:

    ...
    
    context_save_unprocessed Added 0 bytes to cached data
    handle_update_post   updater_process res: 0
    handle_update_post   MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 2048
    context_update       Added 2048, size: 2048
    updater_process_int  Skipping 2048 bytes, 191640 total
    context_remove_data  Consumed 2048, 0 left
    context_save_unprocessed Added 0 bytes to cached data
    handle_update_post   updater_process res: 0
    handle_update_post   MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 2048
    context_update       Added 2048, size: 2048
    updater_process_int  Skipping 2048 bytes, 193688 total
    context_remove_data  Consumed 2048, 0 left
    context_save_unprocessed Added 0 bytes to cached data
    handle_update_post   updater_process res: 0
    handle_update_post   MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 2048
    context_update       Added 2048, size: 2048
    updater_process_int  Skipping 172 bytes, 193860 total
    context_remove_data  Consumed 172, 1876 left
    updater_set_status   Update status 5 -> 6
    updater_process_int  Has descriptor : 0
    updater_set_status   Update status 6 -> 3
    context_save_unprocessed Added 1876 bytes to cached data
    parse_zip_file_header Filename len = 26 bytes, extras len = 0 bytes
    parse_zip_file_header Compression method=0
    parse_zip_file_header File name: dinrelay-2017011309/fs.img
    parse_zip_file_header File name to use: fs.img
    parse_zip_file_header File size: 98368
    parse_zip_file_header General flag=0
    parse_zip_file_header CRC32: 0x0d118abc
    context_remove_data  Consumed 56, 1820 left
    prepare_to_write     Storing fs.img 98368 -> spiffs.img.1.0 98368 (97dba5afa6a1bce2f1925e5fee9e5bcd5d1376c5)
    mgos_upd_file_begin  err = -64
    updater_set_status   Update status 3 -> 9
    updater_finish       Update finished: -1 Failed to create file, mem free 28504
    context_remove_data  Consumed 1820, 0 left
    handle_update_post   updater_process res: -1
    handle_update_post   MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 2048
    handle_update_post   MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 2048
    ...
    
    handle_update_post   MG_EV_HTTP_MULTIPART_REQUEST_END: 20036928 0
    

    Do you have any idea what is wrong ?

    With the web config example I don't have this issue.

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria
    edited January 13

    P.S. I tested with another project of ours and OTA works fine. So there must be something with the size of the FW.

    The working FW zip contents:

    -rw-r--r-- 1 ceco users    781 Jan 13 13:14 cert.der
    -rw-r--r-- 1 ceco users  98368 Jan 13 13:14 fs.img
    -rw------- 1 ceco users   2105 Jan 13 13:14 manifest.json
    -rw-r--r-- 1 ceco users  10868 Jan 13 13:14 mg-boot.bin
    -rw-r--r-- 1 ceco users    512 Jan 13 13:14 mg-boot.cfg.0
    -rw-r--r-- 1 ceco users    512 Jan 13 13:14 mg-boot.cfg.1
    -rw-r--r-- 1 ceco users  28236 Jan 13 13:14 ota_1.0.1.6-2.7.0.0.ucf
    -rw-r--r-- 1 ceco users    256 Jan 13 13:14 ota_1.0.1.6-2.7.0.0.ucf.sign
    -rw-r--r-- 1 ceco users 166753 Jan 13 13:14 smart-sensor.bin
    

    The problematic FW zip contents:

    -rw-r--r-- 1 ceco users    781 Jan 13 13:21 cert.der
    -rw-r--r-- 1 ceco users 193860 Jan 13 13:21 dinrelay.bin
    -rw-r--r-- 1 ceco users  98368 Jan 13 13:21 fs.img
    -rw-r--r-- 1 ceco users    846 Jan 13 13:21 hwinfo_struct_defaults.json
    -rw------- 1 ceco users   2284 Jan 13 13:21 manifest.json
    -rw-r--r-- 1 ceco users  10868 Jan 13 13:21 mg-boot.bin
    -rw-r--r-- 1 ceco users    512 Jan 13 13:21 mg-boot.cfg.0
    -rw-r--r-- 1 ceco users    512 Jan 13 13:21 mg-boot.cfg.1
    -rw-r--r-- 1 ceco users  28236 Jan 13 13:21 ota_1.0.1.6-2.7.0.0.ucf
    -rw-r--r-- 1 ceco users    256 Jan 13 13:21 ota_1.0.1.6-2.7.0.0.ucf.sign
    
  • rojerrojer Dublin, Ireland

    hm. no, the size should not be an issue. can you show the beginning of the log for the failing update? in the fragment you posted it already says "skipping", so it decided to skip the file, but before that it should say why.

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria

    Here is the beginning. It happens on the second try. The first try is OK.


    mongoose_ev_handler 20036760 HTTP connection from 192.168.33.2:41612 updater_context_create Starting update (timeout 300) handle_update_post MG_EV_HTTP_PART_BEGIN: 20036928 upgrade_file dinrelay-cc3200-last.zip handle_update_post MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 332 context_update Added 332, size: 332 updater_set_status Update status 0 -> 1 parse_zip_file_header Filename len = 33 bytes, extras len = 0 bytes parse_zip_file_header Compression method=0 parse_zip_file_header File name: dinrelay-2017011607/manifest.json parse_zip_file_header File name to use: manifest.json parse_zip_file_header File size: 2284 parse_zip_file_header General flag=0 parse_zip_file_header CRC32: 0xc085f3a4 context_remove_data Consumed 63, 269 left updater_set_status Update status 1 -> 2 context_save_unprocessed Added 269 bytes to cached data handle_update_post updater_process res: 0 handle_update_post MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 2048 context_update Added 2048, size: 2317 parse_manifest FW: dinrelay cc3200 2017011607 20170116-071749/181-bump-to-mongoose-6954955@d430502b+ -> 2017011607 20170116-071749/181-bump-to-mongoose-6954955@d430502b+ context_remove_data Consumed 2284, 33 left updater_set_status Update status 2 -> 3 parse_zip_file_header Filename len = 28 bytes, extras len = 0 bytes handle_update_post updater_process res: 0 mongoose_poll New heap free LWM: 28528 mongoose_poll New heap free LWM: 27496 handle_update_post MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 2048 context_update Added 2048, size: 2081 parse_zip_file_header Filename len = 28 bytes, extras len = 0 bytes parse_zip_file_header Compression method=0 parse_zip_file_header File name: dinrelay-2017011607/cert.der parse_zip_file_header File name to use: cert.der parse_zip_file_header File size: 781 parse_zip_file_header General flag=0 parse_zip_file_header CRC32: 0xf25200ef context_remove_data Consumed 58, 2023 left updater_set_status Update status 3 -> 5 updater_process_int Skipping 781 bytes, 781 total context_remove_data Consumed 781, 1242 left updater_set_status Update status 5 -> 6 updater_process_int Has descriptor : 0 updater_set_status Update status 6 -> 3 parse_zip_file_header Filename len = 32 bytes, extras len = 0 bytes parse_zip_file_header Compression method=0 parse_zip_file_header File name: dinrelay-2017011607/dinrelay.bin parse_zip_file_header File name to use: dinrelay.bin parse_zip_file_header File size: 193860 parse_zip_file_header General flag=0 parse_zip_file_header CRC32: 0x8fc14c48 context_remove_data Consumed 62, 1180 left prepare_to_write Digest matched for dinrelay.bin.1 193860 (c2551427b039c06c9a9483c7c1d1a3adb9cf531f) updater_set_status Update status 3 -> 5 updater_process_int Skipping 1180 bytes, 1180 total context_remove_data Consumed 1180, 0 left context_save_unprocessed Added 0 bytes to cached data handle_update_post updater_process res: 0 mongoose_poll New heap free LWM: 27392 handle_update_post MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 2048 context_update Added 2048, size: 2048 updater_process_int Skipping 2048 bytes, 3228 total context_remove_data Consumed 2048, 0 left context_save_unprocessed Added 0 bytes to cached data handle_update_post updater_process res: 0 handle_update_post MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 2048 context_update Added 2048, size: 2048 updater_process_int Skipping 2048 bytes, 5276 total context_remove_data Consumed 2048, 0 left context_save_unprocessed Added 0 bytes to cached data handle_update_post updater_process res: 0 handle_update_post MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 2048 context_update Added 2048, size: 2048 updater_process_int Skipping 2048 bytes, 7324 total context_remove_data Consumed 2048, 0 left context_save_unprocessed Added 0 bytes to cached data handle_update_post updater_process res: 0 handle_update_post MG_EV_HTTP_PART_DATA: 20036928 upgrade_file dinrelay-cc3200-last.zip 2048 context_update Added 2048, size: 2048 updater_process_int Skipping 2048 bytes, 9372 total context_remove_data Consumed 2048, 0 left ...
  • rojerrojer Dublin, Ireland

    sorry for the delay, Tsvetan.

    ok, so "skipping" here is fine - file checksum matched what's already there, so we are not actually writing it. looks like code hasn't changed. that's fine.

    the real problem here is failure to create filesystem container file:

    prepare_to_write Storing fs.img 98368 -> spiffs.img.1.0 98368 (97dba5afa6a1bce2f1925e5fee9e5bcd5d1376c5)
    mgos_upd_file_begin err = -64
    updater_set_status Update status 3 -> 9
    updater_finish Update finished: -1 Failed to create file

    and -64 is the dreaded "SL_FS_FILE_HAS_NOT_BEEN_CLOSE_CORRECTLY" (damn, i hate it so much).

    there are two questions here: how did we get into a state where container was not closed properly and second, why we were not able to overwrite it anyway, which we should.

    i thought that opening a file for truncate, which is what we're doing here, would ignore this. but looks like i was wrong.

    can you please post full console log, from the first flashing till this failure, including all reboots?

  • rojerrojer Dublin, Ireland

    tried update many times and couldn't repro this. definitely need full logs.

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria

    @rojer

    I tested with d1e48ad commit and OTA works (it seems this has been fixed).
    Thanks for your support.

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria

    @rojer

    Hi again.
    I got the same issue.
    Here is a full log of initial boot, first successful and second not successful OTA:
    https://gist.github.com/anonymous/a7b7074ac42c43c6e10146334dac348e

  • rojerrojer Dublin, Ireland
    edited January 27

    ok, so the issue here is that second update goes to the same slot that is currently running. current update was not committed or reverted. this explains why we cannot write to the fs container - it's the one currently mounted!

    we should disallow further updates in this state, i'll do that, but that is a separate issue. the real bug here is that update is not finalized (commit/revert).
    why it happens is a mystery.
    the decision to commit is based on a flag - if it's the first boot after update and init passed successfully, then we commit. if init fails, we revert.
    i can see clearly here that first boot flag is on:

    2017-01-27 15:38:50,045 | cc3200_init          Boot cfg 1: 0xfffffffffffffffd, 0x3, dinrelay.bin.1 @ 0x20000000, spiffs.img.1
    

    0x3 ithe value of the flags word, bit 0 is the BOOT_F_FIRST_BOOT flag. also, i see filesystem merge happening, so this check was positive.
    yet, by the time we get to here, the flag is off, because mgos_upd_boot_finish is invoked with 0 as the second argument:

    2017-01-27 15:38:54,176 | mgos_upd_boot_finish 1 0
    

    how that is possible, i don't know. we never touch the flag value field. perhaps something stomps the g_boot_cfg global variable?
    can you log the contents of g_boot_cfg just before mgos_boot_finish? basically, copy this statement.

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria

    @rojer

    I just tested with a dev board and there is no problem. It seems this is a HW issue as I also had problems saving the config file.
    I will let you know if I get to this issue again.

  • rojerrojer Dublin, Ireland

    this looks like memory corruption to me: it seems that g_boot_cfg.flags and possibly other stuff too was stomped during initialization. could be hardware too, of course, but more like intermittent corruption bug of some sort.

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria

    Yes I got it again.
    Here is first OTA:

    2017-01-27 17:23:00,148 | main_task            Boot cfg before mgos_upd_boot_finish 0: 0x20025891fffffffe, 0x0,  @ 0x20000000, spiffs.img.0
    

    Second OTA:

    2017-01-27 17:24:26,109 | main_task            Boot cfg before mgos_upd_boot_finish 1: 0x20025891fffffffd, 0x0,  @ 0x20000000, spiffs.img.1
    
  • rojerrojer Dublin, Ireland

    yeah, it's definitely stomped all over. that id there - looks like an address was written to its second part (0x20025891). presumably flags field was destroyed in the same incident as well. it happens during init, so it's pretty isolated. i suggest you monitor this location by inserting printfs at different stages and narrow down the code that stomps it.

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria
    edited February 1

    @rojer

    I identified that the problem is in this call:

    mgos_gpio_set_int_handler(59, MGOS_GPIO_INT_EDGE_NEG, _interrupt_handler, NULL);
    

    The reason is that this function does this:

      struct mgos_gpio_state *s = (struct mgos_gpio_state *) &s_state[pin];
      s->cb = cb;
    

    but static struct mgos_gpio_state s_state[MGOS_NUM_GPIO] has only 32 elements for cc3200.

    It seems that you changed something in the way pins work and interrupts are set as this code used to work in past versions of mongoose.

    Can you advise how to refactor it ?

  • rojerrojer Dublin, Ireland

    oh yes, you are correct. thank you for tracking it down! oops. no need to do anything for you, fix is on the way.

  • rojerrojer Dublin, Ireland

    fixed by 595233 and 1c1b17.
    sorry about it, my bad.

  • tsvetan.filevtsvetan.filev Sofia, Bulgaria

    I gave it a try and the code now works (including OTA).
    Thanks.

Sign In or Register to comment.