MeshTastic – Issues I’ve Found

So, this is a list mostly for my reference but for our local mesh too. Some of these I’ve raised, some have been raised, some have been raised, fixed and regressed, some I’ve been told is me being stupid and in one case “Meshtastic is not suitable for battery or solar use”

I’ve devided these up into showstoppers and described each issue as best I can. I’ll probobly submit these as I tidy up my findings. I’ll also divide thes into “irritants” and “showstoppers”. These are just my opinions though.

WHAT: ADC calibration in Meshtastic Web is just broken. Only allows whole numbers.
HOW: Using the web UI try to change ADC Multiplier Override ratio. If you add a decimal point it is deleted meaning only whole numbers are allowed.
IMPACT: It is impossible to calibrate the ADC from the web UI. This is a very important step.
COMMENT: How has this been missed? Reports of this bug seem to go back many months. Misconfiguration of the ADC multiplier can result in flash corruption. This is basic UI stuff!

WHAT: ADC calibration in Android App. UI “Fights” user input
HOW: In the android app go to “Radio Configuration” and “Power”. Under “ADC Multiplier Override ratio”, delete the contents and attempt to enter a floating point number EG 5.75. The UI will force 5.0 when 5 is pressed, the next digit will be igmored and a 0 added giving a ratio of 50.0.
IMPACT: Irritating, makes the job harder and as no range checking is done (but the braindead input checking is) it will result in an ADC value being set that breaks the battery monitoring.
COMMENT: Again, how has this been missed. I cant find any reports on this issue with a quick search but has been present in all versions of the app I’ve used.

WHAT: App/Web/CLI features are inconsistant
HOW: Feature parity between management models is onconsistant. Activeley manageing a node requires two or more methods of access.
IMPACT: Irritating. It makes managing a mesh tiresome and troubleshooting a node in the feild can require multiple devices or accessing the web UI as well as using the AP.
COMMENT: Many features are inconsistant across all four major platforms while developers continue to add “shiney” stuff. Consistancy is important if this is to be considered a viable platform for the use cases it claims to be for.

WHAT: Bluetooth is disabled while WiFi is in use
HOW: Enable WiFi connectivity and attempt to access node via Bluetooth. No connection will be established.
IMPACT: Irritant. This may be a hardware limitation BUT it doesnt seem to have a mention anywhere and is a frequent query. Its related to the following issue too.
COMMENT: This may be a documentation thing if it is a hardware limitation. WiFi and Bluetooth are known to interfere with eachother so it’s not surprising but maybe a mention is needed.

WHAT: WiFi connectivity issues require USB/Serial connection to fix and slows node
HOW: Configure Wifi with wrong credentials. Node becomes unavailable and remains that way. The UI on the screen is notably slowed. I’ve not been able to verify is the overall system is slowed. You’ll need a Serial connection or to connect via USB to fix.
IMPACT: Major Irritant. As the UI has no scan facility it’s easy to get the connection details wrong. This results in a node that will need reconfiguring via a physical connection. This is especially a pain if the node is installed somewhere and has marginal signal.
COMMENT:
A solution may be ‘n’ number of attempts then switch to Bluetooth. A scan type interface would lessing the chance of misconfiguration along with the ability to provide an alternative network or even go into AP mode.

WHAT: No security, not even basic username/pass anywhere
HOW: No authentication is required to access a node
IMPACT: Irritant/Critical security issue. A stolen or lost node with admin channels configured can be used to hijack other nodes on that admin network.
COMMENT: This is basic good practice, Come on! The impact of this has been downplayed when mentioned however it could allow a whole “fleet” of on persons or a communal admin group of nodes to be hijacked. In this current climate, and again, the idea these are useable as an off-grid, in case of emergency platform this is unacceptable

WHAT: Low battery causes infinate sleep. Will not wake when power returns.
HOW: Allow battery to run below discharge level of about 2.7V. Device will enter deep sleep and stay sleeping. The code defaults to a sleep of 36 years before it wakes!?
IMPACT: Critical. Solar based nodes or nodes with intermitent power will not wake when power returns. Node will become “comatose” until the power source is no longer able to sustain the uC and a complete shutdown occurs. This could take weeks or even months depending on the power source and if power is restored the device may charge while not waking making this a completely unresolvable situation. Node will need to be reset or the user button pressed. We have solved this with an external MCU.
COMMENT: Who decided sleeping forever was a good idea? This settin will result in a solar/wind/intermittant power sourced node “bricking” itself. This is a bad, bad default to hide away. Documentation on the implications on this setting is unclear as to what it does and how it is used. This can, and has resulted in remote nodes becomming dead and needing manual intervention.

WHAT: ADC/Battery calibration is/can be critical. Docs bareley touch on this
HOW: If ADC calibration is not done, battery may be exhausted before node shuts down. This causes a brownout condition (see below) and poses the risk of a serious, destructive battery failure. Using a variable power supply in place of a battery reduce the voltage slowly to below the point the node runs.
IMPACT: Critical. Potential safetey risk. Some nodes may shutdown at the prescribed cutoff points, other will lock up/fail/enter an unknown state. Converseley node may enter sleep before battery is exhausted and go comatose, see above. There is a risk of a battery pack entering deep discharge which can result in catastrophic failure during later charging.
COMMENT: This should not be a simple “you can do this”. If you plan on running on battery you *MUST* do this and verify it’s either correct or the unit shuts down early. An early sleep is irritatating but not a safetey issue which running till the battery is dead, is. Use of a DW04 base battery saftey board is a must especially with LiPo cells. This is something that devs and integrators (users) need to be aware of.

WHAT: No aparent watchdog implementation
HOW: If node hard locks, it stays locked. Glitch the power rails a few times and node hard locks. Corrupted flash can also cause a hard lock
IMPACT: Irritation, although the ESP watchdog *SHOULD* catch this condition. It doesnt always seem to. It may be the case the board starts “watchdogging” in a loop.
COMMENT: Mechanisms for dealing with this are present in hardware. Firmware *can* catch a watchdog reset and deal with it.

WHAT: No aparent use of brownout detection
HOW: If the sleep/power monitor function is not working correctly this *will* happen. Otherwise disable power monitoring and reduce power rail voltage until the unit either stops responding or starts a reboot loop.
IMPACT: Critical. The node can be rendered partially or completly inoperable even when power is restored. Most common result is loss of node name, corrupt node db giving symptoms similar to failed front end, unable to configure power module and other issues up to an unresettable node throwing ESP exceptions.
COMMENT: Basic embedded design. The ESP chips provide a brownout detection mechanism but this does not seem to be used. Flash read/write should not be happening during a brownowt condition and it’s the job of the brownout detector to stop this from happening. Relying on ADC sensing alone is not good enough.


WHAT: No aparent detection of corrupt flash
HOW: See above (Brownout detection)
IMPACT: Critical. Node can be misconfigured, left in an indeterminate state potentially with critical settings (ADC ratio, band, TX settings) incorrect. Nodes can sometimes behave in unpredictable states. The ONLY fix is a reflash with erase.
COMMENT: Input validation from EEPROM/Flash/SPIFFS/ETC needs to happen either with feilds or configuration blocks being checksummed or CRC’ed. Graceful fail to safe defaults and trash the node DB if corruption happens.

WHAT:
HOW:
IMPACT: –
COMMENT:

Leave a Reply

Your e-mail address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.