cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Your inputs wanted on device health monitoring

NeerajK
JumpCloud Employee
JumpCloud Employee

Hi there!

We're exploring the development of a device health monitoring and alerting solution. This tool would help you keep track of your device's performance, posture, potential issues, and provide alerts when attention is needed.

I are super interested in hearing your insights and suggestions:

  1. What are your biggest pain points or challenges when it comes to monitoring your device's health?
  2. What features or capabilities would you find most useful in a monitoring and alerting tool?
  3. How and where would you prefer to receive alerts or notifications (e.g., email, messaging, ticketing solution)?
  4. What are the top metrics or components you'd want this tool to monitor (e.g., CPU, disk space, battery health, allowed/denied apps, processes, event logs etc)?
  5. Would you find value in auto-remediation capabilities? What types of issues would you feel comfortable letting the tool tackle automatically?

While we're picking your brains, we're also curious to hear your thoughts on receiving alerts beyond just device health monitoring. Would you be interested in getting notified about things like:

  • Potential security risks or policy violations related to identity and access?
  • Successful and failed login attempts across systems, SSO, web apps?

Would love to know if that unified approach resonates with you or if you'd prefer these capabilities to remain separate.

Appreciate any feedback/thoughts/comments you might have!

 

7 REPLIES 7

bwitzig_Zen
Novitiate III

- When diagnosing device health it can be hard to determine the root cause for slowness as there can be a lot of factors

- Cpu usage percentage is tricky as CPU usage can feel "slow" even with seemingly low amounts of cpu usage (8% for example for a 16 thread cpu). As well, on low core count machines, windows will use up "idle" available frequency to apply updates, repair/update indexes. This also combined with hybrid architectures also becoming more common make this even trickier to diagnose. 

- For laptops, CPU usage % can also be tricky as battery power saving/cpu throttling can cause inconsistent performance. Temperature can also be a key factor, but it's hard to measure as "turbo boost" is intended to provide bursty loads which cause the temperature to rise quickly

- Memory usage (active/compressed) will hit roughly 80% then the OS will migrate over to "Swap" whereas a lot of people will assume there is still a lot of free memory. Apple offers memory pressure as a decent way to identify how burdened the system is

- Network performance monitoring is also tricky, should be use ICMP to major services (o365? Google? etc). Do we have a way to check for bufferbloat? If so, how do we get this to work without impacting the client's actual network performance / data usage (do we divide bufferbloat/speed tests to an "on demand" flow?)

- For admins, their preferred chat channel, Jumpcloud admin portal, with the option of client notifications with tips on what to do could be useful for notifications. Major or more long term issues should be raised as a ticket (ongoing high memory/cpu usage etc.)

- Disk usage is important, but also notifications for S.M.A.R.T. alerts would be amazing as this isn't usually easily exposed to the client, or done in a confusing way, as well being over 80% usage on an SSD can impair life of the drive.

- Setting up conditional access to have a "warning" section could also be useful, admin's usually don't have the time to comb through logs, so anything to help pick out logs that actually need attention is useful.

- I would suggest integrating some diagnostic options for clients in the Jumpcloud app, currently it allows you to reset/sync your password but it could do so much more. 

- "service" checks to allow us to make sure specific software is running (AV/Endpoint tools etc.)

- List of applications detected on the machine with the ability to (attempt to) remove remotely

- Ability to identify device age to indicate when assets "age out"

This combined with both per-machine and more global analytics would help us in making better hardware choices. 

Some great ideas there! Thanks

Joranna_Ng
Rising Star I
Rising Star I

In addition to @NeerajK questions, it would be great to hear if any of you are currently using a device health monitoring and alerting solution. 
1. What are you using the solution for?
2. What's good and not so good about it?

Thanks!
Joranna

gus_feliciano
Novitiate I

- Metrics to understand memory and CPU consumption on MacOs and Windows devices, for example, being able to evaluate which process or service is demanding more from the machine.

- I think it's interesting to have a dashboard to measure consumption, where the admin can filter by OS and hostnames, it could have the average consumption of each machine to generate a report and identify possible performance problems.

-Disk full alerts would be really cool.

cyrus
Novitiate II
  • Notifications via email or (even better) Slack. The email option would be good for automated ticket creation so you don't have to support 100 different systems. Slack would be great for servers and such where you need to be notified right away.
  • Having conditions trigger events would be great. Things like "run this script if the disk starts filling up" would be invaluable.
  • Smart group for devices that meet conditions. For example, having a dynamic group of computers with less than 20% free disk would be cool. Or a group of computers that are not properly patched.
  • I'd like to see a condition where it's pinging a host and if it sees a lot of packet loss or a spike in response time it triggers a notification.
  • You mentioned battery health. Would be nice to see that on the highlights page of the device.
  • I'd love to have a dynamic group of computers that no longer have Applecare (or Applecare Enterprise) active. 

These ideas were a bit scattered. I hope they're helpful.

rjordan
Rising Star I

I'm with all of the above.  Not sure how i missed this one.  

We've been having a few machines that are running slower and would have been nice to have some device monitoring and or alerts for performance/cpu/etc or atleast an option in background tools similar to how LMI central had.

I know JC can always knock it out of the park!

Also would love to have an option for alerts if a device is offline for x amount of time.  This can help in certain cases for us and my clients

NeerajK
JumpCloud Employee
JumpCloud Employee

@rjordan device offline alert has been added to the Early Access. See this post for details:

https://community.jumpcloud.com/t5/jumpcloud-product-news/ea-launch-of-device-monitoring-amp-alertin...

Alerts for performance - slowness, cpu, memory etc is definitely high on the v2 roadmap.