Often we come across bugs reported by customers which are not reproducible at our end but it is reproducible in the customer’s environment. Once you’ve made a number of attempts to reproduce the issue and eliminated the most obvious cases, the cause is something less obvious. It could be the customer using a different configuration, it could be device-specific, it could be related to the operating system, a compatibility issue with the customer’s settings, the customer using the feature wrong way….there can be many potential causes. Most of the time they are real bugs that we are unable to find at our end due to several reasons but these bugs are very critical for customers. Now the question is how do we troubleshoot this? Below are few points to keep in mind that help in fixing these types of bugs:
- Gather the story around the issue and not just the steps: First, it is very important to understand how the customer is using our product and the environment they are using. So request for all the information required along with screenshots/video, logs, bug report, configuration, OS/environment they have used, and everything that you think is required.
- Verify the information gathered and try to reproduce: Now that you have all the required details with you, verify if the user is doing everything right or if there are any settings or configuration that is causing the issue. If that is not the case then try to get your environment as close to theirs as possible and try alternate approaches to reproduce the issue.
- Logs and screenshots will always help: In logs, you need to understand exactly what was going on at that time and also check if there are any errors. Observe the screenshots keenly because you might discover that “X piece has loaded correctly, but it shouldn’t have because it is dependent on Y” and that might give you a hint. Bug reports and available logs should be investigated and various possibilities should be identified on the basis of design and code.
- Conduct a detailed code review on the suspected faulty code: Go through the code with the aim of fixing any theoretical bugs, and adding code to monitor and log any future faults. This code walkthrough will help in getting ideas on what could have gone wrong.
- Extensive Logging: As you review the code, add any possible fixes and relevant logs for further investigation based on the type of the issue. Now give this build having additional logs to the customer and request for logs after reproducing the issue.
Mostly with these steps you should be able to find the root cause and fix the issue. If not try to get the access to customer’s device so that you can experiment and play around more freely.