2018 User Feedback and GOBii Responses to Comments

Created by Star Yanxin Gao, last modified by Liz on Nov 01, 2018

Similar to 2017, a comprehensive survey was sent to CG users in August 2018 to gauge users' feedback and assessment of the GOBii project from 08/2017 to 08/2018. The 33 users surveyed represented PIs, steering teams, curators, MID breeders, developers, system administrators, and IT managers at CIMMYT, ICRISAT, and IRRI. The survey included eight categories as below, averaging six questions in each category as below:

Core GDM/GOBii functionalities

Deployment and System Administration Support

Requirement gathering process

Data loading

Data extraction

Communication and engagement

GOBii-funded tool development

Overall User Satisfaction

A simple 1 to 5 scale was used, with 1 = lowest to 5= highest satisfaction score.

Summary

We received 60% response rate, 20 out of 33 responses, with 7 from CIMMYT, 10 from ICRISAT, and 3 from IRRI and 45% representing application team or breeders, 35% developers, IT managers, or system admins, and 30% PIs or steering teams.

Similar to 2017, users' overall satisfaction to GOBii is very high for overall performance, release deployment, Dev, App, PI/Steering team engagement.

More specifically, GOBii communication and team engagement won the highest ratings across while the weak area was in GOBii core system functionalities. Specifically, some users concerned that GOBii do not have data delete or update functionalities in place or lack systematic software approaches to check validate data loaded and extracted .

2018 User Feedback and GOBii Responses to Comments

Table 1. 2018 CG Users' feedback and acceptance and satisfaction ratings: 1= lowest and 5 = highest (September 2018)

Feedback Category/Topics	CIMMYT	ICRISAT	IRRI	Average
Core GDM/GOBii functionalities	2.45	3.56	3.38	3.14
2.1 Data loading	2.57	3.67	3.67	3.26
2.2 Data extracting	2.57	3.88	4.00	3.39
2.3 Data updates	1.67	3.56	3.00	2.83
2.4 Data deletion	1.33	2.75	3.00	2.29
2.5 Data QC pone loading and extraction	2.50	3.44	3.33	3.11
2.6 User authentication	3.67	4.33	3.33	3.94
2.7 Data access control	2.83	3.22	3.33	3.11
Deployment and Sys Admin Support	3.31	3.81	3.93	3.66
3.1 Release and deployment process	3.00	4.00	3.50	3.53
3.2 Pre-release QC of new features	2.83	3.71	4.00	3.40
3.3 Sys admin support for deployment	3.75	4.00	4.00	3.92
3.4 Sys admin support for maintenance	4.00	3.71	4.00	3.85
3.5 Sys admin engagement	4.00	4.14	4.50	4.15
3.6 System documentation	3.00	3.50	4.00	3.42
3.7 Ease of system maintenance	3.00	3.50	3.50	3.33
Requirement process	3.21	3.71	3.80	3.57
4.1 Requirements gathering and clarification	3.14	3.78	3.67	3.53
4.2 Requirement prioritization	2.71	4.00	3.67	3.47
4.3 Requirement (GR) specification (clarity)	3.50	3.89	4.00	3.81
4.4 Tracking and signing off	3.60	3.33	4.00	3.53
4.5 Ease of submitting requirements (1=difficult, 5=easy)	3.40	3.56	3.67	3.53
Data loading	3.42	3.54	4.10	3.59
5.1 Handling different types of data	3.50	3.43	4.50	3.62
5.2 Issue reporting	3.50	3.43	4.50	3.62
5.3 Data mapping validation	3.67	3.43	4.00	3.58
5.4 Loading large datasets	3.25	3.29	4.00	3.38
5.5 Loading error logs and email notification	3.25	4.14	3.50	3.77
Data extraction	3.13	3.62	3.78	3.50
6.1 Extract features	3.20	3.44	4.00	3.47
6.2 Data extract process	3.40	3.33	4.00	3.47
6.3 Job status and extract output	3.20	3.75	3.33	3.50
6.4 Data integrity	2.60	3.56	3.67	3.29
6.5 Extraction issue reporting	3.20	3.89	4.33	3.76
6.6 Extraction error logs and email notification	3.20	3.78	3.33	3.53
Communication effectiveness	4.17	4.39	4.00	4.25
7.1. Online meetings	4.20	4.22	4.33	4.24
7.2 Effectiveness to engage and communicate	4.17	4.56	3.00	4.17
7.3 Face to face visits	4.50	4.11	4.00	4.22
7.4 Workshops and training	3.67	4.67	4.33	4.28
7.5 Webinars	4.33	4.38	4.33	4.35
GOBii-funded tool development	3.59	3.74	4.13	3.77
8.1 QC-KDC functionalities	3.40	3.56	3.67	3.53
8.2 GOBii-QC Integration	2.80	3.63	3.67	3.38
8.3 F1 verification functionalities	4.00	3.89	4.00	3.93
8.4 Line verification functionalities	4.00	3.89	4.33	4.00
8.5 MABC functionalities	4.25	3.78	4.33	4.00
8.6 GOBii-Flapjack Integration	3.00	3.78	4.33	3.61
8.7 GS-Galaxy functionalities	4.00	3.75	4.33	3.93
8.8 GOBii-GS-Galaxy Integration	4.00	3.63	4.33	3.87
Overall User Satisfaction	3.63	4.17	4.33	4.05
9.1 Scope and clarity of road map	3.00	3.89	4.33	3.71
9.2 App teams management	4.25	3.89	4.33	4.06
9.3 Development team management	4.00	4.22	4.67	4.24
9.4 Engagement with steering teams, PIs, and SABs	3.50	4.22	4.33	4.06
9.5 Deployment and release	3.50	4.44	4.00	4.13
9.6 Overall satisfaction	3.60	4.33	4.33	4.12
Grand Total	3.31	3.81	3.92	3.67

Figure 1. shows the specific questions asked in the survey within each category, with the dotted yellow line showing average rating responses, and yellow highlight showing 95% confidence interval of the average. Responses are split according to institute where the survey was carried out.

Table 2. Comments and GOBii responses to Users' comments

Category	Respondent ID	CG User Comments	GOBii Team Response	Action items prioritized and due date	Responsible
Core GDM/GOBii functionalities	5	Data Integrity This year we encountered a few issues related to the display of extracted data, now even while loading data with more than 10K markers data in the version 1.4. Essential features for curators like delete/modify/hide datasets/information are not ready yet.	Focus on data integrity is our highest priority from now on. Our system is fast but complex and we now need to focus on data integrity and fully pressure test the system with high volume datasets. We are bringing in a consultant to review and overhaul our QA/QC processes and take this issues very seriously.	Review QA/QC procedures with consultant Q3 2018 Implement recommended procedures – Q4 2018 through Q2 2019 Design large volume, complex datasets and regression test the system with these for each release Q4 2018 Review schema and how that impacts data integrity Q4 2018 Design automated data checking tool capable of checking inputs versus outputs Q4 2018 Improve data validation Q11 2019 Automate end-to-end regression tests Q2 2019 Vcf data checks 2019	Liz/Josh/Deb
	6	Need CRUD For curation tool (GOBii GDM) to be effective it should have the basic functionalities: load, retrieve, update and delete information.	We will review how to implement full delete and update functionality by end of this year and roll out during 2019.		Yaw, Kevin
	7	Core minimal: Create, Read, Update, and Delete (CRUD) As GOBII cannot do these 4 functions reliably, it is in effect not a production ready system.	We will focus on the CRUD as the core system in the next year.	Design CRUD implementation including schema review Q4 2018	Yaw
	7	It must be able to load large volumes of data quickly	We are loading a large volume of data much faster than other open-source solutions and much faster than what we have worked with in the industry where we worked
	7	do at least minimal QC
	7	and be able to deliver the data for analysis
	13	Data integrity in data loading and data extraction GOBii is making great progress on building tools for manual data loading and data extraction. But it is hard to say that we are completely satisfied when there have been errors in the data loading and data extraction processes.			Yaw
	13	Software QC to improve new release But, the team has been quick to respond to these and we know that much stronger software QC processes are now in place and continue to improve the quality and confidence of each new release.
	13	Flex query extraction For data extraction, version 1.4 offers many basic extraction functionalities, but, we are very excited to see new features that will come into place with flex query for the future.	Flex query is in the post V 1.5 release		Kevin, Phil
	13	Hard delete-TimeScope We also look forward to having basic tools for deleting data in the coming year.	Hard delete will be released in the Timescope tool in v1.5 We will train system admin/super users	Update TimeScope user and release documentation Schedule to user training for appropriate use of Timescope	Deb, Roy
	13	Soft delete to restrict data access For data access control, there is currently no way to restrict access to certain data(sets) within the system and that could be problematic in certain situations.	We will create access control and are starting to implement delete functionality with the Timescope tool. We agree –we will incorporate into the CRUD roll out through 2019	Design data access control by end of 2018 Roll out functionality through 2019	Yaw/Kevin
	13	flag data status Also, it might be good to be able to associate some kinds of flags, e.g. related to data status that could be applied during the data extraction or utilization process	Need more info on what this would look like
Deployment and Sys Admin Support	6	Deployment Some install parameters are not well documented. We should think of ways to by-pass lengthy back up process during deployment and updates. Slowed down due to lengthy back-up	We are improving the back-up and restore process by implementing incremental back-up	Q4 2018	Roy, Kevin
	7	The system should be highly configurable so that it can be deployed in a wide range of enterprise IT contexts e.g. different authentications, email services, in cloud or on premise etc.	We agree and will work to simplify, automate and document configuration management	Design improvements to Q4 2018. Roll out Q1 and Q2 2019	Roy
	7	Deployment should be amenable to being scripted/automated, which means e.g. all configurations should be with tokens that can be set with a script.			Roy
	7	To ensure stable deployment, changes must be managed and documented, and impact on release process well documented.	Documentation will be improved	Continuous	Roy
	13	We have had some challenges with deployments in the past year, but I think that the process is always improving. Also, in general, we know that the QC process has gotten much better with great testing and great reporting on bug fixes during each release (with test files, etc.). But, there may still be some room for improvement, especially as the number of variations for any new feature that need to be tested increase. Bugs fixing tracking and communication management Also, it has been a bit frustrating in the small number of cases where we thought that something was fixed and then later learned that it was not.			Star, Josh
Requirement process	7	Ensure inter-dependencies understood among requirement in prioritization The fundamental issue in the requirement gathering is the lack of dependency analysis. Users will tend to ask for the analytical functions of any system as this is where the business value is, but these all require the basic CRUD functions. If this is not achieved, the module cannot work within practical breeding. No normal user will up front think of asking for e.g. a delete or update function in a data base, but that does not mean it is not a requirement for the system. The requirement prioritization must both consider user requirements, but also technical IT requirements, and the inter dependencies must be understood in the prioritization process.			Yaw, Kevin, Josh
Requirement process	13	The requirements gathering and prioritization process is still a little unclear. But, I am not sure how it can be refined when there are many different participants with their own priorities. And I know that it must be challenging as the requirements and priorities expressed by groups change over time.			Liz, Yaw
Data loading	13	Track issues and requirements and communication management It is generally easy to report issues but it is not always clear when they are being worked on, especially if they go in by email. We are grateful that the standards for submitting error reports are still relatively loose, but we want to continue to work together to make sure that we are providing the minimal standardized information needed to test the error without taking too much time (especially when it's not known if the error was already reported, etc.).	We have greatly improved the issue and requirement tracking traceability. Visibility will be improved with a cloud deployment and then we will train users	Cloud deployment of our test systems Q4 2018 Train users on process Q1 2019	Roy and Deb
Data loading	13	Informative error logs for trouble shooting /diagnose The process for defining new requirements is less clear to us. The error logs provided with failed data loads generally do not provide information that allows us to diagnose the specific problem that caused the failure.	We are continuously working to improve error messages and will continue to do so		Josh
Data extraction	5	Need data validation tools to validate data for loading and extraction A few issues related to extracted data accuracy were reported to GOBii and required some tests to fix the problems and validate them. I understand that this is the part of the database development process but there is no existing tools for curators to validate data once they are loaded and extracted.
	5	Need systematic data validation tool to handle large dataset It is difficult especially when datasets are large and we use random data validation approaches unless we are lucky enough to identify issues by chance
	6	Proved extract accuracy for patch fixing There are some issues on extract accuracy, I understand that there are already efforts to "patch" the affected data set. But we need to prove that this works.			Deb, Josh
	13	The current extraction features are not very extensive but the flex query tool that will be available in the next version seems to add much more functionality.	Flex query will be released post v1.5		Kevin, Phil
	13	Usability- Navigation to folders is not user friendly The solution for navigating to folders to get data works pretty well for curators ,but may not seem very user-friendly to other users.	We agree and have extensively reviewed web-based browsers that can provide live links to files and that have authentication. We have identified OwnCloud and are incorporating into our emails and into the Marker Toolbox	Q4, 2018	Yaw, Josh
	13	Informative error log The error logs generally do not provide enough information to help us figure out why an extract might have failed. We've also had some concerns, especially before v.1.4 about the integrity of the extracted data. But the data extraction is very fast!
	14	I have not been using GOBii yet, and have not interacted with core functions. I only know second hand of some issues with data import and extract.
Communication Effectiveness	13	Training and workshops I put a lower score for the workshops and training mostly because I think that they have generally been premature. It likely does not make sense to provide any GOBii "training" or GS pipeline "training" to users outside the 3 first CGs until there is a system or a stable completed pipeline that people can use. The minutes from the online meetings are generally very helpful. And targeted meetings in person are also usually quite helpful.	Our project required us to reach out to additional CGs and NARS at this stage of the project, but we agree that we have found it is premature before the database is available to them. We do find the tools we have developed are well received by wider partners though. We do have limited resources though and in 2019 will focus our training on our CG center partners and related NARS	We will focus our efforts through 2019 on our existing CG center customers	Star
GOBii-funded tool development	7	In the case of GOBII, the team started working on analytics functions based on the user requests, much before the foundational functions were finalized, and the result is that we now have disperse partial functions that do not make up a working system.
	13	QC tools with KDCompute KDC provides some good statistics, but, it is unclear how to use these in a functional production workflow.	We agree that having KDCompute in place is only the first step in the process and we have probably not promoted the long term view of the process sufficiently. We will next provide a mechanism to filter the data and reload to GOBii, The non-QCd dataset can then be deleted or eventually will be able to be hidden from the user using ‘soft-delete’	Implementing ability to filter data Q1 2019 Ability to delete non-QCd datasets Q4 2018 Review functionality for soft delete Q4 2018 and prioritize for 2019-2020
	13	QC tools with KDCompute It may also be confusing to disentangle the current data that should be used for "genotypic data" QC versus "germplasm QC," e.g. the the F1 verification information.	All tests in KDCompute are meant to QC genotyping data. Both data quality and genetic quality can be useful for this purpose.
	14	Batch processing MABC using Galaxy workflow Following the recent training, I can see how Galaxy can be helpful for streamlining the MABC batch processes. I'm willing to try this, but would need to schedule an online session to go over the steps once I have data sets ready.

	14	Passing genealogies from breeding management systems I think the F1 and line verification basics are in place, but would like to see how the system would work for evaluating hundreds of F1s against their parents, or to assess all new lines for parent verification. I think some of the requirements for this sort of workflow would require information from the enterprise breeding system which I don't know is there (i.e. Parent 1, Parent 2 meta-data).	We are re-evaluating how to F1 pedigree verification tests are functioning for high volume ‘real-life’ datasets. We are seeing that our tools do not well manage some use-cases. These will be redesigned in Flapjack and KDCompute to accommodate these use cases	Through 2019	Carlos
	14	I don't know how the genealogies are tracked in EBS/BMS and how that information could be used efficiently in an F1 or line validation tool.	We are re-evaluating how to F1 pedigree verification tests are functioning for high volume ‘real-life’ datasets. We are seeing that our tools do not well manage some use-cases. These will be redesigned in Flapjack and KDCompute to accommodate these use cases	As appropriate	Carlos
	15	No one-stop shop workflow yet
	19	Flapjack - has potential for being the most routinely used, most sought after and immediate impact on usage by breeding community; especially for QC and MAS stuff /early generation screening in future; GS-Galaxy - needs more build-up and user cases; this could be next in line fo impact	We have incorporated some breeding management fields into GOBii that are needed for marker data analysis and to accommodate not having breeding management systems up and running at CG centers. As we integrate with breeding management systems we will transition to pulling that information from the correct authoritative system.
Overall User Satisfaction Summary
10. Scope and prioritization	6	Core curation functionality and accuracy of extract should take precedence over these.
	7	Given that only limited time remains ALL possible resources must be re-prioritized to address these basic functions as all other functions depend on this. This most likely implies reducing e.g. analytics and training work in order to secure basic functionality of high quality. GOBII need to recognize that the core genotypic data module is not finalized, and considering how little time is left, focus should be on creating a core genotypic data base with a complete CRUD and some QC functions. Tools is nice to have, but not much use if the underlying genotyping pipeline is not efficient. Given how little time is left of GOBII, it is critical to analyze what is the minimal functionality of a core genotypic data base module and focus on making that work.	Agree, as above we are focusing on the core system
	7	Integration with Breeding management systems For GOBII to be used widely, only require SampleIDs. All additional germplasm data should be removed from core module and be optional-should come from the germplasm system.	Our schema was developed to compensate for the lack of breeding management systems in most institutes and this was before we fully understood sample tracking use cases. But now, we agree, we need to be able to accommodate both the existence and non-existence of breeding systems. We are revising the schema to accommodate both scenarios easily and will roll out a migration plan in the next 2 months	Implementation plan to accommodate only having sampleIDs – Q4 2018 Likely implement by Q2 2019	Kevin
	13	Integration with Breeding management prioritize and ensure joint development efforts, e.g. to ensure that a breeding system can generate a query for information to GOBii and then get back information into a fieldbook,	We are focusing on the sample tracking use-case for integration. If we can coordinate and standardize sample tracking and APIs then these queries will be straight forward. We will continue to engage through BrAPI and drive coordination of data entries across GOBii institutes and HTPG projects	Education on use-cases from breeding management systems back to breeders Q4 2018 Align all relevant BrAPI to GOBii schema and implement calls Q4 2018 Continue to engage HTPG projects through HTPG meetings Schema changes to accommodate use-cases and APIs Q1-2 2019 Simplified loading Q1 2019 Be able to update germplasm information after loading samples (see CRUD functionality)
	13	Tool development how to determine how much effort should be invested in trying to create a tool that can support smaller scale programs versus creating a tool that really has almost no front end and needs to plug into other breeding data systems, etc., to be functional.
	19	Need a very clear and quickest timeline for production scale implementation for routine use

For clarity of overall scope and road map, users are somewhat unclear as the figures below showing quite diverse interests and priorities.

Figure 3. GOBii scope and prioritization (multiple selection)

LikeBe the first to like this

No labels
Edit Labels

Write a comment…