/
2018 User Feedback and GOBii Responses to Comments

2018 User Feedback and GOBii Responses to Comments

Created by Star Yanxin Gao, last modified by Liz on Nov 01, 2018

Similar to 2017, a comprehensive survey was sent to CG users in August 2018 to gauge users' feedback and assessment of the GOBii project from 08/2017 to 08/2018. The 33 users surveyed represented PIs, steering teams, curators, MID breeders, developers, system administrators, and IT managers at CIMMYT, ICRISAT, and IRRI. The survey included eight categories as below, averaging six questions in each category as below: 

  1. Core GDM/GOBii functionalities

  1. Deployment and System Administration Support

  1. Requirement gathering process

  1. Data loading

  1. Data extraction

  1. Communication and engagement

  1. GOBii-funded tool development

  1. Overall User Satisfaction 



A simple 1 to 5 scale was used, with 1 = lowest to 5= highest satisfaction score.



Summary

We received 60% response rate, 20 out of 33 responses, with 7 from CIMMYT, 10 from ICRISAT, and 3 from IRRI and 45% representing application team or breeders, 35% developers, IT managers, or system admins, and 30% PIs or steering teams.







Similar to 2017, users' overall satisfaction to GOBii is very high for overall performance, release deployment, Dev, App, PI/Steering team engagement. 

More specifically, GOBii communication and team engagement won the highest ratings across while the weak area was in GOBii core system functionalities. Specifically, some users concerned that GOBii do not have data delete or update functionalities in place or lack systematic software approaches to check validate data loaded and extracted .   



2018 User Feedback and GOBii Responses to Comments

Table 1. 2018 CG Users' feedback and acceptance and satisfaction ratings: 1= lowest and 5 = highest (September 2018)

Feedback Category/Topics

CIMMYT

ICRISAT

IRRI

Average

Core GDM/GOBii functionalities

2.45

3.56

3.38

3.14

2.1 Data loading

2.57

3.67

3.67

3.26

2.2 Data extracting

2.57

3.88

4.00

3.39

2.3 Data updates

1.67

3.56

3.00

2.83

2.4 Data deletion

1.33

2.75

3.00

2.29

2.5 Data QC pone loading and extraction

2.50

3.44

3.33

3.11

2.6 User authentication

3.67

4.33

3.33

3.94

2.7 Data access control

2.83

3.22

3.33

3.11

Deployment and Sys Admin Support

3.31

3.81

3.93

3.66

3.1 Release and deployment process

3.00

4.00

3.50

3.53

3.2 Pre-release QC of new features

2.83

3.71

4.00

3.40

3.3 Sys admin support for deployment

3.75

4.00

4.00

3.92

3.4 Sys admin support for maintenance

4.00

3.71

4.00

3.85

3.5 Sys admin engagement

4.00

4.14

4.50

4.15

3.6 System documentation

3.00

3.50

4.00

3.42

3.7 Ease of system maintenance

3.00

3.50

3.50

3.33

Requirement process

3.21

3.71

3.80

3.57

4.1 Requirements gathering and clarification

3.14

3.78

3.67

3.53

4.2 Requirement prioritization

2.71

4.00

3.67

3.47

4.3 Requirement (GR) specification (clarity)

3.50

3.89

4.00

3.81

4.4 Tracking and signing off

3.60

3.33

4.00

3.53

4.5 Ease of submitting requirements (1=difficult, 5=easy)

3.40

3.56

3.67

3.53

Data loading

3.42

3.54

4.10

3.59

5.1 Handling different types of data 

3.50

3.43

4.50

3.62

5.2 Issue reporting

3.50

3.43

4.50

3.62

5.3 Data mapping validation

3.67

3.43

4.00

3.58

5.4 Loading large datasets

3.25

3.29

4.00

3.38

5.5 Loading error logs and email notification

3.25

4.14

3.50

3.77

Data extraction

3.13

3.62

3.78

3.50

6.1 Extract features 

3.20

3.44

4.00

3.47

6.2 Data extract process

3.40

3.33

4.00

3.47

6.3 Job status and extract output

3.20

3.75

3.33

3.50

6.4 Data integrity

2.60

3.56

3.67

3.29

6.5 Extraction issue reporting

3.20

3.89

4.33

3.76

6.6 Extraction error logs and email notification

3.20

3.78

3.33

3.53

Communication effectiveness

4.17

4.39

4.00

4.25

7.1. Online meetings

4.20

4.22

4.33

4.24

7.2 Effectiveness to engage and communicate

4.17

4.56

3.00

4.17

7.3 Face to face visits

4.50

4.11

4.00

4.22

7.4 Workshops and training

3.67

4.67

4.33

4.28

7.5 Webinars

4.33

4.38

4.33

4.35

GOBii-funded tool development

3.59

3.74

4.13

3.77

8.1 QC-KDC functionalities

3.40

3.56

3.67

3.53

8.2 GOBii-QC Integration

2.80

3.63

3.67

3.38

8.3 F1 verification functionalities

4.00

3.89

4.00

3.93

8.4 Line verification functionalities

4.00

3.89

4.33

4.00

8.5 MABC functionalities

4.25

3.78

4.33

4.00

8.6 GOBii-Flapjack Integration

3.00

3.78

4.33

3.61

8.7 GS-Galaxy functionalities

4.00

3.75

4.33

3.93

8.8 GOBii-GS-Galaxy Integration

4.00

3.63

4.33

3.87

Overall User Satisfaction 

3.63

4.17

4.33

4.05

9.1 Scope and clarity of road map

3.00

3.89

4.33

3.71

9.2 App teams management

4.25

3.89

4.33

4.06

9.3 Development team management

4.00

4.22

4.67

4.24

9.4 Engagement with steering teams, PIs, and SABs

3.50

4.22

4.33

4.06

9.5 Deployment and release

3.50

4.44

4.00

4.13

9.6 Overall satisfaction

3.60

4.33

4.33

4.12

Grand Total

3.31

3.81

3.92

3.67





Figure 1. shows the specific questions asked in the survey within each category, with the dotted yellow line showing average rating responses, and yellow highlight showing 95% confidence interval of the average. Responses are split according to institute where the survey was carried out. 











Table 2. Comments and GOBii responses to Users' comments

Category

Respondent ID

CG User Comments

GOBii Team Response

Action items prioritized and due date

Responsible

  1. Core GDM/GOBii functionalities

 

 

5

Data Integrity

This year we encountered a few issues related to the display of extracted data, now even while loading data with more than 10K markers data in the version 1.4. Essential features for curators like delete/modify/hide datasets/information are not ready yet.

Focus on data integrity is our highest priority from now on. Our system is fast but complex and we now need to focus on data integrity and fully pressure test the system with high volume datasets. We are bringing in a consultant to review and overhaul our QA/QC processes and take this issues very seriously.

  • Review QA/QC procedures with consultant Q3 2018

  • Implement recommended procedures – Q4 2018 through Q2 2019

  • Design large volume, complex datasets and regression test the system with these for each release Q4 2018

  • Review schema and how that impacts data integrity Q4 2018

  • Design automated data checking tool capable of checking inputs versus outputs Q4 2018

  • Improve data validation Q11 2019

  • Automate end-to-end regression tests Q2 2019

  • Vcf data checks­­ 2019

Liz/Josh/Deb

6

Need CRUD

For curation tool (GOBii GDM) to be effective it should have the basic functionalities: load, retrieve, update and delete information.

We will review how to implement full delete and update functionality by end of this year and roll out during 2019.



Yaw, Kevin

7

Core minimal: Create, Read, Update, and Delete (CRUD)

As GOBII cannot do these 4 functions reliably, it is in effect not a production ready system.

We will focus on the CRUD as the core system in the next year.

  • Design CRUD implementation including schema review Q4 2018

Yaw

7

It must be able to load large volumes of data quickly

We are loading a large volume of data much faster than other open-source solutions and much faster than what we have worked with in the industry where we worked





7

do at least minimal QC







7

and be able to deliver the data for analysis







13

Data integrity in data loading and data extraction

GOBii is making great progress on building tools for manual data loading and data extraction. But it is hard to say that we are completely satisfied when there have been errors in the data loading and data extraction processes.









 



Yaw

13

Software QC to improve new release

But, the team has been quick to respond to these and we know that much stronger software QC processes are now in place and continue to improve the quality and confidence of each new release.







13

Flex query extraction

For data extraction, version 1.4 offers many basic extraction functionalities, but, we are very excited to see new features that will come into place with flex query for the future.

Flex query is in the post V 1.5 release



Kevin, Phil

13

Hard delete-TimeScope

We also look forward to having basic tools for deleting data in the coming year.

Hard delete will be released in the Timescope tool in v1.5

We will train system admin/super users

Update TimeScope user and release documentation

Schedule to user training for appropriate use of Timescope

Deb, Roy

13

Soft delete to restrict data access

For data access control, there is currently no way to restrict access to certain data(sets) within the system and that could be problematic in certain situations.

We will create access control and are starting to implement delete functionality with the Timescope tool.

We agree –we will incorporate into the CRUD roll out through 2019

Design data access control by end of 2018

Roll out functionality through 2019

Yaw/Kevin

13

flag data status

Also, it might be good to be able to associate some kinds of flags, e.g. related to data status that could be applied during the data extraction or utilization process

Need more info on what this would look like





  1. Deployment and Sys Admin Support

 

 

 

6

Deployment

Some install parameters are not well documented. We should think of ways to by-pass lengthy back up process during deployment and updates.

Slowed down due to lengthy back-up

We are improving the back-up and restore  process by implementing incremental back-up

Q4 2018

Roy, Kevin

7

The system should be highly configurable so that it can be deployed in a wide range of enterprise IT contexts e.g. different authentications, email services, in cloud or on premise etc.



We agree and will work to simplify, automate and document configuration management

Design improvements to Q4 2018. Roll out Q1 and Q2 2019

Roy

7

Deployment should be amenable to being scripted/automated, which means e.g. all configurations should be with tokens that can be set with a script.





Roy

7

To ensure stable deployment, changes must be managed and documented, and impact on release process well documented.

Documentation will be improved

Continuous

Roy

13

We have had some challenges with deployments in the past year, but I think that the process is always improving. Also, in general, we know that the QC process has gotten much better with great testing and great reporting on bug fixes during each release (with test files, etc.).

But, there may still be some room for improvement, especially as the number of variations for any new feature that need to be tested increase.

Bugs fixing tracking and communication management

Also, it has been a bit frustrating in the small number of cases where we thought that something was fixed and then later learned that it was not.





Star, Josh

  1. Requirement process

7

Ensure inter-dependencies understood among requirement in prioritization

The fundamental issue in the requirement gathering is the lack of dependency analysis.

Users will tend to ask for the analytical functions of any system as this is where the business value is, but these all require the basic CRUD functions.

If this is not achieved, the module cannot work within practical breeding. No normal user will up front think of asking for e.g. a delete or update function in a data base, but that does not mean it is not a requirement for the system.

The requirement prioritization must both consider user requirements, but also technical IT requirements, and the inter dependencies must be understood in the prioritization process.





Yaw, Kevin, Josh

13

The requirements gathering and prioritization process is still a little unclear. But, I am not sure how it can be refined when there are many different participants with their own priorities.

And I know that it must be challenging as the requirements and priorities expressed by groups change over time.







Liz, Yaw

  1. Data loading

13

Track issues and requirements and communication management

It is generally easy to report issues but it is not always clear when they are being worked on, especially if they go in by email.

We are grateful that the standards for submitting error reports are still relatively loose, but we want to continue to work together to make sure that we are providing the minimal standardized information needed to test the error without taking too much time (especially when it's not known if the error was already reported, etc.).

  • We have greatly improved the issue and requirement tracking traceability.

  • Visibility will be improved with a cloud deployment and then we will train users



  • Cloud deployment of our test systems Q4 2018

  • Train users on process Q1 2019

Roy and Deb

13

Informative error logs for trouble shooting /diagnose

The process for defining new requirements is less clear to us. The error logs provided with failed data loads generally do not provide information that allows us to diagnose the specific problem that caused the failure.

  • We are continuously working to improve error messages and will continue to do so



Josh

  1. Data extraction

 

 

 

 

5

Need data validation tools to validate data for loading and extraction

A few issues related to extracted data accuracy were reported to GOBii and required some tests to fix the problems and validate them.

I understand that this is the part of the database development process but there is no existing tools for curators to validate data once they are loaded and extracted.







5

Need systematic data validation tool to handle large dataset

It is difficult especially when datasets are large and we use random data validation approaches unless we are lucky enough to identify issues by chance







6

Proved extract accuracy for patch fixing

There are some issues on extract accuracy, I understand that there are already efforts to "patch" the affected data set. But we need to prove that this works.





Deb, Josh

13

The current extraction features are not very extensive but the flex query tool that will be available in the next version seems to add much more functionality.



Flex query will be released post v1.5



Kevin, Phil

13

Usability- Navigation to folders is not user friendly

The solution for navigating to folders to get data works pretty well for curators ,but may not seem very user-friendly to other users.

We agree and have extensively reviewed web-based browsers that can provide live links to files and that have authentication. We have identified OwnCloud and are incorporating into our emails and into the Marker Toolbox

Q4, 2018

Yaw, Josh

13

Informative error log

The error logs generally do not provide enough information to help us figure out why an extract might have failed. We've also had some concerns, especially before v.1.4 about the integrity of the extracted data. But the data extraction is very fast!







14

I have not been using GOBii yet, and have not interacted with core functions. I only know second hand of some issues with data import and extract.







  1. Communication Effectiveness

13

Training and workshops

I put a lower score for the workshops and training mostly because I think that they have generally been premature. It likely does not make sense to provide any GOBii "training" or GS pipeline "training" to users outside the 3 first CGs until there is a system or a stable completed pipeline that people can use. The minutes from the online meetings are generally very helpful. And targeted meetings in person are also usually quite helpful.

Our project required us to reach out to additional CGs and NARS at this stage of the project, but we agree that we have found it is premature before the database is available to them. We do find the tools we have developed are well received by wider partners though. We do have limited resources though and in 2019 will focus our training on our CG center partners and related NARS

We will focus our efforts through 2019 on our existing CG center customers

Star

  1. GOBii-funded tool development

 

 

 

 

 

 

7



In the case of GOBII, the team started working on analytics functions based on the user requests, much before the foundational functions were finalized, and the result is that we now have disperse partial functions that do not make up a working system.







13

QC tools with KDCompute

KDC provides some good statistics, but, it is unclear how to use these in a functional production workflow.

We agree that having KDCompute in place is only the first step in the process and we have probably not promoted the long term view of the process sufficiently. We will next provide a mechanism to filter the data and reload to GOBii, The non-QCd dataset can then be deleted or eventually will be able to be hidden from the user using ‘soft-delete’

  • Implementing ability to filter data Q1 2019

  • Ability to delete non-QCd datasets Q4 2018

  • Review functionality for soft delete Q4 2018 and prioritize for 2019-2020



13

QC tools with KDCompute

It may also be confusing to disentangle the current data that should be used for "genotypic data" QC versus "germplasm QC," e.g. the the F1 verification information.

All tests in KDCompute are meant to QC genotyping data. Both data quality and genetic quality can be useful for this purpose.





14

Batch processing MABC using Galaxy workflow

Following the recent training, I can see how Galaxy can be helpful for streamlining the MABC batch processes. I'm willing to try this, but would need to schedule an online session to go over the steps once I have data sets ready.



















14

Passing genealogies from breeding management systems

I think the F1 and line verification basics are in place, but would like to see how the system would work for evaluating hundreds of F1s against their parents, or to assess all new lines for parent verification.

I think some of the requirements for this sort of workflow would require information from the enterprise breeding system which I don't know is there (i.e. Parent 1, Parent 2 meta-data).



We are re-evaluating how to F1 pedigree verification tests are functioning for high volume ‘real-life’ datasets. We are seeing that our tools do not well manage some use-cases. These will be redesigned in Flapjack and KDCompute to accommodate these use cases

Through 2019

Carlos

14

I don't know how the genealogies are tracked in EBS/BMS and how that information could be used efficiently in an F1 or line validation tool.

We are re-evaluating how to F1 pedigree verification tests are functioning for high volume ‘real-life’ datasets. We are seeing that our tools do not well manage some use-cases. These will be redesigned in Flapjack and KDCompute to accommodate these use cases

As appropriate

Carlos

15

No one-stop shop workflow yet







19

Flapjack - has potential for being the most routinely used, most sought after and immediate impact on usage by breeding community; especially for QC and MAS stuff /early generation screening in future;

GS-Galaxy - needs more build-up and user cases; this could be next in line fo impact

We have incorporated some breeding management fields into GOBii that are needed for marker data analysis and to accommodate not having breeding management systems up and running at CG centers. As we integrate with breeding management systems we will transition to pulling that information from the correct authoritative system.





  1. Overall User Satisfaction Summary















10. Scope and prioritization

6

Core curation functionality and accuracy of extract should take precedence over these.









7

Given that only limited time remains ALL possible resources must be re-prioritized to address these basic functions as all other functions depend on this.

This most likely implies reducing e.g. analytics and training work in order to secure basic functionality of high quality.

GOBII need to recognize that the core genotypic data module is not finalized, and considering how little time is left, focus should be on creating a core genotypic data base with a complete CRUD and some QC functions. Tools is nice to have, but not much use if the underlying genotyping pipeline is not efficient.

Given how little time is left of GOBII, it is critical to analyze what is the minimal functionality of a core genotypic data base module and focus on making that work.

Agree, as above we are focusing on the core system





 

7

Integration with Breeding management systems

For GOBII to be used widely, only require SampleIDs. All additional germplasm data should be removed from core module and be optional-should come from the germplasm system.

Our schema was developed to compensate for the lack of breeding management systems in most institutes and this was before we fully understood sample tracking use cases. But now, we agree, we need to be able to accommodate both the existence and non-existence of breeding systems. We are revising the schema to accommodate both scenarios easily and will roll out a migration plan in the next 2 months

Implementation plan to accommodate only having sampleIDs – Q4 2018

Likely implement by Q2 2019

Kevin

13

Integration with Breeding management

prioritize and ensure joint development efforts, e.g. to ensure that a breeding system can generate a query for information to GOBii and then get back information into a fieldbook,

We are focusing on the sample tracking use-case for integration. If we can coordinate and standardize sample tracking and APIs then these queries will be straight forward. We will continue to engage through BrAPI and drive coordination of data entries across GOBii institutes and HTPG projects

  • Education on use-cases from breeding management systems back to breeders Q4 2018

  • Align all relevant BrAPI to GOBii schema and implement calls Q4 2018

  • Continue to engage HTPG projects through HTPG meetings

  • Schema changes to accommodate use-cases and APIs Q1-2 2019

  • Simplified loading Q1 2019

  • Be able to update germplasm information after loading samples (see CRUD functionality)



13

Tool development

how to determine how much effort should be invested in trying to create a tool that can support smaller scale programs versus creating a tool that really has almost no front end and needs to plug into other breeding data systems, etc., to be functional.







19

Need a very clear and quickest timeline for production scale implementation for routine use







For clarity of overall scope and road map, users are somewhat unclear as the figures below showing quite diverse interests and priorities.







Figure 3. GOBii scope and prioritization (multiple selection)







LikeBe the first to like this

Write a comment…