Opened 9 years ago

Closed 7 years ago

#236 closed Enhancement (Fixed)


Reported by: Lois Haggard Owned by: Testing
Priority: -Low Milestone: 2.3 - Testing
Component: IBIS-Q Version: 2.3
Severity: Moderate Effort Keywords:


POPULATE_MISSING_VALUES first came up in working with EPHT maps, where a value, even a zero value, was desired for every geography (e.g., every county). In 2.0, you could add the <POPULATE_MISSING_VALUES> element to your configuration. But it was a bit unusual looking, all the populated elements would appear on the results page at the bottom of the data list table, with no other cells in that row populated.

Garth has done some work on it for version 2.2 that sorts the missing values into the data list table. The new xml element is: <ADD_MISSING_QUERY_RESULT_RECORDS>. Garth's solution starts with the complete set of dimension values, initializing the values equal to zero, and updating them with any values that come from IBIS-Q.

But one thing this solution does not do is capture the population data values or compute confidence intervals (yes, it is possible to calculate a C.I for a zero value). To do that, we need to involve the back-end IBIS-Q application.

Here is a code snippet from a .DEF file I wrote to work with the NM EPHT mapper. It works, but it supplies ALL NM counties. So it will provide zero results even for counties the user filtered out, regardless of whether they had records in the numerator dataset (not good!). So it is really only useful for maps and configurations that do not allow the user any geographic selection.

 * If cross1 or cross2 is equal to fipscode then the statement   ;
 * will resolve to be 'fipscode=fipscode'. When that logical     ;
 * argument is true, as it is in that example, the 'frame'       ;
 * dataset will be created.                                      ;
	%macro geoframe;
	%if (%cross1%=fipscode) or (%cross2%=fipscode) %then %do;
		data frame;	
			retain count 0;
				fipscode=1, 3, 5, 6, 7 to 27 by 2, 28, 29 to 61 by 2, 99, .;
	proc print data=frame noobs; title1 '======='; title2 'frame'; run;

	proc sort data=frame; by fipscode; run;
	proc sort data=tmp; by fipscode; run;
	data new;
		merge frame tmp;  *the order that these datasets are listed is important;
		by fipscode;
		drop _TYPE_ _FREQ_;

	data tmp;
		set new;

	%mend geoframe;
	proc print data=tmp noobs; title2 'tmp'; run;

I think we would all like to have a robust solution that always works, and we probably want to use it with every query.

The proposed requirements are:

  • For any cross1 or cross2 dimension, IBIS should display all dimension values in the results table (except for those flagged with the <NOT_ASSOCIATED_FLAG/> element and any dimension values explicitly filtered out by the user).
  • If a given numerator dataset has no data records for the dimension values specified above, IBIS should display a zero count in the results table(s).
  • All other aspects of the results table should look the same, displaying any population and confidence interval values calculated in the .def file.

A potential solution would use methods similar to those used by Garth and Lois in their partial solutions, which can be summarized as follows:

  • IBIS-Q will create a "data frame" or blank dataset that includes all dimension values of cross1 and cross2. If the user filtered the dataset to selected values of cross1 and cross2, IBIS-Q will use all user-defined values of cross1 and cross2.
  • Because IBIS-Q has no knowledge of the complete set of dimension values, IBIS-View will need to pass all needed dimension values to IBIS-Q. This will require moving away from the current "GET" request because the URL will become too long.
  • The data frame will initialize all cross1 and cross2 dimension values with a measure value of zero.
  • IBIS-Q will then consult the numerator dataset (tmp) as it currently does. It will then update the data frame with any new values from tmp that are discovered.
  • The remaining steps for rates and age-adjusted rates can follow as before. The zero values will produce crude and age-adjusted rates of 0.
  • For rate measures, the population data values will be merged onto the tmp dataset and will be available to appear in the results table.

We will need to discuss this potential solution with ZW and scope it out.

Attachments (1)

LettertoZWFeb2014.docx (21.4 KB) - added by Lois Haggard 8 years ago.
Memo to ZW describing the needed modifications.

Download all attachments as: .zip

Change History (6)

Changed 8 years ago by Lois Haggard

Attachment: LettertoZWFeb2014.docx added

Memo to ZW describing the needed modifications.

comment:1 Changed 8 years ago by Tong Zheng

ZW has finished his part. He needs URL to test if his enhancement work.

comment:2 Changed 8 years ago by Garth Braithwaite

Milestone: Unassigned2.3 - Testing
Owner: changed from Zwiwei Liu to Testing
Priority: Unknown-Low
Severity: UnknownModerate Effort
Version: Unknown2.3

comment:3 Changed 8 years ago by Garth Braithwaite


  • When using the data frame ZW needs all values to be sent that are to build the data frame. So if you want the total row then that "." value needs to be sent. However, the only mechanism that is used and documented is the NOT_ASSOCIATED_FLAG. If you turn this off then ALL NOT_ASSOCIATED_FLAG values are sent to be made into the data frame so you'll always get back values like 99 "Unknown" etc. If you don't include those values then you won't ever get back the total row or a Unknown etc.
Last edited 8 years ago by Garth Braithwaite (previous) (diff)

comment:4 Changed 8 years ago by Tong Zheng

Now we call sas dataset "data_frame_%cross1%%cross2%" in .def to merge with the main dataset in order to get all 0 observations shown on the result. Nothing needs be done on view part.

Utah infant mortality module on dev. site has implemented it. All display-by dimension values are listed on result table even with 0 observations.

comment:5 Changed 7 years ago by Lois Haggard

Resolution: Fixed
Status: newclosed
Note: See TracTickets for help on using tickets.