Assignment 4 Raw Data Preparation for Healthcare Analytics Instructions
Step-By-Step Assignment Instructions
This assignment is to solve the following analytical problem that could benefit from risk adjustment. The assignment is to be submitted in a PDF report format of about 6 pages in length in double spacing.
The problem is identified as follows:
· Some providers at Acme Healthcare may be engaging in fraud with respect to documentation and billing. How can they be identified after controlling for patient-level risk factors?
Step 1 – Summary of Analytical Problem Requiring Risk Adjustment (Done)
· Provider profiling for fraud analysis
Within your report to Acme Healthcare administrators, address the following questions on a page in double space:
· Why did you choose the topic?
· How can the problem benefit from an analytical solution?
· Why risk adjustment is helpful or necessary?
· What general conceptual steps will be required to perform risk adjustment?
Step 2 – Using Groupers to Prepare Analytic Datasets
To prepare for your risk adjustment analysis, consider how you will group diagnoses, procedures, and drugs into more manageable categories.
For this first part of the project you will review data files that contain grouper logic for the following systems:
· Healthcare Cost and Utilization Project (HCUP). (2016). Clinical Classifications Software (CCS) for ICD-9-CM .
· U.S. National Library of Medicine, National Institutes of Health. (2014). Unified Medical Language System .
· University of California – San Diego. (Undated). Chronic Illness and Disability Payment System .
· Berenson-Eggers Type of Service (BETOS) Codes
For each file, address the following question:
· How can you aggregate many codes into a smaller number of analytical categories?
Step 3 – Describe the Analytical Plan
Using the SEMMA methodology, describe how you will use the data sets provided to solve your analytical problem, here are some questions to consider in your description: Provide an answer to each of the following in your description:
· Sample: Will you include all rows of the data?
· Explore: What descriptive analyses might you perform to learn about the data? How might this help you select fields to include in the final analysis?
· Modify: Although you will go into details about this step in Part 4, briefly describe what data transformations might be required and why these are necessary.
· Model: Briefly consider some of your knowledge about data science and statistics to describe some possible methods used for risk adjustment. Be clear in your discussion of datasets (e.g., rows, tables), and use concrete definitions of terms related to predictive modeling (e.g., structured vs. unstructured)
· Assess: Describe how you will assess your model and output.
Step 4 – Creating an Analytical File
Based on the lessons about how to perform risk adjustment, the objective for this part of the project is to describe what types of data transformations and processing are queried to prepare the data for the risk adjustment analysis.
Address all 12 of the following questions in your response (please note: some answers can be answered in one to two sentences, but for others, you may need to expand your answer to three to five sentences):
Concepts, Fields, Groupers
· What concepts are required in the analysis?
· Which fields from the datasets will you select for each concept?
· Continuing from the earlier section about groupers, which grouper categories will you use?
· Which tables have multiple rows per patient?
· When you join data from the various tables, will your output include duplicates?
· In looking at the data dictionary and the data tables, do you see any need for mapping to more standard codes?
· Is there evidence that the data might vary through time, or by different regions/states?
· Would you consider conditional programming logic to recode data values?
· What type of aggregation of data might be helpful?
· Would it be helpful to select specific rows (filter)?
· Is there are need to transpose any fields?
· Are there temporal aspects of data related to dates that could cause problems?
Step 5 – Appendix: Data Dictionary and Output Interpretation
One of the most important parts of analytical projects is to have documentation about the source data so that the data science teams can produce reliable information. In addition, once the analytics are complete, the data scientist teams should explain how they transformed data and created their models.
You need to include the following in your appendix:
Improve the rudimentary data dictionary that you worked on in the lessons for Module 4. For this assignment, create a sample data dictionary that has at least 5 fields (additional points available for more than 5).
· Include fields that were not included in the original example provided by the instructor
· Consider including derived fields that you might create for the analysis. For example, if you create a new variable that combines two variables, it would be important to describe how this was done. It is also important to describe fields created by groupers.
Based on your analytical and modeling plan, summarize what types of output might be created for the risk adjustment analytics.