[Federal Register: April 7, 2005 (Volume 70, Number 66)]
[Notices]
[Page 17765-17817]
From the Federal Register Online via GPO Access [wais.access.gpo.gov]
[DOCID:fr07ap05-133]
[FRL-7895-2]
AGENCY: Environmental Protection Agency.
ACTION: Notice of availability of final document.
This Notice announces the availability of the final document, Guidelines for Carcinogen Risk Assessment (EPA/630/P-03/001F), hereafter referred to as the Guidelines. These Guidelines were developed as part of an Agency-wide guidelines development program by a Technical Panel of the U.S. EPA's Risk Assessment Forum, which was composed of scientists from throughout the Agency. Selected drafts were peer reviewed internally by the U.S. EPA's Science Advisory Board, and by experts from universities, environmental groups, industry and other governmental agencies. The Guidelines were also subjected to several public comment periods. Issuance of these final Guidelines fulfills EPA's obligations under section 112(o) (7) of the Clean Air Act.
The Guidelines are available for use by EPA risk assessors as March 29, 2005.
This Notice contains the full Guidelines document. The Guidelines also are available electronically through the EPA Web site at http://www.epa.gov/cancerguidelines. A limited number of paper and CDROM copies will be available from the EPA's National Service Center for Environmental Publications (NSCEP), P.O. Box 42419, Cincinnati, OH 45242; telephone: (800) 490-9198 or (513) 489-8190; facsimile: (513) 489-8695. Please provide your name, mailing address and the title and number of the requested EPA publication (EPA/630/P-03-001F). Additionally, copies of the Guidelines will be available for inspection at EPA headquarters and regional libraries, through the U.S. Government Depository Library program.
Dr. William P. Wood, Risk Assessment Forum, National Center for Environmental Assessment (8601D), U.S. Environmental Protection Agency, Washington DC 20460, telephone: (202) 564-3361; facsimile: (202) 565-0062; or e-mail: risk.forum@epamail.epa.gov.
In the 1983 Risk Assessment in the Federal Government: Managing the Process, the National Academy of Sciences recommended that Federal regulatory agencies establish ``inference guidelines'' to promote consistency and technical quality in risk assessment, and to ensure that the risk assessment process is maintained as a scientific effort separate from risk management. A task force within EPA accepted that recommendation and requested that EPA scientists begin to develop such guidelines. In 1984, EPA scientists began work on risk assessment guidelines for carcinogenicity, mutagenicity, suspect developmental toxicants, chemical mixtures and exposure assessment. Following extensive scientific and public review, these five guidelines were issued on September 24, 1986 (51 FR 33992- 34054). Since 1986, additional risk assessment guidelines have been developed, revised and supplemented.
EPA continues to revisit the guidelines as experience and scientific consensus evolve. In 1996, the Agency published proposed revisions to EPA's 1986 cancer guidelines for public comment. Since the 1996 proposal, the document has undergone extensive public comment and scientific peer review, including three reviews by EPA's Science Advisory Board (SAB) in February 1997, January 1999 and July 1999. The July 1999 review panel was supplemented by the EPA Children's Health Protection Advisory Committee. Public comments were received concurrent to each of these reviews. In 2001 (66 FR 59593, November 29, 2001) an additional public comment period was held requesting new information gained through the use of the July 1999 draft final revised guidelines on issues including, but not limited to, the nature and use of default assumptions; definition and application of hazard descriptors; identification of carcinogenic mode(s) of action and, in particular, consideration of relevancy for children (e.g., the potential for differential life stage susceptibility); and guidance on the use of the margin of exposure analysis. The notice also announced that the July 1999 draft final revised guidelines would serve as EPA's interim guidance to EPA risk assessors preparing cancer risk assessment, until the issuance of final guidelines. In May 2003 EPA made available for public comment a revised draft of the guidelines, and in February 2005 the guidelines underwent interagency review. The final Guidelines issued today are based, in part, upon the recommendations derived from public comments, workshops and recommendations of the SAB.
CAA section 112(o)(7) provides ``[t]he Administrator shall consider, but need not adopt, the recommendations contained in the report of the National Academy of Sciences prepared pursuant to this subsection and the views of the Science Advisory Board, with respect to such report. Prior to the promulgation of any standard under [CAA section 112(f)], and after notice and opportunity for comment, the Administrator shall publish revised Guidelines for Carcinogenic Risk Assessment or a detailed explanation of the reasons that any recommendations contained in the report of the National Academy of Sciences will not be implemented.''
In response to CAA section 112(o)(7), the 1994 National Research Council (NRC) report, and continuing developments in the science of cancer risk assessment, EPA began the process of revising its Guidelines for Carcinogen Risk Assessment. Revisions to the Guidelines were intended to make greater use of the increasing scientific understanding of the mechanisms that underlie the carcinogenic process. Several drafts of revisions to the Guidelines have been subject to extensive public comment and scientific peer review, including three reviews by EPA's SAB, as discussed above. EPA considered the 1994 recommendations of the NRC on the Guidelines. EPA's approach to those NRC recommendations is reflected in the Guidelines. Draft EPA responses to the NRC recommendations were presented in the preamble to the 1996 draft of these revised Guidelines (61 FR 18003, April 23, 1996). By issuing the final Guidelines which address the recommendations of the NRC, EPA has fulfilled its responsibilities under CAA section 112(o)(7).
The Guidelines are intended to make greater use of the increasing scientific understanding of the mechanisms that underlie the carcinogenic process. The final guidelines include discussions of all of the four steps of the risk assessment process and provide guidance to risk assessors on these steps. In applying these principles to the development of these Guidelines, the following key issues were highlighted: use of default options, the consideration of mode of action, understanding of biological changes, fuller characterization of carcinogenic potential, and consideration of differences in susceptibility.
Use of default options--Default options are approaches that EPA can apply in risk assessments when scientific information about the effects of an agent on human health is unavailable, limited, or of insufficient quality. Under the final Guidelines, EPA's approach begins with a critical analysis of available information, and then invokes defaults if needed to address uncertainty or the absence of critical information.
Consideration of mode of action--Cancer refers to a group of diseases involving abnormal, malignant tissue growth. Research has revealed that the development of cancer involves a complex series of steps and that carcinogens may operate in a number of different ways. The final Guidelines emphasize the value of understanding the biological changes and how these changes might lead to the development of cancer. They also discuss ways to evaluate and use such information, including information about an agent's postulated mode of action, or the series of steps and processes that lead to cancer formation. Mode- of-action data, when available and of sufficient quality, may be used to draw conclusions about the potency of a chemical, its potential effects at low doses, whether findings in animals are relevant to humans, and which populations or lifestages may be particularly susceptible.
Fuller characterization of carcinogenic potential--In the final Guidelines, an agent's human carcinogenic potential is described in a weight-of-evidence narrative. The narrative summarizes the full range of available evidence and describes any conditions associated with conclusions about an agent's hazard potential. For example, the narrative may explain that a chemical appears to be carcinogenic by some routes of exposure but not by others (e.g., by inhalation but not ingestion). Similarly, a hazard may be attributed to exposures during sensitive life-stages of development but not at other times. The narrative also summarizes uncertainties and key default options that have been invoked. To provide additional clarity and consistency in weight-of-evidence narratives, the Guidelines present a set of weight- of-evidence descriptors that accompany the narratives. The Guidelines emphasize that risk managers should consider the full range of information in the narratives and not focus exclusively on the descriptors. As in the case of the narratives, descriptors may apply only to certain routes of exposure, dose ranges and durations of exposure.
Consideration of differences in susceptibility--The Guidelines explicitly recognize that variation may exist among people in their susceptibility to carcinogens. Some subpopulations may experience increased susceptibility to carcinogens throughout their life, such as people who have inherited predisposition to certain cancer types or reduced capacity to repair genetic damage. Also, during certain lifestages the entire population may experience heightened susceptibility to carcinogens. In particular, EPA notes that childhood may be a lifestage of greater susceptibility for a number of reasons: rapid growth and development that occurs prenatally and after birth, differences related to an immature metabolic system, and differences in diet and behavior patterns that may increase exposure.
The final Guidelines explicitly call for consideration of possible sensitive subpopulations and/or lifestages (such as childhood). Therefore, concurrent with release of the final Guidelines, EPA published a separate guidance, entitled Supplemental Guidance for Assessing Susceptibility from Early-Life Exposure to Carcinogens (EPA/ 630/R-03/003F), hereafter referred to as the Supplemental Guidance, describing possible approaches that could be used to assess risks resulting from early life exposure to potential carcinogens. The Supplemental Guidance is separate from the Guidelines so that it may be more easily updated in a timely manner given the expected rapid evolution of scientific understanding about the effects of early-life exposures. Availability of the Supplemental Guidance is announced in a separate notice, also published in today's Federal Register.
These Guidelines set forth principles and procedures to guide EPA scientists in the conduct of cancer risk assessments and to inform Agency decision makers and the public about these procedures. Policies in this document are intended as internal guidance for EPA. So risk assessors and risk managers at EPA are the primary audience. These Guidelines also provide basic information to the public about EPA's risk assessment methods. In particular, the Guidelines emphasize that risk assessments should be conducted on a case-by-case basis, giving full consideration to all relevant scientific information. This approach means that Agency experts study scientific information on each agent under review and use the most scientifically appropriate interpretation to assess risk. The Guidelines also stress that this information be fully presented in Agency risk assessment documents, and that Agency scientists identify the strengths and weaknesses of each assessment by describing uncertainties, assumptions and limitations, as well as the scientific basis and rationale for each assessment. The Guidelines are formulated in part to bridge gaps in risk assessment methodology and data. By identifying these gaps and the importance of the missing information to the risk assessment process, EPA wishes to encourage research and analysis that will lead to new risk assessment methods and data.
The Guidelines are guidance only. They do not establish any substantive ``rules'' under the Administrative Procedure Act or any other law and have no binding effect on EPA or any regulated entity, but instead will represent a non-binding statement of policy. EPA believes that the Guidelines represent a sound and up-to-date approach to cancer risk assessment and enhance the application of the best available science in EPA's risk assessments. However, EPA cancer risk assessments may be conducted differently than envisioned in the Guidelines for many reasons, including (but not limited to) new information, new scientific understanding or new science policy judgment. The science of risk assessment continues to develop rapidly, and specific components of the Guidelines may become outdated or may otherwise require modification in individual settings. Use of the Guidelines in future risk assessments will be based on decisions by EPA that approaches from the Guidelines are suitable and appropriate in the context of those particular risk assessments. These judgments will be tested through peer review, and risk assessments will be modified to use different approaches if appropriate.
Even though the Guidelines are not binding rules, EPA is issuing them in a manner consistent with the procedures in the Administrative Procedure Act that are generally applicable to rulemaking, including providing opportunity for public comment. EPA considered and responded to all significant public comments as it prepared the Guidelines and will send a copy of the final Guidelines to Congress. EPA certifies that the Guidelines will not have a significant impact on a substantial number of small entities, because the Guidelines are for the benefit of EPA and impose no requirements or costs on small entities.
Beginning today, Guidelines and Supplemental Guidance serve as EPA's recommendation to Agency risk assessors preparing cancer risk assessments. As EPA prepares cancer assessments under the Integrated Risk Information System (IRIS) program, as well as in other EPA programs, the Agency intends to begin to use the Guidelines and Supplemental Guidance. EPA also intends to consider the Guidelines and Supplemental Guidance along with other selection factors when EPA selects agents for reassessment in annual IRIS agendas (see for example, 70 FR 10616, March 4, 2005).
Dated: March 29, 2005.
Stephen L. Johnson,
Acting Administrator.
These guidelines revise and replace the U.S. Environmental Protection Agency's (EPA's, or the Agency's) Guidelines for Carcinogen Risk Assessment, published in 51 FR 33992, September 24, 1986 (U.S. EPA, 1986a) and the 1999 interim final guidelines (U.S. EPA, 1999a; see U.S. EPA 2001b). They provide EPA staff with guidance for developing and using risk assessments. They also provide basic information to the public about the Agency's risk assessment methods.
These cancer guidelines are used with other risk assessment guidelines, such as the Guidelines for Mutagenicity Risk Assessment (U.S. EPA, 1986b) and the Guidelines for Exposure Assessment (U.S. EPA, 1992a). Consideration of other Agency guidance documents is also important in assessing cancer risks where procedures for evaluating specific target organ effects have been developed (e.g., assessment of thyroid follicular cell tumors, U.S. EPA, 1998a). All of EPA's guidelines should be consulted when conducting a risk assessment in order to ensure that information from studies on carcinogenesis and other health effects are considered together in the overall characterization of risk. This is particularly true in the case in which a precursor effect for a tumor is also a precursor or endpoint of other health effects or when there is a concern for a particular susceptible life-stage for which the Agency has developed guidance, for example, Guidelines for Developmental Toxicity Risk Assessment (U.S. EPA, 1991a). The developmental guidelines discuss hazards to children that may result from exposures during preconception and prenatal or postnatal development to sexual maturity. Similar guidelines exist for reproductive toxicant risk assessments (U.S. EPA, 1996a) and for neurotoxicity risk assessment (U.S. EPA, 1998b). The overall characterization of risk is conducted within the context of broader policies and guidance such as Executive Order 13045, ``Protection of Children From Environmental Health Risks and Safety Risks'' (Executive Order 13045, 1997) which is the primary directive to Federal agencies and departments to identify and assess environmental health risks and safety risks that may disproportionately affect children.
The cancer guidelines encourage both consistency in the procedures that support scientific components of Agency decision making and flexibility to allow incorporation of innovations and contemporaneous scientific concepts. In balancing these goals, the Agency relies on established scientific peer review processes (U.S. EPA, 2000a; OMB 2004). The cancer guidelines incorporate basic principles and science policies based on evaluation of the currently available information. The Agency intends to revise these cancer guidelines when substantial changes are necessary. As more information about carcinogenesis develops, the need may arise to make appropriate changes in risk assessment guidance. In the interim, the Agency intends to issue special reports, after appropriate peer review, to supplement and update guidance on single topics (e.g., U.S. EPA, 1991b). One such guidance document, Supplemental Guidance for Assessing Susceptibility from Early-Life Exposure to Carcinogens (``Supplemental Guidance''), was developed in conjunction with these cancer guidelines (U.S. EPA., 2005). Because both the methodology and the data in the Supplemental Guidance (see Section 1.3.6) are expected to evolve more rapidly than the issues addressed in these cancer guidelines, the two were developed as separate documents. The Supplemental Guidance, however, as well as any other relevant (including subsequent) guidance documents, should be considered along with these cancer guidelines as risk assessments for carcinogens are generated. The use of supplemental guidance, such as the Supplemental Guidance for Assessing Cancer Susceptibility from Early-life Exposure to Carcinogens, has the advantage of allowing the Supplemental Guidance to be modified as more data become available. Thus, the consideration of new, peer-reviewed scientific understanding and data in an assessment can always be consistent with the purposes of these cancer guidelines.
These cancer guidelines are intended as guidance only. They do not establish any substantive ``rules'' under the Administrative Procedure Act or any other law and have no binding effect on EPA or any regulated entity, but instead represent a non-binding statement of policy. EPA believes that the cancer guidelines represent a sound and up-to-date approach to cancer risk assessment, and the cancer guidelines enhance the application of the best available science in EPA's risk assessments. However, EPA cancer risk assessments may be conducted differently than envisioned in the cancer guidelines for many reasons, including (but not limited to) new information, new scientific understanding, or new science policy judgment. The science of risk assessment continues to develop rapidly, and specific components of the cancer guidelines may become outdated or may otherwise require modification in individual settings. Use of the cancer guidelines in future risk assessments will be based on decisions by EPA that the approaches are suitable and appropriate in the context of those particular risk assessments. These judgments will be tested through peer review, and risk assessments will be modified to use different approaches if appropriate.
Publications by the Office of Science and Technology (OSTP, 1985) and the National Research Council (NRC) (NRC, 1983, 1994) provide information and general principles about risk assessment. Risk assessment uses available scientific information on the properties of an agent \1\ and its effects in biological systems to provide an evaluation of the potential for harm as a consequence of environmental exposure. The 1983 and 1994 NRC documents organize risk assessment information into four areas: Hazard identification, dose-response assessment, exposure assessment, and risk characterization. This structure appears in these cancer guidelines, with additional emphasis placed on characterization of evidence and conclusions in each area of the assessment. In particular, the cancer guidelines adopt the approach of the NRC's 1994 report in adding a dimension of characterization to the hazard identification step: an evaluation of the conditions under which its expression is anticipated. Risk assessment questions addressed in these cancer guidelines are as follows.
\1\ The term ``agent'' refers generally to any chemical substance, mixture, or physical or biological entity being assessed, unless otherwise noted (See Section 1.2.2 for a note on radiation.).
For hazard--Can the identified agent present a carcinogenic hazard to humans and, if so, under what circumstances?
For dose response--At what levels of exposure might effects occur?
For exposure--What are the conditions of human exposure?
For risk--What is the character of the risk? How well do data support conclusions about the nature and extent of the risk from various exposures?
The risk characterization process first summarizes findings on hazard, dose response, and exposure characterizations and then develops an integrative analysis of the whole risk case. It ends in the writing of a technical risk characterization. Other documents, such as summaries for the risk managers and the public, reflecting the key points of the risk characterization are usually written. A summary for managers is a presentation for those who may or may not be familiar with the scientific details of cancer assessment. It also provides information for other interested readers. The initial steps in the risk characterization process are to make building blocks in the form of characterizations of the assessments of hazard, dose response, and exposure. The individual assessments and characterizations are then integrated to arrive at risk estimates for exposure scenarios of interest. As part of the characterization process, explicit evaluations are made of the hazard and risk potential for susceptible lifestages, including children (U.S. EPA, 1995, 2000b).
The 1994 NRC document also explicitly called attention to the role of the risk assessment process in identifying scientific uncertainties that, if addressed, could serve to reduce their uncertainty in future iterations of the risk assessment. NRC recommended that when the Agency ``reports estimates of risk to decisions-makers and the public, it should present not only point estimates of risk, but also the sources and magnitudes of uncertainty associated with these estimates'' (p. 15). Thus, the identified uncertainties serve as a feedback loop to the research community and decisionmakers, specifying areas and types of information that would be particularly useful.
There are several reasons for individually characterizing the hazard, dose response, and exposure assessments. One is that they are often done by different people than those who do the integrative analyses. The second is that there is very often a lapse of time between the conduct of hazard and dose-response analyses and the conduct of exposure assessment and integrative analysis. Thus, it is important to capture characterizations of assessments as the assessments are done to avoid the need to go back and reconstruct them. Finally, frequently a single hazard assessment is used by several programs for several different exposure scenarios. There may be one or several documents involved. ``Integrative analysis'' is a generic term; and many documents that have other titles may contain integrative analyses. In the following sections, the elements of these characterizations are discussed.
The cancer guidelines apply within the framework of policies provided by applicable EPA statutes and do not alter such policies.
The cancer guidelines cover the assessment of available data. They do not imply that one kind of data or another is prerequisite for regulatory action concerning any agent. It is important that, when evaluating and considering the use of any data, EPA analysts incorporate the basic standards of quality, as defined by the EPA Information Quality Guidelines (U.S. EPA, 2002a see Appendix B) and other Agency guidance on data quality such as the EPA Quality Manual for Environmental Programs (U.S. EPA, 2000e), as well as OMB Guidelines for Ensuring and Maximizing the Quality, Utility, and Integrity of Information Disseminated by Federal Agencies (OMB, 2002). It is very important that all analyses consider the basic standards of quality, including objectivity, utility, and integrity. A summary of the factors and considerations generally used by the Agency when evaluating and considering the use of scientific and technical information is contained in EPA's A Summary of General Assessment Factors for Evaluating the Quality of Scientific and Technical Information (U.S. EPA, 2003).
Risk management applies directives in statutes, which may require consideration of potential risk or solely hazard or exposure potential, along with social, economic, technical, and other factors in decision making. Risk assessments may be used to support decisions, but in order to maintain their integrity as decision-making tools, they are not influenced by consideration of the social or economic consequences of regulatory action.
The assessment of risk from radiation sources is informed by the continuing examination of human data by the National Academy of Sciences/NRC in its series of numbered reports: ``Biological Effects of Ionizing Radiation.'' Although some of the general principles of these cancer guidelines may also apply to radiation risk assessments, some of the details of their risk assessment procedures may not, as they are most focused on other kinds of agents. Therefore, these cancer guidelines are not intended to provide the primary source of, or guidance for, the Agency's evaluation of the carcinogenic risks of radiation.
Not every EPA assessment has the same scope or depth, a factor recognized by the National Academy of Sciences (NRC, 1996). For example, EPA's Information Quality Guidelines (U.S. EPA, 2002a, see Appendix B) discuss influential information that ``will have or does have a clear and substantial impact * * * on important public policies or private sector decisions * * * that should adhere to a rigorous standard of quality.'' It is often difficult to know a priori how the results of a risk assessment are likely to be used by the Agency. Some risk assessments may be used by Agency economists and policy analysts, and the necessary information for such analyses, as discussed in detail later in this document, should be included when practicable (U.S. EPA, 2002a). On the other hand, Agency staff often conduct screening-level assessments for priority setting or separate assessments of hazard or exposure for ranking purposes or to decide whether to invest resources in collecting data for a full assessment. Moreover, a given assessment of hazard and dose response may be used with more than one exposure assessment that may be conducted separately and at different times as the need arises in studying environmental problems related to various exposure media. The cancer guidelines apply to these various situations in appropriate detail, given the scope and depth of the particular assessment. For example, a screening assessment may be based almost entirely on structure-activity relationships (SARs) and default options, when other data are not readily available. When more data and resources are readily available, assessments can use a critical analysis of all of the available data as the starting point of the risk assessment. Under these conditions, default options would only be used to address uncertainties or the absence of critical data. Default options are inferences based on general scientific knowledge of the phenomena in question and are also matters of policy concerning the appropriate way to bridge uncertainties that concern potential risk to human health.
These cancer guidelines do not suggest that all of the kinds of data covered here will need to be available or used for either assessment or decision making. The level of detail of an assessment is a matter of Agency management discretion regarding applicable decision- making needs. The Agency generally presumes that key cancer information (e.g., assessments contained in the Agency's Integrated Risk Information System) is ``influential information'' as defined by the EPA Information Quality Guidelines and ``highly influential'' as defined by OMB's Information Quality Bulletin for Peer Review (OMB 2004).
As an increasing understanding of carcinogenesis is becoming available, these cancer guidelines adopt a view of default options that is consistent with EPA's mission to protect human health while adhering to the tenets of sound science. Rather than viewing default options as the starting point from which departures may be justified by new scientific information, these cancer guidelines view a critical analysis of all of the available information that is relevant to assessing the carcinogenic risk as the starting point from which a default option may be invoked if needed to address uncertainty or the absence of critical information. Preference is given to using information that has been peer reviewed, e.g., reported in peer- reviewed scientific journals. The primary goal of EPA actions is protection of human health; accordingly, as an Agency policy, risk assessment procedures, including default options that are used in the absence of scientific data to the contrary, should be health protective (U.S. EPA, 1999b).
Use of health protective risk assessment procedures as described in these cancer guidelines means that estimates, while uncertain, are more likely to overstate than understate hazard and/or risk. NRC (1994) reaffirmed the use of default options as ``a reasonable way to cope with uncertainty about the choice of appropriate models or theory'' (p. 104). NRC saw the need to treat uncertainty in a predictable way that is ``scientifically defensible, consistent with the agency's statutory mission, and responsive to the needs of decision-makers'' (p. 86). The extent of health protection provided to the public ultimately depends upon what risk managers decide is the appropriate course of regulatory action. When risk assessments are performed using only one set of procedures, it may be difficult for risk managers to determine how much health protectiveness is built into a particular hazard determination or risk characterization. When there are alternative procedures having significant biological support, the Agency encourages assessments to be performed using these alternative procedures, if feasible, in order to shed light on the uncertainties in the assessment, recognizing that the Agency may decide to give greater weight to one set of procedures than another in a specific assessment or management decision.
Encouraging risk assessors to be receptive to new scientific information, NRC discussed the need for departures from default options when a ``sufficient showing'' is made. It called on EPA to articulate clearly its criteria for a departure so that decisions to depart from default options would be ``scientifically credible and receive public acceptance'' (p. 91). It was concerned that ad hoc departures would undercut the scientific credibility of a risk assessment. NRC envisioned that principles for choosing and departing from default options would balance several objectives, including ``protecting the public health, ensuring scientific validity, minimizing serious errors in estimating risks, maximizing incentives for research, creating an orderly and predictable process, and fostering openness and trustworthiness'' (p. 81).
Appendices N-1 and N-2 of NRC (1994) discussed two competing standards for choosing default options articulated by members of the committee. One suggested approach would evaluate a departure in terms of whether ``it is scientifically plausible'' and whether it ``tends to protect public health in the face of scientific uncertainty'' (p. 601). An alternative approach ``emphasizes scientific plausibility with regard to the use of alternative models'' (p. 631). Reaching no consensus on a single approach, NRC recognized that developing criteria for departures is an EPA policy matter.
The basis for invoking a default option depends on the circumstances. Generally, if a gap in basic understanding exists or if agent-specific information is missing, a default option may be used. If agent-specific information is present but critical analysis reveals inadequacies, a default option may also be used. If critical analysis of agent-specific information is consistent with one or more biologically based models as well as with the default option, the alternative models and the default option are both carried through the assessment and characterized for the risk manager. In this case, the default model not only fits the data, but also serves as a benchmark for comparison with other analyses. This case also highlights the importance of extensive experimentation to support a conclusion about mode of action, including addressing the issue of whether alternative modes of action are also plausible. Section 2.4 provides a framework for critical analysis of mode of action information to address the extent to which the available information supports the hypothesized mode of action, whether alternative modes of action are also plausible, and whether there is confidence that the same inferences can be extended to populations and lifestages that are not represented among the experimental data.
Generally, cancer risk decisions strive to be ``scientifically defensible, consistent with the agency's statutory mission, and responsive to the needs of decision-makers'' (NRC, 1994). Scientific defensibility would be evaluated through use of EPA's Science Advisory Board, EPA's Office of Pesticide Programs' Scientific Advisory Panel, or other independent expert peer review panels to determine whether a consensus among scientific experts exists. Consistency with the Agency's statutory mission would consider whether the risk assessment overall supports EPA's mission to protect human health and safeguard the natural environment. Responsiveness to the needs of decisionmakers would take into account pragmatic considerations such as the nature of the decision; the required depth of analysis; the utility, time, and cost of generating new scientific data; and the time, personnel, and resources allotted to the risk assessment.
With a multitude of types of data, analyses, and risk assessments, as well as the diversity of needs of decisionmakers, it is neither possible nor desirable to specify step-by-step criteria for decisions to invoke a default option. A discussion of major default options appears in the Appendix. Screening-level assessments may more readily use default parameters, even worst-case assumptions, that would not be appropriate in a full-scale assessment. On the other hand, significant risk management decisions will often benefit from a more comprehensive assessment, including alternative risk models having significant biological support. To the extent practicable, such assessments should provide central estimates of potential risks in conjunction with lower and upper bounds (e.g., confidence limits) and a clear statement of the uncertainty associated with these estimates.
In the absence of sufficient data or understanding to develop of a robust, biologically based model, an appropriate policy choice is to have a single preferred curve-fitting model for each type of data set. Many different curve-fitting models have been developed, and those that fit the observed data reasonably well may lead to several-fold differences in estimated risk at the lower end of the observed range. In addition, goodness-of-fit to the experimental observations is not by itself an effective means of discriminating among models that adequately fit the data (OSTP, 1985). To provide some measure of consistency across different carcinogen assessments, EPA uses a standard curve-fitting procedure for tumor incidence data. Assessments that include a different approach should provide an adequate justification and compare their results with those from the standard procedure. Application of models to data should be conducted in an open and transparent manner.
The use of mode of action \2\ in the assessment of potential carcinogens is a main focus of these cancer guidelines. This area of emphasis arose because of the significant scientific advances that have developed concerning the causes of cancer induction. Elucidation of a mode of action for a particular cancer response in animals or humans is a data-rich determination. Significant information should be developed to ensure that a scientifically justifiable mode of action underlies the process leading to cancer at a given site. In the absence of sufficiently, scientifically justifiable mode of action information, EPA generally takes public health- protective, default positions regarding the interpretation of toxicologic and epidemiologic data: Animal tumor findings are judged to be relevant to humans, and cancer risks are assumed to conform with low dose linearity.
\2\ The term ``mode of action'' is defined as a sequence of key events and processes, starting with interaction of an agent with a cell, proceeding through operational and anatomical changes, and resulting in cancer formation. A ``key event'' is an empirically observable precursor step that is itself a necessary element of the mode of action or is a biologically based marker for such an element. Mode of action is contrasted with ``mechanism of action,'' which implies a more detailed understanding and description of events, often at the molecular level, than is meant by mode of action. The toxicokinetic processes that lead to formation or distribution of the active agent to the target tissue are considered in estimating dose but are not part of the mode of action as the term is used here. There are many examples of possible modes of carcinogenic action, such as mutagenicity, mitogenesis, inhibition of cell death, cytotoxicity with reparative cell proliferation, and immune suppression.
Understanding of mode of action can be a key to identifying processes that may cause chemical exposures to differentially affect a particular population segment or lifestage. Some modes of action are anticipated to be mutagenic and are assessed with a linear approach. This is the mode of action of radiation and several other agents that are known carcinogens. Other modes of action may be modeled with either linear or nonlinear \3\ approaches after a rigorous analysis of available data under the guidance provided in the framework for mode of action analysis (see Section 2.4.3).
\3\ The term ``nonlinear'' is used here in a narrower sense than its usual meaning in the field of mathematical modeling. In these cancer guidelines, the term ``nonlinear'' refers to threshold models (which show no response over a range of low doses that include zero) and some nonthreshold models (e.g., a quadractic model, which shows some response at all doses above zero). In these cancer guidelines, a nonlinear model is one whose slope is zero at (and perhaps above) a dose of zero. A low-dose-linear model is one whose slope is greater than zero at a dose of zero. A low-dose-linear model approximates a straight line only at very low doses; at higher doses near the observed data, a low-dose-linear model can display curvature. The term ``low-dose-linear'' is often abbreviated ``linear,'' although a low-dose-linear model is not linear at all doses. Use of nonlinear approaches does not imply a biological threshold dose below which the response is zero. Estimating thresholds can be problematic; for example, a response that is not statistically significant can be consistent with a small risk that falls below an experiment's power of detection.
The cancer guidelines emphasize the importance of weighing all of the evidence in reaching conclusions about the human carcinogenic potential of agents. This is accomplished in a single integrative step after assessing all of the individual lines of evidence, which is in contrast to the step-wise approach in the 1986 cancer guidelines. Evidence considered includes tumor findings, or lack thereof, in humans and laboratory animals; an agent's chemical and physical properties; its structure-activity relationships (SARs) as compared with other carcinogenic agents; and studies addressing potential carcinogenic processes and mode(s) of action, either in vivo or in vitro. Data from epidemiologic studies are generally preferred for characterizing human cancer hazard and risk. However, all of the information discussed above could provide valuable insights into the possible mode(s) of action and likelihood of human cancer hazard and risk. The cancer guidelines recognize the growing sophistication of research methods, particularly in their ability to reveal the modes of action of carcinogenic agents at cellular and subcellular levels as well as toxicokinetic processes.
Weighing of the evidence includes addressing not only the likelihood of human carcinogenic effects of the agent but also the conditions under which such effects may be expressed, to the extent that these are revealed in the toxicological and other biologically important features of the agent.
The weight of evidence narrative to characterize hazard summarizes the results of the hazard assessment and provides a conclusion with regard to human carcinogenic potential. The narrative explains the kinds of evidence available and how they fit together in drawing conclusions, and it points out significant issues/strengths/limitations of the data and conclusions. Because the narrative also summarizes the mode of action information, it sets the stage for the discussion of the rationale underlying a recommended approach to dose-response assessment.
In order to provide some measure of clarity and consistency in an otherwise free-form, narrative characterization, standard descriptors are used as part of the hazard narrative to express the conclusion regarding the weight of evidence for carcinogenic hazard potential. There are five recommended standard hazard descriptors: ``Carcinogenic to Humans,'' ``Likely to Be Carcinogenic to Humans,'' ``Suggestive Evidence of Carcinogenic Potential,'' ``Inadequate Information to Assess Carcinogenic Potential,'' and ``Not Likely to Be Carcinogenic to Humans.'' Each standard descriptor may be applicable to a wide variety of data sets and weights of evidence and is presented only in the context of a weight of evidence narrative. Furthermore, as described in Section 2.5 of these cancer guidelines, more than one conclusion may be reached for an agent.
Dose-response assessment evaluates potential risks to humans at particular exposure levels. The approach to dose-response assessment for a particular agent is based on the conclusion reached as to its potential mode(s) of action for each tumor type. Because an agent may induce multiple tumor types, the dose-response assessment includes an analysis of all tumor types, followed by an overall synthesis that includes a characterization of the risk estimates across tumor types, the strength of the mode of action information of each tumor type, and the anticipated relevance of each tumor type to humans, including susceptible populations and lifestages (e.g., childhood).
Dose-response assessment for each tumor type is performed in two steps: assessment of observed data to derive a point of departure (POD),\4\ followed by extrapolation to lower exposures to the extent that is necessary. Data from epidemiologic studies, of sufficient quality, are generally preferred for estimating risks. When animal studies are the basis of the analysis, the estimation of a human- equivalent dose should utilize toxicokinetic data to inform cross- species dose scaling if appropriate and if adequate data are available. Otherwise, default procedures should be applied. For oral dose, based on current science, an appropriate default option is to scale daily applied doses experienced for a lifetime in proportion to body weight raised to the \3/4\ power (U.S. EPA, 1992b). For inhalation dose, based on current science, an appropriate default methodology estimates respiratory deposition of particles and gases and estimates internal doses of gases with different absorption characteristics. When toxicokinetic modeling (see Section 3.1.2) is used without toxicodynamic modeling (see Section 3.2.2), the dose-response assessment develops and supports an approach for addressing toxicodynamic equivalence, perhaps by retaining some of the cross- species scaling factor (see Section 3.1.3). Guidance is also provided for adjustment of dose from adults to children (see Section 4.3.1).
\4\ A ``point of departure'' (POD) marks the beginning of extrapolation to lower doses. The POD is an estimated dose (usually expressed in human-equivalent terms) near the lower end of the observed range, without significant extrapolation to lower doses.
Response data on effects of the agent on carcinogenic processes are analyzed (nontumor data) in addition to data on tumor incidence. If appropriate, the analyses of data on tumor incidence and on precursor effects may be used in combination. To the extent the relationship between precursor effects and tumor incidence are known, precursor data may be used to estimate a dose-response function below the observable tumor data. Study of the dose-response function for effects believed to be part of the carcinogenic process influenced by the agent may also assist in evaluating the relationship of exposure and response in the range of observation and at exposure levels below the range of observation.
The first step of dose-response assessment is evaluation within the range of observation. Approaches to analysis of the range of observation of epidemiologic studies are determined by the type of study and how dose and response are measured in the study. In the absence of adequate human data for dose-response analysis, animal data are generally used. If there are sufficient quantitative data and adequate understanding of the carcinogenic process, a biologically based model may be developed to relate dose and response data on an agent-specific basis. Otherwise, as a default procedure, a standard model can be used to curve-fit the data.
The POD for extrapolating the relationship to environmental exposure levels of interest, when the latter are outside the range of observed data, is generally the lower 95% confidence limit on the lowest dose level that can be supported for modeling by the data. SAB (1997) suggested that, ``it may be appropriate to emphasize lower statistical bounds in screening analyses and in activities designed to develop an appropriate human exposure value, since such activities require accounting for various types of uncertainties and a lower bound on the central estimate is a scientifically-based approach accounting for the uncertainty in the true value of the ED
The second step of dose-response assessment is extrapolation to lower dose levels, if needed. This extrapolation is based on extension of a biologically based model if supported by substantial data (see Section 3.3.2). Otherwise, default approaches can be applied that are consistent with current understanding of mode(s) of action of the agent, including approaches that assume linearity or nonlinearity of the dose-response relationship, or both. A default approach for linearity extends a straight line from the POD to zero dose/zero response (see Section 3.3.3). The linear approach is used when: (1) There is an absence of sufficient information on modes of action or (2) the mode of action information indicates that the dose-response curve at low dose is or is expected to be linear. Where alternative approaches have significant biological support, and no scientific consensus favors a single approach, an assessment may present results using alternative approaches. A nonlinear approach can be used to develop a reference dose or a reference concentration (see Section 3.3.4).
An important use of mode of action information is to identify susceptible populations and lifestages. It is rare to have epidemiologic studies or animal bioassays conducted in susceptible individuals. This information need can be filled by identifying the key events of the mode of action and then identifying risk factors, such as differences due to genetic polymorphisms, disease, altered organ function, lifestyle, and lifestage, that can augment these key events. To do this, the information about the key precursor events is reviewed to identify particular populations or lifestages that can be particularly susceptible to their occurrence (see Section 2.4.3.4). Any information suggesting quantitative differences between populations or lifestages is flagged for consideration in the dose-response assessment (see Section 3.5 and U.S. EPA 2002b).
NRC (1994) recommended that ``EPA should assess risks to infants and children whenever it appears that their risks might be greater than those of adults.'' Executive Order 13045 (1997) requires that ``each Federal Agency shall make it a high priority to identify and assess environmental health and safety risks that may disproportionately affect children, and shall ensure that their policies, programs, and standards address disproportionate risks that result from environmental health risks or safety risks.'' In assessing risks to children, EPA considers both effects manifest during childhood and early-life exposures that can contribute to effects at any time later in life.
These cancer guidelines view childhood as a sequence of lifestages rather than viewing children as a subpopulation, the distinction being that a subpopulation refers to a portion of the population, whereas a lifestage is inclusive of the entire population. Exposures that are of concern extend from conception through adolescence and also include pre-conception exposures of both parents. These cancer guidelines use the term ``childhood'' in this more inclusive sense.
Rarely are there studies that directly evaluate risks following early-life exposure. Epidemiologic studies of early-life exposure to environmental agents are seldom available. Standard animal bioassays generally begin dosing after the animals are several weeks old, when many organ systems are mature. This could lead to an understatement of risk, because an accepted concept in the science of carcinogenesis is that young animals are usually more susceptible to the carcinogenic activity of a chemical than are mature animals (McConnell, 1992).
At this time, there is some evidence of higher cancer risks following early-life exposure. For radiation carcinogenesis, data indicate that risks for several forms of cancer are highest following childhood exposure (NRC, 1990; Miller, 1995; U.S. EPA, 1999c). These human results are supported by the few animal bioassays that include perinatal (prenatal or early postnatal) exposure. Perinatal exposure to some agents can induce higher incidences of the tumors seen in standard bioassays; some examples include vinyl chloride (Maltoni et al., 1981), diethylnitrosamine (Peto et al., 1984), benzidine, DDT, dieldrin, and safrole (Vesselinovitch et al., 1979). Moreover, perinatal exposure to some agents, including vinyl chloride (Maltoni et al., 1981) and saccharin (Cohen, 1995; Whysner and Williams, 1996), can induce different tumors that are not seen in standard bioassays. Surveys comparing perinatal carcinogenesis bioassays with standard bioassays for a limited number of chemicals (McConnell, 1992; U.S. EPA, 1996b) have concluded that
These empirical results are consistent with current understanding of the biological processes involved in carcinogenesis, which leads to a reasonable expectation that children can be more susceptible to many carcinogenic agents (Anderson et al., 2000; Birnbaum and Fenton, 2003; Ginsberg, 2003; Miller et al., 2002; Scheuplein et al., 2002). Some aspects potentially leading to childhood susceptibility are listed below.
Differences in the capacity to metabolize and clear chemicals can result in larger or smaller internal doses of the active agent(s). More frequent cell division during development can result in enhanced expression of mutations due to the reduced time available for repair of DNA lesions (Slikker et al., 2004). Some embryonic cells, such as brain cells, lack key DNA repair enzymes. Some components of the immune system are not fully functional during development (Holladay and Smialowicz, 2000; Holsapple et al., 2003). Hormonal systems operate at different levels during different lifestages. Induction of developmental abnormalities can result in a predisposition to carcinogenic effects later in life (Anderson et al., 2000; Birnbaum and Fenton, 2003; Fenton and Davis, 2002). To evaluate risks from early-life exposure, these cancer guidelines emphasize the role of toxicokinetic information to estimate levels of the active agent in children and toxicodynamic information to identify whether any key events of the mode of action are of increased concern early in life. Developmental toxicity studies can provide information on critical periods of exposure for particular targets of toxicity. An approach to assessing risks from early-life exposure is presented in Figure 1-1. In the hazard assessment, when there are mode of action data, the assessment considers whether these data have special relevance during childhood, considering the various aspects of development listed above. Examples of such data include toxicokinetics that predict a sufficiently large internal dose in children or a mode of action where a key precursor event is more likely to occur during childhood. There is no recommended default to settle the question of whether tumors arising through a mode of action are relevant during childhood; and adequate understanding the mode of action implies that there are sufficient data (on either the specific agent or the general mode of action) to form a confident conclusion about relevance during childhood (see Section 2.4.3.4). In the dose-response assessment, the potential for susceptibility during childhood warrants explicit consideration in each assessment. These cancer guidelines encourage developing separate risk estimates for children according to a tiered approach that considers what pertinent data are available (see Section 3.5). Childhood may be a susceptible period; moreover, exposures during childhood generally are not equivalent to exposures at other times and may be treated differently from exposures occurring later in life (see Section 3.5). In addition, adjustment of unit risk estimates may be warranted when used to estimate risks from childhood exposure (see Section 4.4). At this time, several limitations preclude a full assessment of children's risk. There are no generally used testing protocols to identify potential environmental causes of cancers that are unique to children, including several forms of childhood cancer and cancers that develop from parental exposures, and cases where developmental exposure may alter susceptibility to carcinogen exposure in the adult (Birnbaum and Fenton, 2003). Dose-response assessment is limited by an inability to observe how developmental exposure can modify incidence and latency and an inability to estimate the ultimate tumor response resulting from induced susceptibility to later carcinogen exposures. To partially address the limitations identified above, EPA developed in conjunction with these cancer guidelines, Supplemental Guidance for Assessing Susceptibility from Early-Life Exposure to Carcinogens (``Supplemental Guidance''). The Supplemental Guidance addresses a number of issues pertaining to cancer risks associated with early-life exposures generally, but provides specific guidance on procedures for adjusting cancer potency estimates only for carcinogens acting through a mutagenic mode of action. This Supplemental Guidance recommends, for such chemicals when no chemical-specific data exist, a default approach using estimates from chronic studies (i.e., cancer slope factors) with appropriate modifications to address the potential for differential risk of early-lifestage exposure. The Agency considered both the advantages and disadvantages to extending the recommended, age dependent adjustment factors for carcinogenic potency to carcinogenic agents for which the mode of action remains unknown. EPA decided to recommend these factors only for carcinogens acting through a mutagenic mode of action based on a combination of analysis of available data and long-standing science policy positions which govern the Agency's overall approach to carcinogen risk assessment. In general, the Agency prefers to rely on analyses of data, rather than general defaults. When data are available for a sensitive lifestage, they would be used directly to evaluate risks for that chemical and that lifestage on a case-by-case basis. In the case of nonmutagenic carcinogens, when the mode of action is unknown, the data were judged by EPA to be too limited and the modes of action too diverse to use this as a category for which a general default adjustment factor approach can be applied. In this situation, a linear low-dose extrapolation methodology (without further adjustment) is recommended. It is the Agency's long-standing science policy position that use of the linear low-dose extrapolation approach provides adequate public health conservatism in the absence of chemical-specific data indicating differential early-life sensitivity or when the mode of action is not mutagenic. The Agency expects to produce additional supplemental guidance for other modes of action, as data from new research and toxicity testing indicate it is warranted. EPA intends to focus its research, and work collaboratively with its federal partners, to improve understanding of the implications of early life exposure to carcinogens. Development of guidance for estrogenic agents and chemicals acting through other processes resulting in endocrine disruption and subsequent carcinogenesis, for example, might be a reasonable priority in light of the human experience with diethylstilbesterol and the existing early life animal studies. It is worth noting that each mode of action for endocrine disruption will probably require separate analysis. As the Agency examines additional carcinogenic agents, the age groupings may differ from those recommended for assessing cancer risks from early-life exposure to chemicals with a mutagenic mode of action. Puberty and its associated biological changes, for example, involve many biological processes that could lead to changes in sensitivity to the effects of some carcinogens, depending on their mode of action. The Agency is interested in identifying lifestages that may be particularly sensitive or refractory for carcinogenesis, and believes that the mode of action framework described in these cancer guidelines is an appropriate mechanism for elucidating these lifestages. For each additional mode of action evaluated, the various age groupings determined to be at differential risk may differ from those proposed in the Supplemental Guidance. For example, the age groupings selected for the age-dependent adjustments for carcinogens acting through a mutagenic mode of action were initially selected based on the available data, i.e., for the laboratory animal age range representative of birth to < 2 years in humans. More limited data and information on human biology were used to determine a science-informed policy regarding 2 to < 16 years. Data were not available to refine the latter age group. If more data become available regarding carcinogens with a mutagenic mode of action, consideration may be given to further refinement of these age groups. The cancer guidelines emphasize the importance of a clear and useful characterization narrative that summarizes the analyses of hazard, dose-response, and exposure assessment. These characterizations summarize the assessments to explain the extent and weight of evidence, major points of interpretation and rationale for their selection, strengths and weaknesses of the evidence and the analysis, and discuss alternative conclusions and uncertainties that deserve serious consideration (U.S. EPA, 2000b). They serve as starting materials for the overall risk characterization process that completes the risk assessment. The purpose of hazard assessment is to review and evaluate data pertinent to two questions: (1) Whether an agent may pose a carcinogenic hazard to human beings, and (2) under what circumstances an identified hazard may be expressed (NRC, 1994). Hazard assessment involves analyses of a variety of data that may range from observations of tumor responses to analysis of structure-activity relationships (SARs). The purpose of the assessment is not simply to assemble these separate evaluations; its purpose is to construct a total analysis examining what the biological data reveal as a whole about carcinogenic effects and mode of action of the agent, and their implications for human hazard and dose-response evaluation. Conclusions are drawn from weight-of-evidence evaluations based on the combined strength and coherence of inferences appropriately drawn from all of the available information. To the extent that data permit, hazard assessment addresses the question of mode of action of an agent as both an initial step in identifying human hazard potential and as a component in considering appropriate approaches to dose-response assessment. The topics in this chapter include analysis of tumor data, both human and animal, and analysis of other key information about properties and effects that relate to carcinogenic potential. The chapter addresses how information can be used to evaluate potential modes of action. It also provides guidance on performing a weight of evidence evaluation. Presentation of the results of hazard assessment should be informed by Agency guidance as discussed in Section 2.6. The results are presented in a technical hazard characterization that serves as a support to later risk characterization. It includes: Another presentation feature is the use of a weight of evidence narrative that includes both a conclusion about the weight of evidence of carcinogenic potential and a summary of the data on which the conclusion rests. This narrative is a brief summary that in toto replaces the alphanumerical classification system used in EPA's 1986 cancer guidelines (U.S. EPA, 1986a). Evidence of carcinogenicity comes from finding tumor increases in humans or laboratory animals exposed to a given agent or from finding tumors following exposure to structural analogues to the compound under review. The significance of observed or anticipated tumor effects is evaluated in reference to all the other key data on the agent. This section contains guidance for analyzing human and animal studies to decide whether there is an association between exposure to an agent or a structural analogue and occurrence of tumors. Note that the use of the term ``tumor'' in these cancer guidelines is defined as malignant neoplasms or a combination of malignant and corresponding benign neoplasms. Observation of only benign neoplasia may or may not have significance for evaluation under these cancer guidelines. Benign tumors that are not observed to progress to malignancy are assessed on a case-by-case basis. There is a range of possibilities for their overall significance. They may deserve attention because they are serious health problems even though they are not malignant; for instance, benign tumors may be a health risk because of their effect on the function of a target tissue such as the brain. They may be significant indicators of the need for further testing of an agent if they are observed in a short-term test protocol, or such an observation may add to the overall weight of evidence if the same agent causes malignancies in a long-term study. Knowledge of the mode of action associated with a benign tumor response may aid in the interpretation of other tumor responses associated with the same agent. In other cases, observation of a benign tumor response alone may have no significant health hazard implications when other sources of evidence show no suggestion of carcinogenicity. Human data may come from epidemiologic studies or case reports. (Clinical human studies, which involve intentional exposures to substances, may provide toxicokinetic data, but generally not data on carcinogenicity.) The most common sources of human data for cancer risk assessment are epidemiologic investigations. Epidemiology is the study of the distribution of disease in human populations and the factors that may influence that distribution. The goals of cancer epidemiology are to identify distribution of cancer risk and determine the extent to which the risk can be attributed causally to specific exposures to exogenous or endogenous factors (see Centers for Disease Control and Prevention [CDC, 2004]). Epidemiologic data are extremely valuable in risk assessment because they provide direct evidence on whether a substance is likely to produce cancer in humans, thereby avoiding issues such as: species-to-species inference, extrapolation to exposures relevant to people, effects of concomitant exposures due to lifestyles. Thus, epidemiologic studies typically evaluate agents under more relevant conditions. When human data of high quality and adequate statistical power are available, they are generally preferable over animal data and should be given greater weight in hazard characterization and dose-response assessment, although both can be used. Null results from epidemiologic studies alone generally do not prove the absence of carcinogenic effects because such results can arise either from an agent being truly not carcinogenic or from other factors such as: inadequate statistical power, inadequate study design, imprecise estimates, or confounding factors. Moreover, null results from a well-designed and well-conducted epidemiologic study that contains usable exposure data can help to define upper limits for the estimated dose of concern for human exposure in cases where the overall weight of the evidence indicates that the agent is potentially carcinogenic in humans. Furthermore, data from a well designed and well conducted epidemiologic study that does not show positive results, in conjunction with compelling mechanistic information, can lend support to a conclusion that animal responses may not be predictive of a human cancer hazard. Epidemiology can also complement experimental evidence in corroborating or clarifying the carcinogenic potential of the agent in question. For example, epidemiologic studies that show elevated cancer risk for tumor sites corresponding to those at which laboratory animals experience increased tumor incidence can strengthen the weight of evidence of human carcinogenicity. Furthermore, biochemical or molecular epidemiology may help improve understanding of the mechanisms of human carcinogenesis. All studies that are considered to be of acceptable quality, whether yielding positive or null results, or even suggesting protective carcinogenic effects, should be considered in assessing the totality of the human evidence. Conclusions about the overall evidence for carcinogenicity from available studies in humans should be summarized along with a discussion of uncertainties and gaps in knowledge. Conclusions regarding the strength of the evidence for positive or negative associations observed, as well as evidence supporting judgments of causality, should be clearly described. In assessing the human data within the overall weight of evidence, determination about the strength of the epidemiologic evidence should clearly identify the degree to which the observed associations may be explained by other factors, including bias or confounding. Characteristics that are generally desirable in epidemiologic studies include (1) Clear articulation of study objectives or hypothesis; (2) proper selection and characterization of comparison groups (exposed and unexposed groups or case and control groups); (3) adequate characterization of exposure; (4) sufficient length of follow- up for disease occurrence; (5) valid ascertainment of the causes of cancer morbidity and mortality; (6) proper consideration of bias and confounding factors; (7) adequate sample size to detect an effect; (8) clear, well-documented, and appropriate methodology for data collection and analysis; (9) adequate response rate and methodology for handling missing data; and (10) complete and clear documentation of results. No single criterion determines the overall adequacy of a study. Practical and resource constraints may limit the ability to address all of these characteristics in a study. The risk assessor is encouraged to consider how the limitations of the available studies might influence the conclusions. While positive biases may be due, for example, to a healthy worker effect, it is also important to consider negative biases, for example, workers who may leave the workforce due to illness caused either by high exposures to the agent or to effects of confounders such as smoking. The following discussions highlight the major factors included in an analysis of epidemiologic studies. The major types of cancer epidemiologic study designs used for examining environmental causes of cancer are analytical studies and descriptive studies. Each study type has well-known strengths and weaknesses that affect interpretation of results, as summarized below (Lilienfeld and Lilienfeld, 1979; Mausner and Kramer, 1985; Kelsey et al., 1996; Rothman and Greenland, 1998). Analytical epidemiologic studies, which include case-control and cohort designs, are generally relied on for identifying a causal association between human exposure and adverse health effects. In case- control studies, groups of individuals with (cases) and without (controls) a particular disease are identified and compared to determine differences in exposure. In cohort studies, a group of ``exposed'' and ``nonexposed'' individuals are identified and studied over time to determine differences in disease occurrence. Cohort studies can be performed either prospectively or retrospectively from historical records. The type of study chosen may depend on the hypothesis to be evaluated. For example, case-control studies may be more appropriate for rare cancers while cohort studies may be more appropriate for more commonly occurring cancers. On the other hand, descriptive epidemiologic studies examine symptom or disease rates among populations in relation to personal characteristics such as age, gender, race, and temporal or environmental conditions. Descriptive studies are most frequently used to generate hypotheses about exposure factors, but subsequent analytical designs are necessary to infer causality. For example, cross-sectional designs might be used to compare the prevalence of cancer between areas near and far from a Superfund site. However, in studies where exposure and disease information applies only to the current conditions, it is not possible to infer that the exposure actually caused the disease. Therefore, these studies are used to identify patterns or trends in disease occurrence over time or in different geographical locations, but typical limitations in the characterization of populations in these studies make it difficult to infer the causal agent or degree of exposure. Case reports describe a particular effect in an individual or group of individuals who were exposed to a substance. These reports are often anecdotal or highly selective in nature and generally are of limited use for hazard assessment. Specifically, cancer causality can rarely be inferred from case reports alone. Investigative follow-up may or may not accompany such reports. For cancer, the most common types of case series are associated with occupational and childhood exposures. Case reports can be particularly valuable for identifying unique features, such as an association with an uncommon tumor (e.g., inhalation of vinyl chloride and hepatic angiosarcoma in workers or ingestion of diethylstilbestrol by mothers and clear-cell carcinoma of the vagina in offspring). For epidemiologic data to be useful in determining whether there is an association between health effects and exposure to an agent, there should be adequate characterization of exposure information. In general, greater weight should be given to studies with more precise and specific exposure estimates. Questions to address about exposure are: What can one reliably conclude about the exposure parameters including (but not limited to) the level, duration, route, and frequency of exposure of individuals in one population as compared with another? How sensitive are study results to uncertainties in these parameters? Actual exposure measurements are not available for many retrospective studies. Therefore, surrogates are often used to reconstruct exposure parameters. These may involve attributing exposures to job classifications in a workplace or to broader occupational or geographic groupings. Use of surrogates carries a potential for misclassification, i.e., individuals may be placed in an incorrect exposure group. Misclassification generally leads to reduced ability of a study to detect differences between study and referent populations. When either current or historical monitoring data are available, the exposure evaluation includes consideration of the error bounds of the monitoring and analytic methods and whether the data are from routine or accidental exposures. The potential for misclassification and for measurement errors is amenable to both qualitative and quantitative analysis. These are essential analyses for judging a study's results, because exposure estimation is the most critical part of a retrospective study. Biological markers potentially offer excellent measures of exposure (Hulka and Margolin, 1992; Peto and Darby, 1994). In some cases, molecular or cellular effects (e.g., DNA or protein adducts, mutation, chromosomal aberrations, levels of thyroid-stimulating hormone) can be measured in blood, body fluids, cells, and tissues to serve as biomarkers of exposure in humans and animals (Callemen et al., 1978; Birner et al., 1990). As such, they can act as an internal surrogate measure of chemical dose, representing, as appropriate, either recent exposure (e.g., serum concentration) or accumulated exposure over some period (e.g., hemoglobin adducts). Validated markers of exposure such as alkylated hemoglobin from exposure to ethylene oxide (Van Sittert et al., 1985) or urinary arsenic (Enterline et al., 1987) can improve estimates of dose over the relevant time periods for the markers. Markers closely identified with effects promise to greatly increase the ability of studies to distinguish real effects from bias at low levels of relative risk between populations (Taylor et al., 1994; Biggs et al., 1993) and to resolve problems of confounding risk factors. However, when using molecular or cellular effects as biomarkers of exposure, since many of these changes are often not specific to just one type of exposure, it is important to be aware that changes may be due to exposures unrelated to the exposure of interest and attention must be paid to controlling for potential confounders. Biochemical or molecular epidemiologic studies may use biological markers of effect as indicators of disease or its precursors. The application of techniques for measuring cellular and molecular alterations due to exposure to specific environmental agents may allow conclusions to be drawn about the mechanisms of carcinogenesis (see section 2.4 for more information on this topic). Control for potential confounding factors is an important consideration in the evaluation of the design and in the analysis of observational epidemiologic studies. A confounder is a variable that is related to both the health outcome of concern (cancer) and exposure. Common examples include age, socioeconomic status, smoking habits, and diet. For instance, if older people are more likely to be exposed to a given contaminant as well as more likely to have cancer because of their age, age is considered a confounder. Adjustment for potentially confounding factors (from a statistical as contrasted with an epidemiologic point of view) can occur either in the design of the study (e.g., individual or group matching on critical factors) or in the statistical analysis of the results (stratification or direct or indirect adjustment). Direct adjustment in the statistical analysis may not be possible owing to the presentation of the data or because needed information was not collected during the study. In this case, indirect comparisons may be possible. For example, in the absence of data on smoking status among individuals in the study population, an examination of the possible contribution of cigarette smoking to increased lung cancer risk may be based on information from other sources, such as the American Cancer Society's longitudinal studies (Hammand, 1966; Garfinkel and Silverberg, 1991). The effectiveness of adjustments contributes to the ability to draw inferences from a study. Different studies involving exposure to an agent may have different confounding factors. If consistent increases in cancer risk are observed across a collection of studies with different confounding factors, the inference that the agent under investigation was the etiologic factor is strengthened. There may also be instances where the agent of interest is a risk factor in conjunction with another agent. For instance, interaction as well as effect-measure modification are sometimes construed to be confounding, but they are different than confounding. Interaction is described as a situation in which two or more risk factors modify the effect of each other with regard to the occurrence of a given effect. This phenomenon is sometimes described as effect-measure modification or heterogeneity of effect (Szklo and Nieto, 2000). Effect-measure modification refers to variation in the magnitude of measure exposure effect across levels of another variable (Rothman and Greenland, 1998). The variable across which the effect measure varies and is called an effect modifier (e.g., hepatitis virus B and aflatoxin in hepatic cancer). Interaction, on the other hand, means effect of the exposure on the outcome differs, depending on the presence of another variable (the effect modifier). When the effect of the exposure of interest is accentuated by another variable, it is said to be synergistic interaction. Synergistic interaction can be additive (e.g., hepatitis virus B and aflatoxin in hepatic cancer) or multiplicative (e.g., asbestos and smoking in lung cancer). If the effect of exposure is diminished or eliminated by another variable, it said to be antagonistic interaction (e.g., intake of vitamin E and lower occurrence of lung cancer). The analysis should apply appropriate statistical methods to ascertain whether the observed association between exposure and effects would be expected by chance. A description of the method or methods used should include the reasons for their selection. Statistical analyses of the bias, confounding, and interaction are part of addressing the significance of an association and the power of a study to detect an effect. The analysis augments examination of the results for the whole population with exploration of the results for groups with comparatively greater exposure or time since first exposure. This may support identifying an association or establishing a dose-response trend. When studies show no association, such exploration may apply to determining an upper limit on potential human risk for consideration alongside results of animal tumor effects studies. The power of a study--the likelihood of observing an effect if one exists--increases with sample size, i.e., the number of subjects studied from a population. (For example, a quadrupling of a background rate in the 1 per 10,000 range would require more subjects who have experienced greater or longer exposure or lengthier follow-up, than a doubling of a background rate in the 1 per 100 range.) If the size of the effect is expected to be very small at low doses, higher doses or longer durations of exposure may be needed to have an appreciable likelihood of observing an effect with a given sample size. Because of the often long latency period in cancer development, the likelihood of observing an effect also depends on whether adequate time has elapsed since exposure began for effects to occur. Since the design of the study and the choice of analysis, as well as the design level of certainty in the results and the magnitude of response in an unexposed population also affect the likelihood of observing an effect, it is important to carefully interpret the absence of an observed effect. A unique feature that can be ascribed to the effects of a particular agent (such as a tumor type that is seen only rarely in the absence of the agent) can increase sensitivity by permitting separation of bias and confounding factors from real effects. Similarly, a biomarker particular to the agent can permit these distinctions. Statistical re- analyses of data, particularly an examination of different exposure indices, can give insight into potential exposure- response relationships. These are all factors to explore in statistical analysis of the data. When comparing cases and controls or exposed and non-exposed populations, it would be preferable for the two populations to differ only in exposure to the agent in question. Because this is seldom the case, it is important to identify sources of sampling and other potential biases inherent in a study design or data collection methods. Bias is a systematic error. In epidemiologic studies, bias can occur in the selection of cases and controls or exposed and non-exposed populations, as well as the follow up of the groups, or the classification of disease or exposure. The size of the risks observed can be affected by noncomparability between populations of factors such as general health, diet, lifestyle, or geographic location; differences in the way case and control individuals recall past events; differences in data collection that result in unequal ascertainment of health effects in the populations; and unequal follow-up of individuals (Rothman and Greenland, 1998). Other factors worth consideration can be inherent in the available cohorts, e.g., use of occupational studies (the healthy worker effect), absence of one sex, or limitations in sample size for one or more ethnicities. The mere presence of biases does not invalidate a study, but should be reflected in the judgment of its strengths or weaknesses. Acceptance of studies for assessment depends on identifying their sources of bias and the possible effects on study results. Meta-analysis is a means of integrating the results of multiple studies of similar health effects and risk factors. This technique is particularly useful when various studies yield varying degrees of risk or even conflicting associations (negative and positive). It is intended to introduce consistency and comprehensiveness into what otherwise might be a more subjective review of the literature. The value of such an analysis is dependent upon a systematic review of the literature that uses transparent criteria of inclusion and exclusion. In interpreting such analyses, it is important to consider the effects of differences in study quality, as well as the effect of publication bias. Meta-analysis may not be advantageous in some circumstances. These include when the relationship between exposure and disease is obvious from the individual studies; when there are only a few studies of the key health outcomes; when there is insufficient information from available studies related to disease, risk estimate, or exposure classification to insure comparability; or when there are substantial confounding or other biases that cannot be adjusted for in the analysis (Blair et al., 1995; Greenland, 1987; Peto, 1992). Determining whether an observed association (risk) is causal rather than spurious involves consideration of a number of factors. Sir Bradford Hill (Hill, 1965) developed a set of guidelines for evaluating epidemiologic associations that can be used in conjunction with the discussion of causality such as the 2004 Surgeon General's report on smoking (CDC, 2004) and in other documents (e.g., Rothman and Greenland 1998; IPCS, 1999). The critical assessment of epidemiologic evidence is conceptually based upon consideration of salient aspects of the evidence of associations so as to reach fundamental judgments as to the likely causal significance of the observed associations. In so doing, it is appropriate to draw from those aspects initially presented in Hill's classic monograph (Hill, 1965) and widely used by the scientific community in conducting such evidence-based reviews. A number of these aspects are judged to be particularly salient in evaluating the body of evidence available in this review, including the aspects described by Hill as strength, experiment, consistency, plausibility, and coherence. Other aspects identified by Hill, including temporality and biological gradient, are also relevant and considered here (e.g., in characterizing lag structures and concentration-response relationships), but are more directly addressed in the design and analyses of the individual epidemiologic studies included in this assessment. As discussed below, these salient aspects are interrelated and considered throughout the evaluation of the epidemiologic evidence generally reflected in the integrative synthesis of the mode of action framework. The general evaluation of the strength of the epidemiological evidence reflects consideration not only of the magnitude of reported effects estimates and their statistical significance, but also of the precision of the effects estimates and the robustness of the effects associations. Consideration of the robustness of the associations takes into account a number of factors, including in particular the impact of alternative models and model specifications and potential confounding factors, as well issues related to the consequences of measurement error. Consideration of the consistency of the effects associations involves looking across the results of studies conducted by different investigators in different places and times. Particular weight may be given, consistent with Hill's views, to the presence of ``similar results reached in quite different ways, e.g., prospectively and retrospectively'' (Hill, 1965). Looking beyond the epidemiological evidence, evaluation of the biological plausibility of the associations observed in epidemiologic studies reflects consideration of both exposure-related factors and toxicological evidence relevant to identification of potential modes of action (MOAs). Similarly, consideration of the coherence of health effects associations reported in the epidemiologic literature reflects broad consideration of information pertaining to the nature of the biological markers evaluated in toxicologic and epidemiologic studies. In identifying these aspects as being particularly salient in this assessment, it is also important to recognize that no one aspect is either necessary or sufficient for drawing inferences of causality. As Hill (1965) emphasized: None of my nine viewpoints can bring indisputable evidence for or against the cause-and-effect hypothesis and none can be required as a sine qua non. What they can do, with greater or less strength, is to help us to make up our minds on the fundamental question--is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect? While these aspects frame considerations weighed in assessing the epidemiologic evidence, they do not lend themselves to being considered in terms of simple formulas or hard-and-fast rules of evidence leading to answers about causality (Hill, 1965). One, for example, cannot simply count up the numbers of studies reporting statistically significant results or statistically non-significant results for carcinogenesis and related MOAs and reach credible conclusions about the relative strength of the evidence and the likelihood of causality. Rather, these important considerations are taken into account throughout the assessment with a goal of producing an objective appraisal of the evidence (informed by peer and public comment and advice), which includes the weighing of alternative views on controversial issues. Thus, although these guidelines have become known as ``causal criteria,'' it is important to note that they cannot be used as a strictly quantitative checklist. Rather, these ``criteria'' should be used to determine the strength of the evidence for concluding causality. In particular, the absence of one or more of the ``criteria'' does not automatically exclude a study from consideration (e.g., see discussion in CDC, 2004). The list below has been adapted from Hill's guidelines as an aid in judging causality. Various whole-animal test systems are currently used or are under development for evaluating potential carcinogenicity. Cancer studies involving chronic exposure for most of the lifespan of an animal are generally accepted for evaluation of tumor effects (Tomatis et al., 1989; Rall, 1991; Allen et al., 1988; but see Ames and Gold, 1990). Other studies of special design are useful for observing formation of preneoplastic lesions or tumors or investigating specific modes of action. Their applicability is determined on a case-by-case basis. The objective of long-term carcinogenesis bioassays is to determine the potential carcinogenic hazard and dose-response relationships of the test agent. Carcinogenicity rodent studies are designed to examine the production of tumors as well as preneoplastic lesions and other indications of chronic toxicity that may provide evidence of treatment- related effects and insights into the way the test agent produces tumors. Current standardized carcinogenicity studies in rodents test at least 50 animals per sex per dose group in each of three treatment groups and in a concurrent control group, usually for 18 to 24 months, depending on the rodent species tested (OECD, 1981; U.S. EPA, 1998c). The high dose in long-term studies is generally selected to provide the maximum ability to detect treatment-related carcinogenic effects while not compromising the outcome of the study through excessive toxicity or inducing inappropriate toxicokinetics (e.g., overwhelming absorption or detoxification mechanisms). The purpose of two or more lower doses is to provide some information on the shape of the dose-response curve. Similar protocols have been and continue to be used by many laboratories worldwide. All available studies of tumor effects in whole animals should be considered, at least preliminarily. The analysis should discard studies judged to be wholly inadequate in protocol, conduct, or results. Criteria for the technical adequacy of animal carcinogenicity studies have been published and should be used as guidance to judge the acceptability of individual studies (e.g., NTP, 1984; OSTP, 1985; Chhabra et al., 1990). As these criteria, in whole or in part, may be updated by the National Toxicology Program (NTP) and others, the analyst should consult the appropriate sources to determine both the current standards as well as those that were contemporaneous with the study. Care should be taken to include studies that provide some evidence bearing on carcinogenicity or that help interpret effects noted in other studies, even if these studies have some limitations of protocol or conduct. Such limited, but not wholly inadequate, studies can contribute as their deficiencies permit. The findings of long-term rodent bioassays should be interpreted in conjunction with results of prechronic studies along with toxicokinetic studies and other pertinent information, if available. Evaluation of tumor effects takes into consideration both biological and statistical significance of the findings (Haseman, 1984, 1985, 1990, 1995). The following sections highlight the major issues in the evaluation of long-term carcinogenicity studies. Among the many criteria for technical adequacy of animal carcinogenicity studies is the appropriateness of dose selection. The selection of doses for chronic bioassays is based on scientific judgments and sound toxicologic principles. Dose selection should be made on the basis of relevant toxicologic information from prechronic, mechanistic, and toxicokinetic and mechanistic studies. A scientific rationale for dose selection should be clearly articulated (e.g., NTP, 1984; ILSI, 1997). How well the dose selection is made is evaluated after the completion of the bioassay. Interpretation of carcinogenicity study results is profoundly affected by study exposure conditions, especially by inappropriate dose selection. This is particularly important in studies that do not show positive results for carcinogenicity, because failure to use a sufficiently high dose reduces the sensitivity of the studies. A lack of tumorigenic responses at exposure levels that cause significant impairment of animal survival may also not be acceptable. In addition, overt toxicity or qualitatively altered toxicokinetics due to excessively high doses may result in tumor effects that are secondary to the toxicity rather than directly attributable to the agent. With regard to the appropriateness of the high dose, an adequate high dose would generally be one that produces some toxic effects without unduly affecting mortality from effects other than cancer or producing significant adverse effects on the nutrition and health of the test animals (OECD, 1981; NRC, 1993a). If the test agent does not appear to cause any specific target organ toxicity or perturbation of physiological function, an adequate high dose can be specified in terms of a percentage reduction of body weight gain over the lifespan of the animals. The high dose would generally be considered inadequate if neither toxicity nor change in weight gain is observed. On the other hand, significant increases in mortality from effects other than cancer generally indicate that an adequate high dose has been exceeded. Other signs of treatment-related toxicity associated with an excessive high dose may include (a) Significant reduction of body weight gain (e.g., greater than 10%), (b) significant increases in abnormal behavioral and clinical signs, (c) significant changes in hematology or clinical chemistry, (d) saturation of absorption and detoxification mechanisms, or (e) marked changes in organ weight, morphology, and histopathology. It should be noted that practical upper limits have been established to avoid the use of excessively high doses in long-term carcinogenicity studies of environmental chemicals (e.g., 5% of the test substance in the feed for dietary studies or 1 g/kg body weight for oral gavage studies [OECD, 1981]). For dietary studies, weight gain reductions should be evaluated as to whether there is a palatability problem or an issue with food efficiency; certainly, the latter is a toxic manifestation. In the case of inhalation studies with respirable particles, evidence of impairment of normal clearance of particles from the lung should be considered along with other signs of toxicity to the respiratory airways to determine whether the high exposure concentration has been appropriately selected (U.S. EPA, 2001a). For dermal studies, evidence of skin irritation may indicate that an adequate high dose has been reached (U.S. EPA, 1989). In order to obtain the most relevant information from a long-term carcinogenicity study, it is important to maximize exposure conditions to the test material. At the same time, caution is appropriate in using excessive high-dose levels that would confound the interpretation of study results to humans. The middle and lowest doses should be selected to characterize the shape of the dose-response curve as much as possible. It is important that the doses be adequately spaced so that the study can provide relevant dose-response data for assessing human hazard and risk. If the testing of potential carcinogenicity is being combined with an evaluation of noncancer chronic toxicity, the study should be designed to include one dose in addition to the control(s) that is not expected to elicit adverse effects. There are several possible outcomes regarding the study interpretation of the significance and relevance of tumorigenic effects associated with exposure or dose levels below, at, or above an adequate high dose. The general guidance is given here; for each case, the information at hand should be evaluated and a rationale should be given for the position taken. Adequately high dose. If an adequately high dose has been used, tumor effects are judged positive or negative depending on the presence or absence of significant tumor incidence increases, respectively. Excessively high dose. If toxicity or mortality is excessive at the high dose, interpretation depends on whether or not tumors are found. --Studies that show tumor effects only at excessive doses may be compromised and may or may not carry weight, depending on the interpretation in the context of other study results and other lines of evidence. Results of such studies, however, are generally not considered suitable for dose-response extrapolation if it is determined that the mode(s) of action underlying the tumorigenic responses at high doses is not operative at lower doses. --Studies that show tumors at lower doses, even though the high dose is excessive and may be discounted, should be evaluated on their own merits. --If a study does not show an increase in tumor incidence at a toxic high dose and appropriately spaced lower doses are used without such toxicity or tumors, the study is generally judged as negative for carcinogenicity. Inadequately high dose. Studies of inadequate sensitivity where an adequately high dose has not been reached may be used to bound the dose range where carcinogenic effects might be expected. The main aim of statistical evaluation is to determine whether exposure to the test agent is associated with an increase of tumor development. Statistical analysis of a long-term study should be performed for each tumor type separately. The incidence of benign and malignant lesions of the same cell type, usually within a single tissue or organ, are considered separately but may be combined when scientifically defensible (McConnell et al., 1986). Trend tests and pairwise comparison tests are the recommended tests for determining whether chance, rather than a treatment-related effect, is a plausible explanation for an apparent increase in tumor incidence. A trend test such as the Cochran-Armitage test (Snedecor and Cochran, 1967) asks whether the results in all dose groups together increase as dose increases. A pairwise comparison test such as the Fisher exact test (Fisher, 1950) asks whether an incidence in one dose group is increased over that of the control group. By convention, for both tests a statistically significant comparison is one for which p is less than 0.05 that the increased incidence is due to chance. Significance in either kind of test is sufficient to reject the hypothesis that chance accounts for the result. A statistically significant response may or may not be biologically significant and vice versa. The selection of a significance level is a policy choice based on a trade-off between the risks of false positives and false negatives. A result with a significance level of greater or less than 5% (the most common significance level) is examined to see if the result confirms other scientific information. When the assessment departs from a simple 5% level, this should be highlighted in the risk characterization. A two-tailed test or a one-tailed test can be used. In either case a rationale is provided. Statistical power can affect the likelihood that a statistically significant result could reasonably be expected. This is especially important in studies or dose groups with small sample sizes or low dose rates. Reporting the statistical power can be useful for comparing and reconciling positive and negative results from different studies. Considerations of multiple comparisons should also be taken into account. Haseman (1983) analyzed typical animal bioassays that tested both sexes of two species and concluded that, because of multiple comparisons, a single tumor increase for a species-sex-site combination that is statistically significant at the 1% level for common tumors or 5% for rare tumors corresponds to a 7-8% significance level for the study as a whole. Therefore, animal bioassays presenting only one significant result that falls short of the 1% level for a common tumor should be treated with caution. The standard for determining statistical significance of tumor incidence comes from a comparison of tumors in dosed animals with those in concurrent control animals. Additional insights about both statistical and biological significance can come from an examination of historical control data (Tarone, 1982; Haseman, 1995). Historical control data can add to the analysis, particularly by enabling identification of uncommon tumor types or high spontaneous incidence of a tumor in a given animal strain. Identification of common or uncommon situations prompts further thought about the meaning of the response in the current study in context with other observations in animal studies and with other evidence about the carcinogenic potential of the agent. These other sources of information may reinforce or weaken the significance given to the response in the hazard assessment. Caution should be exercised in simply looking at the ranges of historical responses, because the range ignores differences in survival of animals among studies and is related to the number of studies in the database. In analyzing results for uncommon tumors in a treated group that are not statistically significant in comparison with concurrent controls, the analyst may be informed by the experience of historical controls to conclude that the result is in fact unlikely to be due to chance. However, caution should be used in interpreting results. In analyzing results for common tumors, a different set of considerations comes into play. Generally speaking, statistically significant increases in tumors should not be discounted simply because incidence rates in the treated groups are within the range of historical controls or because incidence rates in the concurrent controls are somewhat lower than average. Random assignment of animals to groups and proper statistical procedures provide assurance that statistically significant results are unlikely to be due to chance alone. However, caution should be used in interpreting results that are barely statistically significant or in which incidence rates in concurrent controls are unusually low in comparison with historical controls. In cases where there may be reason to discount the biological relevance to humans of increases in common animal tumors, such considerations should be weighed on their own merits and clearly distinguished from statistical concerns. When historical control data are used, the discussion should address several issues that affect comparability of historical and concurrent control data, such as genetic drift in the laboratory strains, differences in pathology examination at different times and in different laboratories (e.g., in criteria for evaluating lesions; variations in the techniques for the preparation or reading of tissue samples among laboratories), and comparability of animals from different suppliers. The most relevant historical data come from the same laboratory and the same supplier and are gathered within 2 or 3 years one way or the other of the study under review; other data should be used only with extreme caution. In general, observation of tumors under different circumstances lends support to the significance of the1.3.7. Emphasis on Characterization
2. Hazard Assessment
2.1. Overview of Hazard Assessment and Characterization
2.1.1. Analyses of Data
2.1.2. Presentation of Results
2.2. Analysis of Tumor Data
2.2.1. Human Data
2.2.1.1. Assessment of Evidence of Carcinogenicity From Human Data
2.2.1.2. Types of Studies
2.2.1.3. Exposure Issues
2.2.1.4. Biological Markers
2.2.1.5. Confounding Factors
2.2.1.6. Statistical Considerations
2.2.1.6.1. Likelihood of Observing an Effect
2.2.1.6.2. Sampling and Other Bias Issues
2.2.1.6.3. Combining Statistical Evidence Across Studies
2.2.1.7. Evidence for Causality
2.2.2. Animal Data
2.2.2.1. Long-Term Carcinogenicity Studies
2.2.2.1.1. Dosing Issues
2.2.2.1.2. Statistical Considerations
2.2.2.1.3. Concurrent and Historical Controls
2.2.2.1.4. Assessment of Evidence of Carcinogenicity From Long-term Animal Studies