Troubleshooting¶
This page covers common errors, warnings, and unexpected behaviors, organized by pipeline stage.
Input Catalog Errors¶
ValueError: input_catalog must have RA/DEC columns¶
Cause: The CSV file has no column named RA or DEC (case-insensitive search).
Fix: Rename your coordinate columns to RA and DEC. The pipeline accepts any case (e.g. ra, Ra, RA) but always warns if the exact names are not uppercase.
RuntimeError: Band mismatch between input catalog and image list¶
Cause: Your catalog contains FLUX_* columns whose band suffixes don't exactly match the FILTER keyword values in your FITS images. For example, the catalog has FLUX_m400 but the image has FILTER = M400 (different case or extra whitespace).
Fix:
- Check that
FLUX_suffixes match FITSFILTERvalues exactly (case-sensitive after stripping whitespace). - If you don't need flux priors, remove all
FLUX_*columns from the catalog. The pipeline will initialize fluxes from aperture photometry instead.
ValueError: Duplicate keys in patch results¶
Cause: Multiple sources in the input catalog share the same merge key. If ID exists, duplicate ID values cause this. If ID is absent, duplicate (RA, DEC) pairs cause this.
Fix: Ensure ID values are unique. If using coordinate-based merging, ensure no two sources have identical RA and DEC.
Warning: input_catalog uses ra/dec columns¶
Cause: The pipeline found coordinate columns but they are not exactly named RA and DEC (e.g. they are lowercase ra/dec).
Fix: Rename to uppercase RA/DEC. The pipeline works either way, but the warning indicates a potential naming convention issue.
FITS Image Errors¶
RuntimeError: Missing FILTER in <file>¶
Fix: Add a FILTER keyword to the FITS header with the band name string.
RuntimeError: Empty FILTER in <file>¶
Fix: Set the FILTER header value to a non-empty band name.
RuntimeError: Missing ZP_AUTO in <file>¶
Fix: Run photometric calibration on the image and write the ZP_AUTO keyword.
RuntimeError: Bad/Missing SKYSIG in <file>¶
Fix: Compute the sky background noise (1-sigma, in ADU) and write it as SKYSIG. The value must be positive and finite.
RuntimeError: Bad/Missing EGAIN (e-/ADU) in <file>¶
Fix: Write the effective gain in electrons per ADU as EGAIN. This is typically a detector/instrument property.
RuntimeError: Image shape mismatch¶
Cause: Not all images have the same pixel dimensions.
Fix: Ensure all images are resampled/registered to the same pixel grid. All images for a given tile should share identical (NAXIS2, NAXIS1).
RuntimeError: WCS mismatch with reference¶
Cause: The WCS of one image does not agree with the first image within the configured tolerances.
Fix:
- Re-register images to a common WCS.
- Or relax the tolerance values in
checks.wcs_tolerance.*(not recommended unless you understand the implications). - Or disable the check:
checks.require_wcs_alignment: false(not recommended).
Warning: SATURATE missing in <file>; using inf¶
Cause: The FITS header lacks a SATURATE keyword.
Fix: This is often harmless — it means no saturation mask will be built for this image. If you need saturation filtering, add the SATURATE keyword to the header.
Load Stage Performance¶
Long pauses in load_inputs¶
Expected heavy steps:
- Per-frame FITS prep: Loading large images, computing masks and noise arrays. Scales with number of bands.
- White-stack build: Inverse-variance weighted coaddition. Memory-intensive.
- Diagnostic plot rendering: Large PNG files for white-stack and overlay plots.
Use the timing summary to locate bottlenecks:
load_inputs timing [s]: prep=X.XX white=X.XX crop=X.XX sat=X.XX overlay=X.XX total=X.XX
Mitigation strategies:
- Increase
performance.frame_prep_workers(but watch memory). - Increase
performance.white_stack_workers. - Reduce diagnostic plot resolution:
crop.display_downsample,crop.post_crop_display_downsample,overlay.downsample_full. - Disable selected plots if needed:
crop.plot_pre_crop,crop.plot_post_crop,overlay.enabled,overlay.zoom_enabled.
ePSF Issues¶
Few or no ePSF outputs¶
Check:
epsf.skip_empty_epsf_patches: truemay be intentionally pruning empty cells. This is expected in sparse fields.- Source density after crop and saturation flagging may be very low.
- ePSF selection thresholds may be too aggressive:
epsf.thresh_sigma(detection threshold) — lowering it finds fainter candidates.epsf.minarea(minimum detection area) — lowering it accepts smaller detections.epsf.gaia_snr_min(SNR requirement) — lowering it accepts fainter candidates.epsf.q_range(roundness filter) — widening it accepts less round sources.
ePSF quality warnings (low star count)¶
If the log shows low-star ePSF (nstars_used=N < min_epsf_nstars_for_use=M):
- The ePSF was built but from too few stars to be reliable.
- The pipeline will use the fallback PSF model for affected bands in that cell.
- Reduce
patch_run.min_epsf_nstars_for_useif you are willing to accept lower-quality ePSFs, or increaseepsf.max_starsand relax selection criteria.
GaiaXP catalog issues¶
If the log shows GaiaXP catalog not found:
- Check the path in
inputs.gaiaxp_synphot_csv. - If you don't have a GaiaXP catalog for this field, set
epsf.use_gaiaxp: falseorepsf.psfstar_mode: "sep".
If GaiaXP is loaded but no stars are selected:
- Check that the
mag_{band}column names match the FITSFILTERvalues. - Check that
epsf.gaia_mag_min/epsf.gaia_mag_maxbracket the expected magnitude range.
Patch Run Issues¶
Many empty patch runs¶
Enable pruning:
patch_inputs:
skip_empty_patch: true
epsf:
skip_empty_epsf_patches: true
This prevents writing and running patches that contain no sources.
Optimizer convergence warnings¶
If many patches show opt_hit_max_iters: true:
- Increase
patch_run.n_opt_iters(at the cost of runtime). - Relax
patch_run.dlnp_stop(accept a larger dlnp change as "converged"). - Check if the PSF model is appropriate — a poor PSF can prevent convergence.
Fitted positions snap to nearby brighter sources¶
Cause: When a much brighter source is nearby, the Tractor optimizer may move the fitted position toward the brighter source because it finds a higher likelihood there. The default per-iteration step limit (maxstep = 1 pixel) slows the drift but does not prevent it over many iterations. When the position hits the pos_max_shift_pix box bound, the optimizer may report convergence (opt_converged = True, dlnp = 0) even though the fit is visually poor — this happens because the constrained line search cannot find an acceptable step, not because the model is a good fit.
Mitigations (applied in combination):
-
Position bounds. Set
patch_run.pos_max_shift_pixto limit the total position displacement:yaml patch_run: pos_max_shift_pix: 5.0 # allow at most 5 pixels of position shiftA value of 3--10 pixels is typically appropriate, depending on the pixel scale and the astrometric quality of the input catalog.
-
Bright-source masking. Enable
bright_mask(on by default) to detect and mask bright unmatched sources on the white stack before fitting. This prevents the contaminant's flux from influencing the optimizer at all. See Configuration —bright_mask. -
Three-stage fitting. Enable
patch_run.enable_staged_fit: true(on by default). By fitting fluxes/shapes first with positions frozen, then fitting positions with fluxes/shapes frozen, the optimizer is less likely to drift toward a contaminant because the initial flux estimate stabilizes the fit. -
Boundary-stall flags. After fitting, check
flag_bound_stalledandflag_bound_stalled_which_parameterto identify sources whose parameters are stuck at bounds. Sources withpos.x@upperorpos.y@lowerin the stall flag are likely affected by this issue. -
Stronger position priors. Populate the
POS_ERRcolumn in the input catalog with smaller values (e.g. 0.5--1.0 pixels) for sources with reliable astrometry. This adds a Gaussian pull toward the catalog position, making large drift more costly.
Boundary-stalled fits (flag_bound_stalled)¶
Cause: The optimizer converged (or stalled) with one or more bounded parameters at or very near their hard limits. This can happen for position (pos.x, pos.y when pos_max_shift_pix is set), galaxy shape (shape.logre, shape.ee1, shape.ee2), or Sersic index (sersicindex).
Diagnosis:
import pandas as pd
df = pd.read_csv("output_catalog.csv")
stalled = df[df["flag_bound_stalled"] == True]
print(f"Stalled sources: {len(stalled)} / {len(df)}")
print(stalled["flag_bound_stalled_which_parameter"].value_counts().head(10))
Common stall patterns and fixes:
| Pattern | Likely cause | Fix |
|---|---|---|
pos.x@upper or pos.y@lower |
Position drifted toward a bright contaminant or off-catalog source | Enable bright_mask; tighten pos_max_shift_pix; strengthen POS_ERR |
shape.logre@lower |
Galaxy is more compact than the minimum allowed radius | Check if the source should be typed as STAR instead |
shape.logre@upper |
Galaxy is very extended; model may be absorbing background or neighbors | Inspect cutout; consider masking neighbors |
sersicindex@upper or sersicindex@lower |
Sersic profile hitting physical limits (0.29--6.3) | Inspect cutout; the source may need a different model type |
Bright-source mask tuning¶
If the bright-source mask is too aggressive (masking legitimate sources):
- Increase
bright_mask.thresh_sigma(e.g. from 5.0 to 10.0) to detect only brighter contaminants. - Decrease
bright_mask.ellipse_scale(e.g. from 3.0 to 1.5) to shrink mask ellipses. - Decrease
bright_mask.dilate_pix(e.g. from 1 to 0) to reduce padding.
If the mask is not aggressive enough (contaminants still affect fits):
- Decrease
bright_mask.thresh_sigma(e.g. from 5.0 to 3.0). - Increase
bright_mask.ellipse_scale(e.g. from 3.0 to 5.0).
Inspect bright_mask/white_bright_mask_overlay.png to visually verify which sources are being masked (red X) vs matched to the catalog (lime circles).
Tuning bright_mask.match_radius_pix¶
The match_radius_pix parameter controls how SEP detections are matched to active catalog sources. Detections within this radius of a catalog source are considered matched and are not masked. Setting this value correctly is important:
If match_radius_pix is too large:
Many SEP detections near catalog sources will be matched and left unmasked -- including bright sources that are not in the input catalog but happen to fall within the matching radius of a catalog source. If such an unmasked bright source is much brighter than the nearby catalog source, the optimizer may snap the catalog source's position toward the bright contaminant. When this happens, the pipeline has several safety mechanisms:
- Bright-source masking (this feature) is the first line of defense, but it only works when the bright source is not matched to a catalog entry. The best approach is to include the bright source in the input catalog so the Tractor can model it directly (this is what halo source inclusion does for sources within the ROI).
- Three-stage fitting (
patch_run.enable_staged_fit) stabilizes the optimization by fitting fluxes first with frozen positions, reducing the tendency for positions to drift toward bright neighbors. - Position priors (
POS_ERR/zp.gaia_pos_err_pix) add a Gaussian pull toward the catalog position. A smallerPOS_ERR(e.g. 0.1 pixels) makes large position shifts costly, discouraging position snapping. However, setting this too small may prevent the optimizer from correcting genuine astrometric offsets. - Boundary-stall detection (
flag_bound_stalled) flags sources whose position hit thepos_max_shift_pixbound, making them easy to identify in the output catalog.
If match_radius_pix is too small:
Sources near the catalog position -- including the catalog source's own SEP detection -- may fail to match, causing the mask to cover the catalog source's own pixels. This results in the catalog source being fitted with partially or fully masked data, producing poor or unconstrained fits. The affected_by_bright_mask flag in the output catalog identifies sources whose center falls inside a mask region. If you see many sources with this flag, match_radius_pix may be too small relative to the astrometric discrepancy between the input catalog positions and the actual source positions on the image.
If the nearby non-catalog source is comparable in brightness to the catalog source (not overwhelmingly bright):
In this case, masking may not the best approach -- the Tractor is capable of deblending sources of similar brightness through simultaneous modeling. Set match_radius_pix large enough that both the catalog source and its neighbor are left unmasked, and let the Tractor's joint optimization separate their contributions. The halo source inclusion feature ensures that neighbors in the ROI are included in the model. -- As said, just including the neighbor sources in the input catalog is the best way!
Gaussian priors not taking effect¶
Cause: Error columns exist in the catalog but priors are not being applied for some or all sources.
Possible reasons:
- The error value is
NaNor non-positive for that source. Only finite, positive errors activate priors. - The parameter value itself was a fallback (not from the catalog). For example, if
ReisNaNand falls back tore_fallback_pix, thenRe_ERRis ignored for that source even if present. - For
ELL_ERR/THETA_ERR: both must be present and finite for ellipticity/PA priors to apply. Providing only one results in no prior onee1/ee2. - For position priors: the
POS_ERRcolumn must exist with values in pixels. For newly injected Gaia sources, check thatzp.gaia_pos_err_pixis notnulland thatPOS_ERRappears in the savedZP/input_catalog_*_with_Gaia.csv. Matched original catalog rows are not auto-filled. - For flux priors: the
FLUX_{band}_ERRcolumn name must match the band exactly (case-insensitive).
Diagnosis: Check the patch log for the line Gaussian priors applied: pos=N, flux=N, shape_logre=N, shape_ee=N. If all counts are zero, verify your error column names and values.
Subprocess crashes¶
Check <tag>.runner.log for the full subprocess output. Common causes:
- Out-of-memory (too many sources in a patch, or too-large cutout size).
- Missing ePSF files (check the
epsf_rootpath). - Tractor internal errors (rare; usually indicates a degenerate source configuration).
Overlay Plot Confusion¶
All sources labeled UNKNOWN¶
Cause: The TYPE column is missing from the input catalog, or all values are unrecognized.
Fix: Add a TYPE column with values from: STAR, EXP, DEV, SERSIC. Or accept the fallback: all sources will be fit as patch_run.gal_model.
Overlay TYPE grouping reference¶
| Category | Matched TYPE values |
|---|---|
| STAR (cyan) | STAR |
| GAL (magenta) | GAL, EXP, DEV, SERSIC |
| UNKNOWN (yellow) | Anything else, including empty, NaN, NULL |
Merge Issues¶
Rows with no fit values in the output¶
This can be normal. Sources that were flagged as excluded are preserved in the final catalog but not assigned to any patch, so their fit columns are empty. Inspect:
excluded_crop— was the source outside the crop region?excluded_saturation— was the source near saturated pixels?excluded_any— was the source excluded for any reason?excluded_reason— human-readable reason:"crop","saturation","crop+saturation", or"".
If excluded_any is false but fit columns are still empty, the source may have had NaN coordinates or projected outside image bounds.
ID column read as numeric (leading zeros lost)¶
Cause: When per-patch fit CSVs are read, pandas may infer an all-numeric ID column as int64, stripping leading zeros (e.g. 00101 → 101). This causes a merge key mismatch.
Status: This is handled automatically. The pipeline reads ID as string dtype in all CSV reads (both base and patch-fit catalogs), preserving the original format. IDs are format-free strings — 00019, 1858283, and star_alpha all work correctly.
download-sample logging differs from pipeline logging¶
This is expected:
- Pipeline commands (
run,run-epsf, etc.) apply the YAMLlogging.*configuration. By default, logs are written to{work_dir}/{command}.log(e.g.run.log,merge.log). download-sampleruns without--config, so it uses its own independent logging setup.
Log file location¶
With the default logging.file: "auto", each command writes its log to {work_dir}/{command}.log. To change this:
- Set
logging.file: "my_custom.log"— resolved relative towork_dir. - Set
logging.file: null— disable file logging entirely (console only).
General Tips¶
Checking available bands¶
To see which bands/filters are in your images without running the full pipeline:
for f in /path/to/images/*.fits; do
python -c "from astropy.io import fits; print('$f', fits.getheader('$f')['FILTER'])"
done
Verifying catalog-image band alignment¶
Check that your FLUX_* column suffixes match FITS FILTER values:
import pandas as pd
from astropy.io import fits
cat = pd.read_csv("input_catalog.csv")
cat_bands = {c.replace("FLUX_", "") for c in cat.columns if c.startswith("FLUX_")}
# Compare with FITS FILTER values
for f in ["image1.fits", "image2.fits"]:
print(f, fits.getheader(f)["FILTER"])
Inspecting exclusion statistics¶
After a pipeline run:
import pandas as pd
df = pd.read_csv("output_catalog.csv")
print("Total sources:", len(df))
print("Excluded (crop):", df["excluded_crop"].sum())
print("Excluded (saturation):", df["excluded_saturation"].sum())
print("Excluded (any):", df["excluded_any"].sum())
print("Fitted:", df["opt_converged"].notna().sum())
print("Converged:", df["opt_converged"].sum())
print("Hit max iters:", df["opt_hit_max_iters"].sum())
print("Bound-stalled:", df["flag_bound_stalled"].sum())
if "affected_by_bright_mask" in df.columns:
print("Affected by bright mask:", df["affected_by_bright_mask"].sum())
# Show which parameters are most commonly stalled:
stalled = df.loc[df["flag_bound_stalled"] == True, "flag_bound_stalled_which_parameter"]
if len(stalled) > 0:
print("Stall parameter counts:")
for param_str in stalled:
for p in str(param_str).split(","):
print(f" {p.strip()}")