Skip to content

Troubleshooting

This page covers common errors, warnings, and unexpected behaviors, organized by pipeline stage.


Input Catalog Errors

ValueError: input_catalog must have RA/DEC columns

Cause: The CSV file has no column named RA or DEC (case-insensitive search).

Fix: Rename your coordinate columns to RA and DEC. The pipeline accepts any case (e.g. ra, Ra, RA) but always warns if the exact names are not uppercase.

RuntimeError: Band mismatch between input catalog and image list

Cause: Your catalog contains FLUX_* columns whose band suffixes don't exactly match the FILTER keyword values in your FITS images. For example, the catalog has FLUX_m400 but the image has FILTER = M400 (different case or extra whitespace).

Fix:

  • Check that FLUX_ suffixes match FITS FILTER values exactly (case-sensitive after stripping whitespace).
  • If you don't need flux priors, remove all FLUX_* columns from the catalog. The pipeline will initialize fluxes from aperture photometry instead.

ValueError: Duplicate keys in patch results

Cause: Multiple sources in the input catalog share the same merge key. If ID exists, duplicate ID values cause this. If ID is absent, duplicate (RA, DEC) pairs cause this.

Fix: Ensure ID values are unique. If using coordinate-based merging, ensure no two sources have identical RA and DEC.

Warning: input_catalog uses ra/dec columns

Cause: The pipeline found coordinate columns but they are not exactly named RA and DEC (e.g. they are lowercase ra/dec).

Fix: Rename to uppercase RA/DEC. The pipeline works either way, but the warning indicates a potential naming convention issue.


FITS Image Errors

RuntimeError: Missing FILTER in <file>

Fix: Add a FILTER keyword to the FITS header with the band name string.

RuntimeError: Empty FILTER in <file>

Fix: Set the FILTER header value to a non-empty band name.

RuntimeError: Missing ZP_AUTO in <file>

Fix: Run photometric calibration on the image and write the ZP_AUTO keyword.

RuntimeError: Bad/Missing SKYSIG in <file>

Fix: Compute the sky background noise (1-sigma, in ADU) and write it as SKYSIG. The value must be positive and finite.

RuntimeError: Bad/Missing EGAIN (e-/ADU) in <file>

Fix: Write the effective gain in electrons per ADU as EGAIN. This is typically a detector/instrument property.

RuntimeError: Image shape mismatch

Cause: Not all images have the same pixel dimensions.

Fix: Ensure all images are resampled/registered to the same pixel grid. All images for a given tile should share identical (NAXIS2, NAXIS1).

RuntimeError: WCS mismatch with reference

Cause: The WCS of one image does not agree with the first image within the configured tolerances.

Fix:

  • Re-register images to a common WCS.
  • Or relax the tolerance values in checks.wcs_tolerance.* (not recommended unless you understand the implications).
  • Or disable the check: checks.require_wcs_alignment: false (not recommended).

Warning: SATURATE missing in <file>; using inf

Cause: The FITS header lacks a SATURATE keyword.

Fix: This is often harmless — it means no saturation mask will be built for this image. If you need saturation filtering, add the SATURATE keyword to the header.


Load Stage Performance

Long pauses in load_inputs

Expected heavy steps:

  • Per-frame FITS prep: Loading large images, computing masks and noise arrays. Scales with number of bands.
  • White-stack build: Inverse-variance weighted coaddition. Memory-intensive.
  • Diagnostic plot rendering: Large PNG files for white-stack and overlay plots.

Use the timing summary to locate bottlenecks:

load_inputs timing [s]: prep=X.XX white=X.XX crop=X.XX sat=X.XX overlay=X.XX total=X.XX

Mitigation strategies:

  • Increase performance.frame_prep_workers (but watch memory).
  • Increase performance.white_stack_workers.
  • Reduce diagnostic plot resolution: crop.display_downsample, crop.post_crop_display_downsample, overlay.downsample_full.
  • Disable selected plots if needed: crop.plot_pre_crop, crop.plot_post_crop, overlay.enabled, overlay.zoom_enabled.

ePSF Issues

Few or no ePSF outputs

Check:

  • epsf.skip_empty_epsf_patches: true may be intentionally pruning empty cells. This is expected in sparse fields.
  • Source density after crop and saturation flagging may be very low.
  • ePSF selection thresholds may be too aggressive:
  • epsf.thresh_sigma (detection threshold) — lowering it finds fainter candidates.
  • epsf.minarea (minimum detection area) — lowering it accepts smaller detections.
  • epsf.gaia_snr_min (SNR requirement) — lowering it accepts fainter candidates.
  • epsf.q_range (roundness filter) — widening it accepts less round sources.

ePSF quality warnings (low star count)

If the log shows low-star ePSF (nstars_used=N < min_epsf_nstars_for_use=M):

  • The ePSF was built but from too few stars to be reliable.
  • The pipeline will use the fallback PSF model for affected bands in that cell.
  • Reduce patch_run.min_epsf_nstars_for_use if you are willing to accept lower-quality ePSFs, or increase epsf.max_stars and relax selection criteria.

GaiaXP catalog issues

If the log shows GaiaXP catalog not found:

  • Check the path in inputs.gaiaxp_synphot_csv.
  • If you don't have a GaiaXP catalog for this field, set epsf.use_gaiaxp: false or epsf.psfstar_mode: "sep".

If GaiaXP is loaded but no stars are selected:

  • Check that the mag_{band} column names match the FITS FILTER values.
  • Check that epsf.gaia_mag_min / epsf.gaia_mag_max bracket the expected magnitude range.

Patch Run Issues

Many empty patch runs

Enable pruning:

patch_inputs:
  skip_empty_patch: true

epsf:
  skip_empty_epsf_patches: true

This prevents writing and running patches that contain no sources.

Optimizer convergence warnings

If many patches show opt_hit_max_iters: true:

  • Increase patch_run.n_opt_iters (at the cost of runtime).
  • Relax patch_run.dlnp_stop (accept a larger dlnp change as "converged").
  • Check if the PSF model is appropriate — a poor PSF can prevent convergence.

Fitted positions snap to nearby brighter sources

Cause: When a much brighter source is nearby, the Tractor optimizer may move the fitted position toward the brighter source because it finds a higher likelihood there. The default per-iteration step limit (maxstep = 1 pixel) slows the drift but does not prevent it over many iterations. When the position hits the pos_max_shift_pix box bound, the optimizer may report convergence (opt_converged = True, dlnp = 0) even though the fit is visually poor — this happens because the constrained line search cannot find an acceptable step, not because the model is a good fit.

Mitigations (applied in combination):

  1. Position bounds. Set patch_run.pos_max_shift_pix to limit the total position displacement:

    yaml patch_run: pos_max_shift_pix: 5.0 # allow at most 5 pixels of position shift

    A value of 3--10 pixels is typically appropriate, depending on the pixel scale and the astrometric quality of the input catalog.

  2. Bright-source masking. Enable bright_mask (on by default) to detect and mask bright unmatched sources on the white stack before fitting. This prevents the contaminant's flux from influencing the optimizer at all. See Configuration — bright_mask.

  3. Three-stage fitting. Enable patch_run.enable_staged_fit: true (on by default). By fitting fluxes/shapes first with positions frozen, then fitting positions with fluxes/shapes frozen, the optimizer is less likely to drift toward a contaminant because the initial flux estimate stabilizes the fit.

  4. Boundary-stall flags. After fitting, check flag_bound_stalled and flag_bound_stalled_which_parameter to identify sources whose parameters are stuck at bounds. Sources with pos.x@upper or pos.y@lower in the stall flag are likely affected by this issue.

  5. Stronger position priors. Populate the POS_ERR column in the input catalog with smaller values (e.g. 0.5--1.0 pixels) for sources with reliable astrometry. This adds a Gaussian pull toward the catalog position, making large drift more costly.

Boundary-stalled fits (flag_bound_stalled)

Cause: The optimizer converged (or stalled) with one or more bounded parameters at or very near their hard limits. This can happen for position (pos.x, pos.y when pos_max_shift_pix is set), galaxy shape (shape.logre, shape.ee1, shape.ee2), or Sersic index (sersicindex).

Diagnosis:

import pandas as pd
df = pd.read_csv("output_catalog.csv")
stalled = df[df["flag_bound_stalled"] == True]
print(f"Stalled sources: {len(stalled)} / {len(df)}")
print(stalled["flag_bound_stalled_which_parameter"].value_counts().head(10))

Common stall patterns and fixes:

Pattern Likely cause Fix
pos.x@upper or pos.y@lower Position drifted toward a bright contaminant or off-catalog source Enable bright_mask; tighten pos_max_shift_pix; strengthen POS_ERR
shape.logre@lower Galaxy is more compact than the minimum allowed radius Check if the source should be typed as STAR instead
shape.logre@upper Galaxy is very extended; model may be absorbing background or neighbors Inspect cutout; consider masking neighbors
sersicindex@upper or sersicindex@lower Sersic profile hitting physical limits (0.29--6.3) Inspect cutout; the source may need a different model type

Bright-source mask tuning

If the bright-source mask is too aggressive (masking legitimate sources):

  • Increase bright_mask.thresh_sigma (e.g. from 5.0 to 10.0) to detect only brighter contaminants.
  • Decrease bright_mask.ellipse_scale (e.g. from 3.0 to 1.5) to shrink mask ellipses.
  • Decrease bright_mask.dilate_pix (e.g. from 1 to 0) to reduce padding.

If the mask is not aggressive enough (contaminants still affect fits):

  • Decrease bright_mask.thresh_sigma (e.g. from 5.0 to 3.0).
  • Increase bright_mask.ellipse_scale (e.g. from 3.0 to 5.0).

Inspect bright_mask/white_bright_mask_overlay.png to visually verify which sources are being masked (red X) vs matched to the catalog (lime circles).

Tuning bright_mask.match_radius_pix

The match_radius_pix parameter controls how SEP detections are matched to active catalog sources. Detections within this radius of a catalog source are considered matched and are not masked. Setting this value correctly is important:

If match_radius_pix is too large:

Many SEP detections near catalog sources will be matched and left unmasked -- including bright sources that are not in the input catalog but happen to fall within the matching radius of a catalog source. If such an unmasked bright source is much brighter than the nearby catalog source, the optimizer may snap the catalog source's position toward the bright contaminant. When this happens, the pipeline has several safety mechanisms:

  1. Bright-source masking (this feature) is the first line of defense, but it only works when the bright source is not matched to a catalog entry. The best approach is to include the bright source in the input catalog so the Tractor can model it directly (this is what halo source inclusion does for sources within the ROI).
  2. Three-stage fitting (patch_run.enable_staged_fit) stabilizes the optimization by fitting fluxes first with frozen positions, reducing the tendency for positions to drift toward bright neighbors.
  3. Position priors (POS_ERR / zp.gaia_pos_err_pix) add a Gaussian pull toward the catalog position. A smaller POS_ERR (e.g. 0.1 pixels) makes large position shifts costly, discouraging position snapping. However, setting this too small may prevent the optimizer from correcting genuine astrometric offsets.
  4. Boundary-stall detection (flag_bound_stalled) flags sources whose position hit the pos_max_shift_pix bound, making them easy to identify in the output catalog.

If match_radius_pix is too small:

Sources near the catalog position -- including the catalog source's own SEP detection -- may fail to match, causing the mask to cover the catalog source's own pixels. This results in the catalog source being fitted with partially or fully masked data, producing poor or unconstrained fits. The affected_by_bright_mask flag in the output catalog identifies sources whose center falls inside a mask region. If you see many sources with this flag, match_radius_pix may be too small relative to the astrometric discrepancy between the input catalog positions and the actual source positions on the image.

If the nearby non-catalog source is comparable in brightness to the catalog source (not overwhelmingly bright):

In this case, masking may not the best approach -- the Tractor is capable of deblending sources of similar brightness through simultaneous modeling. Set match_radius_pix large enough that both the catalog source and its neighbor are left unmasked, and let the Tractor's joint optimization separate their contributions. The halo source inclusion feature ensures that neighbors in the ROI are included in the model. -- As said, just including the neighbor sources in the input catalog is the best way!

Gaussian priors not taking effect

Cause: Error columns exist in the catalog but priors are not being applied for some or all sources.

Possible reasons:

  • The error value is NaN or non-positive for that source. Only finite, positive errors activate priors.
  • The parameter value itself was a fallback (not from the catalog). For example, if Re is NaN and falls back to re_fallback_pix, then Re_ERR is ignored for that source even if present.
  • For ELL_ERR/THETA_ERR: both must be present and finite for ellipticity/PA priors to apply. Providing only one results in no prior on ee1/ee2.
  • For position priors: the POS_ERR column must exist with values in pixels. For newly injected Gaia sources, check that zp.gaia_pos_err_pix is not null and that POS_ERR appears in the saved ZP/input_catalog_*_with_Gaia.csv. Matched original catalog rows are not auto-filled.
  • For flux priors: the FLUX_{band}_ERR column name must match the band exactly (case-insensitive).

Diagnosis: Check the patch log for the line Gaussian priors applied: pos=N, flux=N, shape_logre=N, shape_ee=N. If all counts are zero, verify your error column names and values.

Subprocess crashes

Check <tag>.runner.log for the full subprocess output. Common causes:

  • Out-of-memory (too many sources in a patch, or too-large cutout size).
  • Missing ePSF files (check the epsf_root path).
  • Tractor internal errors (rare; usually indicates a degenerate source configuration).

Overlay Plot Confusion

All sources labeled UNKNOWN

Cause: The TYPE column is missing from the input catalog, or all values are unrecognized.

Fix: Add a TYPE column with values from: STAR, EXP, DEV, SERSIC. Or accept the fallback: all sources will be fit as patch_run.gal_model.

Overlay TYPE grouping reference

Category Matched TYPE values
STAR (cyan) STAR
GAL (magenta) GAL, EXP, DEV, SERSIC
UNKNOWN (yellow) Anything else, including empty, NaN, NULL

Merge Issues

Rows with no fit values in the output

This can be normal. Sources that were flagged as excluded are preserved in the final catalog but not assigned to any patch, so their fit columns are empty. Inspect:

  • excluded_crop — was the source outside the crop region?
  • excluded_saturation — was the source near saturated pixels?
  • excluded_any — was the source excluded for any reason?
  • excluded_reason — human-readable reason: "crop", "saturation", "crop+saturation", or "".

If excluded_any is false but fit columns are still empty, the source may have had NaN coordinates or projected outside image bounds.

ID column read as numeric (leading zeros lost)

Cause: When per-patch fit CSVs are read, pandas may infer an all-numeric ID column as int64, stripping leading zeros (e.g. 00101101). This causes a merge key mismatch.

Status: This is handled automatically. The pipeline reads ID as string dtype in all CSV reads (both base and patch-fit catalogs), preserving the original format. IDs are format-free strings — 00019, 1858283, and star_alpha all work correctly.

download-sample logging differs from pipeline logging

This is expected:

  • Pipeline commands (run, run-epsf, etc.) apply the YAML logging.* configuration. By default, logs are written to {work_dir}/{command}.log (e.g. run.log, merge.log).
  • download-sample runs without --config, so it uses its own independent logging setup.

Log file location

With the default logging.file: "auto", each command writes its log to {work_dir}/{command}.log. To change this:

  • Set logging.file: "my_custom.log" — resolved relative to work_dir.
  • Set logging.file: null — disable file logging entirely (console only).

General Tips

Checking available bands

To see which bands/filters are in your images without running the full pipeline:

for f in /path/to/images/*.fits; do
  python -c "from astropy.io import fits; print('$f', fits.getheader('$f')['FILTER'])"
done

Verifying catalog-image band alignment

Check that your FLUX_* column suffixes match FITS FILTER values:

import pandas as pd
from astropy.io import fits

cat = pd.read_csv("input_catalog.csv")
cat_bands = {c.replace("FLUX_", "") for c in cat.columns if c.startswith("FLUX_")}

# Compare with FITS FILTER values
for f in ["image1.fits", "image2.fits"]:
    print(f, fits.getheader(f)["FILTER"])

Inspecting exclusion statistics

After a pipeline run:

import pandas as pd

df = pd.read_csv("output_catalog.csv")
print("Total sources:", len(df))
print("Excluded (crop):", df["excluded_crop"].sum())
print("Excluded (saturation):", df["excluded_saturation"].sum())
print("Excluded (any):", df["excluded_any"].sum())
print("Fitted:", df["opt_converged"].notna().sum())
print("Converged:", df["opt_converged"].sum())
print("Hit max iters:", df["opt_hit_max_iters"].sum())
print("Bound-stalled:", df["flag_bound_stalled"].sum())
if "affected_by_bright_mask" in df.columns:
    print("Affected by bright mask:", df["affected_by_bright_mask"].sum())
# Show which parameters are most commonly stalled:
stalled = df.loc[df["flag_bound_stalled"] == True, "flag_bound_stalled_which_parameter"]
if len(stalled) > 0:
    print("Stall parameter counts:")
    for param_str in stalled:
        for p in str(param_str).split(","):
            print(f"  {p.strip()}")