KPF Data Tutorial

This tutorial demonstrates how to create and use RVData standard files for the Keck Planet Finder (KPF) instrument at all data levels:

Level 2 (L2): Extracted, wavelength-calibrated echelle spectra
Level 3 (L3): Stitched 1D spectrum on a common wavelength grid
Level 4 (L4): Radial velocity measurements

Prerequisites

Install the rvdata package:

pip install rv-data-standard

Setup and Data Download

First, we’ll import the necessary modules and download sample KPF data files.

[ ]:

import os
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from astropy.io import fits

# RVData imports
from rvdata.core.models.level2 import RV2
from rvdata.core.models.level3 import RV3
from rvdata.core.models.level4 import RV4

def download_file(url, filename):
    """Download a file if it doesn't already exist."""
    if not os.path.exists(filename):
        print(f"Downloading {filename}...")
        response = requests.get(url)
        response.raise_for_status()
        with open(filename, "wb") as f:
            f.write(response.content)
        print(f"Downloaded {filename}")
    else:
        print(f"{filename} already exists, skipping download.")

# KPF sample data URLs (hosted on project server)
# L0 (raw) and L1 (extracted) files are needed to create L2
# Native L2 files are needed to create L4
file_urls = {
    "l0": "http://grinnell.as.arizona.edu/~rvdata/kpf/KP.20250208.17485.59.fits",
    "l1": "http://grinnell.as.arizona.edu/~rvdata/kpf/KP.20250208.17485.59_L1.fits",
    "native_l2": "http://grinnell.as.arizona.edu/~rvdata/kpf/KP.20241022.41656.30_L2.fits",
}

# Download the files
l0_file = "KP.20250208.17485.59.fits"
l1_file = "KP.20250208.17485.59_L1.fits"
native_l2_file = "KP.20241022.41656.30_L2.fits"

download_file(file_urls["l0"], l0_file)
download_file(file_urls["l1"], l1_file)
download_file(file_urls["native_l2"], native_l2_file)

Level 2: Extracted Echelle Spectra

Level 2 data contains wavelength-calibrated, extracted echelle spectra organized by trace (fiber). Each trace contains flux, wavelength, variance, and blaze function arrays.

Creating L2 from Native KPF Files

To create an RVData-standard L2 file from native KPF data, you need:

L0 file: Raw data with headers and telemetry
L1 file: Extracted spectra from the KPF pipeline

[ ]:

# Create RVData-standard L2 from native KPF files
kpf_l2 = RV2.from_fits(l1_file, l0file=l0_file, instrument="KPF")

# Save to FITS file
l2_standard_file = "kpf_L2_standard.fits"
kpf_l2.to_fits(l2_standard_file)
print(f"Created {l2_standard_file}")

Using L2 Data

Reading the L2 File

You can read L2 files using either astropy’s fits.open() or the RVData RV2.from_fits() method.

[ ]:

# Open using astropy
l2 = fits.open(l2_standard_file)

# Examine the primary header - same keywords regardless of instrument!
hdr = l2[0].header
print(f"Telescope: {hdr['TELESCOP']}")
print(f"Instrument: {hdr['INSTRUME']}")
print(f"Object: {hdr['OBJECT']}")
print(f"Number of traces: {hdr['NUMTRACE']}")
print("\nTrace contents:")
for i in range(1, hdr['NUMTRACE'] + 1):
    print(f"  TRACE{i}: {hdr[f'TRACE{i}']}")

Examining L2 Extensions

The EXT_DESCRIPT extension lists all FITS extensions in the file.

[ ]:

# List all extensions
ext_descript = pd.DataFrame(l2['EXT_DESCRIPT'].data)
print(ext_descript[['Name', 'Description']].to_string())

Examining the Order Table

The ORDER_TABLE extension describes the wavelength coverage of each echelle order.

[ ]:

order_table = pd.DataFrame(l2['ORDER_TABLE'].data)
print(f"Number of orders: {len(order_table)}")
print(f"\nWavelength coverage: {order_table['WAVE_START'].min():.1f} - {order_table['WAVE_END'].max():.1f} Angstroms")
print("\nFirst 5 orders:")
print(order_table.head())

Plotting L2 Spectra

KPF has 5 traces:

TRACE1: Calibration fiber (etalon)
TRACE2, TRACE3, TRACE4: Science fibers
TRACE5: Sky fiber

Let’s plot one order from the science traces.

[ ]:

# Plot a single order from the three science traces
order = 30  # Choose an order to plot

fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)

for i, (ax, trace_num) in enumerate(zip(axes, [2, 3, 4])):
    wave = l2[f'TRACE{trace_num}_WAVE'].data[order]
    flux = l2[f'TRACE{trace_num}_FLUX'].data[order]
    blaze = l2[f'TRACE{trace_num}_BLAZE'].data[order]

    # Scale blaze for visualization
    blaze_scaled = blaze * (np.nanmax(flux) / np.nanmax(blaze))

    ax.plot(wave, flux, 'b-', lw=0.5, label='Flux')
    ax.plot(wave, blaze_scaled, 'orange', lw=1, label='Blaze (scaled)')
    ax.set_ylabel(f'TRACE{trace_num}\nCounts')
    ax.legend(loc='upper right')
    ax.set_ylim(0, np.nanmax(flux) * 1.1)

axes[-1].set_xlabel('Wavelength (Angstroms)')
fig.suptitle(f'KPF L2 Spectra - Order {order}', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

Level 3: Stitched 1D Spectrum

Level 3 data contains a stitched 1D spectrum on a common wavelength grid with constant velocity spacing. The stitching process:

Divides out the blaze function
Resamples each order onto a common wavelength grid
Combines overlapping regions using inverse-variance weighting

Creating L3 from L2

L3 is created from an RVData-standard L2 file.

[ ]:

# Create L3 from the standard L2 file
kpf_l3 = RV3.from_fits(l2_standard_file, instrument="KPF")

# Save to FITS file
l3_standard_file = "kpf_L3_standard.fits"
kpf_l3.to_fits(l3_standard_file)
print(f"Created {l3_standard_file}")

Using L3 Data

Reading the L3 File

[ ]:

# Open the L3 file
l3 = fits.open(l3_standard_file)

# List extensions
print("L3 Extensions:")
for hdu in l3:
    print(f"  {hdu.name}")

Understanding L3 Extensions

For KPF with multiple science fibers, the stitched spectra are stored in STITCHED_CORR_TRACE{n}_* extensions:

STITCHED_CORR_TRACE2_WAVE/FLUX/VAR: Science fiber 1
STITCHED_CORR_TRACE3_WAVE/FLUX/VAR: Science fiber 2
STITCHED_CORR_TRACE4_WAVE/FLUX/VAR: Science fiber 3

[ ]:

# Check which STITCHED extensions are present
stitched_exts = [hdu.name for hdu in l3 if 'STITCHED' in hdu.name]
print("Stitched spectrum extensions:")
for ext in stitched_exts:
    print(f"  {ext}")

Plotting the Stitched Spectrum

[ ]:

# Plot the stitched spectrum for one trace
# Check which trace extensions exist
trace_num = 2  # Try TRACE2 first
wave_ext = f'STITCHED_CORR_TRACE{trace_num}_WAVE'
flux_ext = f'STITCHED_CORR_TRACE{trace_num}_FLUX'

if wave_ext in [hdu.name for hdu in l3]:
    wave_l3 = l3[wave_ext].data
    flux_l3 = l3[flux_ext].data

    fig, ax = plt.subplots(figsize=(14, 5))
    ax.plot(wave_l3, flux_l3, 'b-', lw=0.3)
    ax.set_xlabel('Wavelength (Angstroms)', fontsize=12)
    ax.set_ylabel('Flux (normalized)', fontsize=12)
    ax.set_title(f'KPF L3 Stitched Spectrum - TRACE{trace_num}', fontsize=14, fontweight='bold')

    # Zoom inset
    ax.set_xlim(wave_l3[np.isfinite(wave_l3)].min(), wave_l3[np.isfinite(wave_l3)].max())
    plt.tight_layout()
    plt.show()

    print(f"\nWavelength range: {np.nanmin(wave_l3):.1f} - {np.nanmax(wave_l3):.1f} Angstroms")
    print(f"Number of pixels: {len(wave_l3)}")
else:
    print(f"Extension {wave_ext} not found. Available: {stitched_exts}")

Zoomed View of Spectral Features

[ ]:

# Zoom in on H-alpha region
if wave_ext in [hdu.name for hdu in l3]:
    fig, ax = plt.subplots(figsize=(12, 4))

    # H-alpha region
    mask = (wave_l3 > 6550) & (wave_l3 < 6580)
    ax.plot(wave_l3[mask], flux_l3[mask], 'b-', lw=0.5)
    ax.axvline(6562.8, color='red', ls='--', alpha=0.7, label='H-alpha (6562.8 A)')
    ax.set_xlabel('Wavelength (Angstroms)')
    ax.set_ylabel('Flux')
    ax.set_title('H-alpha Region', fontsize=12)
    ax.legend()
    plt.tight_layout()
    plt.show()

Level 4: Radial Velocity Measurements

Level 4 data contains radial velocity (RV) measurements derived from the spectra. These can include:

Per-order RVs
Combined RV with uncertainty
Activity indicators

Creating L4 from Native KPF L2

L4 is typically created from native pipeline outputs that contain RV measurements. For KPF, the native L2 file includes CCF-derived RVs.

[ ]:

# Create L4 from native KPF L2 file (which contains RV measurements)
kpf_l4 = RV4.from_fits(native_l2_file, instrument="KPF")

# Save to FITS file
l4_standard_file = "kpf_L4_standard.fits"
kpf_l4.to_fits(l4_standard_file)
print(f"Created {l4_standard_file}")

Using L4 Data

Reading the L4 File

[ ]:

# Open the L4 file
l4 = fits.open(l4_standard_file)

# Examine primary header for RV info
hdr4 = l4[0].header
print(f"Object: {hdr4['OBJECT']}")
print(f"Observation time (BJD): {hdr4.get('BJD_TDB', 'N/A')}")

# List extensions
print("\nL4 Extensions:")
for hdu in l4:
    print(f"  {hdu.name}")

Examining the RV1 Extension

The RV1 extension contains per-order radial velocity measurements with standardized column names.

[ ]:

# Examine the RV1 extension
rv1 = pd.DataFrame(l4['RV1'].data)
print("RV1 columns:")
print(rv1.columns.tolist())
print(f"\nNumber of orders: {len(rv1)}")
print("\nFirst 5 rows:")
print(rv1.head())

Per-Order RVs

The RV1 extension contains one row per echelle order. The RV column holds the combined RV measurement for each order, and ORDER_INDEX identifies which order each row corresponds to. Additional columns include BERV (barycentric earth radial velocity), WAVE_START/WAVE_END (wavelength range), and WEIGHT (RV weight).

[ ]:

# Plot per-order RVs from the RV1 extension
rv1 = pd.DataFrame(l4['RV1'].data)

fig, ax = plt.subplots(figsize=(10, 5))
valid = np.isfinite(rv1['RV'])
ax.scatter(rv1.loc[valid, 'ORDER_INDEX'], rv1.loc[valid, 'RV'], s=20, alpha=0.7)
median_rv = np.nanmedian(rv1['RV'])
ax.axhline(median_rv, color='red', ls='--', label=f'Median: {median_rv:.2f} m/s')
ax.set_xlabel('Order Index')
ax.set_ylabel('RV (m/s)')
ax.set_title('Per-Order Radial Velocities (RV1)')
ax.legend()
plt.tight_layout()
plt.show()

Summary

This tutorial demonstrated how to:

Create L2 from native KPF L0+L1 files using RV2.from_fits()
Use L2 data: access headers, examine extensions, plot spectra
Create L3 from standard L2 using RV3.from_fits()
Use L3 data: access stitched spectra, examine spectral features
Create L4 from native KPF L2 using RV4.from_fits()
Use L4 data: access RV measurements and per-order RVs

The standardized data format allows consistent access patterns across all EPRV instruments!

[ ]:

# Clean up - close FITS files
l2.close()
l3.close()
l4.close()