Summary Table Usage


Summary Tables

Use summary tables, which we expect will satisfy most users’ need. In the summary tables, unlike other tables,

  • Multiband data are available for each celestial object;

And,

  • Fluxes are in [erg cm-2 s-1 Hz-1]; Magnitudes are also provided;
  • Centroids are in (ra, dec);
  • Shapes (quadrupole moments) are in arcseconds;
    They are measured in the planes that contact the sky at the objects’ own positions.

These are not the case in other tables: Fluxes, for example, are not in a physical unit but are bare sums of values of pixels within objects’ areas.

There are two summary tables: meas and forced.

Contents in meas table were obtained from independent measurements of different bands. Forced table resulted from forced photometry.
The source files of meas table are “meas-###.fits”. The source files of forced tables are “forced_src-###.fits”.

We first explain the two in the following sections, and after that we show some use cases.

meas

Contents in meas table were obtained from independent measurements of different bands. Meas table has been split into two tables (meas and meas2). This is due to a technical limit of PostgreSQL. It forbids a table more than around 1600 columns. If you want aperture fluxes or their flags, you have to join the two tables. Thus:

SELECT
  object_id, ira, idec, imag_aperture10, gmag_aperture10
FROM
  pdr1_wide.meas LEFT JOIN pdr1_wide.meas2 USING (object_id)
WHERE
  -- (ra, dec) = (33 deg, -3.5 deg), radius = 60 arcsec.
  coneSearch(icoord, 33, -3.5, 60)
;

If you do not need any fields in meas2 table, you can omit meas2 from the FROM-clause, but you do not have to — Unnecessary tables joined will not affect response performance.

Note:

  • Meas table doesn’t have (ra,dec) fields, but it has fields of band-specific positions: (gra, gdec), (rra, rdec), (ira, idec), (zra, zdec), (yra, ydec), etc.

forced

Forced table resulted from forced photometry. It is easier to use forced table than meas table, for everything is in one table, unlike meas table:

SELECT
  object_id, ra, dec, imag_aperture10, gmag_aperture10
FROM
  pdr1_wide.forced
WHERE
  -- (ra, dec) = (33 deg, -3.5 deg), radius = 60 arcsec.
  coneSearch(coord, 33, -3.5, 60)
;

Note:

  • Forced table have (ra, dec) fields common to all bands since in forced photometry, objects’ positions were shared by measurements of different bands.

Basic usage

Example:

SELECT
  object_id, ra, dec, gmag_sinc, rmag_sinc, imag_sinc
FROM
  pdr1_udeep.forced
WHERE
  -- Equivalent to "tract = 8524", but tractSearch() is by far the faster.
  tractSearch(object_id, 8524)
  AND detect_is_primary
;
  • object_id is the unique ID number for each object. It is a 64-bit integer. Be careful not to ruin it by converting it to 32-bit integer or 32/64-bit float. For instance, opening a CSV file by Microsoft Excel will damage object_id. For advanced users: object_id is called simply “id” in source catalogs output by the analysis pipeline. In other tables, object_id field is named id without difference in its meanings and values.
  • Multiband fields are prefixed with filter names. For example, gmag_sinc is “Magnitude by sinc algorithm in g-band”
  • Fields not prefixed with filter names are common to all bands.

The summary tables have the following columns that are prepared specially for display, or in other words, for use in SELECT-clause. It is not recommended that they are used in search criteria.

ra, dec; gra, gdec; rra, rdec; ira, idec,…
These fields are computed from coord, gcoord and so on, on the fly.
tract, patch
These fields are computed from skymap_id on the fly. Patch is an integer value that can be decomposed into patch_x and patch_y by “patch = 100*patch_x + patch_y”.

The summary tables have the following columns that are prepared specially for search criteria. They are not intended to be selected in SELECT-clauses.

coord; gcoord, rcoord, icoord,…
These fields essentially contain the same information as (ra,dec), but are used only for search criteria. Thus:
WHERE boxSearch(coord, 33, 34, -3, -4).
This is equivalent to “WHERE (ra BETWEEN 33 AND 34) AND (dec BETWEEN -4 AND -3)”, but the former is significantly faster. “Coord” is in forced table, while “gcoord” etc. are in meas table.

There are special condition functions that accelerate queries, as shown below. It is recommended that at least one of them is included in WHERE-clauses. Otherwise, your queries will not return in the time limit.

boxSearch(coord, ra1, ra2, dec1, dec2) → boolean
This function searches for objects within the box
[ra1, ra2] × [dec1, dec2].
For the first argument coord, one of fields “coord”, “gcoord”, “rcoord”, etc., must be passed in. The other arguments are in degrees.
Example: boxSearch(coord, 33, 34, -3, -4)
conesearch(coord, ra, dec, radius) → boolean
This function searches for objects within the cone centered at (ra, dec) with its radius radius.
For the first argument coord, one of fields “coord”, “gcoord”, “rcoord”, etc., must be passed in. Ra and dec are in degrees. Radius is in arcseconds.
Example: coneSearch(icoord, 33, -3.5, 60)
tractSearch(object_id, tract) → boolean
This function searches for objects within the tract tract. For the first argument object_id, the field “object_id” must be passed in.
Example: tractSearch(object_id, 8524)
tractSearch(object_id, tractStart, tractEnd) → boolean
This function searches for objects within the tracts from tractStart through tractEnd inclusive. For the first argument object_id, the field “object_id” must be passed in.
Example: tractSearch(object_id, 8522, 8526)

Practical use cases

Get clean objects from COSMOS Ultra Deep Field

SELECT
  object_id, ra, dec,
  -- cmodel fluxes
  gcmodel_flux, gcmodel_flux_err, 
  rcmodel_flux, rcmodel_flux_err, 
  icmodel_flux, icmodel_flux_err, 
  zcmodel_flux, zcmodel_flux_err, 
  ycmodel_flux, ycmodel_flux_err, 
  -- PSF fluxes
  gflux_psf, gflux_psf_err,
  rflux_psf, rflux_psf_err,
  iflux_psf, iflux_psf_err,
  zflux_psf, zflux_psf_err,
  yflux_psf, yflux_psf_err,
  -- Galaxy or star?
  iclassification_extendedness,
  -- Absorptions
  a_g, a_r, a_i, a_z, a_y

FROM
 pdr1_udeep.forced

WHERE
  -- Search 2 degrees around (ra,dec) = (150, 2)
  coneSearch(coord, 150, 2, 2*3600)

  -- Primary object (see below) only
  AND detect_is_primary

  -- Forced-measurement was done with reference to i-band
  AND merge_measurement_i

  -- Centroid algorithm succeeded in all bands
  AND NOT gcentroid_sdss_flags
  AND NOT rcentroid_sdss_flags
  AND NOT icentroid_sdss_flags
  AND NOT zcentroid_sdss_flags
  AND NOT ycentroid_sdss_flags

  -- CModel flux algorithm succeeded in all bands
  AND NOT gcmodel_flux_flags
  AND NOT rcmodel_flux_flags
  AND NOT icmodel_flux_flags
  AND NOT zcmodel_flux_flags
  AND NOT ycmodel_flux_flags

  -- Not at the edges of images
  AND NOT gflags_pixel_edge
  AND NOT rflags_pixel_edge
  AND NOT iflags_pixel_edge
  AND NOT zflags_pixel_edge
  AND NOT yflags_pixel_edge

  -- Center 3x3 pixels are not interpolated
  AND NOT gflags_pixel_interpolated_center
  AND NOT rflags_pixel_interpolated_center
  AND NOT iflags_pixel_interpolated_center
  AND NOT zflags_pixel_interpolated_center
  AND NOT yflags_pixel_interpolated_center

  -- Center 3x3 pixels are not saturated
  AND NOT gflags_pixel_saturated_center
  AND NOT rflags_pixel_saturated_center
  AND NOT iflags_pixel_saturated_center
  AND NOT zflags_pixel_saturated_center
  AND NOT yflags_pixel_saturated_center

  -- Center 3x3 pixels are not contaminated by cosmic rays
  AND NOT gflags_pixel_cr_center
  AND NOT rflags_pixel_cr_center
  AND NOT iflags_pixel_cr_center
  AND NOT zflags_pixel_cr_center
  AND NOT yflags_pixel_cr_center

  -- Not on bad pixels of CCDs
  AND NOT gflags_pixel_bad
  AND NOT rflags_pixel_bad
  AND NOT iflags_pixel_bad
  AND NOT zflags_pixel_bad
  AND NOT yflags_pixel_bad

-- Get first 100 records only, for preview.
-- Remove this limit to get the whole result.
LIMIT 100
;

This is a straightforward extension from the basic usage above.

If objects are primary objects (as demanded in the WHERE-clause), they are:

  • Isolated objects or deblended children, and
  • Within inner patches and inner tracts.

Inner patches / inner tracts:
In mosaicking, the sky was split into regions called tracts, and for each tract its sky region was projected onto a plane. The plane is called a tract, too. The plane was further split into fragments of modest size called patches. A problem here is that the tracts an patches are defined with slight overlaps with each other. Objects in the overlapping regions appears twice or more in the catalog table. “Inner patches / inner tracts” are literally inner regions within patches and tracts, and defined such that they are adjacent to each other without overlaps. You can avoid object duplication by requiring the objects are within inner patches and inner tracts.

Flags indicates failure in general. To get “clean” catalogs, you have to check them to be FALSE. You should bear in mind that logic is tristate in SQL: a flag may be TRUE, FALSE, or UNKNOWN (also called NULL).
Thus:
“TRUE = UNKNOWN” → UNKNOWN;
“UNKNOWN and TRUE” → UNKNOWN;
but “UNKNOWN or TRUE” → TRUE.
Aside from these usual logical operators, there are six postpositional operators that must be used carefully: IS TRUE, IS FALSE, IS UNKNOWN, and their negation IS NOT TRUE etc. Values of these six will never be UNKNOWN but will always be TRUE/FALSE. For example, “UNKNOWN is not false” → TRUE while “UNKNOWN != false” → UNKNOWN. Beware of the difference or you may get an unintended result.

LIMIT 100” is required before you push ‘preview’ button in the Direct SQL Search Form. If the preview times out, press the ‘preview’ button again and again until it succeeds. In the process, data will gradually be loaded from hard disks onto memory, and finally the preview will succeed in time — probably for the example above, but do not assume this is the case in general.

Look up photo-z

SELECT
  -- From "forced"
  object_id, ra, dec, icmodel_flux,
  -- From "photoz_mizuki"
  photoz_mc
FROM
  pdr1_udeep.forced AS f
  LEFT JOIN pdr1_udeep.photoz_mizuki AS photoz USING (object_id)
WHERE
  coneSearch(coord, 150, 2, 2*3600)
  AND detect_is_primary
LIMIT 100;

To look up photo-z for each object, you can join one or more photo-z tables
to the summary tables.
Note that object_id in the summary tables corresponds to
id” in photoz tables, hence the join condition
object_id = photoz.id”.
On details of join, Google will lecture you — Search for “SQL join”.

Match your catalog with the summary tables

If your catalog is small (around 10,000 records), you can match it with the summary tables in this way:

WITH
  -- Describe the catalog to match
  my_catalog(my_id, my_ra, my_dec, my_mag) AS (VALUES
    -- Specify data types here
    (NULL::int8, NULL::float8, NULL::float8, NULL::float4),
    -- Then list records to match
    (37484559004079282, 34.9368103756283, -5.49575920878389, 25.8668),
    (37484559004079399, 34.9376518156381, -5.49356595402835, 25.4748),
    (37484559004079428, 34.9461145648192, -5.49299547939153, 25.2956),
    (37484559004079718, 34.9377573698323, -5.48685133433986, 25.4888),
      :
      : (10,000 records)
      :
    (37484705032992272, 34.8390549872623, -5.23730251854975, 24.7598)
  )
  ,
  -- Create the match catalog which contain minimal required information
  match AS (
    SELECT
      my_catalog.*,
      object_id
      -- No columns should be selected other than "object_id"
      -- To obtain other columns, describe them in the principal clause below
    FROM
      my_catalog JOIN pdr1_wide.meas ON (
        -- Positions within 1.0 arcseconds
        coneSearch(icoord, my_ra, my_dec, 1.0)
      )
      -- No WHERE-clause should exist here
  )
-- The principal clause:
--   Filter the above match catalog by conditions
--   and select columns other than object_id
SELECT
  match.*,
  ira, idec, imag_sinc
FROM
  match JOIN pdr1_wide.meas USING (object_id)
WHERE
  idetect_is_primary
;

This query looks complicated and is actually complex in a grammatical sense. It is composed of two subordinate clauses, my_catalog and match, and a principal clause. Thus:
WITH my_catalog ..., match ... SELECT ...;

First, you have to create your own catalog to match: my_catalog. You can use “VALUES (a, b, c), (d, e, f),…” construct for this purpose.

Then, you create a match catalog: “match”. Matching will be done without special tricks — for each record in my_catalog, the summary table is searched for matching records. It is desirable that this process is performed quickly, and so this clause (“match AS (…)”) should be as simple as possible. You should not select columns other than object_id from the summary tables, and you should not describe a WHERE-clause here.

Finally, select wanted columns and sift out clean records in the principal clause.

You should note that the resulting matches are not one-on-one. If there are two or more objects in one catalog within the search radius from an object in another, all the pairs are included in the match catalog.

Notes on invalid values

In the summary tables, invalid values are represented by 'NaN' (Not a Number).

For people unfamiliar to computer science, behaviors of NaN look strange at first glance, but PostgreSQL treats 'NaN' still more surprising manner. Read this section carefully even if you are familiar with NaN.

Common knowledge on NaN

Usually (but not necessarily in PostgreSQL), NaN implies “unknown”. Therefore,

  • “NaN = NaN” is false. (NaN is not equal to itself)
  • “NaN != NaN” is true.
  • “NaN > 1” is false, “NaN = 1” is false, and “NaN < 1” is also false.

We must emphasize again that the rules above are not necessarily the case in PostgreSQL, but once you get catalogs and you open them with C/C++ programs, python scripts, or any other tools, then NaN will behave like this.

Peculiarities of NaN in PostgreSQL

In PostgreSQL, 'NaN' is ordered:

  • “NaN = NaN” is true.
  • “NaN != NaN” is false.
  • “NaN > 1” is true, “NaN = 1” is false, and “NaN < 1” is false.

In fact, 'NaN' is the largest number of all, larger than 'Infinity'.

Let us consider a criterium rmag_kron - imag_kron > 1:

rmag_kron imag_kron rmag_kron - imag_kron rmag_kron - imag_kron > 1
NaN (other value) NaN True
(other values) NaN NaN True
NaN NaN NaN True

The criterium above will admit such objects that i-band magnitude is faint (NaN). It may not be your intention. Perhaps what you need is:
rmag_kron - imag_kron > 1 AND imag_kron != 'NaN'.

A tricky solution is to move imag_kron to the right hand side:

rmag_kron imag_kron rmag_kron > imag_kron + 1
NaN (other value) True
(other values) NaN False
NaN NaN False

In astronomy, as an object gets fainter, its magnitude gets larger. If the object gets even fainter, the magnitude will finally be NaN. Fortunately this fact is consistent with PostgreSQL’s treatment of NaN that NaN is larger than any other number.

Help() function

There is help() function in the database. You can get descriptions of tables, columns, and functions like:

    SELECT * FROM help('pdr1_udeep.%');
    SELECT * FROM help('pdr1_udeep.meas');
    SELECT * FROM help('pdr1_udeep.meas%.ishape_%');
    SELECT * FROM help('%search');

‘%’ is the wildcard. It is not ‘*’. Basically all you need is this one special character, but if you want details, google “postgresql similar to”

You can further exploit help() function to find tables that contain specific columns:

    SELECT * FROM help('pdr1_%.%.n921shape_sdss%');

Appendix

Independent measurements

The contents in meas table were the output of independent measurements.

Independent measurements were, in contrast to forced photometry, done independently in different bands. You should however note that the measurements were not totally independent. We explain it in the following paragraphs.

The measurement process began with peak detection in each band. It occurred independently.

Second, the peaks in all bands were collected to make a pan-band peak catalog. Peaks in different bands but sufficiently close in position were merged here. The flags merge_peak_g, merge_peak_r, etc. will tell you what bands a peak came from. A peak may come from two or more bands, in which case it was a merged peak.

Third, each band was measured with the use of the pan-band peak catalog. Though peak positions as initial values were common to all bands, measurements (deblend, centroid, flux, shape,…) were done independently of the other bands. Deblending did not create new children: the process only determined distribution of photon flux to existing peaks.

Independent measurements that we call here are this third stage of the measurement process. The outcome from the third stage are stored in meas table.

Forced photometry

The contents in forced table resulted from forced photometry. Let us explain its difference from independent measurements.

After the three stages above, the measurement proces still went on.

Fourth, the catalogs from the third stage were collected to make a reference catalog with which to perform forced photometry. This merge was different from the second stage in nature: for each object, the process chose the first record that had sufficient flux S/N, among the bands from which the peak had actually come from, in the order of a pre-defined priority. The reference catalog resulted from this procedure. The flags merge_measurement_g, merge_measurement_r, etc. will tell you what band a reference record came from.

Fifth, forced photometry was performed on each band with the reference catalog. In this stage, some parameters (centroid and ellipticities) in flux measurements were not free but fixed as the reference. Note centroids and shapes were actually re-measured, but these new values were not used in flux measurements.

The outcome from the fifth stage are stored in forced table.