Use summary tables, which we expect will satisfy most users’ need. In the summary tables, unlike other tables,
- Multiband data are available for each celestial object;
- Fluxes are in [erg cm-2 s-1 Hz-1]; Magnitudes are also provided;
- Centroids are in (ra, dec);
- Shapes (quadrupole moments) are in arcseconds;
They are measured in the planes that contact the sky at the objects’ own positions.
These are not the case in other tables: Fluxes, for example, are not in a physical unit but are bare sums of values of pixels within objects’ areas.
There are two summary tables: meas and forced.
Contents in meas table were obtained from independent measurements of different bands. Forced table resulted from forced photometry.
The source files of meas table are “meas-###.fits”. The source files of forced tables are “forced_src-###.fits”.
We first explain the two in the following sections, and after that we show some use cases.
Contents in meas table were obtained from independent measurements of different bands. Meas table has been split into two tables (meas and meas2). This is due to a technical limit of PostgreSQL. It forbids a table more than around 1600 columns. If you want aperture fluxes or their flags, you have to join the two tables. Thus:
SELECT object_id, ira, idec, imag_aperture10, gmag_aperture10 FROM pdr1_wide.meas LEFT JOIN pdr1_wide.meas2 USING (object_id) WHERE -- (ra, dec) = (33 deg, -3.5 deg), radius = 60 arcsec. coneSearch(icoord, 33, -3.5, 60) ;
If you do not need any fields in meas2 table, you can omit meas2 from the
FROM-clause, but you do not have to — Unnecessary tables joined will not affect response performance.
- Meas table doesn’t have (ra,dec) fields, but it has fields of band-specific positions: (gra, gdec), (rra, rdec), (ira, idec), (zra, zdec), (yra, ydec), etc.
SELECT object_id, ra, dec, imag_aperture10, gmag_aperture10 FROM pdr1_wide.forced WHERE -- (ra, dec) = (33 deg, -3.5 deg), radius = 60 arcsec. coneSearch(coord, 33, -3.5, 60) ;
- Forced table have (ra, dec) fields common to all bands since in forced photometry, objects’ positions were shared by measurements of different bands.
SELECT object_id, ra, dec, gmag_sinc, rmag_sinc, imag_sinc FROM pdr1_udeep.forced WHERE -- Equivalent to "tract = 8524", but tractSearch() is by far the faster. tractSearch(object_id, 8524) AND detect_is_primary ;
- object_id is the unique ID number for each object. It is a 64-bit integer. Be careful not to ruin it by converting it to 32-bit integer or 32/64-bit float. For instance, opening a CSV file by Microsoft Excel will damage object_id. For advanced users:
object_idis called simply “id” in source catalogs output by the analysis pipeline. In other tables,
object_idfield is named
idwithout difference in its meanings and values.
- Multiband fields are prefixed with filter names. For example,
gmag_sincis “Magnitude by sinc algorithm in g-band”
- Fields not prefixed with filter names are common to all bands.
The summary tables have the following columns that are prepared specially for display, or in other words, for use in SELECT-clause. It is not recommended that they are used in search criteria.
- ra, dec; gra, gdec; rra, rdec; ira, idec,…
- These fields are computed from
gcoordand so on, on the fly.
- tract, patch
- These fields are computed from
skymap_idon the fly. Patch is an integer value that can be decomposed into patch_x and patch_y by “patch = 100*patch_x + patch_y”.
The summary tables have the following columns that are prepared specially for search criteria. They are not intended to be selected in
- coord; gcoord, rcoord, icoord,…
- These fields essentially contain the same information as (ra,dec), but are used only for search criteria. Thus:
WHERE boxSearch(coord, 33, 34, -3, -4).
This is equivalent to “
WHERE (ra BETWEEN 33 AND 34) AND (dec BETWEEN -4 AND -3)”, but the former is significantly faster. “Coord” is in forced table, while “gcoord” etc. are in meas table.
There are special condition functions that accelerate queries, as shown below. It is recommended that at least one of them is included in WHERE-clauses. Otherwise, your queries will not return in the time limit.
- boxSearch(coord, ra1, ra2, dec1, dec2) → boolean
- This function searches for objects within the box
[ra1, ra2] × [dec1, dec2].
For the first argument coord, one of fields “coord”, “gcoord”, “rcoord”, etc., must be passed in. The other arguments are in degrees.
boxSearch(coord, 33, 34, -3, -4)
- conesearch(coord, ra, dec, radius) → boolean
- This function searches for objects within the cone centered at (ra, dec) with its radius radius.
For the first argument coord, one of fields “coord”, “gcoord”, “rcoord”, etc., must be passed in. Ra and dec are in degrees. Radius is in arcseconds.
coneSearch(icoord, 33, -3.5, 60)
- tractSearch(object_id, tract) → boolean
- This function searches for objects within the tract tract. For the first argument object_id, the field “object_id” must be passed in.
- tractSearch(object_id, tractStart, tractEnd) → boolean
- This function searches for objects within the tracts from tractStart through tractEnd inclusive. For the first argument object_id, the field “object_id” must be passed in.
tractSearch(object_id, 8522, 8526)
SELECT object_id, ra, dec, -- cmodel fluxes gcmodel_flux, gcmodel_flux_err, rcmodel_flux, rcmodel_flux_err, icmodel_flux, icmodel_flux_err, zcmodel_flux, zcmodel_flux_err, ycmodel_flux, ycmodel_flux_err, -- PSF fluxes gflux_psf, gflux_psf_err, rflux_psf, rflux_psf_err, iflux_psf, iflux_psf_err, zflux_psf, zflux_psf_err, yflux_psf, yflux_psf_err, -- Galaxy or star? iclassification_extendedness, -- Absorptions a_g, a_r, a_i, a_z, a_y FROM pdr1_udeep.forced WHERE -- Search 2 degrees around (ra,dec) = (150, 2) coneSearch(coord, 150, 2, 2*3600) -- Primary object (see below) only AND detect_is_primary -- Forced-measurement was done with reference to i-band AND merge_measurement_i -- Centroid algorithm succeeded in all bands AND NOT gcentroid_sdss_flags AND NOT rcentroid_sdss_flags AND NOT icentroid_sdss_flags AND NOT zcentroid_sdss_flags AND NOT ycentroid_sdss_flags -- CModel flux algorithm succeeded in all bands AND NOT gcmodel_flux_flags AND NOT rcmodel_flux_flags AND NOT icmodel_flux_flags AND NOT zcmodel_flux_flags AND NOT ycmodel_flux_flags -- Not at the edges of images AND NOT gflags_pixel_edge AND NOT rflags_pixel_edge AND NOT iflags_pixel_edge AND NOT zflags_pixel_edge AND NOT yflags_pixel_edge -- Center 3x3 pixels are not interpolated AND NOT gflags_pixel_interpolated_center AND NOT rflags_pixel_interpolated_center AND NOT iflags_pixel_interpolated_center AND NOT zflags_pixel_interpolated_center AND NOT yflags_pixel_interpolated_center -- Center 3x3 pixels are not saturated AND NOT gflags_pixel_saturated_center AND NOT rflags_pixel_saturated_center AND NOT iflags_pixel_saturated_center AND NOT zflags_pixel_saturated_center AND NOT yflags_pixel_saturated_center -- Center 3x3 pixels are not contaminated by cosmic rays AND NOT gflags_pixel_cr_center AND NOT rflags_pixel_cr_center AND NOT iflags_pixel_cr_center AND NOT zflags_pixel_cr_center AND NOT yflags_pixel_cr_center -- Not on bad pixels of CCDs AND NOT gflags_pixel_bad AND NOT rflags_pixel_bad AND NOT iflags_pixel_bad AND NOT zflags_pixel_bad AND NOT yflags_pixel_bad -- Get first 100 records only, for preview. -- Remove this limit to get the whole result. LIMIT 100 ;
This is a straightforward extension from the basic usage above.
If objects are primary objects (as demanded in the
WHERE-clause), they are:
- Isolated objects or deblended children, and
- Within inner patches and inner tracts.
Inner patches / inner tracts:
In mosaicking, the sky was split into regions called tracts, and for each tract its sky region was projected onto a plane. The plane is called a tract, too. The plane was further split into fragments of modest size called patches. A problem here is that the tracts an patches are defined with slight overlaps with each other. Objects in the overlapping regions appears twice or more in the catalog table. “Inner patches / inner tracts” are literally inner regions within patches and tracts, and defined such that they are adjacent to each other without overlaps. You can avoid object duplication by requiring the objects are within inner patches and inner tracts.
Flags indicates failure in general. To get “clean” catalogs, you have to check them to be
FALSE. You should bear in mind that logic is tristate in SQL: a flag may be
UNKNOWN (also called
“TRUE = UNKNOWN” → UNKNOWN;
“UNKNOWN and TRUE” → UNKNOWN;
“UNKNOWN or TRUE” → TRUE.
Aside from these usual logical operators, there are six postpositional operators that must be used carefully:
IS UNKNOWN, and their negation
IS NOT TRUE etc. Values of these six will never be
UNKNOWN but will always be
TRUE/FALSE. For example,
“UNKNOWN is not false” → TRUE while
“UNKNOWN != false” → UNKNOWN. Beware of the difference or you may get an unintended result.
“LIMIT 100” is required before you push ‘preview’ button in the Direct SQL Search Form. If the preview times out, press the ‘preview’ button again and again until it succeeds. In the process, data will gradually be loaded from hard disks onto memory, and finally the preview will succeed in time — probably for the example above, but do not assume this is the case in general.
SELECT -- From "forced" object_id, ra, dec, icmodel_flux, -- From "photoz_mizuki" photoz_mc FROM pdr1_udeep.forced AS f LEFT JOIN pdr1_udeep.photoz_mizuki AS photoz USING (object_id) WHERE coneSearch(coord, 150, 2, 2*3600) AND detect_is_primary LIMIT 100;
To look up photo-z for each object, you can join one or more photo-z tables
to the summary tables.
object_id in the summary tables corresponds to
id” in photoz tables, hence the join condition
object_id = photoz.id”.
On details of join, Google will lecture you — Search for “SQL join”.
If your catalog is small (around 10,000 records), you can match it with the summary tables in this way:
WITH -- Describe the catalog to match my_catalog(my_id, my_ra, my_dec, my_mag) AS (VALUES -- Specify data types here (NULL::int8, NULL::float8, NULL::float8, NULL::float4), -- Then list records to match (37484559004079282, 34.9368103756283, -5.49575920878389, 25.8668), (37484559004079399, 34.9376518156381, -5.49356595402835, 25.4748), (37484559004079428, 34.9461145648192, -5.49299547939153, 25.2956), (37484559004079718, 34.9377573698323, -5.48685133433986, 25.4888), : : (10,000 records) : (37484705032992272, 34.8390549872623, -5.23730251854975, 24.7598) ) , -- Create the match catalog which contain minimal required information match AS ( SELECT my_catalog.*, object_id -- No columns should be selected other than "object_id" -- To obtain other columns, describe them in the principal clause below FROM my_catalog JOIN pdr1_wide.meas ON ( -- Positions within 1.0 arcseconds coneSearch(icoord, my_ra, my_dec, 1.0) ) -- No WHERE-clause should exist here ) -- The principal clause: -- Filter the above match catalog by conditions -- and select columns other than object_id SELECT match.*, ira, idec, imag_sinc FROM match JOIN pdr1_wide.meas USING (object_id) WHERE idetect_is_primary ;
This query looks complicated and is actually complex in a grammatical sense. It is composed of two subordinate clauses,
match, and a principal clause. Thus:
WITH my_catalog ..., match ... SELECT ...;
First, you have to create your own catalog to match:
my_catalog. You can use “VALUES (a, b, c), (d, e, f),…” construct for this purpose.
Then, you create a match catalog: “
match”. Matching will be done without special tricks — for each record in
my_catalog, the summary table is searched for matching records. It is desirable that this process is performed quickly, and so this clause (“match AS (…)”) should be as simple as possible. You should not select columns other than object_id from the summary tables, and you should not describe a WHERE-clause here.
Finally, select wanted columns and sift out clean records in the principal clause.
You should note that the resulting matches are not one-on-one. If there are two or more objects in one catalog within the search radius from an object in another, all the pairs are included in the match catalog.
In the summary tables, invalid values are represented by
'NaN' (Not a Number).
For people unfamiliar to computer science, behaviors of NaN look strange at first glance, but PostgreSQL treats
'NaN' still more surprising manner. Read this section carefully even if you are familiar with NaN.
Usually (but not necessarily in PostgreSQL), NaN implies “unknown”. Therefore,
- “NaN = NaN” is false. (NaN is not equal to itself)
- “NaN != NaN” is true.
- “NaN > 1” is false, “NaN = 1” is false, and “NaN < 1” is also false.
We must emphasize again that the rules above are not necessarily the case in PostgreSQL, but once you get catalogs and you open them with C/C++ programs, python scripts, or any other tools, then NaN will behave like this.
'NaN' is ordered:
- “NaN = NaN” is true.
- “NaN != NaN” is false.
- “NaN > 1” is true, “NaN = 1” is false, and “NaN < 1” is false.
'NaN' is the largest number of all, larger than
Let us consider a criterium
rmag_kron - imag_kron > 1:
The criterium above will admit such objects that i-band magnitude is faint (NaN). It may not be your intention. Perhaps what you need is:
rmag_kron - imag_kron > 1 AND imag_kron != 'NaN'.
A tricky solution is to move
imag_kron to the right hand side:
In astronomy, as an object gets fainter, its magnitude gets larger. If the object gets even fainter, the magnitude will finally be NaN. Fortunately this fact is consistent with PostgreSQL’s treatment of NaN that NaN is larger than any other number.
help() function in the database. You can get descriptions of tables, columns, and functions like:
SELECT * FROM help('pdr1_udeep.%'); SELECT * FROM help('pdr1_udeep.meas'); SELECT * FROM help('pdr1_udeep.meas%.ishape_%'); SELECT * FROM help('%search');
‘%’ is the wildcard. It is not ‘*’. Basically all you need is this one special character, but if you want details, google “postgresql similar to”
You can further exploit
help() function to find tables that contain specific columns:
SELECT * FROM help('pdr1_%.%.n921shape_sdss%');
The contents in meas table were the output of independent measurements.
Independent measurements were, in contrast to forced photometry, done independently in different bands. You should however note that the measurements were not totally independent. We explain it in the following paragraphs.
The measurement process began with peak detection in each band. It occurred independently.
Second, the peaks in all bands were collected to make a pan-band peak catalog. Peaks in different bands but sufficiently close in position were merged here. The flags merge_peak_g, merge_peak_r, etc. will tell you what bands a peak came from. A peak may come from two or more bands, in which case it was a merged peak.
Third, each band was measured with the use of the pan-band peak catalog. Though peak positions as initial values were common to all bands, measurements (deblend, centroid, flux, shape,…) were done independently of the other bands. Deblending did not create new children: the process only determined distribution of photon flux to existing peaks.
Independent measurements that we call here are this third stage of the measurement process. The outcome from the third stage are stored in meas table.
The contents in forced table resulted from forced photometry. Let us explain its difference from independent measurements.
After the three stages above, the measurement proces still went on.
Fourth, the catalogs from the third stage were collected to make a reference catalog with which to perform forced photometry. This merge was different from the second stage in nature: for each object, the process chose the first record that had sufficient flux S/N, among the bands from which the peak had actually come from, in the order of a pre-defined priority. The reference catalog resulted from this procedure. The flags merge_measurement_g, merge_measurement_r, etc. will tell you what band a reference record came from.
Fifth, forced photometry was performed on each band with the reference catalog. In this stage, some parameters (centroid and ellipticities) in flux measurements were not free but fixed as the reference. Note centroids and shapes were actually re-measured, but these new values were not used in flux measurements.
The outcome from the fifth stage are stored in forced table.