Summary Tables
Use summary tables, which we expect will satisfy most users’ need. In the summary tables, unlike other tables,
- Multiband data are available for each celestial object;
And,
- Fluxes are in [nJy] (changed from [erg cm-2 s-1 Hz-1] of PDR1); Magnitudes are also provided;
- Centroids are in (ra, dec);
- Shapes (quadrupole moments) are in arcseconds;
They are measured in the planes that contact the sky at the objects’ own positions.
These are not the case in other tables: Fluxes, for example, are not in a physical unit but are bare sums of values of pixels within objects’ areas.
There are two summary tables: meas and forced.
Contents in meas table were obtained from independent measurements of different bands. Forced table resulted from forced photometry.
The source files of meas table are “meas-###.fits”. The source files of forced tables are “forced_src-###.fits”.
We first explain the two in the following sections, and after that we show some use cases.
meas
Contents in meas table were obtained from independent measurements of different bands. Meas table has been split into two tables (meas and meas2). This is due to a technical limit of PostgreSQL. It forbids a table more than around 1600 columns. If you want aperture fluxes or their flags, you have to join the two tables. Thus:
SELECT
object_id, i_ra, i_dec, i_apertureflux_10_mag, g_apertureflux_10_mag
FROM
pdr2_wide.meas LEFT JOIN pdr2_wide.meas3 USING (object_id)
WHERE
-- (ra, dec) = (33 deg, -3.5 deg), radius = 60 arcsec.
coneSearch(i_coord, 33, -3.5, 60)
;
If you do not need any fields in meas2 table, you can omit meas2 from the FROM
-clause, but you do not have to — Unnecessary tables joined will not affect response performance.
Note:
- Meas table doesn’t have (ra,dec) fields, but it has fields of band-specific positions: (gra, gdec), (rra, rdec), (ira, idec), (zra, zdec), (yra, ydec), etc.
forced
Forced table resulted from forced photometry. It is easier to use forced table than meas table, for everything is in one table, unlike meas table:
SELECT
object_id, ra, dec, i_apertureflux_10_mag, g_apertureflux_10_mag
FROM
pdr2_wide.forced LEFT JOIN pdr2_wide.forced3 USING (object_id)
WHERE
-- (ra, dec) = (33 deg, -3.5 deg), radius = 60 arcsec.
coneSearch(coord, 33, -3.5, 60)
;
Note:
- Forced table have (ra, dec) fields common to all bands since in forced photometry, objects’ positions were shared by measurements of different bands.
Basic usage
Example:
SELECT
object_id, ra, dec, g_cmodel_mag, r_cmodel_mag, i_cmodel_mag
FROM
pdr2_dud.forced
WHERE
-- Equivalent to "tract = 8524", but tractSearch() is by far the faster.
tractSearch(object_id, 8524)
AND isprimary
;
- object_id is the unique ID number for each object. It is a 64-bit integer. Be careful not to ruin it by converting it to 32-bit integer or 32/64-bit float. For instance, opening a CSV file by Microsoft Excel will damage object_id. For advanced users:
object_id
is called simply “id” in source catalogs output by the analysis pipeline. In other tables,object_id
field is namedid
without difference in its meanings and values. - Multiband fields are prefixed with filter names. For example,
g_cmodel_mag
is “Magnitude by cmodel algorithm in g-band” - Fields not prefixed with filter names are common to all bands.
The summary tables have the following columns that are prepared specially for display, or in other words, for use in SELECT-clause. It is not recommended that they are used in search criteria.
- ra, dec; g_ra, g_dec; r_ra, r_dec; i_ra, i_dec,…
- These fields are computed from
coord
,g_coord
and so on, on the fly. - tract, patch
- These fields are computed from
skymap_id
on the fly. Patch is an integer value that can be decomposed into patch_x and patch_y by “patch = 100*patch_x + patch_y”.
The summary tables have the following columns that are prepared specially for search criteria. They are not intended to be selected in SELECT
-clauses.
- coord; g_coord, r_coord, i_coord,…
- These fields essentially contain the same information as (ra,dec), but are used only for search criteria. Thus:
WHERE boxSearch(coord, 33, 34, -3, -4)
.
This is equivalent to “WHERE (ra BETWEEN 33 AND 34) AND (dec BETWEEN -4 AND -3)
”, but the former is significantly faster. “Coord” is in forced table, while “g_coord” etc. are in meas table.
There are special condition functions that accelerate queries, as shown below. It is recommended that at least one of them is included in WHERE-clauses. Otherwise, your queries will not return in the time limit.
- boxSearch(coord, ra1, ra2, dec1, dec2) → boolean
- This function searches for objects within the box
[ra1, ra2] × [dec1, dec2].
For the first argument coord, one of fields “coord”, “g_coord”, “r_coord”, etc., must be passed in. The other arguments are in degrees. - Example:
boxSearch(coord, 33, 34, -3, -4)
- conesearch(coord, ra, dec, radius) → boolean
- This function searches for objects within the cone centered at (ra, dec) with its radius radius.
For the first argument coord, one of fields “coord”, “g_coord”, “r_coord”, etc., must be passed in. RA and Dec are in degrees. Radius is in arcseconds. - Example:
coneSearch(i_coord, 33, -3.5, 60)
- tractSearch(object_id, tract) → boolean
- This function searches for objects within the tract tract. For the first argument object_id, the field “object_id” must be passed in.
Example:tractSearch(object_id, 8524)
- tractSearch(object_id, tractStart, tractEnd) → boolean
- This function searches for objects within the tracts from tractStart through tractEnd inclusive. For the first argument object_id, the field “object_id” must be passed in.
Example:tractSearch(object_id, 8522, 8526)
Practical use cases
Get clean objects from COSMOS Deep / Ultra Deep Field
SELECT object_id, ra, dec, -- cmodel fluxes g_cmodel_flux, g_cmodel_fluxsigma, r_cmodel_flux, r_cmodel_fluxsigma, i_cmodel_flux, i_cmodel_fluxsigma, z_cmodel_flux, z_cmodel_fluxsigma, y_cmodel_flux, y_cmodel_fluxsigma, -- PSF fluxes g_psfflux_flux, g_psfflux_fluxsigma, r_psfflux_flux, r_psfflux_fluxsigma, i_psfflux_flux, i_psfflux_fluxsigma, z_psfflux_flux, z_psfflux_fluxsigma, y_psfflux_flux, y_psfflux_fluxsigma, -- Galaxy or star? i_extendedness_value, -- Absorptions a_g, a_r, a_i, a_z, a_y FROM pdr2_dud.forced JOIN pdr2_dud.forced2 USING (object_id) WHERE -- Search 2 degrees around (ra,dec) = (150, 2) coneSearch(coord, 150, 2, 2*3600) -- Primary object (see below) only AND isprimary -- Forced-measurement was done with reference to i-band AND merge_measurement_i -- Centroid algorithm succeeded in all bands AND NOT g_sdsscentroid_flag AND NOT r_sdsscentroid_flag AND NOT i_sdsscentroid_flag AND NOT z_sdsscentroid_flag AND NOT y_sdsscentroid_flag -- CModel flux algorithm succeeded in all bands AND NOT g_cmodel_flag AND NOT r_cmodel_flag AND NOT i_cmodel_flag AND NOT z_cmodel_flag AND NOT y_cmodel_flag -- Not at the edges of images AND NOT g_pixelflags_edge AND NOT r_pixelflags_edge AND NOT i_pixelflags_edge AND NOT z_pixelflags_edge AND NOT y_pixelflags_edge -- Center 3x3 pixels are not interpolated AND NOT g_pixelflags_interpolatedcenter AND NOT r_pixelflags_interpolatedcenter AND NOT i_pixelflags_interpolatedcenter AND NOT z_pixelflags_interpolatedcenter AND NOT y_pixelflags_interpolatedcenter -- Center 3x3 pixels are not saturated AND NOT g_pixelflags_saturatedcenter AND NOT r_pixelflags_saturatedcenter AND NOT i_pixelflags_saturatedcenter AND NOT z_pixelflags_saturatedcenter AND NOT y_pixelflags_saturatedcenter -- Center 3x3 pixels are not contaminated by cosmic rays AND NOT g_pixelflags_crcenter AND NOT r_pixelflags_crcenter AND NOT i_pixelflags_crcenter AND NOT z_pixelflags_crcenter AND NOT y_pixelflags_crcenter -- Not on bad pixels of CCDs AND NOT g_pixelflags_bad AND NOT r_pixelflags_bad AND NOT i_pixelflags_bad AND NOT z_pixelflags_bad AND NOT y_pixelflags_bad -- Get first 100 records only, for preview. -- Remove this limit to get the whole result. LIMIT 100 ;
This is a straightforward extension from the basic usage above.
If objects are primary objects (as demanded in the WHERE
-clause), they are:
- Isolated objects or deblended children, and
- Within inner patches and inner tracts.
Inner patches / inner tracts:
In mosaicking, the sky was split into regions called tracts, and for each tract its sky region was projected onto a plane. The plane is called a tract, too. The plane was further split into fragments of modest size called patches. A problem here is that the tracts an patches are defined with slight overlaps with each other. Objects in the overlapping regions appears twice or more in the catalog table. “Inner patches / inner tracts” are literally inner regions within patches and tracts, and defined such that they are adjacent to each other without overlaps. You can avoid object duplication by requiring the objects are within inner patches and inner tracts.
Flags indicates failure in general. To get “clean” catalogs, you have to check them to be FALSE
. You should bear in mind that logic is tristate in SQL: a flag may be TRUE
, FALSE
, or UNKNOWN
(also called NULL
).
Thus:
“TRUE = UNKNOWN” → UNKNOWN
;
“UNKNOWN and TRUE” → UNKNOWN
;
but “UNKNOWN or TRUE” → TRUE
.
Aside from these usual logical operators, there are six postpositional operators that must be used carefully: IS TRUE
, IS FALSE
, IS UNKNOWN
, and their negation IS NOT TRUE
etc. Values of these six will never be UNKNOWN
but will always be TRUE/FALSE
. For example, “UNKNOWN is not false” → TRUE
while “UNKNOWN != false” → UNKNOWN
. Beware of the difference or you may get an unintended result.
“LIMIT 100” is required before you push ‘preview’ button in the Direct SQL Search Form. If the preview times out, press the ‘preview’ button again and again until it succeeds. In the process, data will gradually be loaded from hard disks onto memory, and finally the preview will succeed in time — probably for the example above, but do not assume this is the case in general.
Match your catalog with the summary tables
If your catalog is small (around 10,000 records), you can match it with the summary tables in this way:
WITH -- Describe the catalog to match my_catalog(my_id, my_ra, my_dec, my_mag) AS (VALUES -- Specify data types here (NULL::int8, NULL::float8, NULL::float8, NULL::float4), -- Then list records to match (37484559004079282, 34.9368103756283, -5.49575920878389, 25.8668), (37484559004079399, 34.9376518156381, -5.49356595402835, 25.4748), (37484559004079428, 34.9461145648192, -5.49299547939153, 25.2956), (37484559004079718, 34.9377573698323, -5.48685133433986, 25.4888), -- : -- : (10,000 records) -- : (37484705032992272, 34.8390549872623, -5.23730251854975, 24.7598) ) , -- Create the match catalog which contain minimal required information match AS ( SELECT my_catalog.*, object_id -- No columns should be selected other than "object_id" -- To obtain other columns, describe them in the principal clause below FROM my_catalog JOIN pdr2_wide.meas ON ( -- Positions within 1.0 arcseconds coneSearch(i_coord, my_ra, my_dec, 1.0) ) -- No WHERE-clause should exist here ) -- The principal clause: -- Filter the above match catalog by conditions -- and select columns other than object_id SELECT match.*, i_ra, i_dec, i_cmodel_mag FROM match JOIN pdr2_wide.meas USING (object_id) WHERE i_detect_isprimary ;
This query looks complicated and is actually complex in a grammatical sense. It is composed of two subordinate clauses, my_catalog
and match
, and a principal clause. Thus:
WITH my_catalog ..., match ... SELECT ...;
First, you have to create your own catalog to match: my_catalog
. You can use “VALUES (a, b, c), (d, e, f),…” construct for this purpose.
Then, you create a match catalog: “match
”. Matching will be done without special tricks — for each record in my_catalog
, the summary table is searched for matching records. It is desirable that this process is performed quickly, and so this clause (“match AS (…)”) should be as simple as possible. You should not select columns other than object_id from the summary tables, and you should not describe a WHERE-clause here.
Finally, select wanted columns and sift out clean records in the principal clause.
You should note that the resulting matches are not one-on-one. If there are two or more objects in one catalog within the search radius from an object in another, all the pairs are included in the match catalog.
Notes on invalid values
In the summary tables, invalid values are represented by 'NaN'
(Not a Number).
For people unfamiliar to computer science, behaviors of NaN look strange at first glance, but PostgreSQL treats 'NaN'
still more surprising manner. Read this section carefully even if you are familiar with NaN.
Common knowledge on NaN
Usually (but not necessarily in PostgreSQL), NaN implies “unknown”. Therefore,
- “NaN = NaN” is false. (NaN is not equal to itself)
- “NaN != NaN” is true.
- “NaN > 1” is false, “NaN = 1” is false, and “NaN < 1” is also false.
We must emphasize again that the rules above are not necessarily the case in PostgreSQL, but once you get catalogs and you open them with C/C++ programs, python scripts, or any other tools, then NaN will behave like this.
Peculiarities of NaN in PostgreSQL
In PostgreSQL, 'NaN'
is ordered:
- “NaN = NaN” is true.
- “NaN != NaN” is false.
- “NaN > 1” is true, “NaN = 1” is false, and “NaN < 1” is false.
In fact, 'NaN'
is the largest number of all, larger than 'Infinity'
.
Let us consider a criterium r_kronflux_mag - i_kronflux_mag > 1
:
r_kronflux_mag |
i_kronflux_mag |
r_kronflux_mag - i_kronflux_mag |
r_kronflux_mag - i_kronflux_mag > 1 |
---|---|---|---|
NaN | (other value) | NaN | True |
(other values) | NaN | NaN | True |
NaN | NaN | NaN | True |
The criterium above will admit such objects that i-band magnitude is faint (NaN). It may not be your intention. Perhaps what you need is:
r_kronflux_mag - i_kronflux_mag > 1 AND i_kronflux_mag != 'NaN'
.
A tricky solution is to move i_kronflux_mag
to the right hand side:
r_kronflux_mag |
i_kronflux_mag |
r_kronflux_mag > i_kronflux_mag + 1 |
---|---|---|
NaN | (other value) | True |
(other values) | NaN | False |
NaN | NaN | False |
In astronomy, as an object gets fainter, its magnitude gets larger. If the object gets even fainter, the magnitude will finally be NaN. Fortunately this fact is consistent with PostgreSQL’s treatment of NaN that NaN is larger than any other number.
Help() function
There is help()
function in the database. You can get descriptions of tables, columns, and functions like:
SELECT * FROM help('pdr2_dud.%'); SELECT * FROM help('pdr2_dud.meas'); SELECT * FROM help('pdr2_dud.meas%.i_%shape_%'); SELECT * FROM help('%search');
‘%’ is the wildcard. It is not ‘*’. Basically all you need is this one special character, but if you want details, google “postgresql similar to”
You can further exploit help()
function to find tables that contain specific columns:
SELECT * FROM help('pdr2_%.%.n921_sdssshape%');
Appendix
Independent measurements
The contents in meas table were the output of independent measurements.
Independent measurements were, in contrast to forced photometry, done independently in different bands. You should however note that the measurements were not totally independent. We explain it in the following paragraphs.
The measurement process began with peak detection in each band. It occurred independently.
Second, the peaks in all bands were collected to make a pan-band peak catalog. Peaks in different bands but sufficiently close in position were merged here. The flags merge_peak_g, merge_peak_r, etc. will tell you what bands a peak came from. A peak may come from two or more bands, in which case it was a merged peak.
Third, each band was measured with the use of the pan-band peak catalog. Though peak positions as initial values were common to all bands, measurements (deblend, centroid, flux, shape,…) were done independently of the other bands. Deblending did not create new children: the process only determined distribution of photon flux to existing peaks.
Independent measurements that we call here are this third stage of the measurement process. The outcome from the third stage are stored in meas table.
Forced photometry
The contents in forced table resulted from forced photometry. Let us explain its difference from independent measurements.
After the three stages above, the measurement proces still went on.
Fourth, the catalogs from the third stage were collected to make a reference catalog with which to perform forced photometry. This merge was different from the second stage in nature: for each object, the process chose the first record that had sufficient flux S/N, among the bands from which the peak had actually come from, in the order of a pre-defined priority. The reference catalog resulted from this procedure. The flags merge_measurement_g, merge_measurement_r, etc. will tell you what band a reference record came from.
Fifth, forced photometry was performed on each band with the reference catalog. In this stage, some parameters (centroid and ellipticities) in flux measurements were not free but fixed as the reference. Note centroids and shapes were actually re-measured, but these new values were not used in flux measurements.
The outcome from the fifth stage are stored in forced table.