|
The in situ images produced by Project Component 3 are linked to the genes identified by Project Component 2 using the following protocol.
For the purposes of this description:
- A clone id is the number part of an "IMAGE:#" string found as part of the GenBank description.
- An IC# is the unique identifier from Project Component 3 for an experiment that produces a series of in situ images.
- An identified gene is a GenBank accession number in the Project Component 2 database that is associated with one or more cell lines.
Protocol
Component 3
Project Component 3 submits data for each in situ experiment via the url https://baygenomics.ucsf.edu/insituupdate.html. This web page contains three html forms, one each for the three distinct submission parts. (Alternatively, data may be submitted programmatically directly to the underlying cgi script at https://baygenomics.ucsf.edu/cgi-bin/insituupdate.py.) In either case, data must be sumitted in the following order.
Component 3 can also annotate individual images with text descriptions via the url https://baygenomics.ucsf.edu/cgi-bin/Insitu.py.
- Submit IC#, clone ID, and GenBank accession.
- Submit the experimental 3' and 5' sequences of clones, one sequence per submission. There are at most two sequences per IC#.
- Submit image files one at a time. These files have structured file names, which encode IC# and experimental conditions. The file names are of the form "ICid_dpc_days_magnificationX_suffix.tif" or "ICid_al_treatment_magnificationX_suffix.tif", where
id is the unique integer identifier for the set of images assigned by Project Component 3; days is 10 times the number of days post coitus when the image was taken; magnification is the magnification level of the camera; and treatment is the treatment applied for adult lung images. suffix is a single, lowercase letter added only as needed to differentiate an image from an otherwise identically named image file for the same IC#. Component 2
- Component 2 has developed scripts for initial data processing, updating the Component 2 database, validating the clone sequence data, and linking the in situ images to current Component 2 gene identifications.
Initial Data Processing & Database Loading
- Component 2 processes the received data files in three distinct steps.
- The first of these validates the IC# then updates the Component 2 database with IC#, clone ID, and GenBank accession.
- The second updates the database with up to two experimental clone sequences per IC#. Once two sequences have been processed, subsequent sequences for that IC# will cause a processing error.
- The third validates, renames, and stores valid Component 3 image files and records their existence as IC#'s in the database. An image file received by Component 2 will replace an existing file of the same name.
- The second and third steps occur only for existing IC#'s.
Data Validation and Identification
- After initial processing, the clone ids are converted into nucleic acid sequences by using the CloneRanger interface at Invitrogen's web site to look up the clone id and corresponding GenBank accession numbers. Each clone sequence is BLASTed against the experimental 3' and 5' sequences for validation. Valid GenBank sequences are those returning at least one hit with an e-value better than 1e-50.
- The valid clone sequences are BLASTed against the Component 2 identified genes database. If all clone sequences that return any BLAST hits match the same identified genes with e-values better than 1e-50, then the in situ image IC# is linked to those genes in the Component 2 database. This step is rerun regularly, e.g., once a month, to link in situ images to newly identified genes.
- The BayGenomics web interface uses the IC#-gene links to label genes (and therefore cell lines) that have in situ images associated with them. If the valid clone sequences for an image do not pass the above protocol, that image will remain unlinked to any gene and will not appear on the BayGenomics web site.
Notes
- The Project Component 3 IC#'s must not contain spaces or any leading zeroes. For example, "IC5" is a valid IC# whereas "IC05" and "IC 5" are not.
Last updated 05 Nov 2003.