Skip to content

Commit 586aaf8

Browse files
dataset-datasource (#87)
* use category= in links file * fix categories * datasetdatasource table * datasetproperty table * fix typos * fix typos * fix category * Update organismTableQueries.xml fix 'Genomes' * Update organismTableQueries.xml fix 'Genomes' * use datasetdatasource in tuning mgr
1 parent 8777c4a commit 586aaf8

16 files changed

+151
-149
lines changed

Model/bin/jbrowseOrganismList

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,9 @@ and ts.taxon_id = o.taxon_id
2929
";
3030

3131
my $historySql = "select h.build_number, o.public_abbrev, h.genome_source, h.genome_version, h.annotation_source, h.annotation_version
32-
from apidbtuning.datasethistory h, apidbtuning.datasetnametaxon nt, apidb.organism o
33-
where h.dataset_presenter_id = nt.dataset_presenter_id
34-
and nt.name like '%primary_genome_RSRC'
32+
from apidbtuning.datasethistory h, apidbtuning.datasetdatasource dd, apidb.organism o
33+
where h.dataset_presenter_id = dd.dataset_presenter_id
34+
and dd.name like '%primary_genome_RSRC'
3535
and h.annotation_version is not null
3636
and o.taxon_id = nt.taxon_id";
3737

Model/config/datasetLinks.xml

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<?xml version="1.0" encoding="UTF-8"?>
22
<links>
3-
<link type="genome" subtype="null">
3+
<link category="Genomes">
44
<text>View the DEFAULT_PROJECT Genomic Sequence Page for an Example Sequence</text>
55
<description><![CDATA[
66
The Genomic Sequence Page is a view of a chromosome, contig, or scaffold. We provide summary information, a link to the Genome Browser, and a form for sequence retrieval. All Genomic Sequence Searches return this type of record.
@@ -9,7 +9,7 @@
99
<url><![CDATA[/a/app/record/genomic-sequence/DEFAULT_SEQUENCE]]></url>
1010
</link>
1111

12-
<link type="genome" subtype="null">
12+
<link category="Genomes">
1313
<text>View a partial region of an Example Sequence in the DEFAULT_PROJECT Genome Browser</text>
1414
<description><![CDATA[
1515
Each Sequence for this dataset can be browsed using the Genome Browser. Here we provide a sample region with tracks which are available for all sequences.
@@ -19,7 +19,7 @@
1919
</link>
2020

2121
<!-- TODO: replace this link with a link to the dataset record for this organism -->
22-
<link type="genome" subtype="null">
22+
<link category="Genomes">
2323
<text>View the DEFAULT_PROJECT Organism Record Page</text>
2424
<description><![CDATA[
2525
The Organism Record page contains a summary of the different types of data associated with this organism. How many genes? Does this organism have RNA Sequence Data? ...
@@ -28,7 +28,7 @@
2828
<url><![CDATA[/a/app/record/organism/DEFAULT_ORGANISM_PK]]></url>
2929
</link>
3030

31-
<link type="genome" subtype="null">
31+
<link category="Genomes">
3232
<text>Download Genome Data from DEFAULT_PROJECT</text>
3333
<description><![CDATA[
3434
Go to download directory for Current Relase of this particular organism.
@@ -37,7 +37,7 @@ Go to download directory for Current Relase of this particular organism.
3737
<url><![CDATA[/a/app/downloads/Current_Release/ORGANISM_FILE_NAME]]></url>
3838
</link>
3939

40-
<link type="genome" subtype="null">
40+
<link category="Genomes">
4141
<text>Run BLAST on sequences in DEFAULT_PROJECT</text>
4242
<description><![CDATA[
4343
BLAST finds regions of similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. VEuPathDB BLAST accommodates inputs such as a single sequence or several FASTA-formatted sequences at once, i.e., is multi-query capable.
@@ -48,7 +48,7 @@ Go to download directory for Current Relase of this particular organism.
4848

4949

5050

51-
<link type="est" subtype="null">
51+
<link category="EST">
5252
<text>Run BLAST on sequences in DEFAULT_PROJECT</text>
5353
<description><![CDATA[
5454
BLAST finds regions of similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. VEuPathDB BLAST accommodates inputs such as a single sequence or several FASTA-formatted sequences at once, i.e., is multi-query capable.
@@ -62,7 +62,7 @@ Go to download directory for Current Relase of this particular organism.
6262

6363

6464

65-
<link type="protein_expression" subtype="null">
65+
<link category="Protein expression">
6666
<text>View peptides aligned to an Example Protein in the DEFAULT_PROJECT Genome Browser</text>
6767
<description><![CDATA[
6868
@@ -73,7 +73,7 @@ Go to download directory for Current Relase of this particular organism.
7373

7474

7575

76-
<link type="protein_expression" subtype="null">
76+
<link category="Protein expression">
7777
<text>View peptides aligned to an Example Protein in the DEFAULT_PROJECT Protein Browser</text>
7878
<description><![CDATA[
7979
@@ -83,7 +83,7 @@ Go to download directory for Current Relase of this particular organism.
8383
</link>
8484

8585

86-
<link type="protein_expression" subtype="null">
86+
<link category="Protein expression">
8787
<text>View the DEFAULT_PROJECT Proteomics Section on an Example Gene Page</text>
8888
<description><![CDATA[
8989
@@ -93,15 +93,15 @@ Go to download directory for Current Relase of this particular organism.
9393
</link>
9494

9595

96-
<link type="protein_expression" subtype="quantitative">
96+
<link category="Protein expression">
9797
<text>View All Quantitative Proteomics Searches at DEFAULT_PROJECT</text>
9898
<description><![CDATA[
9999
]]>
100100
</description>
101101
<url><![CDATA[/a/app/search/transcript/GenesByQuantitativeProteomics]]></url>
102102
</link>
103103

104-
<link type="transcript_expression" subtype="array">
104+
<link category="DNA Microarray Assay">
105105
<text>View All Microarray Searches at DEFAULT_PROJECT</text>
106106
<description><![CDATA[
107107
@@ -111,7 +111,7 @@ Go to download directory for Current Relase of this particular organism.
111111
</link>
112112

113113

114-
<link type="transcript_expression" subtype="rnaseq">
114+
<link category="RNASeq">
115115
<text>View All RNA-Seq Searches at DEFAULT_PROJECT</text>
116116
<description><![CDATA[
117117
@@ -120,7 +120,7 @@ Go to download directory for Current Relase of this particular organism.
120120
<url><![CDATA[/a/app/search/transcript/GenesByRNASeqEvidence]]></url>
121121
</link>
122122

123-
<link type="transcript_expression" subtype="rnaseq">
123+
<link category="RNASeq">
124124
<text>View/Download this Data Set in the DEFAULT_PROJECT Genome Browser</text>
125125
<description><![CDATA[
126126
View JBrowse tracks associated with this dataset.
@@ -130,7 +130,7 @@ Go to download directory for Current Relase of this particular organism.
130130
</link>
131131

132132

133-
<link type="transcript_expression" subtype="chipseq">
133+
<link category="CHIP Seq">
134134
<text>View/Download this Data Set in the DEFAULT_PROJECT Genome Browser</text>
135135
<description><![CDATA[
136136
View JBrowse tracks associated with this dataset.

Model/lib/dst/profileGraphs.dst

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -224,16 +224,16 @@ prop=includeProjectsExcludeEuPathDB
224224
, APIDBTUNING.DATASETPROPERTY dp
225225
, APIDBTUNING.DATASETPROPERTY dp2
226226
, APIDBTUNING.DATASETPROPERTY dp3
227-
, APIDBTUNING.datasetnametaxon dnt
227+
, APIDBTUNING.datasetdatasource dd
228228
where ps.DATASET_NAME = '${datasetName}'
229229
and ps.profile_type = 'values'
230230
and ps.dataset_name = ct.dataset_name
231-
and ps.dataset_name = dnt.name
232-
and dnt.dataset_presenter_id = dp.dataset_presenter_id
231+
and ps.dataset_name = dd.name
232+
and dd.name = dp.datasource_name
233233
and dp.property = 'profileSamplesHelp'
234-
and dnt.dataset_presenter_id = dp2.dataset_presenter_id
234+
and dd.dataset_presenter_id = dp2.dataset_presenter_id
235235
and dp2.property = 'datasetShortDisplayName'
236-
and dnt.dataset_presenter_id = dp3.dataset_presenter_id
236+
and dd.dataset_presenter_id = dp3.dataset_presenter_id
237237
and dp3.property = 'datasetDisplayName'
238238
ORDER BY ps.study_id, ps.NODE_ORDER_NUM
239239
]]>
@@ -324,15 +324,15 @@ FROM apidbtuning.profilesamples ps
324324
, APIDBTUNING.DATASETPROPERTY dp
325325
, APIDBTUNING.DATASETPROPERTY dp2
326326
, APIDBTUNING.DATASETPROPERTY dp3
327-
, APIDBTUNING.datasetnametaxon dnt
327+
, APIDBTUNING.datasetdatasource dd
328328
WHERE ps.DATASET_NAME = '${datasetName}'
329329
AND ps.profile_type = 'values'
330-
and ps.dataset_name = dnt.name
331-
and dnt.dataset_presenter_id = dp.dataset_presenter_id
330+
and ps.dataset_name = dd.name
331+
and dd.name = dp.datasource_name
332332
and dp.property = 'profileSamplesHelp'
333-
and dnt.dataset_presenter_id = dp2.dataset_presenter_id
333+
and dd.dataset_presenter_id = dp2.dataset_presenter_id
334334
and dp2.property = 'datasetShortDisplayName'
335-
and dnt.dataset_presenter_id = dp3.dataset_presenter_id
335+
and dd.dataset_presenter_id = dp3.dataset_presenter_id
336336
and dp3.property = 'datasetDisplayName'
337337
ORDER BY ps.study_id, ps.NODE_ORDER_NUM
338338
]]>

Model/lib/perl/JbrowseRnaSeqJunctionTracks.pm

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ sub processOrganism {
5050

5151
my $sql = "select count(*)
5252
from apidbtuning.datasetproperty p
53-
, apidbtuning.datasetnametaxon d
53+
, apidbtuning.datasetdatasource d
5454
, apidb.organism o
5555
where d.DATASET_PRESENTER_ID = p.DATASET_PRESENTER_ID
5656
and o.taxon_id = d.taxon_id

Model/lib/wdk/model/questions/params/datasetParams.xml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -117,8 +117,8 @@
117117
<![CDATA[
118118
select distinct organism_name as term, oa.COMPONENT_TAXON_ID as internal
119119
from apidbtuning.organismattributes oa
120-
, apidbtuning.datasetnametaxon dnt
121-
where dnt.taxon_id = oa.COMPONENT_TAXON_ID
120+
, apidbtuning.datasetdatasource dd
121+
where dd.taxon_id = oa.COMPONENT_TAXON_ID
122122
union
123123
select 'any', -1
124124
]]>
@@ -145,7 +145,7 @@ select 'any', -1
145145
<column name="term"/>
146146
<sql>
147147
<![CDATA[
148-
select distinct dataset_presenter_id as term, dataset_presenter_id as internal from APIDBTUNING.datasetnametaxon
148+
select distinct dataset_presenter_id as term, dataset_presenter_id as internal from APIDBTUNING.datasetdatasource
149149
]]>
150150
</sql>
151151
</sqlQuery>

Model/lib/wdk/model/questions/params/geneParams.xml

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -6422,10 +6422,10 @@ products of your selected type (or types).<br><br>
64226422
select distinct 'long_transcript_novelty' as ontology_term_name, cast(null as varchar(1)) as parent_ontology_term_name,
64236423
'Support From Long Read Evidence' as display_name, 'Find genes based on overlapping models from long read RNA sequencing, for example from Oxford Nanopore or PacBio sequencing platforms. To be considered in this query, a model must have at least 5 supporting reads and be at least 20 bases long. Categories as follows: Novel in Collection (NIC): Prediction uses known splice donors and acceptors but reveals new connections (e.g., skipped exon isoforms); Incomplete Splice Match (ISM): Prediction matches subsection of a known transcript model, but has a novel putative start or end point; Novel Not In Collection (NNC): Prediction has at least one novel splice donor or acceptor; Genomic: Prediction has no overlapping splice junctions compared to known transcripts; Genomic: Prediction has no overlapping splice junctions compared to known transcripts; Known: Prediction exactly matches a known model. NOTE: The percentage add up to more than 100% because one existing gene model can have more than associated TALON model.' as description, cast(null as varchar(1)) as units,
64246424
'string' as type, 0 as is_range, cast(null as varchar(1)) as precision, 7 as display_order
6425-
from apidbtuning.datasetnametaxon dnt
6425+
from apidbtuning.datasetdatasource dd
64266426
where '$$gene_or_transcript$$' = 'gene_source_id'
6427-
and dnt.name like '%nanopore_longReadRnaSeq_RSRC'
6428-
and dnt.taxon_id in ($$organism_select_all$$)
6427+
and dd.name like '%nanopore_longReadRnaSeq_RSRC'
6428+
and dd.taxon_id in ($$organism_select_all$$)
64296429
and '@PROJECT_ID@' != 'EuPathDB'
64306430
UNION
64316431
select 'intron_junction' as ontology_term_name, cast(null as varchar(1)) as parent_ontology_term_name,
@@ -6608,8 +6608,8 @@ products of your selected type (or types).<br><br>
66086608
FROM sres.externaldatabaserelease edr
66096609
, sres.externaldatabase ed
66106610
, apidbtuning.datasetpresenter dsp
6611-
, apidbtuning.datasetnametaxon dnt
6612-
WHERE dnt.taxon_id = $$organismsWithSingleCell$$
6611+
, apidbtuning.datasetdatasource dd
6612+
WHERE dd.taxon_id = $$organismsWithSingleCell$$
66136613
AND dsp.name like '%cellxgene%'
66146614
AND dsp.name = dnt.name
66156615
AND ed.name = dnt.name
@@ -8869,19 +8869,19 @@ end as term
88698869
sdi.sort_order
88708870
FROM apidbTuning.sampledisplayinfo sdi,
88718871
apidbTuning.datasetPresenter dsp,
8872-
apidbTuning.datasetNameTaxon dsnt,
8872+
apidbTuning.datasetdatasource dd,
88738873
apidbTuning.datasetContact dsc,
88748874
(SELECT DISTINCT ta.organism,ta.taxon_id,ta.project_id
88758875
FROM apidbtuning.transcriptattributes ta, apidb.massspecpeptide msf
88768876
WHERE msf.protein_source_id = ta.protein_source_id
88778877
and (ta.project_id = '@PROJECT_ID@' OR 'UniDB' = '@PROJECT_ID@')
88788878
) org
88798879
WHERE sdi.dataset_name like '%_massSpec_%'
8880-
AND sdi.dataset_name = dsnt.name
8881-
AND dsc.dataset_presenter_id = dsnt.dataset_presenter_id
8882-
AND dsp.dataset_presenter_id = dsnt.dataset_presenter_id
8880+
AND sdi.dataset_name = dd.name
8881+
AND dsc.dataset_presenter_id = dd.dataset_presenter_id
8882+
AND dsp.dataset_presenter_id = dd.dataset_presenter_id
88838883
AND Dsc.Is_Primary_Contact = true
8884-
AND dsnt.taxon_id = org.taxon_id
8884+
AND dd.taxon_id = org.taxon_id
88858885
)
88868886
SELECT replace(replace(term,',',''),(''''), '') as term,
88878887
replace(replace(parentTerm,',', ''),(''''), '') as parentTerm,
@@ -10164,16 +10164,16 @@ end as term
1016410164
<column name="internal"/>
1016510165
<sql>
1016610166
<![CDATA[
10167-
SELECT dsnt.name AS term
10168-
, dsnt.name as internal
10167+
SELECT dd.name AS term
10168+
, dd.name as internal
1016910169
, dsp.display_name AS display
1017010170
FROM
1017110171
apidbtuning.datasetpresenter dsp
10172-
, apidbtuning.datasetnametaxon dsnt
10172+
, apidbtuning.datasetdatasource dd
1017310173
WHERE
10174-
dsp.dataset_presenter_id = dsnt.dataset_presenter_id
10174+
dsp.dataset_presenter_id = dd.dataset_presenter_id
1017510175
AND dsp.subtype in ('Broad_3k_array', 'Broad_75K_array', 'Broad_barcode', 'NIH_10k')
10176-
ORDER BY dsnt.name DESC
10176+
ORDER BY dd.name DESC
1017710177
]]>
1017810178
</sql>
1017910179
</sqlQuery>

Model/lib/wdk/model/questions/params/sharedParams.xml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3074,19 +3074,19 @@ This parameter allows you to apply the minimum number of peptides to each select
30743074
<column name="internal"/>
30753075
<sql>
30763076
<![CDATA[
3077-
SELECT dsnt.name AS term
3078-
, dsnt.name as internal
3077+
SELECT dd.name AS term
3078+
, dd.name as internal
30793079
, dsp.display_name AS display
30803080
FROM
30813081
apidbtuning.datasetpresenter dsp
3082-
, apidbtuning.datasetnametaxon dsnt
3082+
, apidbtuning.datasetdatasource dd
30833083
WHERE
30843084
dsp.dataset_presenter_id = dsnt.dataset_presenter_id
30853085
AND dsp.subtype in ('Broad_3k_array', 'Broad_75K_array', 'Broad_barcode', 'NIH_10k')
3086-
AND dsnt.name in (SELECT dataset_name
3086+
AND dd.name in (SELECT dataset_name
30873087
FROM apidbTuning.Ontology
30883088
WHERE organism = '$$organismSinglePick$$')
3089-
ORDER BY dsnt.name DESC
3089+
ORDER BY dd.name DESC
30903090
]]>
30913091
</sql>
30923092
</sqlQuery>

Model/lib/wdk/model/records/geneAttributeQueries.xml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -604,20 +604,20 @@ GROUP BY source_id
604604
SELECT ga.source_id, ga.project_id, v.annotation_version as ds_annotation_version, v.dataset_presenter_id as dataset_id, v.description as attribution_partial
605605
FROM apidbtuning.geneattributes ga
606606
LEFT JOIN (
607-
SELECT ga.source_id, ga.project_id,dsh.annotation_version,dsh.annotation_source, dnt.dataset_presenter_id, dsp.description
607+
SELECT ga.source_id, ga.project_id,dsh.annotation_version,dsh.annotation_source, dd.dataset_presenter_id, dsp.description
608608
FROM (
609609
SELECT max(dsh.build_number) bld, dsh.dataset_presenter_id
610610
FROM apidbtuning.datasethistory dsh
611611
WHERE dsh.annotation_version is not null
612612
GROUP BY dsh.dataset_presenter_id
613613
) dpb
614-
, apidbtuning.datasethistory dsh, apidbtuning.datasetnametaxon dnt
614+
, apidbtuning.datasethistory dsh, apidbtuning.datasetdatasource dd
615615
, apidbtuning.geneattributes ga, apidbtuning.datasetpresenter dsp
616616
WHERE dpb.bld = dsh.BUILD_NUMBER
617617
AND dpb.dataset_presenter_id = dsh.DATASET_PRESENTER_ID
618-
AND dnt.dataset_presenter_id = dsp.dataset_presenter_id
619-
AND dsh.dataset_presenter_id = dnt.dataset_presenter_id
620-
AND dnt.taxon_id = ga.taxon_id
618+
AND dd.dataset_presenter_id = dsp.dataset_presenter_id
619+
AND dsh.dataset_presenter_id = dd.dataset_presenter_id
620+
AND dd.taxon_id = ga.taxon_id
621621
) v ON ga.source_id = v.source_id AND ga.project_id = v.project_id
622622
]]>
623623
</sql>

0 commit comments

Comments
 (0)