Our understanding of microbial natural environments combines in situ experimentation with studies of specific interactions in laboratory-based setups. The purpose of this work was to develop, build and demonstrate the use of a microbial culture chamber enabling both in situ and laboratory-based studies. The design uses an enclosed chamber surrounded by two porous membranes that enables the comparison of growth of two separate microbial populations but allowing free exchange of small molecules. Initially, we tested if the presence of the macroalga Fucus vesiculosus inside the chamber affected colonization of the outer membranes by marine bacteria. The alga did indeed enrich the total population of colonizing bacteria by more than a factor of four. These findings lead us to investigate the effect of the presence of the coccolithophoric alga Emiliania huxleyi on attachment and biofilm formation of the marine bacterium Phaeobacter inhibens DSM
|Published (Last):||26 September 2011|
|PDF File Size:||14.99 Mb|
|ePub File Size:||16.23 Mb|
|Price:||Free* [*Free Regsitration Required]|
Metrics details. The advent of Next Generation Sequencing technologies and corresponding bioinformatics tools allows the definition of transcriptomes in non-model organisms. Non-model organisms are of great ecological and biotechnological significance, and consequently the understanding of their unique metabolic pathways is essential. Several methods that integrate de novo assembly with genome-based assembly have been proposed.
Yet, there are many open challenges in defining genes, particularly where genomes are not available or incomplete. Despite the large numbers of transcriptome assemblies that have been performed, quality control of the transcript building process, particularly on the protein level, is rarely performed if ever.
To test and improve the quality of the automated transcriptome reconstruction, we used manually defined and curated genes, several of them experimentally validated. Several approaches to transcript construction were utilized, based on the available data: a draft genome, high quality RNAseq reads, and ESTs. In order to maximize the contribution of the various data, we integrated methods including de novo and genome based assembly, as well as EST clustering.
After each step a set of manually curated genes was used for quality assessment of the transcripts. The interplay between the automated pipeline and the quality control indicated which additional processes were required to improve the transcriptome reconstruction.
We discovered that E. While individual tools missed genes and artificially joined overlapping transcripts, combining the results of several tools improved the completeness and quality considerably. To the best of our knowledge, this is the first time that an automated transcript definition is subjected to quality control using manually defined and curated genes and thereafter the process is improved. We recommend using a set of manually curated genes to troubleshoot transcriptome reconstruction.
Conventionally, genetic and transcriptional studies of non-model organisms have been restricted due to the lack of reference genomes that impede their analyses. Nevertheless, non-model organisms are of great ecological and economic significance; consequently the understanding of their unique metabolic pathways by investigating their gene expression profiles is crucial. The advent of next generation sequencing NGS and its continuing improvement, as well as the development of corresponding bioinformatics analysis tools have boosted the number of sequenced transcriptomes in non-model organisms and their automated assemblies have become common over time [ 1 , 2 ].
Numerous software and pipelines have been used to automatically build transcriptomes and several methods that integrate de novo assembly together with genome based assembly have been proposed for non-model organisms [ 3 ]. Two major alternatives can be employed: 1 Aligning reads to the existing reference genome and then assembling the remaining unmapped reads or 2 Performing a de novo assembly first and then using the genome to improve the transcript assembly [ 3 ].
However, many open challenges in defining genes remain, particularly where genomes are not available or are incomplete. In spite of the large numbers of transcriptome assemblies that have been performed, quality control of the transcript building process is rarely performed. Manually defined and curated transcripts or good quality ESTs could be used to assess the quality of automated transcriptome assembly, but to the best of our knowledge, they have not been used.
In non-model organisms in particular, it is critical to have genes built from the species being studied, as closely related well-annotated species might not be available.
In spite of the great potential importance, the processes used to manually define and curate genes have not been documented until now Ben-Dor S. In this study, the transcriptome of the bloom-forming alga Emiliania huxleyi was built. Its intricate calcite coccoliths account for a third of the total marine CaCO 3 production, making it highly susceptible to future ocean acidification [ 6 ].
In addition to their role in the biogeochemistry of carbon and related climatic impacts, coccolithophores produce the sulfur containing compound dimethylsulphoniopropionate, precursor of the dimethylsulfide gas which is a major source of sulfur to the atmosphere where it can influence aerosol formation and consequently cloud condensation nuclei [ 7 ].
The recently published genome assembly of E. A large number of available unassembled genomic reads, numerous repeats and duplications, as well as holes in the genome, indicated that the genome alone would not provide a good basis for building transcripts. Therefore we opted for an integrative pipeline to build the transcriptome. To test and improve the quality of the automated transcriptome reconstruction, we used 63 manually defined and curated E. After each step in the automated definition pipeline, the presence of the manually defined genes was checked, allowing troubleshooting of missing genes and improving our pipeline.
This is the first time that an automated transcript definition is subjected to quality control using manually defined and curated genes and thereafter the process is improved. The experimental system was E. RNAseq was performed for six samples using the Ilumina HiSeq as follows: control no virus and infected with EhV or EhV, at two time points: 1 and 24 hours post infection. The sequences were trimmed because of a decrease in quality scores in late sequencing cycles due to the high GC content Additional file 1 : Figure S1.
The sample containing EhV for 24 hours had the highest number of reads with adaptor sequences about 15 million and a relatively low number of reads, since many of the E. The available genome assembly Emihu1 is a draft, and was constructed from Sanger reads. In addition to the genome, there were publicly available ESTs, which can provide additional information.
In light of this, three different approaches were applied to define E. The first was de novo assembly; the second was a genome-based alignment to an improved version of the genome assembly. The third approach utilized a collection of publicly available E.
All approaches were integrated at the end. Genome assembly quality assessment. There is full coverage of the gene in both ESTs and RNAseq reads, but when compared to the genome, there is a part which does not have coverage. B Genomic Duplication.
Duplication of almost the entire segment can be seen. Initial approaches for automated transcriptome building. A Three different approaches were applied to automatically define the transcriptome, the first one, de novo assembly, uses only the reads, the second one uses the reads and an improved version of the genome assembly and the third one is based on a publically available EST collection.
See Methods for details. Reads from all samples were pooled for assembly. This resulted in fragments. In order to reduce redundancy CAP3 [ 12 ] was used to cluster the fragments. The outcome was contigs and singlets. In addition to the Emihu1 current assembly of the genome, the JGI website database includes Sanger sequencing reads that were not included in the current assembly. They are called unplaced genomic reads and consist of bases N's and A, C, G or T in sequences.
The unplaced reads were assembled using Newbler, resulting in contigs. To evaluate the possible contribution of these contigs to the transcriptome definition, the reads of Sample 10BE which had the lowest number of E. In view of the high number of reads aligned to the new contigs, they were assembled to the available genome using Minimus2 [ 13 ] to create an improved genome version, Emihu1plus Additional file 3.
The reads of each sample were aligned separately to the Emihu1plus genome using TopHat [ 14 ]. The total number of reads aligned to the genome per sample spanned from 44 to almost 52 million. The exception was Sample 10BE, as mentioned above, which has only 3.
After alignment, Cufflinks and Cuffcompare [ 15 ] were applied to all the TopHat outputs to define transcripts. In this process, potential transcripts were defined. Single ESTs that were not clustered were not utilized for further transcriptome reconstruction.
In parallel to the automated approach for transcript assembly, we manually defined a set of E. These genes were used in order to assess the quality of the automated pipeline. Protein sequences of the target gene from human, Arabidopsis thaliana , and yeast S. Hits were inspected to see if there was any transcript evidence ESTs.
If there were matching ESTs, they were assembled into transcripts, and compared to the predictions, if there were any. If there was incomplete EST coverage, but a JGI predicted gene model, the blast results were used to fix the prediction accordingly. When the RNAseq reads became available, if possible, the putative transcripts were corrected on the basis of the reads.
If no ESTs were available to use as an anchor for a predicted transcript, then a combination of reads if available , prediction based on blast hits and the JGI predictions were used to construct a transcript. Manual gene definition procedure. The procedure of manual gene definition is presented as a decision tree, with the start and end in purple.
The procedure starts from the choice of the target gene purple circle, top middle , which was taken from three species, human, Arabidopsis thaliana , and yeast. The chart flows from top to bottom, with decision points in pale blue diamonds, and analyses in blue rectangles database searches in bright blue, and other analyses in gray-blue. Transcripts were then constructed and extended as far as possible by running Blast.
The sequence was then translated, and a BlastP search was run at NCBI to determine if the putative protein was closest to the target gene in other species, and if it had the proper domains. We found that in many proteins repetitive sequences interrupted the canonical domain composition, and in some cases the domains themselves. After the protein sequence was finalized, multiple alignments and phylogenetic trees were constructed with protein sequences from representative species to see that the sequence indeed belonged, who its closest relatives were, and in the case of multiple family members, to attempt to assign orthology.
Validation of 18 of the sequences was performed by real-time PCR with primers designed to the manually constructed sequences or Western blot Method of construction and validation status: Additional file 5. All of the sequences but two had reads in the RNAseq data of the current dataset, and the remaining two have reads in an additional dataset not shown.
They include both globular and transmembrane proteins. They are a mix of short and long transcripts ranging from to base pairs, with varying numbers of exons, ranging from single exon to 17 exon genes, and are expressed in varying levels according to the RNAseq data Additional file 5. The transcripts were compared to the standard genes using Blat [ 17 ], with a minimum hit score of , in order to ensure significant hits. Four of the genes had less than 10 reads in the RNAseq, and therefore could not be found in the read-based arms of the assembly.
In the genome based transcript collection, of the 59 possible genes, eleven were missed. In the de novo assembly, twelve genes were missed. Two of the four transcripts which did not have enough reads in the RNAseq were found in the EST branch, and two were not found at all. These regions were required to have a minimum length of base pairs and a minimum coverage of 50 reads, as the default Partek settings are permissive.
Automated transcription definition improvement. Partek found regions that had reads in places that Cufflinks could not define transcripts. Regions with a minimum coverage of 50 reads and a length of at least bp were selected for defining further transcripts.
Of the twenty-one, it found four that were missed by Cufflinks, two of which had also been missed by CLC Assembly Cell. The other 17 genes that were found by Partek added new fragments to transcripts defined with Cufflinks.
For each of the approaches, we characterized the transcripts and their translations. This process resulted in transcripts longer than base pairs, including contigs and singletons as defined by TGICL.
Systems used to automatically annotate proteins with high accuracy:. Select item s and click on "Add to basket" to create your own collection here entries max. Automatic assertion according to rules i. Automatic assertion inferred from database entries i. You are using a version of browser that may not display all the features of this website.