Next generation sequencing

FASTQ: derived from FASTA format with the addition of quality scores. Each read from a sequencer comprises an identifier line, a sequence line, a second identifier line (or with a + character) and final a quality line. This typically forms the input to a mapping program (along with a FASTA reference genome). A typical human exome FASTQ file might be around 10-15GB, which can be compressed to 5-6GB using gzip.
SAM format: mapped/aligned sequence containing detail about alignment, mapping quality etc. This usually contains a subset of the raw reads (as some will have been discarded at the mapping stage). The SAM (or BAM) file is typically used as the substrate for variant calling algorithms and other analyses.
BAM format: binary version of SAM. A typical human exome BAM file might be around 2GB in size.
BED format: annotation format that describes genome regions, with the optional addition of annotation data for display of genome browser tracks.
Other annotation formats: UCSC describe a number of other formats suitable for generating tracks in genome browsers.
VCF:variant call format – this contains details about the number of reads at variant sites in the genome, plus a range of quality information. A typical human exome VCF file might contain about 20,000 lines.

1000 genomes project
International consortium working towards sequence data for 1000 human genomes (2 trios at high coverage,179 low coverage whole genome, 697 exome). VCF files and raw data downloadable.
http://www.1000genomes.org/. Also see Nature 467:1061–1073.
National Institute of Environmental Health Sciences SNP project
Complete exome sequencing data for 88 EGP samples, with VCF and BAM data available
http://snp.gs.washington.edu/niehsExome/
Personal genome project
Harvard University initiative for genome data sharing: aiming for 100,000 participants, currently only a limited quantity of data
http://www.personalgenomes.org/
Illumina’s demo data
eg One Yoruban human genome available from NA18507, plus analysed in/del and SNP information
http://www.illumina.com/HumanGenome/

Sequence capture arrays – exome, gene list, specific GWAS-hit regions etc
PCR amplification – suitable for smaller scale
Pooling to maximise throughput (“barcoded” or anonymous)
FAIRE-Seq: identify regions of open chromatin, where regulatory proteins bind (formaldehyde-assisted isolation of regulatory elements)
MAINE-Seq: identify regions of closed chromatin(MNase-mediated purification of mononucleosomes to extract histone-bound DNA sequencing)
ChIP-Seq: identify where transcription factors bind using antibody to TF on nuclear DNA(Chromatin Immunoprecipitation sequencing)

SEQanswers: an online forum – extremely useful for NGS information
http://seqanswers.com
Service providers: check with your University. This is a rapidly changing field and most universities are beginning to run systems in-house. Alternatively, commercial NGS services are available in many countries.
Illumina:
http://www.illumina.com
454:
http://www.454.com
ABI SOLiD:
http://tinyurl.com/ccdk8j